This workflow describes how you prepare the content so it is ready for preserving at the next stage. If there are any terms that you are unfamiliar with on this page, please refer to the glossary for the most common terms used in digital preservation.
Step 2.1 Understand what you have
This is an essential step
- Use software, such as DROID, to identify what you have and create a list of the content. This should include file names, file paths, sizes, file format, last modified date etc.
- Identifying the file formats accurately is particularly important.
- Save the list in an open format (e.g. CSV or XML) and store in the ‘metadata’ folder you created in step 1.2.
Software
- DROID (identifies file format and other information).
- Fido (identifies file format only).
- Siegfried (file format identification tool).
- MediaInfo (useful for identifying audiovisual files).
- Karen’s Directory Printer (useful for creating lists of files but does not identify file formats with the same degree of certainty as DROID or Fido).
Further guidance
- DROID: User Guide
- DROID Video Demo
- University of Hull Idiot Guide No. 5: Droid
- Fido for format identification, and why it matters (Open Preservation Foundation Webinar)
- Bodleian Libraries: Introduction to Digital Preservation: Identification
Step 2.2 Validate content
- Validation software checks whether the content conforms to their file format specification. In some cases it can also fix issues.
- It is not always seen as an essential step but can help flag issues. For example, if the content does not conform to this specification then it may be more difficult to read or manage in the future.
- It can also be useful for checking the quality of digitised content.
Validation software
- JHOVE (validates certain file formats and also carries out identification).
- Jpylyzer (validates JP2 images).
- veraPDF (validates PDF/A).
- MediaConch (validates audiovisual files).
Further guidance
- Bodleian Libraries: Introduction to Digital Preservation: Validation (includes good links to various open source tools)
- JHOVE Documentation
Step 2.3 Analyse and investigate
- You may wish to analyse the metadata you captured during steps 2.1-2.2 and flag any issues for investigation.
- This includes looking out for corrupt files, compressed files, encrypted files and password-protected files. You will probably need to go back to the depositor to resolve these.
- It can also flag unidentified formats which could require further research.
- Some archives also convert file formats to a preferred file format for preservation (see step 3.5).
Software
- Freud (used by The National Archives to analyse a DROID export and pick up common issues to mark for investigation).
- HxD Hex Editor (displays the bytes of a file and helps with file format research).
Further guidance
- How to research and develop signatures for file format identification (The National Archives).
- My first file format signature (University of York blog).
Step 2.4 Describe
This is an essential step
- As a minimum, create a high-level description of the content.
- You may decide to do more detailed cataloguing in accordance with your organisation’s cataloguing standards (either now or at a later date).
- You can add the descriptions to the list you created in step 2.2 or create them in a CSV or XML file.
- If you use a collection management system, you may wish to record the descriptions there (e.g. the accession record or catalogue).
Software
- Quick View Plus (allows you to view over 300+ file formats. $99 per year)
- VLC (for playing audio and video files)
Further guidance
- Levels of Born-Digital Access (pages 10-13 cover the topic of description)
- Paradigm – Arranging and cataloguing digital and hybrid archives
- Digital Cataloguing Practices at The National Archives (2017)
- Quick View Plus Product Fact Sheet and Supported File Format List
Step 2.5 Appraise
- You may have already carried out appraisal at step 1.2. At this stage, you may wish to carry out further appraisal.
- As a minimum, you could consider identifying and removing duplicates by comparing the checksums of the content. There is software that can help you do this (see below).
- However, you may decide to keep duplicates if they have useful contextual information (e.g. file name).
De-duplication software
- CSV Validator and deduplication schema (can be used for de-duplication)
- TreeSize Free
- ePADD (supports the appraisal of email archives as well as processing, preservation, and discovery)
Further guidance
- Seeing double (Blog by Rachel MacGregor on deduplication using the CSV Validator)
- Paradigm Project – Appraisal and Disposal
- DPC Handbook: Acquisition and Appraisal
- Susanne Belovari (2017) Expedited digital appraisal for regular archivists: an MPLP-type approach, Journal of Archival Organization, 14:1-2, 55-77
- Victoria Sloyan (2016) Born-digital archives at the Wellcome Library: appraisal and sensitivity review of two hard drives, Archives and Records, 37:1, 20-36
2.6 Apply access restrictions
This is an essential step
- Some of the content may contain personal, sensitive or confidential information.
- If the content is subject to the Freedom of Information Act, you will need to use the act’s exemptions to inform any restrictions.
- The depositor should help you identify this during transfer at step 1.2. Cataloguing at step 2.4 can also help with this.
- There is software that can help you identify personal information. Some of it is commercial and expensive, but a list of free software can be found below.
- Access restrictions or any risks should be recorded somewhere (e.g. in the list you created in step 2.2 and/or in any collection management system).
Software
- Bulk Extractor
- BitCurator (digital forensics tools for digital preservation including Bulk Extractor)
- ePADD (can help identify sensitive information in email archives)
Further guidance
- Victoria Sloyan (2016) Born-digital archives at the Wellcome Library: appraisal and sensitivity review of two hard drives, Archives and Records, 37:1, 20-36
- BitCurator: Using Bulk Extractor to Locate Potentially Sensitive Information (video)
For the next stage of the digital preservation workflow, head over to the Preserve page.