The South West Heritage Trust used the opportunities of New Burdens funding and a move to an independent IT network to work with Arkivum and Metadatis in developing a digital preservation solution.
Introduction
The South West Heritage Trust is an independent charity and limited company. The Trust operates with a board of trustees and is founded on long-term legal agreements with Somerset and Devon County Councils, its principal funders.
We hold fairly standard types of born-digital records for county record offices, including born-digital minutes of organisations; indexes and databases made for family and local history research; photographs; sound recordings; a limited quantity of film; and published records. These come in a great variety of physical and file formats, including DAT tapes, 5¼” floppy disks and other obsolete media. We are aware that there are some types of records which we will have to deal with in the future which have particularly strong requirements in terms of integrity, reliability and authenticity, such as magistrates’ court records, adoption records and electoral registers.
Both Somerset and Devon started receiving digital records in the early 1990s, but it was not until 2013 that we started to deal with them in a systematic manner. In 2015 we implemented a specific accessions process for digital records, using three pieces of free, open-source software. This was extremely time-consuming and cumbersome, but we persisted until we had the opportunity to do something better. The feedback from our Accreditation in early 2017 acknowledged our work to date and encouraged planned developments to our digital preservation provision.
The opportunity came in 2017, when two major events took place. The first was the Trust’s move to an independent IT network and appointment of a dedicated IT Manager. This gave us increased flexibility to develop our own direction with IT systems. This has included taking the decision to move from our CALM online catalogue to a new and innovative online catalogue program called Epexio, developed by Metadatis. The second event was receiving New Burdens funding, which we decided to use solely for digital preservation. We invested this in the implementation of a digital preservation system and for staff time over 12 months to test and implement the system and to start ingesting our huge digital backlog.
Description of your digital preservation architecture
Since September 2018 our main tools have been:
- Archivematica (hosted and supported by Arkivum in their Perpetua model)
- Describe (our back-end archives catalogue)
- Epexio (our online catalogue)
We use Describe to accession and catalogue new material, in just the same way as for analogue records. We then currently use Exactly and NextCloud to create a Submission Information Package (or SIP). Exactly creates a ‘bag’ for the data so it can all be transferred and moved around in one chunk, without any data going missing (it is similar to BagIt). This ‘bag’ of data is then uploaded to Archivematica using an upload tool called NextCloud, which is similar to DropBox. Archivematica ingests the SIP and produces an Archival Information Package (AIP) for long-term storage, and a Dissemination Information Package (DIP) for the user copies. Archivematica is the tool which performs the main preservation actions: creating and verifying checksums, assigning UUIDs, virus-scanning, creating a copy of the record in the format which is best for long-term storage, and creating a user copy in an easily-readable format. At the end of the Archivematica process, we have an AIP for long-term storage and a DIP ready for transfer back to the archive catalogue. Lastly, we use Describe again, to link the DIP to the catalogue entry; at this stage, we establish controls over who can see the DIP and, if necessary, edit the presentation and viewer experience. Digital objects can be viewed in our searchrooms and we have the functionality to make them available through our online catalogues, although this has yet to be implemented. The system is evolving rapidly and we expect to upgrade to more a streamlined version in the near future.
Our data storage is as follows: the AIPs are stored by Arkivum on two separate UK based servers, and the data are also replicated on to magnetic tape. One copy of each DIP is stored on our own storage, another on Metadatis’ servers, for use with the archive catalogues.
We have two instances of the system, one for Somerset and one for Devon, with separate storage for each. This is because we need to keep the data for each county separate, both for intellectual and legal reasons.
Rationale for a predominantly open-source approach
In 2016 we started investigating the systems which were available, and came to three conclusions.
Firstly, we decided that we would prefer a system based on open-source software rather than something proprietary, so that it could be more adaptable and not tied to one provider. Secondly, we realised that we did not have the knowledge and capacity to build and customise it ourselves, and so we would need to purchase a managed system. Lastly, we wanted a system which had a clear exit strategy, as our data will outlast all of our lifetimes and companies will come and go. We also ideally wanted something that would link to our archive catalogue (which was CALM at that point) so that we did not have to maintain both that and a separate catalogue in the digital preservation system. Archivematica seemed the best fit for our needs, but it lacked a solution that linked to CALM so we were somewhat in a quandary.
However, following an open tendering process it became evident that a blended approach was possible which would enable us to use Archivematica as the underlying digital preservation solution, linked to a new catalogue system called Describe, provided by Metadatis. We were able to work with our chosen suppliers Arkivum and Metadatis to develop a solution that met our requirements.
Pressure points when designing or implementing the architecture
Being the first to link Archivematica with Describe and Epexio, and having digital objects going to different servers, meant that the links between the systems had to be built from scratch, which meant the development and testing phases were very important. Now that they are in place we have a streamlined, integrated system which is very easy to use. We can provide easy access to digital records in the searchroom, and can instantly change the access to make a record available online (or to remove it from public access if we need to). The way in which Epexio displays some digital objects has needed adaptation in some cases (for example, a film viewer had to be written), but this has all been achievable.
The variety of file formats that need preserving has been a challenge and a lot of work has been necessary to customise Archivematica to suit our needs. However, we now have what we aspired to in terms of an open-source but managed system, which enables us to meet the necessary standards for data storage, and with a clear exit strategy.
Our main pressure point is the on-going capacity needed to ingest our backlog. We had initial funding for a twelve-month project, and used that time to survey all our digital records, procure the system develop and test it and get it live. In Somerset we were able to successfully bid for a twelve-month “Museum Futures” trainee post from the British Museum, specifically focussing on digital records relevant to the Somerset Museums Service. The trainee was able to ingest over 1 TB of Somerset data during his time with us, which has been a great benefit. However, we still face challenges with the Devon data and remaining Somerset material and will be seeking further opportunities to develop capacity to meet this need.
Manual processes you would like to automate but haven’t been able to yet
We still manually approve each ingest, after checking whether or not it has worked correctly, as unfortunately the error reporting is currently problematic, especially in terms of flagging up where files have not been properly normalised for preservation or access. Eventually, we would like to allow the system to work automatically, but first we need to be confident that errors can be reported and rectified.
Contact the archive
Somerset Heritage Centre (South West Heritage Trust): somersetarchives@swheritage.org.uk