Main section

Click on the tabs below to find out more about this theme including recommendations and potential for implementation, and read the related case studies.

Tabs Navigation

Tabs

About this theme


Content in cultural collections is engaging and insightful – however, much of it is undiscovered. The true value of the UK’s collections is locked away and audiences are poorly served. Even with digitised collections, particularly across UK collections, there is often no easy way to search (by subject, person or place) in order to assist with personal interest, academic study, commercial or scholarly research.  

Mapping digitised cultural collections in the UK

Complementary to the digitisation taskforce survey, DCMS commissioned a feasibility study to look at a recommended framework for mapping and connecting digitised cultural collections within England. The aim was to better understand the application of emerging technologies in making collections searchable across institutions. This study was undertaken by The Collections Trust, a member of the Taskforce, in collaboration with Knowledge Integration. 

The study successfully built and demonstrated a test aggregator that could ingest data from a range of institutions using various technical means, and without the institutions having to format their contributions to any kind of set template. The potential and pitfalls of agreed emerging technologies were demonstrated, as various generic AI services were applied to the test data with mixed success. 

This, however, reflected the need to train such AI services with lots of data relevant to cultural heritage collections, rather than any inherent shortcomings of the technologies themselves. The proposed framework is widely applicable and does not take the ‘one size fits all’ approach of previous aggregators. In particular, an institution could initially be represented with just collection-level records that would still be useful, adding item-level records and digitised assets as and when they were ready, in whatever form and through whatever means they were able to provide them. The report can be found here.

Recommendations

Recommendation 3

Nationally-funded programmes focused on interoperability and data-sharing in the cultural and heritage sector, such as AHRC’s Towards a National Collection (TaNC), should take account of this report and its recommendations. 

The Network will: 

  • Continue to work with UK Research and Innovation , The Arts & Humanities Research Council (AHRC) and others, sharing the work of the Taskforce and inviting collaboration and support.

Recommendation 4

There should be increased investment, publicity and support for sector-approved content aggregators       

The Network will: 

  • Continue to support national and sector specific content aggregators that act as major gateways to digitised collections, such as Discovery, Archives Hub, Art UK and the Global Biodiversity Information Facility, promoting them to their communities. 

British Library


Case study: 3D Digitisation of the British Library’s historic globe collection      

In 2019 the British Library embarked upon an ambitious project to unlock one of its most beautiful yet technically challenging collections.  

Historic globes form a small but unique subset of the UK’s national collection of 4 million maps. Ranging in date from 1600 to 1900 and in size from an inch to a metre in diameter, these terrestrial and celestial spheres provide crucial insights into the history of science and society, and in particular the development of the British mapmaking industry. However, because of their format, age, precision craftsmanship and materials, these objects pose particular challenges for handling and interpretation.  

Globes 3D brings these objects to a wider online audience whilst preserving them for future generations. Utilising the British Library’s in-house multi-camera 3D imaging system, the complete surface areas of 30 globes were photographed using focus stacking.  

Working alongside the digitisation company Cyreal, Library technicians were then able to construct 3D digital models showing historical geospatial data to sub-millimetre accuracy using photogrammetry.   

On the left, a computer screen displays images of a 3D globe map. On the right, a man is using scanning equipment to create more images of the globe.

Historic globe 3D photography (credit: Tony Grant, British Library)

These 3D digital models of the globes are now available on the Library’s main website (https://www.bl.uk/collection-guides/globes) via Sketchfab, where anyone online will be able to access and interact with them for the first time and a variety of previously illegible surface features can be read, some with the assistance of multi-spectral imaging. 

Alongside the obvious benefits of enabling the Library to preserve the precious original items, the project has generated high-quality interactive content at no cost to the public purse, enabling the library to fulfil its role of facilitating research and enjoyment of the nation’s cartographic heritage.  

This digital innovation will generate further benefits to the Library in terms of licensing agreements, and for use in displays, exhibitions, loans and partnerships, cultural activities, and in leading learning workshops and education programmes exploring the history of maps, science and society. 

Yet digitisation on its own is not enough. Digitised collections need to take account of both human and machine access, ideally following the FAIR data principles to be findable, accessible, interoperable and reusable. In addition, user research,  and communications, access and inclusion strategies all need to be deployed in order to achieve the desired impact.  

In some instances, the aim may be to provide a new channel for an existing audience. The Natural History Museum’s Digital Collections Programme, for example, aims to support the global audience of scientific researchers. However the aim is not to replicate existing access online, but to increase access by removing physical and cost barriers, and also to transform what research is possible and how, for example, by enabling big data modelling approaches in addition to visits.  

The Natural History Museum


Case Study: The Natural History Museum Data Portal

“It has been ​over five​ years since we launched the Data Portal back in 2015 and in that time we have transformed access to the collections, creating an audience that is ​more than​ ten times greater than the number of scientists able to visit our physical collections.”

Vince Smith, Head of Biodiversity Informatics.

Since 2015, the Natural History Museum London has made its research and collections data available through its  Data Portal – this includes specimens digitised through the Museum’s Digital Collections Programme, which in many cases are now available on the portal within a day or two of imaging. 

As of June 2020, more than 27 billion records have been downloaded from the portal and aggregators in over 400k download events since 2015. The portal contains over 4.8 million records from the specimen collection and over 6 million further records from other research datasets including 3D scans, images, video and audio recordings as well as other structured data in tables. More than 1,000 scientific publications have cited data from the Data Portal, either directly or through aggregators such as the Global Biodiversity Information Facility (GBIF), showing the power of joining up collections data and observations of nature globally, and there are many more citations that it is currently not possible to track. 

Planned portal functionality will increase data linkage making it easier to discover data associated with our collections. For example, integration with the ORCID system for researcher identification, and increased use of links from specimens to genetic sequence data and 3D resources, enabling users to find related datasets associated with the specimens they are viewing. 

There are many aspects to successful interoperability, and these have implications for the kinds of skills and resources needed to implement them. These include adoption of common schemas (mostly applicable to metadata); common open Application Programming Interfaces (APIs), for example, the International Image Interoperability Framework (IIIF); common vocabularies (to categorise content consistently); and adoption of a legal framework that makes this possible e.g. non-restrictive licensing or terms.

The Bodleian, Oxford University


Case Study: Presenting digitised manuscripts from German-speaking lands

The International Image Interoperability Framework (IIIF) is a global partnership of libraries, archives, museums, and cultural heritage organizations that seeks to promote open standards for access to digital media. The IIIF publishes technical specifications that define how institutions can make their digital image collections interoperable and re-useable (https://iiif.io). The inclusion of audio and video content in the upcoming version will make IIIF the pre-eminent specification for sharing of cultural heritage media. At the Bodleian Libraries in Oxford, IIIF services are a central part of its strategy to increase access to the collections and promote greater discovery and engagement, by making digitized content accessible to users as data that can be re-mixed, re-used, and integrated into their own research.  

For example, from 2019 to 2021, the Bodleian and the Herzog August Bibliothek (HAB) in Wolfenbüttel, Germany collaborated to digitize over six hundred manuscripts from their collections. The project sought to virtually ‘re-unite’ items from the collections of former religious houses (convents and monasteries) from central Germany whose medieval libraries had been dispersed through war and dissolution. The resulting project website uses IIIF to present the digitized manuscripts from each institution, where each institution makes available the data for the manuscripts over the IIIF APIs and they are presented to the user as a collection that can be searched and browsed within a single interface.  

Search results from the project website, showing images from six illustrated manuscripts.

Search results from the project website, showing six manuscripts once held in the library of the Cistercian Nunnery in Medingen. Three are now at the Bodleian Libraries in Oxford, two at the Dombibliothek in Hildesheim, and one in the Herzog August Bibliothek in Wolfenbüttel.

This approach has allowed the institutions to build a thematically unified international collaborative project, while simultaneously ensuring that the digital images and metadata are maintained in well-supported digital collections infrastructure for long-term curation and preservation. Adopting this approach removes the need for large numbers of images to be sent to a centralized project-specific systems for presentation, lengthy website set-up, and even lengthier and complex maintenance of project-specific digital resources to ensure long-term project sustainability. Even though the German Manuscripts digitization project has ended, the digitized materials, and cataloguing and description effort, will continue to live on in well-supported institutional digital collections systems.  

The use of IIIF has allowed collaborating institutions to reap the benefits of a shared platform for presenting engaging, thematic content for the duration of the project, while also providing for the long-term sustainability of the outputs of the projects and their integration into the larger collections of the individual institutions. 

 

For both object data and metadata, new technologies to automatically create or enhance data such as Optical Character Recognition (OCR), handwriting recognition, Artificial Intelligence (AI) or machine learning could transform the speed and cost at which data can be extracted,  enhanced, and discovered; for example by creating natural language descriptions of objects. At present, however, these technologies are often designed to work best with more standardised material than is typical in collections; so further research and development is required. One can imagine a time where a user might be able to interact with a collection of specimens, photos, sound, and archival documents through one search. The Taskforce welcomes the programme of research projects Towards a National Collection,1 and the work of the UK Research and Innovation (UKRI) Infrastructure Roadmap.

This Roadmap, Opportunities to grow our capabilities, was published in November 2019, having been requested by the government as a strategic guide to inform investment decisions for the next generation of infrastructure. The theme ‘maintaining and preserving cultural heritage’ has a sub-theme on ‘digitisation and interoperability’ that is highly relevant to this report. Indicative actions include: 

  • National aggregator for existing collections 
  • Phased cataloguing, digitisation and connecting of national collections 
  • Creation of a digital archive for the second half of the twentieth century 
  • Understanding the heritage of empire and colonialism 
  • Video and sound search facility 

Moreover, while correctly calling for better storage, conservation and access facilities for physical collections, the roadmap also notes that ‘phased creation of a distributed network of advanced physical and digital facilities would transform access to and search and discovery capability for national collections.’

One key route to achieve access and interoperability is already in existence: aggregators. These portals, for example, The National Archives’ Discovery; Jisc’s Archives Hub; and Art UK, are all online sites where different collections can be ‘discovered’ in one place, enabling researchers to find material across organisations, and smaller institutions to benefit from the scale and resource of larger platforms and established audiences. Aggregators also exist for more specialised access such as the Global Biodiversity Information Facility which combines natural science collections data with biodiversity observation data. The Taskforce concluded that, rather than one huge system, transformation for the collections sector will come from linking data. Any approaches need to be scalable in order not to leave small institutions behind. Search engine indexing and optimisation are also important to collections discovery. 

The Taskforce also noted the conclusions of the report, ‘Mapping Digitised Collections in England’ This report described a feasibility study, “to develop and evaluate a practical framework for collecting relevant data in order to map cultural collections and consider what functionalities a tool based on this framework might possess”. The work, led by Collections Trust concluded that:   

  • “The test data demonstrated that, with a suitable user interface, the proposed framework would allow end users a single point of access to data at multiple levels from a wide range of institutions.  
  • The test demonstrated that the prototype could ingest data from a range of institutions using various technical means, and without the institutions having to format their contributions to any kind of set template.  
  • Although museums, libraries and archives follow different cataloguing standards (often even within the same institution), the flexible nature of the prototype means there is no technical reason why data from all three could not be harvested by an aggregator built using the same architecture, allowing searches to be made across the different collection types.”5

Licensing, Intellectual Property, copyright and information management (including GDPR) were all issues mentioned by many survey respondents, that again impact publication of and access to digital collections. There is often a key tension for those holding collections between their duty or mission to make collections available, and the need to raise funding by, for example, charging for image rights. Different collection types raise very different issues. For example, copyright is less relevant in relation to natural specimens, but there may be sensitivities about geographic data, for example, in disclosing the location of endangered species. Use of Creative Commons machine-readable licensing can support interoperability, but may not be suitable, or not well understood, across the sector. There can also be significant issues of perception around the balance of risks between extending the reach of collections data, and loss of curatorial control and informed interpretation. This was an area of work being taken forward elsewhere in the Culture is Digital project – for example, The Space produced a Digital Rights Toolkit6. However, the Taskforce is aware of the need for any work on digitisation to factor in these challenges.  

Art UK


Case Study: Art UK

At least 80% of the UK’s national art collection is not on view and the majority has not been photographed. Art UK, the art education charity and online home for every public art collection in the UK, has undertaken two major UK-wide digitisation programmes. First, it digitised all oil paintings in public ownership and then digitised sculptures, both within collections and outdoors.

In both programmes a network of researchers (‘coordinators’) worked alongside freelance art photographers – on a county-by-county basis for paintings and across 25 regions for sculpture. Key tombstone data was imported from collections on spreadsheets, most of which needed processing by coordinators and Art UK’s editorial team. For smaller collections, coordinators often entered data themselves. A style guide ensured consistency and artists were ‘linked’ to ensure unique artist records.

Typically 20MB+ photographs were taken and then imported and linked to data records. For the oil paintings, some 90% of the 212,000 objects were photographed by Art UK with the remainder provided by collections. For this project, photographers visited 3,200 UK venues of which 46% had ten or fewer paintings. For the indoor sculpture digitisation project, data records were imported for every object and 36% photographed by professional photographers, most with multiple angles. Outdoors, all public sculptures were photographed by volunteer photographers trained by Art UK, with over 500 volunteers involved.

Image reproduction agreements now cover the 3,400 venues on Art UK. Venues include museums, universities, hospitals, historic houses within the likes of the National Trust, libraries and other civic buildings. There are currently 54,000 artists represented on Art UK with 57% in copyright. Art UK’s copyright team works to trace, record due diligence and gain consent from rights holders with an insurance policy covering orphan works.

The artwork database and images are stored in Art UK’s cloud infrastructure, built on top of AWS. This includes layers to manage long-term image preservation. Tags added by the public and machine learning support searching by subject matter. The Art UK website is rich in story content (almost 2,000 articles), learning resources, public engagement opportunities, maps, onward links and shopping opportunities. In the last 12 months 5 million unique users visited the site with over 50% from overseas.

Art UK’s initiative means that the researchers have access to artworks in the national collection that they would not otherwise have been able to see. In Art UK’s November 2022 collection survey, 76% of small collections, 46% of medium collections and 41% of large collections said they only showed their art on Art UK (and 14% of very large collections).
Subject to funding, Art UK’s next digitisation programme will be outdoor murals. But in future the vast majority of artworks added to the site will come through automated ingest from the Museum Data Service.