- UK Government Web Archive
Web archiving and web continuity guidance
On this page you can access web archiving technical guidance, an overview of the technical limitations of web archives, web continuity technical guidance, suggestions on how to link to the UK Government Web Archive and some information on how to overcome any firewall issues relating to accessing the web archive. You can also contact us.
For more background on the UK Government Web Archive (UKGWA), see our Information pages.
Web archiving technical guidance
To enable successful web archiving, the website to be archived needs to meet a number of technical criteria. Detailed descriptions of these requirements are outlined in the following document:
Additional guidance for creating an official public inquiry website suitable for web archiving is also available.
Most websites are archived according to a regular schedule which can be obtained by contacting the web archiving team.
We need at least eight weeks' notice to complete the web archiving process. The eight week period is the time from the launch of a crawl to when the archived website is publicly-accessible. If your website is closing you will need to tell us in advance. Please see Annex B in the above document for a checklist for closing sites.
Technical limitations of web archives
Technical design is the most crucial factor in determining whether a website can be successfully archived. The majority of problems encountered in web archiving can be broadly divided into two groups. Capture problems, as the name suggests, relate to problems with the initial capture of a resource through the crawling process. By contrast, access problems relate to problems with accessing captured resources and emerge at the point when users wish to access resources in the web archive.
The most common technical problems experienced with web archives are:
- The original website's in-built search will not work as our web archiving technology cannot capture the underlying search engine required.
- Any content that can only be reached by logging in cannot be captured.
- Navigational features such as drop-down menus, tick boxes and interactive maps will not usually work. They are normally capture and access problems.
- Content hosted on websites external to the website in question and especially those outside the UK Central Government web estate are unlikely to be captured. This applies to all content unless the web archiving team is made aware of it before a crawl is made.
- Flash animations and games, streaming media and embedded social media are unlikely to work in the web archive. In many cases this is both a capture and an access problem.
- While we are able to capture files of any size, we are not, at present, able to provide access to files larger than 20MB, with the exception of PDF files.
Web continuity technical guidance
The National Archives is working with departments across government to reduce broken web links on central government websites. Our web continuity service involves conducting high quality captures of the websites and provides an innovative redirection solution to improve user experience.
There are web redirection software components that can be installed to make sure that links persist over time. The components run on Apache and Microsoft IIS web servers. They have been independently tested by a validation facility approved by AKAS (the UK Accreditation Scheme) under ISO/IEC 17025:2005.
The following guidance explains where website managers can find the software required and how it can be installed and configured:
- Government Web Archive: Redirection Technical Guidance for Government Departments (PDF, 0.42Mb)
- Apache Accreditation Documentation (PDF, 0.10Mb)
- Ionics ISAPI Accreditation Documentation (PDF, 0.10Mb)
The are other ways to redirect users to content in the UKGWA. The Government Digital Service outline their approach in this blog post.
Linking to the UK Government Web Archive
You can provide links to an index of all available snapshots of a resource, which can include the homepage of a website or to specific, dated crawls/snapshots. Let's take the Foreign and Commonwealth Office website (http://www.fco.gov.uk/) as an example:
This predictable URL with the * prefix shows all available dates for the URL:
And the following example shows a URL pattern that links to a specific crawl or snapshot of the URL, in this case from 12 January 2000, with the numbers in a YYYYMMDDHHMMSS datestamp format:
Complete crawls crawl deep into the website and normally pass through our quality assurance processes. Occasionally, partial crawls occur which are limited in depth and may have missing content. Therefore, it is important to link to complete crawls where possible. You can contact us to find out the dates of these complete crawls.
When linking to the UKGWA, please state that the site has been archived by The National Archives and is available through the UK Government Web Archive.
Departmental security policies may require that firewalls are configured to block access to archived websites including those hosted by the Internet Memory Foundation. This may prevent staff from accessing the collection.
As the UKGWA is hosted at http://webarchive.nationalarchives.gov.uk/, access to the collection through a firewall will minimally require that URL to be opened up.
Before the archiving of websites was contracted to the Internet Memory Foundation in 2005 some sites were archived using the tools of other organisations involved in web archiving. As a result, the UK Web Archive and Internet Archive host some sites in our collection which were archived before 2005. Access to these collections through a firewall will minimally require the following urls to be opened up:
- http://www.webarchive.org.uk/ for sites archived by the UK Web Archive and
- https://archive.org/ for sites archived by the Internet Archive
We undertake quality assurance testing on the vast majority of archived websites in the collection, though problems may develop over time. We welcome feedback from departments on any issues relating to particular sites. If you wish to notify us of any issues, or have any comments or queries about the web archive in general, please email us: firstname.lastname@example.org
This page contains PDF files. See plug-ins and file formats for help in accessing these file types.