- UK Government Web Archive
Information for webmasters
UK Government website review programme
Government is reviewing the number of websites it has, to provide a more user-friendly experience for the public. This activity is being managed by the Government Digital Service (GDS).
Please don't close your site ahead of schedule, as this will prevent us from being able to archive it. Generally, we need at least eight weeks' notice for changes to the closure schedule because the web archiving process takes a full eight weeks to complete. Please do not make major changes to your site during the eight weeks before closure or convergence as the changes may not be archived.
UK Government Website Database
The UK Government Website Database was developed by The National Archives and GDS to track manage the web archiving and website review programmes. It is accessible to webmasters and records officers of central government departments and bodies. In order to obtain access to this, please email firstname.lastname@example.org.
Web archiving guidance
- Cabinet Office Web Standard TG105: Web archiving guidance, providing a step-by-step guide, and best practice advice
- Cabinet Office Web Standard TG122: sitemap guidance, including the use of sitemap generation software. Using XML sitemaps can assist search engine optimisation and the comprehensive capture of website content in the archive
- Cabinet Office Web Standard TG125: guidance about managing URLs/links persistence and using redirection technology.
Guidance relating to web continuity and redirection components can be found on the web continuity pages.
Contact details on archived websites
Website owners are reminded that users may attempt to use contact email addresses, mail to links and telephone numbers found in archived versions of sites. It is best practice to ensure that email addresses and telephone numbers used on websites are kept live for this reason. Site owners may find it easier to use generic (i.e. central or team) email addresses and telephone numbers.
How we archive your websitesThe National Archives captures websites using remote harvesting, performed under contract, by the Internet Memory Foundation (formerly known as the European Archive)
The crawler (archiving robot) identifies itself as being from the Internet Memory Foundation or European Archive. Please ensure that your sites are set up to allow access to our crawler. Sites are crawled (archived) at a very polite request rate. This means that our crawler should not cause you any difficulties when archiving sites. If you experience any problems with our crawler please contact us immediately.
We archive sites according to a regular schedule. Full details about our archiving schedules are available in the Government Website Database. Website owners are encouraged to contact us to ensure that content is archived before removing it from the live web.
Technical limitationsSome limitations of web crawling technology are outlined on our Information on Web Archiving page. Outside of these known limitations it should be possible for us archive sites which comply with the COI guidance mentioned above. However, unexpected technical difficulties can arise, so we recommend that website owners satisfy themselves that significant content has been successfully added to the UK Government Web Archive before removing it from the live web.
While we are able to add files of any size, we are not, at present, able to serve files greater than 20MB in size from the UK Government Web Archive. If your site contains any files greater than 20MB which you wish to remain accessible through the web archive we recommend that you split them into smaller files before the site is crawled. They can then be crawled as several smaller files which can be made accessible individually.
Linking to archived sites hosted in the UK Government Web Archive
A majority of the sites in our collection are hosted by the Internet Memory Foundation. You can provide links to an index of all available snapshots of a website, or to specific, dated snapshots of a website in the collection.
An example of how you should create an href for linking to an index of all available snapshots of a website:
This predictable URL will link to an index page showing all available snapshots of the Ministry of Defence website. The URL after the '/*/' can be altered to retrieve the indexes for other websites.
Please note that as the web archive is indexed at page and file level, additional snapshots can occur where the web crawler follows links outside of a given domain. The result is a partial crawl which is limited in depth and may have missing content, but is still a valuable part of the archive. Therefore, for information on which crawls in the index are complete crawls, please check the crawl schedule in the UK Government Website Database or email email@example.com.
To link to specific archived instances of a website, you can follow links from the index page, and obtain the URL of the specified snapshot. The first eight digits of the code provided the date the crawl of the site began in YYYYMMDD format. For example, the following link is to the crawl of www.number10.gov.uk on 5 October 2009: http://webarchive.nationalarchives.gov.uk/20091005102710/http://www.number10.gov.uk/
Please contact us if you wish to link to a specific archived instance of a website hosted in one of our smaller collections.
When linking to the UK Government Web Archive, please state that the site has been archived by The National Archives and is available through the UK Government Web Archive.
Departmental security policies may require that firewalls are configured to block access to archived websites including those hosted by the Internet Memory Foundation. This may prevent staff from accessing the collection.
The http://webarchive.nationalarchives.gov.uk provides access to The National Archives' web archive of UK government websites hosted by the Internet Memory Foundation. Access to the collection through a firewall will minimally require the following URL to be opened up:
Before the archiving of websites was contracted to the Internet Memory Foundation in 2005 some sites were archived using the tools of other organisations involved in web archiving.. As a result, the UK Web Archive and Internet Archive host some sites in our collection which were archived before 2005. Access to these collections through a firewall will minimally require the following URLs to be opened up:
- http://www.webarchive.org.uk/ for sites archived by the UK Web Archive and
- http://www.archive.org/ for sites archived by the Internet Archive