In addition to our work to archive central government websites, we have undertaken to archive the websites of primary local authorities and National Health Service (NHS) websites through two separate large-scale crawls.

The initial crawl, performed between September and November 2011, captured nearly 60 million links across nearly 3,000 websites. There will be a second, similar crawl, between July and September 2012.

Aims of the project

The purpose of the project is to:

  • ensure that  transparency datasets released on these websites as part of the government’s Transparency and Open Data initiative, and linked to from data.gov.uk, will be archived and remain permanently accessible
  • continue to lead the archives sector by combating the problem of the potential loss of this information from the historical record, and providing perpetual access to it; and
  • support our local authority web archiving pilot project with the aim of raising awareness and the necessary skills so that participants can decide on a web archiving model that meets their needs

How we carried it out

While we have carried out the crawl for many of the same reasons as our main web archive, the methodology is different in the following ways:

  • we have performed very little quality assurance on the crawls, as the project was designed to be low-cost and largely automated
  • we opted for breadth of capture rather than depth of capture to gather as much data as possible
  • we did not engage in active dialogue with website owners

View the websites

You can access the results of the crawls by viewing the NHS A-Z list of archived websites and the local authority A-Z list of archived websites.

Please note there may be additional dates on the index pages for each website, due to occasional crawling of content linked to from central government websites.