Understanding web legal deposit
The BnF is responsible for the legal deposit of the French Web. Its collection of archived sites, which is one of the oldest and richest in the world, is open to anyone wishing to carry out research.
Legal framework
Legal deposit, which was introduced in the 16th century, has the aim of preserving the memory of all French publishing production, whatever the target audience (scientific, artistic, leisure, etc.). It has adapted to changes in the media it uses, creating a unique and irreplaceable heritage collection.
Since the DADVSI law (law on copyright and related rights in the information society) passed on 1 August 2006 and its implementing decree passed in 2011, the BnF has been responsible for collecting, preserving, referencing and providing public access to websites in the French domain under the legal deposit scheme. Unlike traditional legal deposit for printed publications, web legal deposit does not require any active approach on the part of the website producer, as the data is collected automatically using a bot. This law, which transposed the European directive on copyright and related rights in the information society into French law, also enabled the adoption of a number of exceptions to copyright and related rights in French legislation.
In particular, it introduced into the French Heritage Code (Articles L.132-4, L.132-5 and L.132-6) an exception to intellectual property rights (copyright, related rights and the rights of database producers) in favour of organisations responsible for legal deposit. Organisations responsible for legal deposit may now legally, without having to request prior authorisation or pay any remuneration (French Heritage Code, Articles L.131-1 to L.133-1 and R.131-1 to R.133-1):
- reproduce works for the purposes of legal deposit on any medium and by any process: collection, preservation, consultation,
- make these works available for consultation by accredited researchers on individual consultation workstations.
How the web crawler works
Frequency and depth
Points of attention and limitations
How crawls are organised
- broad crawls: Carried out once a year, the aim of this type of crawl is to have a sample of as many sites as possible. The list of these sites is provided by partner registrars, such as the Association française pour le nommage de l’internet en coopération (Afnic) and OVH. Every year, the BnF strives to improve its web coverage: between 2007 and 2022, the number of domains collected rose from 0.9 million to 5.8 million (i.e. around 60% of the French Web).
- focused crawls: These crawls vary in frequency and depth and cover several tens of thousands of sites selected by librarians at the BnF and in Printer legal deposit libraries in the French regions, as well as by specialists and researchers.
Collection building policy
- Collect objects that are now natively digital: candidate manifestos, research blogs, performance programmes, etc.;
- Cover current research in a given discipline: academic sites, organisation of a disciplinary field, conferences and events, training bodies and programmes, etc.;
- Capture the appropriation of a field by various stakeholders, the diversity of actions and representations (academic web and also amateur history blogs, blogs by well-known writers and readers’ blogs, participatory science, resources on both art music and popular music);
- Document amateur and emerging practices (online writing, digital art), and everyday sites (video games);
- Record debates and discussion, and diversity of opinion;
- Document renewed forms of social commitment and activism with the arrival of the internet (online voting, digital public services, etc.).