BnF and Artificial intelligence
What is the common point between: a query to find your little brother’s doppelganger on Gallica, a comparison between several handwritten music notations to identify the scribe who copied an ancient score, and the ability to anticipate how BnF collections should be processed to ensure their best conservation?
These three tasks can be supported by artificial intelligence (AI) technologies. The area of AI applications encompasses all the activities and services of the Library thus opening exciting perspectives and research approaches.
fields
AI projects can be organized in five main fields at the BnF. These fields are related to its missions of collecting, preserving, cataloguing and disseminating remarkable collections in their volume, variety, and historical scope:
- Support to cataloguing activities
- Collections management
- Searching, analysing content and improving access to content
- User engagement, putting content in perspective
- Decision-making and governance
Roadmap
Building a consistent, unifying, and accountable AI policy
AI-based processes, developments and projects require a consistent policy, able to engage an AI community within the Library and to addess ethical issues such as the evolution of relationships between human and machines. In order to meet these challenges, a roadmap has been designed for 2021-2026.
Five actions
Presented in December 2021 at the BnF, during the 3rd international conference about Artificial Intelligence in Libraries, Archives and Museums, the document sets out five actions:
- Making AI challenges and projects a part of the Institution’s global strategy
- Improving R&D organisation and implementation at the BnF
- Developing new skills
- Adapting infrastructure and data management
- Designing a multi-year programme with other stakeholders
Read AI Roadmap (visual summary)
Key projects and experiments
The multi-year programme described in the roadmap brings together six key projects bound to incorporate AI into the Library’s daily processes and services. Such integration requires a shift from experimentation towards industrialisation. This is what the project of image mining in Gallica is all about. In this project, technologies as IIIF and machine learning are used to localize and detect images in documents of any kind (books, newspapers, etc.), and to add tags or analyze visual content in order to facilitate digital collection exploration. Like image mining, all the BnF’s initiatives involving AI rely on existing tools and current projects at the BnF. For instance, they are related to the ongoing creation of a new cataloguing application (called NOEMI) to ensure bibliographic transition, or to the physical management of items and to the construction of a new conservation building in Amiens (Northern France).
The six main projects of the roadmap are the following:
- Image mining in Gallica (querying images in Gallica based on similarity and generated keywords)
- Hand-written text recognition (HTR) to be integrated into Gallica (such a technology applies to hand-written texts but also to ancient printed works, and to texts written in less spoken languages)
- Cataloguing (daily cataloguing support, automatic mechanisms expansion and improvement, implementation of LRM model…)
- Personalised content recommendation with an ethical perspective (that is to say respectful of diversity, data privacy…)
- Identifying autonomous documents in web archives (academic articles, official publications, etc. may be detected by AI within the huge web archives collection, and metadata can be extracted to create and enrich basic records in the Library’s catalogue…)
- Monitoring tools for the preservation and management of collections: this project is closely related to items management and to the future conservation site in Amiens. AI assists librarians in undertaking relevant treatments to better conserve damaged or brittle documents, and to preparing stacks and storage rooms design, etc.
Experiments are carried out continuously in relationship with these projects, but also via other opportunities (in particular in terms of research partnerships). From this point of view, the BnF DataLab proves to be an essential support.
Archiving contents related to AI ont the web
The BnF doesn’t only implement AI technologies to process and disseminate its collections: it also captures resources that deal with AI on the web, as part of its legal deposit mission. For example, during the summer of 2021, more than 700 websites or Twitter accounts selected by the staff were harvested by the Library’s crawlers, which amounts to more than 10 million URLs. Resources about ethical issues and socio-economic consequences of AI, as well as scientific and artistic uses of A, or literary experiments are included in this collection.
Contact : depot.legal.web@bnf.fr
Illustrations of this page: C.Ardenti and L.Giocanti
Resources
BnF Roadmap on AI, 2021-2026
- Roadmap: a visual summary (download the PDF)
AI at the BnF , in libraries, in cultural institutions
- BnF and AI: an itinerary through the fundamentals of AI
- “Futurs fantastiques” 2021: 3rd international conference about Artificial Intelligence in Libraries, Archives and Museums