We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Automating metadata extraction and cataloguing: experiences from the National Libraries of Norway and Finland

Formale Metadaten

Titel
Automating metadata extraction and cataloguing: experiences from the National Libraries of Norway and Finland
Serientitel
Anzahl der Teile
15
Autor
Mitwirkende
Lizenz
CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
The increasing volume of grey literature, such as reports produced by public sector organizations and academia, poses significant cataloguing, discoverability, and accessibility challenges in digital libraries. To help address these challenges, the National Library of Norway (NLN) and the National Library of Finland (NLF) have explored different strategies to automatically extract bibliographic metadata from PDF files. This presentation will first discuss METEOR, an open-source tool developed by the NLN that uses rule-based logic and keywords and is already integrated in the production workflow as a suggestion engine for librarians. Meanwhile, the NLF is exploring the potential of fine-tuned, locally hosted large language models for extracting bibliographic metadata. The strengths and weaknesses of both approaches are analyzed, as well as the common obstacles they face. This talk will also present our joint efforts to prepare high quality datasets for training and evaluation of metadata extraction systems along with newly developed metrics suited to the task. Finally, the discussion will focus on the integration of external catalogues and authority registries in these processes, enabling the use of persistent identifiers for entities in the metadata. Our presentation seeks to share practical solutions, promote methodology exchange, and inspire community collaboration.