Improving data quality at Europeana

CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/47570 (DOI)

Publisher

ZBW - Leibniz-Informationszentrum Wirtschaft

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Release Date

2016

Language

English

Content Metadata

Subject Area

Information Science

Genre

Conference/Talk

Abstract

Europeana aggregates metadata from a wide variety of institutions, a significant proportion of which is of inconsistent or low quality. This low-quality metadata acts as a limiting factor for functionality, affecting e.g. information retrieval and usability. Europeana is accordingly implementing a user- and functionality-based framework for assessing and improving metadata quality. Currently, the metadata is being validated (against the EDM XML schema) prior to being loaded into the Europeana database. However, some technical choices with regard to the expressions of rules impose limitations on the constraints that can be checked. Furthermore, Europeana and its partners sense that more than simple validation is needed. Finer-grained indicators for the 'fitness for use' of metadata would be useful for Europeana and its data providers to detect and solve potential shortcomings in the data. Beginning 2016, Europeana created a Data Quality Committee to work on data quality issues and to propose recommendations for its data providers, seeking to employ new technology and innovate metadata-related processes. This presentation will describe more specifically the activities of the Committee with respect to data quality checks: - Definition of new data quality requirements and measurements, such as metadata completeness measures; - Assessment of (new) technologies for data validation and quantification, such as SHACL for defining data patterns; - Recommendations to data providers, and integration of the results into the Europeana data aggregation workflow.