Integrating Distributed Data Sources in VIVO via Lookup Services

Integrating Distributed Data Sources in VIVO via Lookup Services
Recording information about countries, conferences, organizations and concepts in a Linked Data application like VIVO means at the first stage an initial import of a large number of data items, which beforehand must be transformed into RDF and manually enriched with persistent identifiers, geographic position, short description, and multilingual labels. Collecting, enriching and converting such an amount of information cost considerable temporal and administrative efforts. Storage of the amount of data can slow down the performance, responsiveness and reasoning processes of an application. Lookup services, already developed for VIVO, DSpace-CRIS, Linked Data for Libraries (LD4L) and other projects are aimed to facilitate the integration of external authority data. Whereas some vocabularies and data sources like EuroVoc and Wikidata offer a SPARQL endpoint, other authority data sources such as the Integrated Authority File of the German National Library (GND) provide only data dumps. Our objective is to enable a combined access to external sources via a single interface, using Named Entity Recognition tools, APIs and SKOSMOS in the background. Beside concepts we would also provide integration of such data items as events, organizations and languages, supplemented with additional information, which requires mappings between source and target systems in order to insert and display attributes and relations of the selected entities. Furthermore we investigate the automated transferring of the changes made in external vocabularies to the data in the target system. This presentation outlines our achievements and lessons learned concerning the integration of semantically structured and enriched data from distributed sources via lookup services, similar to the external vocabulary services in VIVO and related projects.