VIVO-DataConnect: Towards an Architectural Model for Interconnecting Heterogeneous Data Sources to Populate the VIVO Triplestore

Héon, Michel Dickner, Nicolas Jerabek, Alexander Belkouch, Rachid

Formale Metadaten

Titel

Serientitel

11th International VIVO Conference, June 16-18, 2020

Anzahl der Teile

Autor

Lizenz

CC-Namensnennung - keine kommerzielle Nutzung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/48010 (DOI)

Herausgeber

Technische Informationsbibliothek (TIB)

Erscheinungsjahr

2020

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Information und Dokumentation

Genre

Konferenz/Talk

Abstract

In a large organization, corporate data is rarely stored in a single data source. Data is most often stored sparsely in distributed systems that communicate more or less well with each other. In this context, the integration of a new data source such as VIVO is sometimes perceived as a complexification of the infrastructure already in production, making it difficult or impossible to exchange data between the VIVO instance and the databases in use. Important and common obstacles to each new integration are encountered by organizations. A first problem is the conversion of data from a tabular format specific to relational databases to the RDF graph specific to the triplestore; and also, the updating (adding, modifying, deleting) of data through different data sources. In our work currently in progress, we plan to build a generalizable and adaptive solution to different organizational contexts. In this presentation we will present the architectural solution that we have designed and that we wish to implement in our institution. It is an architecture based on message processing of the data to be transferred. The architecture should make it possible to standardize the data transformation process and the synchronization of these data in the different databases. The target architecture considers the VIVO instance as a node in a network of data servers rather than considering a star architecture based on the principle that VIVO is the centre of data sources. In addition to presenting this distributed architecture based on Apache Kafka, the presentation will discuss the advantages and disadvantages of the solution.