VIVO-DataConnect: Towards an Architectural Model for Interconnecting Heterogeneous Data Sources to Populate the VIVO Triplestore

Héon, Michel Dickner, Nicolas Jerabek, Alexander Belkouch, Rachid

Formal Metadata

Title

Title of Series

11th International VIVO Conference, June 16-18, 2020

Number of Parts

Author

License

CC Attribution - NonCommercial 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/48010 (DOI)

Publisher

Technische Informationsbibliothek (TIB)

Release Date

2020

Language

English

Content Metadata

Subject Area

Information Science

Genre

Conference/Talk

Abstract

In a large organization, corporate data is rarely stored in a single data source. Data is most often stored sparsely in distributed systems that communicate more or less well with each other. In this context, the integration of a new data source such as VIVO is sometimes perceived as a complexification of the infrastructure already in production, making it difficult or impossible to exchange data between the VIVO instance and the databases in use. Important and common obstacles to each new integration are encountered by organizations. A first problem is the conversion of data from a tabular format specific to relational databases to the RDF graph specific to the triplestore; and also, the updating (adding, modifying, deleting) of data through different data sources. In our work currently in progress, we plan to build a generalizable and adaptive solution to different organizational contexts. In this presentation we will present the architectural solution that we have designed and that we wish to implement in our institution. It is an architecture based on message processing of the data to be transferred. The architecture should make it possible to standardize the data transformation process and the synchronization of these data in the different databases. The target architecture considers the VIVO instance as a node in a network of data servers rather than considering a star architecture based on the principle that VIVO is the centre of data sources. In addition to presenting this distributed architecture based on Apache Kafka, the presentation will discuss the advantages and disadvantages of the solution.