VIVO ETL using open source tools

Cite

Related Material

Technische Informationsbibliothek (TIB)

Conlon, Mike

Formal Metadata

Title

VIVO ETL using open source tools

Title of Series

12th International VIVO Conference, June 23 - 25, 2021

Number of Parts

Author

Conlon, Mike

Contributors

Technische Informationsbibliothek (TIB)

License

CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/51872 (DOI)

Publisher

Technische Informationsbibliothek (TIB)

Release Date

2021

Language

English

Content Metadata

Subject Area

Information Science

Genre

Conference/Talk

Abstract

Loading data to VIVO requires the creation of triples using the VIVO ontologies. Data may come from a variety of sources and in a variety of formats. vivo-etl (https://github.com/mconlon17/vivo-etl) is a simple open source command-line pipeline using available open source tools for extracting data from a source, transforming it to VIVO triples, and loading the triples to a VIVO TDB data store. The method extracts data from an API using wget, transforms CSV or JSON data to "raw" RDF and then transforms the "raw" RDF to VIVO RDF using a SPARQL CONSTRUCT query executed from the command line using robot, an open source tool (http://robot.obolibrary.org/). VIVO triples can then be loaded using tdbloader. The method can be used to transform data from any source (CERFIF, PubMed, Dimensions, local repositories) to the current VIVO ontologies, or to ontologies under development by the VIVO Ontology Interest Group. A demonstration gathering data from ROR (Research Organization Registry) and providing the data as VIVO triples is included in the presentation.