VIVO-DataConnect: Towards an Architectural Model for Interconnecting Heterogeneous Data Sources to Populate the VIVO Triplestore
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 22 | |
Author | ||
License | CC Attribution - NonCommercial 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/48010 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
20
00:00
Computer animation
00:23
Computer animation
01:13
Computer animation
01:59
Computer animation
02:59
Computer animation
03:23
Computer animation
04:28
Computer animation
05:34
Computer animation
07:24
Computer animation
09:14
Computer animation
11:22
Computer animation
Transcript: English(auto-generated)
00:05
And welcome everyone to the presentation on the Vivo interconnection architecture with other data sources in an organization.
00:24
The presentation objective is to present the problem of data exchange architecture with Vivo. Discuss an enterprise vision decentralized architecture of the data where Vivo integrates into an existing data ecosystem. Describe the architectural solution and its benefit for data exchange based on the Apache Kafka messaging framework.
00:49
Complete the architectural description with presentation of the scenario illustrating the data exchange between Vivo and its original context of data sources with the use of Kafka.
01:01
Through this presentation we hope to generate a debate around the architectural proposal that introduces new concepts in the way of ingesting data in Vivo. When the choice is made to integrate Vivo within an institution, the question that quickly
01:21
arises is how can data be exchanged between Vivo and data sources within the organization. A series of questions that emerge in the architect's mind is give, for example, how can I ensure the integrity and data security of data sources? How to allow Vivo to be a data source for other database?
01:42
Is there a solution that is extensible, scalable, modular, and reusable? How to translate data from a rational database into an RAF graph? Can I synchronize data in real time with other data sources?
02:01
Vivo's current feeding process consists of harvesting data from various external data sources with a basic relational database that becomes a graph, CSV file, or the triple store, or whatever data source. Its architecture puts Vivo in the center of the data transfer mechanism. In an enterprise architecture, the emphasis on the exchange mechanisms between data source, enterprise
02:28
architecture aims to make data source equivalent to each other and their exchange mechanisms. Each has the possible to being harvest or harvesting in an equivalent way. There are therefore a kind of interoperability of connections between data source and is independent of its architecture.
02:47
Introducing a messaging framework make it easier to achieve an enterprise architecture vision. Although the integration of a real-time data stream synchronization mechanism.
03:01
Kafka is a real-time data streaming system. It was invited by LinkedIn and is now part of the open source Apache Foundation. Kafka leverages the architecture principle of microservices development which is highly appreciated in agile development methodologies.
03:25
Apache Kafka is a published subscribe messaging system. The message is a Kafka atomic data unit. It does not, by definition, have a predefined data structure. It is a set of bits that could be a JSON-LD ontology.
03:40
The message is transmitted by a producer and received by a consumer. The topic is the channel to which the message being communicated. A note about the message structure. The use of JSON-LD elevates enterprise architecture to an ontological data exchange that would notably allow the design of intelligence producer and receivers able to adapt its behavior to the semantic of the exchange data.
04:07
The topic is like a table in a relational database. It is a reference by which data is stored. Each topic can divide into partition to distribute the message communication.
04:23
Partition is a mechanism by which Kafka ensures its scalability. An important challenge of data exchange between heterogeneous architecture data source is the transition from the entity relationship type representational mode to a subjects predicate object type representational modes.
04:44
Several questions arise. How to extract the semantic of the rational data? How to process the content of a junction table? How to identify what is class, a property, a resource? How to ensure a good mapping between all of these elements?
05:02
What are the transition moves from translating the attribute value to its representation by an IRI? All questions that must be answered in order to carry out the adequate data mapping from one system to another. There are several solutions proposed by the W3C.
05:21
I would like to hold your attention to the concept of StarDock Virtual Graph, which is, in my opinion, the most advanced solution in terms of relational data to graph mapping. This diagram presents a hypothetical scenario of implementing communication
05:41
mechanisms between several heterogeneous data sources using the Kafka framework. In the scenario, three data sources are linked. Avivu RDF triple store, just right here. Orchid, just right here. And relational database of professor.ucam.
06:01
For each data source, Avivu researchers profile topic is associated to which the other data source can subscribe through the publish subscribe mechanism. Each data source has its own producer and consumer. You can see just right here and right here. Right here for our kid and right here for the prof.ucam.
06:24
Let's now analyze the mechanism that we will implement in the case where a professor wishes to update his or her research profile via a professor.ucam. So the professor is just here and wants to make an update here.
06:40
First, the update's operation is serialized in a message by a producer. Just right here. Second, the producer sends the message to researcher profile topic database. Just right here. This is the topic. Third, each consumer who subscribes to the topic will receive the message to be processed.
07:02
So the message is received from this consumer here and this consumer here. The receiver message is serialized and stored in the host data source in a native format. At the end of the process, all data sources have the synchronized image of the update performed by the professor.
07:27
Here is a second example of scenario where we want to standardize with the CRDC the expertise of a professor in Orchid. The scenario starts with the Avivu Orchid application.
07:42
Receive standardization message. Orchid sends the expertise translation message to the CRDC via Kafka topic of the Avivu CRDC vocabulary. So you have the full chain of Kafka here, producer, topic, consumer, and the applications receive the message.
08:01
The Avivu CRDC vocabulary application translates the expertise contained in the message into the CRDC standardized expertise format using semantic AI to make the matching. Then it transit an expertise conversion message to the Orchid CRDC expertise consumer via the Avivu CRDC vocabulary Kafka topics.
08:25
So when it makes the matching, it sends the message to the producer. The producer takes the message, serializes it to a message and sends it to the topics and the consumer receives the message.
08:40
When the consumer receives the message, he transits it to the application that updates the Orchid database via the REST API. So the application is making this update just right here. Other expertise consumers who subscribe to the CRDC topic are also updated with this new standardization. So if you have another consumer here connected to another database, it will have the message to update his database.
09:15
Vivodata Connect here is a summary of the main feature of the Vivodata Connect.
09:22
The first one, the modularity, publish, subscribe, enables secure control of message flow between data sources. Interoperability, serialization and deserialization of the producer consumer ensure data transport regardless of the nature of the data annotation used by the data source.
09:40
Reusability, the producer consumer implementation can be reused for other applications. Security and data protection, the subscribe to a topic the consumer and or producer must have the necessary authorities. Scalability, the architecture facility is the integration, addition and interconnection of a
10:03
new data source while provide loading balancing capabilities with the physical architecture. Innovation, the use of anthology-based message, although the design of semantic AI services is certainly an innovation to be highlight. In conclusion, the transition of the decentralized to distributed architecture is
10:29
facilitated by the integration of a messaging framework such as Kafka. The anthology-based message structure broadens the scope of use of semantic technology beyond the Vivo triple store.
10:44
In particular, it introduced the ability to include semantic AI in producer and consumers. Public subscribe mechanism and consumer-producer architecture provides a deployment and component architecture that is secure, evaluative, extensible, scalable, modular, interoperable and reusable.
11:05
Finally, Vivo Data Connect does not aim to replace Vivo's current population mechanisms. Rather, the vision is to enrich the mechanism with an add-on that elevates Vivo as an enterprise-level data source. Thank you for your attention.