Integrating Distributed Data Sources in VIVO via Lookup Services
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 15 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/47585 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
10
00:00
Lecture/Conference
01:37
Computer animation
03:18
Computer animation
04:12
Computer animation
05:03
Computer animation
06:48
Computer animation
07:10
Lecture/Conference
09:40
Computer animation
10:53
Computer animation
11:56
Computer animation
12:56
Computer animation
13:19
Computer animationProgram flowchart
14:13
Program flowchart
14:40
Program flowchart
15:37
Computer animation
16:00
Computer animation
16:29
Computer animation
16:48
Computer animation
17:12
Program flowchartComputer animation
18:44
Computer animation
20:07
Computer animation
20:52
Program flowchart
22:18
Lecture/Conference
22:56
Lecture/ConferenceMeeting/Interview
23:26
Lecture/Conference
Transcript: English(auto-generated)
00:01
Well, good afternoon, and welcome to the last talk. Thank you for still being here, for staying to the end of this conference. We appreciate this very much. I am Tatiana Walther, and this is Martin Barber. We are from the German National Library of Science and Technology, TIB.
00:22
And the topic we are going to address today is the integrating data from distributed sources via lookup services. And well, it's very similar to the previous topic, but our work focuses on Vivo and lookup services in Vivo.
00:44
So let's take a short look at the agenda of this talk. I begin with some key aspects regarding the topic. And then as our work concerns lookup services in Vivo, I give you the current state of the lookup services.
01:04
And I am going to introduce you our work with its motivation, goal, and approaches we have applied to implement some additional sources in Vivo. And Martin is going to continue with describing Cosmos, our technical implementations.
01:24
And this will be followed by some demos of the particular implemented sources. And with challenges in future plans, we are going to conclude our presentation. Well, while conceptualizing integrating data
01:43
from distributed services, some key aspects should be taken into account. For example, purposes or types of data or formats in which the data is available on the web. And also the feasible ways of data integration.
02:01
Well, for what purposes do we need data from the external sources? We reuse this data for subject cataloging, for annotating our data, for enriching our data with further information through linking to referred objects
02:20
in external authorities. When we are speaking about the data, we can distinguish between subjects, on the one hand, the subjects from desirous and controlled vocabularies, and things speaking in terms of semantic web, like persons, organizations, events, places, and so on.
02:42
And this data can be available in non-machine readable form, like in PDF or Excel or TXT, or in machine readable form, like RDF, scores, XML, and so on. And we can integrate this data by, for example,
03:05
importing the whole data dump into our system and using it, or we can provide direct access to external sources via lookup services. So you have heard a lot about lookup services.
03:24
In the previous talk, thank you, Huda and Lynette, for your inspiring talk. And our work concerns lookup services in Vivo, as I have already mentioned. I assume that a lot of you in this audience
03:41
might be familiar with Vivo already, because it has been the recurring topic at the SWIP for last years. But for those who doesn't know Vivo, just a brief description. It is a community supported open source software
04:00
for representing scholarly activities. It is based on linked data technologies, and it provides a number of external vocabulary sources. So this currently more or less available external vocabulary services in Vivo
04:21
are the agricultural vocabulary, the general multilingual environmental thesaurus, the Library of Congress subject headings, and the unified medical language system. All of these vocabulary services use in the background different APIs.
04:41
For example, Agrovoc is located on this Cosmos server, and Vivo communicates with this Cosmos server via the REST API. At the course of this presentation, I will show you how the lookup services in Vivo work.
05:04
Now to our motivation. There is a number of projects where we have detected a strong need for more external sources in Vivo. One of these projects I would like to mention is the use of the research core data set in Vivo.
05:23
This data set is German national standard. It is aimed to harmonize, to standardize reporting in German academic and research institutions. And this standard determines a set
05:40
of core elements for data assignment, and it recommends the subject classifications of the German Federal Office of Statistics for subject-specific data assignment. This classification is, so to say, a special case
06:02
which I'm going to show you later. And apart from the research core data set, it is an increasing need for a wider range of concepts
06:24
in Vivo for precise data annotating. One more strong motivation is to integrate the non-scorced data items, like I have already mentioned, like organizations, events, places, languages, and so on,
06:44
on demand, so via lookup services. Therefore, the goal of our work is to extend the scope of the external vocabularies and sources in Vivo. And these services are intended to be used by the TIB staff,
07:02
by the German users of the research core data set in Vivo, and also other interested parties. So now to the approaches we have applied. According to the different types and formats of data,
07:22
to the different types of the access to the data, we can basically distinguish between two approaches. The first one concerns this course-based vocabularies, and the second one, other data in RDF, which is data items which are not course concepts.
07:44
And the first approach can be, in turn, divided into vocabularies like the subject classification of the German Federal Office of Statistics, which was initially available in a non-machine-readable form.
08:02
And for this case, we have chosen SCOSMOS. Because we can publish it on, or locate it on, SCOSMOS server, and make it available for normal users for the non-developers and domain experts to browse.
08:24
And on the other hand, we can provide a connection between Vivo and SCOSMOS server. For that purpose, we must, we had to transform this classification into SCOSMOS concept scheme,
08:41
by means of SCOSify. Thank you, Osma, and your team for this great tool. It has worked immediately and very smooth. And the second case regards the subject authorities, which are already available in SCOS format on the web,
09:02
and provide their own public API, which we can use for the lookups in Vivo. And in this case, we have implemented the lookup service for the standard Tizarus for economics, hosted or provided by our host, the ZBW.
09:28
Well, and now I hand it over to Martin, who is going to tell about SCOSMOS and show you our technical implementations.
09:40
Yeah. First, a few words why we are using SCOSMOS. SCOSMOS is a web publishing tool for vocabulary libraries in the SCOS format. And it has been developed by the National Library of Finland. Another reason why we decided to use SCOSMOS is that it is open source, and it has a GitHub community
10:06
where we are able to report bugs or requests. And as already heard in the presentation before, SCOSMOS provides an REST API. Other usable features from SCOSMOS is,
10:23
as Tatiana already said, the user is able to browse an interface, use different ordering systems like an alphabetical index, so on. And yeah, it's quite easy to implement the SCOSMOS server using different languages.
10:45
In the next slides, I will show you as about a small demo of our SCOSMOS installation. Yeah, well, that's later. So the technical implementation of SCOSMOS is mainly done in PHP and JavaScript.
11:00
It also uses other open source libraries. And for example, some plug-ins are jQuery or Bootstrap and Composer. The requirements to run such a SCOSMOS server is a PHP-capable web server. And it makes sense to have at least one vocabulary in SCOS.
11:23
And you need at least one triple store. In our case, we use for the triple store the Apache Genia Fusiki server. And what is important is that the SCOSMOS server and the Apache Genia Fusiki server are matching both the same Genia models.
11:43
And as mentioned before, there's a REST API where you can get data from this SCOSMOS server. And the data is linked data accessible. Here, you can see our technical implementation of our Vivo installation and our SCOSMOS.
12:04
And the vocabulary, Tatiana already mentioned Fecchia Systematic. We started to include a lookup service in Vivo to the Fusiki server. And this lookup service is using the SPARQL endpoint
12:23
of the Fusiki server. After a bit of trouble having this SCOSMOS server and when this SCOSMOS server was running then, we implemented a lookup service using the REST API of this SCOSMOS server.
12:40
And then if you're requesting some information, this SCOSMOS server answers with the JSON response. The symbol on the right-hand side stands for the normal user interacting with our SCOSMOS server using in browser. Yes, as you can see here and also shown on other
13:02
in the presentation before, you can see here, we've got our query URL, our search term, bioinformatics, and our vocabulary Fecchia Systematic. And yeah, under that, you can see the JSON response.
13:22
Yes, next slide, it's a moving image or it's a GIF. You will see our SCOSMOS installation. I will say something about the features SCOSMOS is including already. On the left-hand side, you can see the alphabetical index
13:43
and the hierarchy. Right now, we are looking for bioinformatics, which is shown there. We also are able to see which are the broader concepts and narrow concepts of informatics.
14:05
And here, you can see the general information about our vocabulary data set, where it comes from, so on. So yeah, Tatiana will show you now how this lookup service is included in our vivo installation.
14:21
Thank you. Right now, you can see a screenshot of my test profile in a test vivo. And if I, for example, if I would like to add one more research area, I'll go to this icon and click Edit.
14:41
And then I am in the Manage Concepts menu, where I can add one more concept. And here, you can see in comparison to one of the first screenshots, where I have shown you some currently available lookup services in vivo. You can see that there are much more.
15:01
And I select here the subject classification, which I have already mentioned. And I type the search term bioinformatics and send the request. So here, we can see the response, which we get.
15:25
It appears to be the best match, because I think there is only one concept for bioinformatics in this classification. And then the concept is added to the list.
15:40
It gets its own profile page. And we can see that it has preserved the URI from the source. Normally, also broad and narrow concepts can be showed up. But this was not included in the query. So that's it.
16:05
So yes, here you can see the technical implementation of our vivo installation using an external source. And here, you can see how the vivo installation is using the Tizaros for economics from the Leibniz Institute
16:20
for economics. So he was sending the SPARQL request to the Fusiki server. And then the JSON response comes back. Yeah, here are some of the Java classes, which were already implemented in vivo. And what we have done, we refactored them a bit
16:42
and changed, of course, the target URL. Yes, as Tatiana showed before, you will see an animation, where this time you're using a different lookup service, a different search
17:01
term. And the new concept is added to vivo.
17:27
So challenges we are facing right now, there was something said about that as well in the presentation before. And that one of the challenges is to keep our data and our vivo installation up to date.
17:43
There are two different ideas we are looking forward to implement. One of the ideas is to have a cron job, which is executed nightly, weekly, or monthly. The problem with such a cron job is that the external sources could be stressed quite often.
18:04
And they also could go offline because of too much stressing. The preferred idea I'm looking forward to implement is to include an update function, which is executed if in a specific web page or vivo is loaded.
18:25
And so when somebody is interacting with an interface of vivo, our data from the external lookup services are always up to date. And we have an up-to-date vivo model.
18:45
Future plans we are planning to integrate is the integration of data in RDF from external authorities. So data entities like organization, events, places, and languages, as how Tatiana mentioned already,
19:04
and possible sources are WikiData, GND, or Open Persistent Institutional Identifier for Registry. We also heard in the presentation about that. Yeah. One important, another future plan
19:20
is to implement a standardized dynamic integration of external sources in vivo via admin side. At the moment, the already existing classes which are implemented in vivo were added over time from different committers. And they work similar but differently.
19:42
So the plan is to include a dynamic integration of an external source. So the plan is that the user, the administrator, is able to add a new lookup service into the vivo installation without restarting, rebuilding
20:01
the vivo installation, vivo software. So here we can see such an interface. The user is logged in as an administrator. He's able to see in the table down here which lookup service are already included
20:21
and how the status of such a lookup service is. He's also able to add a new source. And when he's adding a new source, he has to give some basic information, such as the targeted URL to the REST API or the sparkle endpoint and other things
20:43
like general information that the user is able to inform itself when he's using such a lookup service. Yes, now we are on the user side of vivo. As you can see, if I am as an administrator
21:02
added a new lookup service to the vivo installation, it should show up here. I'm able to select different kinds of formats if the REST API or the sparkle endpoints offer them. Yes, one of the things which should be discussed
21:25
is that if the user should be able to describe its own query because if he wants to be mean, he's able to describe or write down queries which are going to stress the external lookup service quite evil.
21:44
Another thing we are looking forward to integrate is this run query button to execute the query, but also that the user is able to see where the query is going to. I will see it down here. The idea behind that is that the user
22:02
should be able to figure out if it has an error in the query or if there's an error in general with the lookup service or the URL the lookup service is targeting. Yes, thank you very much.
22:21
If you have any questions or ideas, you're welcome. All right, thank you for that great presentation. We are two minutes over, but that's OK. Does anyone have a pressing question for these folks?
22:42
Any questions, comments or feedback? Other than it was fantastic? All right. Just a quick comment. I just want to say that's really good work and I want to see the mock up somewhere soon
23:00
that I can see them and that also contributing back to the main Vivo code on this would benefit everyone because the issues you've discussed about, especially the dynamic integration part, everybody wants a piece of that. So thank you. Thank you, Huda. I hope we keep in touch also because of the integration
23:21
of the data items, non-scos data. All right. Well, thank you very much. Thank you again. Thank you.