LINKING AUTHORITIES - Engaging information professionals in the process of authorit. interlinking - TIB AV-Portal

LINKING AUTHORITIES - Engaging information professionals in the process of authorit. interlinking

00:00

0

Zugehöriges Material

ZBW - Leibniz-Informationszentrum Wirtschaft

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Formale Metadaten

Titel

LINKING AUTHORITIES - Engaging information professionals in the process of authorit. interlinking

Serientitel

SWIB18 - Semantic Web in Libraries

Anzahl der Teile

16

Autor

Mitwirkende

N. N. (Moderation)

Debruyne, Christophe

O'Sullivan, Declan

Lizenz

CC-Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.

Identifikatoren

10.5446/60338 (DOI)

Herausgeber

ZBW - Leibniz-Informationszentrum Wirtschaft

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Erscheinungsjahr

Sprache

Produktionsort

Bonn, Germany

Inhaltliche Metadaten

Fachgebiet

Genre

Abstract

Through the use of Linked Data (LD), Libraries, Archives and Museums (LAMs) have the potential to expose their collections to a larger audience and to allow for more efficient user searches. Despite this, relatively few LAMs have invested in LD projects and the majority of these display limited interlinking across datasets and institutions. A survey was conducted to understand Information Professionals' (IPs') position with regards to LD, with a particular focus on the interlinking problem. The survey was completed by 185 librarians, archivists, metadata cataloguers and researchers. Results indicated that, when interlinking, IPs find the process of ontology and property selection to be particularly challenging, and LD tooling to be technologically complex and unsuitable for their needs. Our research is focused on developing an authoritative interlinking framework for LAMs with a view to increasing IP engagement in the linking process. Our framework will provide a set of standards to facilitate IPs in the selection of link types, specifically when linking local resources to authorities. The framework will include guidelines for authority, ontology and property selection, and for adding provenance data. A user-interface will be developed which will direct IPs through the resource interlinking process as per our framework. Although there are existing tools in this domain, our framework differs in that it will be designed with the needs and expertise of IPs in mind. This will be achieved by involving IPs in the design and evaluation of the framework. A mock-up of the interface has already been tested and adjustments have been made based on results. We are currently working on developing a minimal viable product so as to allow for further testing of the framework. We will present our updated framework, interface, and proposed interlinking solutions.

SWIB18 - Semantic Web in Libraries10 / 16

1

24:58

AUTOMATING LOD - Transformations for aggregating Linked Open Data

2

1:15:29

KEYNOTE - The Semantic Web: vision, reality and revision

3

22:50

USER INTERFACES - Powering Linked Open Data applications with Fedora and Islandora CLAW

4

22:13

LINKING AUTHORITIES - Linking concepts schemes for better subject access

5

26:05

RESEARCH INFRASTRUCTURE - Linked Data technologies as backend infrastructure for scientific search portals

6

36:35

LIGHTNING TALKS - Linked data implementations — who, what, why?

7

49:00

KEYNOTE: Libraries and their communities: from town halls to mobile phones

8

18:35

LINKING AUTHORITIES - LCSH subject indexing with LOD in a Nigerian university library

9

26:23

RESEARCH INFRASTRUCTURE - Integrating library data in a semantic web research environment for university collections

10

20:37

LINKING AUTHORITIES - Engaging information professionals in the process of authorit. interlinking

11

22:40

RESEARCH INFRASTRUCTURE - Documenting & preserving programming languages & software in Wikidata

12

22:03

AUTOMATING LOD - Data.bnf.fr as a sandbox for FRBRization

13

27:24

USER INTERFACES - Connecting the dots of Linked Data of resource collections

14

24:52

USER INTERFACES - Capturing cataloger expectations in an RDF editor

15

23:58

AUTOMATING LOD - Automation and standardization of semantic video annotations for large-scale empirical film studies

16

25:09

AUTOMATING LOD - Annif: leveraging bibliogr. metadata for automated subject indexing & classification

Automatisches Abspielen

Sprache

Text

Bild

00:00

Computeranimation

03:20

Computeranimation

Transkript: Englisch(automatisch erzeugt)

00:15

I like how this is what we are all used to do,

00:20

establishing the linking in this case in our professional network by means of technology and it is sometimes it is hard and we just keep on doing that and trying. Yeah well, our next talk, our first talk now in this session is going to be held by Lucy McKenna,

00:44

who is working at the ADAPT Center in Trinity College Dublin and as opposed to Mia Rich's keynote where she was including external expertise by specialists.

01:02

Lucy is working on including or improving inclusion of internal specialists, so I'm looking forward to hear how that's going. Thank you, sorry.

01:22

So as mentioned, my name is Lucy and I'm part of the ADAPT Center in Trinity College Dublin and that's where I'm currently trying to complete my PhD. So at the beginning of my PhD, I started looking at current library linked data projects.

01:41

Obviously there's an increase in the uptake in numbers of libraries implementing linked data, but I found that these are mostly large institutions and organizations for the most part, possibly because they've got access to the technical and financial resources needed.

02:01

And also, a few of these implementations use an abundance of data sets. They're often single institution initiatives with limited interlinking across data sets and the interlinking is mostly done to large authorities and controlled vocabularies. So I decided to investigate this further,

02:25

looking at exploring information professionals' knowledge and use of linked data, exploring the challenges that they experience with linked data, particularly in that area of interlinking, why isn't there more interlinking with smaller institutions and why aren't smaller institutions producing linked data?

02:43

And also exploring how to potentially overcome these challenges. So in order to investigate this, I developed an online questionnaire and surveyed 185 information professionals

03:05

across Europe and the States for the most part. They were primarily information professionals from the library domain and the majority had a prior knowledge of the semantic web and linked data, although this wasn't a requirement for the survey.

03:22

So the key findings, the benefits were very similar to those already mentioned in the literature, improved data discoverability and accessibility, cross-institutional linking and integration provides additional context for data interpretation and also using linked data can enrich metadata and improve authority control.

03:42

However, the challenges we found were mainly in three different areas. These were resource issues such as the issues with dataset availability and quality, issues with provenance data availability and quality, a lack of guidelines and use cases that could help institutions start a linked data project

04:04

and figure out where to go, issues with funding and training and issues with URIs and maintaining URIs and creating them and there were also issues around linked data tooling. These included usability issues. So the librarians found the tools quite difficult

04:21

to learn how to use and to implement them to do what they needed to do with their data and they were unsuitable for the needs of libraries, archives and museums. The software was quite immature at times, technologically complex and again, difficult to learn. There was also issues with interlinking and integration.

04:42

This for the most part was due to difficulty selecting an appropriate ontology and difficulty selecting the appropriate link types or the predicate when interlinking two resources, data reconciliation and vocabulary mapping.

05:03

So the survey also tried to figure out what a potential solution to these challenges could be by asking the information professionals what they thought, how they perceived the usefulness of a linked data tool that was specifically designed for their needs and 89% of the participants

05:22

rated this as useful. They felt that this would reduce the technical knowledge gap and also encourage increased numbers of LAMs to use linked data. They specified a number of requirements for such a tool. That would be that it should be attuned and adaptable to LAM workflows,

05:42

that it should hide the more complex linked data technicalities, especially stuff that's not necessarily necessary and that it should be aware of common library data sources and also provide information on the data quality of the source that's being used.

06:01

So we looked at what measures the participants found most important for when they're assessing data quality. These ones listed here were the primary ones that the participants mentioned. So this was looking at data trustworthiness, the interoperability of the data

06:21

where their licensing information was provided, the completeness of the data set, understandability, provenance and the timeliness or currentness of the data. So with this, I decided to focus my research on how can information professionals be facilitated

06:42

to engage with the process of authoritative interlinking with greater efficacy, ease and efficiency. So what is authoritative interlinking? We know interlinking is the creation of a link between two linked data resources and what would make it authoritative would be that the interlink is known

07:00

to be reliable and trustworthy. So that would be providing provenance data about how the interlink was created, why it was created and information about the interlinking process itself. And of course, why involve information professionals in this process? Well, as we know, experts in metadata creation, knowledge discovery and authority control,

07:22

so we're well placed to play a role in the semantic web in terms of interlinking. So I looked at some of the current interlinking frameworks. The majority require technical knowledge of linked data and they primarily support Al's same as links. And then two tools I found, RDF of Fine and Marimba,

07:44

these were two particular tools that were aimed at the library domain and they provided access to large scale data sets. So looking at what's already out there, I came up with a few other additional requirements. So this would be to provide the information professionals with the ability to create more than just same as links

08:04

to describe a relationship between two resources to be able to interlink with data sets emerging from smaller authoritative institutions. So not just these large scale data sets and also remove the need for this expert technical or linked data knowledge.

08:21

So the aims of my research were to develop an authoritative interlinking framework specifically designed with the workflows and expertise of information professionals in mind and then to develop a provenance model that expresses the required provenance of interlink created by information professionals and then to design an interlinking interface

08:40

for information professionals that would guide the users through the interlinking process including ontology and link type selection and provenance data generation. So I applied a design science method approach to creating this framework and that basically involves iterative processes

09:01

of design and testing and I involved information professionals at all phases. So the design and the testing phase and currently I'm operating in phase two. So still developing the prototype and evaluating it.

09:24

So this diagram shows the framework that I developed. It's called NOSC and that stands for Novel Authoritative Interlinking of Scheme and Concepts and it's also the Gaelic word for links. So and basically where NOSC fits in

09:43

in a linked data application is part of the identity resolution model. So here, so this is the interlinking phase and the providing interlinking abilities and interlinking provenance. So the NOSC framework basically it has four primary steps

10:03

that are repeated. So this step involves resource selection, so searching an internal data set and external data set for two resources that you want to interlink, validating the URIs of the resources that you've selected

10:20

and then moving on to the interlinking process that's determining the relationship of the resources. So we hope to provide, to make this process somewhat semi-automatic depending on the kinds of information that a user would enter with regards to the resources that they've chosen. So maybe, so again, providing options for the user

10:45

to select that would represent the relationship. So the tool again would recommend a predicate based on the information and the user could then select the most appropriate ontology and predicate from the recommendations and then the user would be able

11:01

to enter the provenance information for the interlink that they have created and then we'd be able to publish the interlink RDF and publish the provenance RDF and also generate graphs for these. And again, this would be done on a repetitive cycle as the user modifies or adds to their interlinks.

11:23

So just discussing the interface. So the interface allows for a user to search an internal RDF dataset using a semantic faceted search tool, a SPARQL endpoint or a web resource and they can then enter and validate the resource URI.

11:44

Then to select a related resource, there are a number of authorities and thesaurian controlled vocabularies that the user can choose from and they can also add other linked datasets from other institutions that they can then ideally in the future

12:02

be able to assess the data quality of these datasets that they've added so they can rate its trustworthiness, interoperability, licensing, et cetera like the criteria mentioned before and then be able to make an informed decision as to whether to interlink with that resource or not

12:20

or whether another resource is preferred. And again, able to enter the URI and validate the URI for the resource that they want to interlink with, so this is the external resource. And then the interlinking phase, the user will then select a predicate that describes the relationship between the resources.

12:43

So in future iterations, we hope to make this a little bit more automatic whereby the information about the descriptions of the resources that the user wants to interlink, they'll be used to inform a recommender system that'll recommend a potential link and that'll try and find what that relationship might be

13:03

for users who are less familiar with ontologies and predicates. So after selecting the interlink, the next phase is to then enter the provenance for that interlink.

13:21

So we developed a provenance model based on the requirements that we distilled from the survey and also some provenance competency questions. Oh, sorry. So these included, so just figuring out the who, what, why, and when of an interlink.

13:40

We used three graphs, an interlink graph, which is a named graph that contains a set of interlinks, a provenance graph, which is a prov bundle containing a set of provenance descriptions for a set of interlinks, and a relationship graph, which is the graph that represents the relationship between an interlink graph and a provenance graph.

14:01

And as a prov bundle is an entity, we can also describe the provenance of the interlink being described. So this looks like so. So we have the interlink graph that will contain the triples and the interlinks created. The provenance graph below would contain the provenance information

14:20

about how an interlink was generated and why, and then the relationship graph would show the relationship between a set of interlinks and a provenance graph. And also as these graphs are updated, you can create more relations between the different graphs, so showing which graph is a modification

14:40

of another graph, et cetera, so providing a sort of history of the interlinks. So we use the prov ontology as the basis for our provenance model, so we use the prov ontology to describe the who, where, and when the interlinks were created, modified, or deleted, and then we extended the ontology to not prov

15:03

to describe the what, how, and why interlinks were created so we added interlink-specific subclasses and properties, and we also used the void ontology to describe the data sets, and we also used Dublin Core and Fove to further describe any entities in the provenance bundle.

15:23

So this just shows how we extended the prov ontology, so we added a not prov interlink entity, which is a subclass of entity, and we also added an interlink creation activity and an interlink modification activity, which is a subclass of activity, and we also added a property that would allow the user

15:42

to add a justification for why a link was created. So following this, we used an R2RML mapping to uplift any data entered into the relational database to RDF, and we were able to produce an RDF graph

16:02

for the interlinks created by the user. RDF can be published or viewed via a graph depending on what the user finds easiest to understand. Yeah, and that's just an example

16:20

of the mapping that we used. So our future directions, as yet, we haven't tested our framework. As mentioned, we developed it with information professionals, so the next phase is to test it through a phase of usability testing and then to modify the framework based on the feedback,

16:40

and then, as mentioned, to add the dataset quality criteria and predicate recommender elements of the framework, which would lead to further testing. Yeah, so that's everything. Thank you.

17:04

Thank you very much, Lucy. That was quite interesting, I thought. Are there questions for Lucy?

17:23

I have a technical question. You linked two graphs together. The interlink graph and the provenance graph. How is it done technically? Are they two named graphs that have some relationship, or do you annotate the triples in one graph?

17:41

Yeah, so the interlink graph is a named graph, and the provenance graph is a provenance bundle, which is very similar to a named graph. It acts the same way as a named graph does, so then we can create a relationship between both these graphs, and then if a graph is edited, you create a new provenance bundle that describes the edits that were made in a graph,

18:02

and again, you can link those graphs together. Are there more questions? Thank you for the great talk. I'm really interested in the extent of the predicates

18:21

that you might have in the recommender, in training you might have on getting information professionals to use the right ones. So we hope to provide some of the most common ontologies that people use, and then being able to have a way for the user to express what kind of a relationship

18:43

that they want to state that the two resources have, and then to provide a recommendation as to say, you can use these three types of predicates, and perhaps then providing information on which ontologies are most frequently used, or which ontology is already used in the data set, so why you might choose one over the other,

19:01

so a few different steps to what that might entail. More questions? I would be interesting to learn what can be recorded

19:21

in one of the extended provenance elements that you mentioned about the why a link was established. What could that be, and how arose the need? So that would be expressing,

19:40

maybe it could express a number of things, so why you chose to interlink with a particular data set, or why you chose a particular predicate to express a relationship between a data set, so it can be, it's somewhat open to interpretation to what the user might want to express, or also indicating why one resource is related to another,

20:02

so is it part of, is it the same person, so being able to provide a little bit more richer information as to why that interlink was created, maybe for the future users who aren't as familiar with RDF, so it's more of a human-friendly explanation. Sounds very sensible to me.

20:21

Well, another question? Then thanks again, thank you Lucy. Thank you. Thank you.