We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

DATA MODELS - BIBFRAME as a data model for aggregating heterogeneous data in a search portal

00:00

Formal Metadata

Title
DATA MODELS - BIBFRAME as a data model for aggregating heterogeneous data in a search portal
Title of Series
Number of Parts
14
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The Lin|gu|is|tik portal is a research tool for the field of linguistics that has been developed at the University Library Frankfurt am Main since 2012. It provides an integrated search for discipline-specific scientific resources: printed as well as electronic publications, websites, and research data. In order to facilitate the inclusion of language resources that are part of the Linguistic Linked Open Data cloud, linked data technologies have increasingly been incorporated into the portal since 2015. As a major part of this effort the established thesaurus of the Bibliography of Linguistic Literature (BLL) comprising more than 9,000 subject terms has been re-modeled as an ontology (BLL Ontology) and made freely available online. Since our goal is to make full use of the opportunities linked data technologies offer, we decided to replace the underlying proprietary data scheme of the portal by a standardized data model that has been expressly designed for linked data applications. BIBFRAME has been selected to fulfill this role. In this presentation we will give an overview of the ongoing work in the current project phase (2020-2022) and discuss why we chose BIBFRAME for this use case, how we adapted it in terms of an application profile, how it fits into the overall LOD-centric architecture of the portal, and in what way it interacts with the BLL Ontology that is used throughout the portal as authority data. We will also show specific features that the chosen data model makes possible as well as give a brief technological overview.
Transcript: English(auto-generated)
going to talk first a bit about the work we do in our project regarding the Mantic Web and this frame. So first, I introduce a little bit our project. The project is called Bockenformatsunstienstlinguistic. We are based at the university Frankfurt am Main in Germany
and we are two project partners, which are two subunits of the university. For one part, the university library, which does the librarian part of the project. And for the other part, the computational linguistics working group located at the computer science department
headed by professor Dr. Christian Tchaikovsky. And he especially specializes in applying the Mantic Web and linked data technologies to the field of linguistics. And when he joined our project, that was basically also what prompted us to become interested in these topics, which led
me to present now here today. So next. Yes. So our project is a Bockenformatsunstienstlinguistic that is basically a funding program by the German Research Foundation. And we try to provide researchers in the field of linguistics with resources, tools and services for them to be
able to conduct their studies and their research. And several like this is basically the latest offspring of a long tradition of comparable funding. And yeah, we are now doing this project since 2017 and are currently in the second phase. And OK, so we offer a vast amount of
services. And the pivot point of all these services is our web portal, Linguistic.de, which also as the centerpiece has or includes a search portal for relevant resources for the
field of linguistics. And these are for the most part traditional library sources. Maybe a little peculiarity is that we have a large amount of articles there, like 62 percent from approximately, I think, two point seven million unique resources. And we have also
some additional things that are maybe a bit specific to the field of linguistics and which are also directly connected to us being interested in the data. And these are electronic language resources that we need, for example, electronic dictionaries and also
corpora in the like corpora in the sense of of this talk, we basically being prototypically some sort of text that is annotated in a specific way, for example, with speech text like verb or noun for being employed in linguistic research. There are also
other types of corpora like when they have audio or video or stuff like that. But this is the like the working definition for today, maybe. So what we wanted to do to get these resources into our into our portal is to to link them to the data cloud. So maybe most of you will
probably be familiar with similar Marvel graphics, so to say, which has different like which tries to map the link data cloud. And this is a specific version for linguistic link
data. So these are these models you see there. They are they are linguistic resources, electronic dictionaries or something like that, corpora. And we basically wanted to become, so to say, a model in this model back so that we can connect ourselves to other ones. And then
by that, get get additional resources, linguistic resources, such as the aforementioned corpora. And the way we wanted to do that was to employ research done by Professor Kotsan Tarkas, I mentioned him before, and he basically has developed some sort
of, I would call it meta ontology that maps different ontologies used in these resources onto a common ground. And we wanted to basically map something we already have, I will talk about that later, to that repository so that we can, via inferences, for example, with OLS or something like that, map stuff we have to stuff in the cloud and get
it on this way into our portal. This is basically the approach we tried to take at the Library of Frankfurt. Because of this long tradition of linguistic funding, we have also the bibliography of linguistic literature, which has been maintained for about
50 years. And it has about 500,000 bibliographic references and a dedicated thesaurus of subject terms specific to the field of linguistics. And these range from terms for specific languages or also terms of grammatical description, like verb, noun,
you name it. And what we wanted now to basically do is convert this existing thesaurus, which only existed as authority data in our library system, into cos. That was our first approach. So that we then could become basically this model here. And the idea was then to take
this thesaurus and link it to other existing vocabularies inside of this cloud. And one was the aforementioned Udyr ontology of linguistic annotations by our project partner, which is
basically concerned with grammatical terms, but also there are other ones. I think everyone in linguistics, for example, knows this, it's a very, very astonishing work by colleagues from the Max Bank Institute in Leipzig. And they have, for example, a vast amount of information about languages. And we want to basically to have some sort of linking from our subject
terms to terms by them so we can get resources on this route. So this is for you maybe just as an example how such a remodeled thesaurus item looks. It's a specific
language here I chose. And you can see it looks very like a basic thesaurus. You have like a product relation. You have some notation and you have different labels. And so next slide. Yes. So and here is the ontological remodeling, which was done by my colleague,
who was a linguist. And she basically looked at each of these subject terms to remodel them according to how actually it's used in linguistics and not just by librarians who assign subject headings. And it's also maybe important that, as in many sciences, I'm sure,
in linguistics, many terms are also disputed and there are different opinions about them. So it made sense to use autology and also class relations instead of just taxonomic relations, which can be modeled in a thesaurus. So we did all that and actually we even
got resources like on this route. So my colleague made the ontology. We put it into some software our colleagues wrote and we could find resources that could be integrated in our portal. But we also noticed that besides all this cool new data technology, it was not
really possible to completely leverage the technology we developed by making this thesaurus. And we noticed that our technology was not ready for that. So we decided in our current project phase to put the ontology first as the centerpiece of what I would call
technological backbone of the of the portal and completely change also the before the data model that was before more like a ad hoc invented scheme, basically like indexing fields
for Zola. It was not a laid out or unified data model or something like that. It was just a talk made up labels for key values, so to say. And we wanted to have a dedicated data model and we wanted to have everything in the portal should have persistent URIs and it should connect separate the data pools we had before
and also really use authority data so that, for example, we can also enrich these language terms we had before from other sources, such as GloToLook or WikiData or something like that to enable further search scenarios. So and then we sat down together and this is also where finally
and the different parts start, so to say, we sat together and looked at different data models and we together with our project partners came up with what we would need and what we
what we won't need. And we decided that we wanted to have a established or very defined data model with dedicated and coherent properties and classes that had been made to work together. And we have also some peculiarities that are maybe not so common with regular
applications. And I have also to say, when we finally started to become interested in BIBFRAME, I was trying to find out who else does that and who is actually who are the users or the community that uses BIBFRAME. And my impression was, maybe I'm wrong, but that still BIBFRAME
is mostly used by large university libraries and also many large national libraries in the US and in Europe. And I had a hard time to find like smaller community portals such as ours that have maybe a bit diverging interests or needs for this kind of data. For example, we are not
interested in any kind of cataloging, for example. We don't do it. We just take other people's data and aggregate it. And we are also not really using library holdings and other like features that BIBFRAME has, for example, and also which I feel many people talk about,
but it's good to be there, I guess. So then we sat together and tried to find out what actually would be needed for a data model to be covered from the requirements we have.
And we have different formats that go into our portal and regular library formats. Which I guess most of you know, PICA is what we use in our library a lot and also the German National Library uses it a lot. But then we have several bibliographical formats and also
custom stuff, which are like Microsoft Access databases or even HTML and all these resource types have to come together and to be used in one portal. And for us specifically, it's also
very, besides the regular catalog data, we also need to be able to represent like this corpora I mentioned before and electronic resources and stuff like that. And these were also to some extent the more tricky part to model with BIBFRAME and our approach. We then took that we
use our local library data, which was a vast amount of data, several hundred thousand records, which covered a spectrum of fields and use cases. And we implemented a prototypical
BIBFRAME mapping directly from this data, from our local data model to BIBFRAME. And I used or basically I tried to find out how I could do it and then I discussed it on a weekly basis with our librarian colleagues and we tried to make sense of it or something like that.
Because I have also to say, when we started out, I tried to find who else, for example, in our rather large library knows about BIBFRAME or has used it before and it was a bit hard for me to find anyone. So we tried to look at the documents and try to find out what other people did and draw experience or draw inspiration from that and specifically the things done by the
Library of Congress and also the German National Library and this Swedish National Library in Sweden, which was also a large inspiration for me because they had some stuff we needed and I didn't find anywhere else like modeling articles, for example. Yes, so I think my time
is now going to come closer to an end. So maybe here are some, here's a very narrowed down like skeletal representation. I'm not sure if you can really see it. I zoom a bit in. I hope that works. So here just maybe to show you a bit how a very like bare bones
and reduced version of a BIBFRAME work level looks in our portal. The interesting part, one interesting part is here maybe how we try to link to our local, to our ontology and maybe some other interesting thing is here that we decided to be compatible
to some extent with the original BIBFRAME but also make it possible to have some custom, for example, general terms. We made a little ontology to model these and
here is a second part of this one here. You can see this is an article that is part of a collection and like a collective work and here you can see how we modeled this thing. I see now that my time is coming to an end so I will go to the last slide.
So here are some tricky cases as I mentioned before. One was how to model model articles or also specific other information about articles and the other one was how to get
our classification in there and some notes on technology because I was asked by the reviewers to say something about that and we use in our library a lot Python and also for this stuff and our website we use Bechtel and Zengel and the search will be
used in Zola and as for the triple store we evaluated several ones but it's still a bit hard for me to find the one that I really like because some of the features that I would like or that need are only available in large commercial triple stores and that's always a
bit of a problem to use these when you want to maybe enable other people also to reuse your stuff. Thank you all for your time and if you are interested in further information you can find them here on the slide.