We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

DATA MODELS - RDA/RDF at the University of Washington Libraries

00:00

Formal Metadata

Title
DATA MODELS - RDA/RDF at the University of Washington Libraries
Title of Series
Number of Parts
14
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The Linked Data Team at the University of Washington Libraries describes their work using RDA/RDF. This work includes ongoing creation and use of RDA/RDF machine-readable application profiles compatible with the Sinopia Linked Data Editor, a mapping and conversion project between the RDA/RDF and BIBFRAME data models, and a recently-launched project to create a mapping from MARC21 to RDA/RDF with future large-scale metadata conversion in mind. The Linked Data Team at the University of Washington Libraries’ goal in doing this work is to demonstrate that RDA/RDF is a better tool for representing RDA bibliographic descriptions than BIBFRAME, and to lay the groundwork for expanding the use of the RDA/RDF ontology in the international GLAM community.
Computer animation
Computer animation
Transcript: English(auto-generated)
So, and this is not just one speaker, but no less than four speakers from the University of Washington Libraries. So, we have Theodore Jevantakos, Benjamin Moore-Riesenberg, Melissa Morgan, and Crystal Clements. You will be giving the talk in pieces. So, yes, the floor is yours.
Thank you, and we're going to talk about resource description and access bibliographic metadata or RDA expressed as RDF at the University of Washington Libraries. And our work and our focus today in the talk will be application profiles, mappings and alignments of data models and data transformations, because we think these are proficiency's essential
current metadata practice. The metadata in the 2020s, in the last 20-plus years, has gone through a lot of changes, but there's been some continuity. For example, we still want to increase interoperability, and we still want to adhere to standards. Our interoperability issue today that the main interoperability issue will address
is the merging of data that's modeled differently, and we would assert that this is not a trivial problem. So, for example, if institution one has RDA data, institution two has BIBFRAME data, institution three has Dublin Core data, how do we merge that into a useful graph? So, our proposed partial solution is to increase metadata professionals fluency,
to move across data models, and then to create tools and proficiencies that allow this, and in this case, profiles, mappings, and transformations. So, we want to adhere to standards today. The standards we'll talk about is RDA and BIBFRAME, mostly a little mark, three well-established standards. It could have been other standards,
but those are the main ones we'll talk about. The changes in 2020s metadata, though perhaps the most striking, is the shift to linked data. And fortunately, in the last few years, there's been an emergence of linked data infrastructure, including publishing platforms,
and one such publishing platform is Zenopia. We've heard a little bit about that this week. As you know, it's a linked data platform, and I think what's important here is that it allows us to enter data using unlimited data models and proves a good testing ground for this fluency across data models. So, we've completed one workflow where we input RDA, we output BIBFRAME outside Zenopia, but then inside Zenopia, we publish both side by side.
And then in progress right now is the second workflow where we input MARC data, and then outside Zenopia, we output RDA aligned with the IFLOW library reference model and also BIBFRAME, and then in Zenopia, we publish both side by side RDA and BIBFRAME.
So, we think this demonstrates a way to move across data models, a way to increase interoperability and a commitment to standards, demonstrates commitment to standards. And it's a labor-intensive approach. So, the alignment and mapping work especially requires significant intellectual effort. It would be great if we could increase the amount of
sharing we do of this work and also increase the amount of shared methods that we have, share our mappings across institutions more, and then have more efficient ways to find this work.
So, in summary, the shift of metadata practice in the 2020s includes metadata professionals' ability to move fluently across data models, to increase the community of mapping and alignment, to commit to application profiles of the standards, and then also have the ability to write transformation code. And so, with that, I'll let Benjamin talk about application profiles.
Thank you, Theo. Yes, I want to talk a bit about work at the University of Washington with RDA, RDF metadata application profiles, and I'll focus specifically on profiles for use with a particular tool that we're using, the Zenopia Linked Data Editor. I should mention that
Zenopia requires what is called a resource template for use to describe a typed RDF resource, and it's maybe not quite right to simply use resource template and metadata application profile as synonyms, but a resource template includes the kind of information that we expect
from a metadata application profile, and we're approaching the design of RDA, RDF templates for use in Zenopia in a kind of a comprehensive way, and making all those decisions, elements for use to describe a resource, constraints on values, even things along the lines of
additional guidance. So, our work on this area in the past really began in 2019 and 2020 when, along with other institutions, the University of Washington was piloting use and testing use of this linked data editor tool, and so it was a chance for us to sort of start from scratch
and build templates taking from the classes and properties at the RDA registry. I've included some resources about that work from past presentations and are actually templates
and data we've made available in just a human readable format, at least online. I should also mention that this did follow on work at the University of Washington a proof of concept project to create an RDA input form that output RDA, RDF, so it wasn't
a completely new effort at our institution. At present, we are revamping our workflow to design and load templates into Zenopia, and this will allow us to do a couple of things. One note is that the templates we were using in 2019 and 2020 did not reflect recent changes
to the RDA ontology, especially changes that were a part of the so-called 3R project that was officially released at the end of last year. So, at this point, building new templates will allow us to reflect changes to the ontology, and of course, having a new workflow
for building templates will allow us to take advantage of really a lot of excellent ongoing work and development and improvement of the Zenopia platform itself, functionality, changes in the UI, and things such as that. So, the future for us here, at least the immediate
future, is of course to create the new templates and implement them, and then to continue to create, to build on the RDA, RDF metadata description sets that we've already generated. We were lucky to have a number of UW catalogers participate in this project during 2019 and 2020, so we'll be
building on that store of RDA, RDF data that has been created, and I expect that for us, this will involve a lot of learning around implementing the RDA ontology, especially again, thinking about the changes that were involved in the 3R project and aligning RDA with the library
reference model, things like the introduction of new entities, new relationships, and attributes. So, next, to talk a little bit about what we've done with the data that was created in these projects and what we hope to do in the future, I'll turn things over to Melissa.
Thank you. So, today I'm going to be talking about the technical aspect of how we're converting our RDA and RDF data into bibframe using a combination of Python and RML. So, the conversion code we've created has two parts to it. The first part is a Python script
that takes in the RDA to bibframe mapping that we've created, which is in a spreadsheet. We have our mappings written in syntax that we call Kegel syntax. We'll see an example of what that looks like in the next slide, and this Python script takes that Kegel and converts it into RML. This is what I'll be talking about today. And then the second part is a separate Python script,
which takes our RDA RDF data and converts it into bibframe using the RML that we generated in this first step. So, here we can see an example of the mapping we wrote for the RDA property p1004, which is has category of work, as written in Kegel syntax. And at the bottom
here we have that same mapping written in turtle syntax, if that makes it clear what this Kegel mapping is saying. So, on this first line with our Kegel syntax, we have the bibframe property genre form. The bibframe namespace is assumed in our mapping, unless stated otherwise, with an asterisk to indicate that it takes an IRI as a value. The next line we have or,
indicating that there are multiple mapping options for this property, and on that final line we have our mapping for when the value of p1004 is a literal. So, that genre form property takes a blank node. That blank node is classed as a bibframe genre form, and it contains the RDFS label, which takes that literal value of the p1004 property. So, when our mapping
goes into our Python script, this is how the script interprets that Kegel that we saw on the previous slide. So, at the top we have our Kegel. Just below that we have a Python dictionary, and below that I put in a graphic that illustrates that dictionary. So, we generate
this dictionary for each RDA property we have mapped. We iterate through each dictionary, each list within that dictionary, and each item in that list, and then pass each item through a series of if-then statements in order to generate our RML. So, what is RML? RML stands for RDF mapping language. It is meant for converting non-RDF data into RDF data, which
isn't exactly what we're doing. We're converting RDF data into a different kind of RDF data, but it still works for us. So, the advantages of using RML are that it's relatively simple, it's repetitive, and RML maps themselves are RDF graphs. So, what this meant for us is that
instead of writing out conversion code for each of our thousands of mappings individually, we could write a Python script that will generate that code iteratively with some variable swapped out as needed. So, without getting too deep into the weeds of RML, I have a few slides here that'll just show you what I call the building blocks of RML and show you what that
ends up looking like in our Python script using some color coding just to help you visualize what exactly we're generating. So, here we have our building blocks on the left and what RML code actually looks like on the right. So, we start with the triples map, which is the container for the three essential pieces of writing mapping rules using RML. We have our logical source,
which outlines the location and basic structure of our source data. Here it's an RDF XML file. We have our subject map, which contains the rules for generating a subject for our triples. So, here it's the IRI for our RDA and RDF data. And then we have a predicate objects map, which contains the rules for generating a predicate and object for our triples. So, here our predicate
is bit-brain genre form and our object is the IRI value of the RDA property p1004. So, here we still have our RML on the right, but now on the left we have the section of our Python script that generates this RML. So, we're using the Python library RDFlib, which allows us to create an RDF graph and add all of our RML mapping rules as triples to that graph and output
it as turtle. So, here I open a graph, I set up a few blank nodes as variables, and then I add our triples to that graph. So, I start with the triples map, add to it a logical source map, a subject map, and a predicate object map. So, when we're iterating through our RDA to bit-brain mappings in Kegel, as we looked at earlier, this is the Python code on the left
that is executed based on our if-then statements. So, hopefully that gives you a basic picture of our process. By doing this, we can create detailed data using RDA, but then using this process we can output that data to bit-brain without data loss for interoperability with other institutions. So, now I will pass off to Crystal. I'm going to talk briefly about our
Mark 21 to RDA RDF mapping project, and I just included a link to the repository where you can find more detailed information and get involved. Some of you may wonder why we would take on this
laborious task. We're committed to doing our RDA cataloging using the RDA RDF ontology. We have application profiles, an editing environment, and original descriptive metadata in LD4's Synopia environment. What we're missing is our legacy Mark data expressed as RDA RDF.
Before we can create a conversion tool to achieve this goal, we need a detailed and authoritative mapping between the two standards. We're basing our mapping on the existing RDA to Mark 21 mapping published in many formats by the RDA Steering Committee, the internal CSV mapping used in RIMF 3, and the human readable mapping
in the original RDA toolkit. Each of these mappings is created with purposes slightly different from ours. Extensive human analysis is needed to combine them with the Mark 21 standard and the RDA RDF ontology to create the mapping we need to bring our Mark 21 data into the RDA RDF metadata ecosystem that we've built.
In addition to meeting notes and project management information, our GitHub repository holds working mapping documents representing the entire Mark 21 bibliographic standard, including actual mapping work beginning with the fields included in the PCC or Program for
Cooperative Cataloging's bid code standard record. We started that last month. Each CSV file pictured here represents a shared mapping document with the following columns. This mapping is necessarily complex and labor intensive. Our cataloging rules are complex,
and Mark is complex, and so is RDA RDF. We are seeking collaborators to help with this important work, so please get in touch if you're interested. Upon completion of this mapping, we hope to establish it as a common published mapping for all stakeholders and to launch
another project to build a conversion tool based on that mapping. Thank you. That is the end.