We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Linked Data for Production

00:00

Formal Metadata

Title
Linked Data for Production
Title of Series
Number of Parts
16
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The Mellon Foundation recently approved a grant to Stanford University for a project called Linked Data for Production (LD4P). LD4P is a collaboration between six institutions (Columbia, Cornell, Harvard, Library of Congress, Princeton, and Stanford University) to begin the transition of technical services production workflows to ones based in Linked Open Data (LOD). This first phase of the transition focuses on the development of the ability to produce metadata as LOD communally, the enhancement of the BIBFRAME ontology to encompass multiple resource formats, and the engagement of the broader academic library community to ensure a sustainable and extensible environment. As its name implies, LD4P is focused on the immediate needs of metadata production such as ontology coverage and workflow transition. In parallel, Cornell also has been awarded a grant from the Mellon Foundation for Linked Data for Libraries-Labs (LD4L-Labs). LD4L-Labs will in turn focus on solutions that can be implemented in production at research libraries within the next three to five years. Their efforts will focus on the enhancement of linked data creation and editing tools, exploration of linked data relationships and analysis of the graph to directly improve discovery, BIBFRAME ontology development and piloting efforts in URI persistence, and metadata conversion tool development needed by LD4P and the broader library community. The presentation will focus on a brief description of the projects, how they interrelate, and what has been accomplished to date. Special emphasis will be given to extensibility and interactions with the broader LOD community.
Lecture/Conference
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Lecture/Conference
Transcript: English(auto-generated)
So our next, oh, you're right here, please come on up. So our next presenter is Philip Schur, how do you say this, from Stanford University who's going to talk to us about the Linked Data for Production or perhaps more colloquial known as LD4P project.
So thank you. Thank you very much. The title of my talk, Linked Data for Production, a multi-institution approach to change technical services transformation. So LD4P is a two-year project newly funded by the Mellon Foundation to in a very practical way change our basic technical services production such as cataloging and acquisitions to ones
that are rooted in linked data. I think the keys here are practical and production. I think we've reached the point where we feel that we are ready to make the shift at the very core of what is done in technical services.
So Linked Data for Production or LD4P is a collaboration between six institutions, Columbia, Cornell, Harvard, the Library of Congress, Princeton, and Stanford to begin this transition of technical services production workflows to ones that are rooted in linked data.
So the first phase of the transition has three broad areas of development. The first is the ability to produce metadata as linked data communally. The second is the enhancement of the BIBFRAME ontology to encompass the multiple formats that all libraries must process. And the third is the engagement of the broader academic library community to ensure
that whatever we do is both sustainable and extensible. In parallel to LD4P's application to the Mellon Foundation for funding, the original Linked Data for Libraries team also applied to Mellon for the next stage of their development. This grant was also approved and has the name of Linked Data for Libraries Labs.
The focus, as the focus of LD4P is on the adaption of existing tools to immediate production needs, LD4L Labs will focus on solutions that can be implemented in production at research libraries within the next three to five years. So their efforts are focused on the enhancement of linked data creation and editing tools,
the exploration of linked data to directly improve discovery, BIBFRAME development and the piloting efforts in URI persistence, and metadata conversion tool development needed by LD4P and the broader library community.
The LD4P team sees five major outcomes of this project. The first will be the development of the ability for libraries to work in an open network environment in the construction of their metadata, allowing for more immediate and simple exchange of that data.
A second major outcome will be the extension of the BIBFRAME ontology to include domains such as perform music and geospatial data sets. A third major outcome will be the multiple open source tool development for use in metadata creation and transformation in a linked open data environment.
Fourth, a key part of the LD4P program will be the engagement of LD4P with other strategic linked data projects elsewhere in the United States and the world. And last, LD4P will proactively reach out to the traditional cataloging community to help develop a cohesive library perspective on this transition to linked data.
So these are the benefits we hope for LD4P as a whole. Now I'd like to focus on what we've been doing to date and also on those individual institutional projects. So first, BIBFRAME 2.0 evaluation. Now much of our initial effort has gone into the evaluation and change recommendations
for BIBFRAME 2.0. After the original linked data for libraries grant, the Library of Congress commissioned a study of BIBFRAME 1.0 by Rob Sanderson and implemented many of the change recommendations to form BIBFRAME 2.0. A group of ontologists from our six institutions have had weekly calls with the Library of
Congress on suggested changes to that revised framework. And these recommendations will be coming out shortly as open documents for community discussion. A second focus has been on the development of what we call the target ontology. So the target ontology is the core of what we hope BIBFRAME will eventually evolve
into and will be the core that our tooling will be developed around. So now a bit about that tooling. The initial tooling efforts for LD4L Labs is centered around the creation of a marked BIBFRAME converter, including some entity reconciliation as part of the conversion
process. They are also working on the vitro vivo stack to create an editor for the native creation of RDF. And then general tool evaluation will also be part of the LD4P project.
Together the partners will be evaluating tools such as LC Suite of BIBFRAME tools, Aliata, which has already been mentioned. We're also looking at Karma. We'll be using Fedora 4, the RML editor.
And also there is another project being developed at Stanford called Cedar, which is basically the development of a very extensible RDF editor, which should be flexible enough to create just about any type of data that we would like. So I'd like to move on now quickly to what the individual institutions will be working on.
Columbia is looking at the intersection between the museum and the library communities. This sub-project will focus on testing BIBFRAME suitability for the description of art objects, both two-dimensional and three-dimensional. So up to date, they have mapped the metadata currently being used to record art objects
in Columbia's art properties database to BIBFRAME 2.0 as an experiment to see if BIBFRAME 2.0 could handle art objects in its current state without extension. I think the upshot of their work so far is that much of the data they require gets ended up putting in generic notes, which is not very helpful.
The group has also made an alignment between the Visual Resources Association's core RDF and BIBFRAME 2.0, and they are beginning an analysis of CDOC, CRM, and Furberu data models, and then also relate them to BIBFRAME 2.0.
And then finally, they have formed a group to create an art object ontology extension for BIBFRAME 2.0 in association with the art community. Cornell will be doing two projects for LD4P. First is the community development of a library ontology extension for the rare
materials community, focusing on the instance and the item level, because data such as provenance and binding information are currently not well-defined in library ontologies. A second is the original metadata creation for non-commercial LPs from their hip hop collection. Now, the rare materials group has focused on developing use cases, categorizing them,
and they have mapped them to entities in order to develop their framework. Their goal is to give useful feedback to the bibliographic standards committee of the rare books and manuscripts sections of ALA as they continue with their ontology development. For the part of the project dealing with the hip hop collection, some of the LPs will be processed
with traditional MARC cataloging, and some with BIBFRAME 2.0 to let the group analyze which is a better data model for representing their data. This Harvard, so this sub-project of LD4P explores best practices for creating native linked data
descriptions for library cartographic resources, including printed maps, atlases, and geospatial data sets. The project will evaluate BIBFRAME's effectiveness in describing cartographic resources for research needs, and then will compare them to BIBFRAME's effectiveness in describing the same materials.
So far, the group has built target use cases for geospatial and cartographic data, and turned these into modeling patterns for their ontology development. As part of this work, they have developed a mapping of the Harvard Geospatial Library's geospatial metadata elements to BIBFRAME, and have solicited feedback on these activities
from the map and the geospatial communities. Once this is done, they'll be creating metadata for cartographic and geospatial data based on their extension work to see whether it is actually a very practical way of recording geospatial data. So the Library of Congress is working on four separate projects.
The first one will be focused on metadata creation for its archival film and recorded sound collections. Their second project will be explore best practices for creating linked data descriptions of material in their print and photographic resources collections,
taking into account different resources, cataloging rules for those materials. Their third project is BIBFRAME 2.0 vocabulary development. In early 2015, ELSI began to analyze the comments that had come in from the community, and there were many. They commissioned that review of BIBFRAME, and now they're making change to their
vocabulary based on the feedback. And ELSI's last project will be to explore the BIBFRAME and RDA data models and best practices for creating linked data descriptions for resources in monographic, serial, notated music, as these areas use RDA most often.
ELSI is starting their next big phase test of BIBFRAME starting after ALA, so in February. They'll be training 40 or 50 catalogers to produce their next big data set to really test out this. Also, as part of that work, index data will be converting the entire back file
of ELSI's cataloging, making use of a new MARC 2 BIBFRAME 2.0 mapping that they have developed. Princeton. In March 2015, Princeton acquired the personal library of Algerian-born French philosopher Jacques Derrida.
Taking this collection, the overarching goal of Princeton's LD4P project is to explore, develop, and implement linked data standards for the description of special collections materials and the annotations they contain. They are including digital surrogates for some of the dedication pages because of the graphical nature of a lot of the annotations.
We just had an LD4P meeting last week at the Library of Congress, and we had an interesting side discussion about perhaps the use of IIIF for the capturing of those images and being able to make annotations about them.
Stanford will be working on two projects. The first is the Perform Music Ontology project. The project aims to develop a BIBFRAME-based ontology for Perform Music in all formats, with particular emphasis on the modeling of works, events, and their contributors. So this part of the project had a 12-month guideline, which is coming up in maybe another three months,
so I think their results will be ready to be published sometime soon. The second project that we are working on is called Tracer Bullets. So according to the Agile Dictionary, a tracer bullet is a set of work where interfaces are developed from the beginning to the end of a process.
These interfaces may be very simplified or may just pass through. The purpose of the tracer bullet is to examine how an end-to-end process and the work will examine this feasibility. So we will be developing a linked data processing stream for our traditional cataloging workflows from the acquisitions process
through to a Blacklight discovery-based environment. The testing will be done with actual library resources and actual library staff so that a true measure of effort and most importantly the cost to implement this new paradigm can be evaluated. So we have chosen four key workflows for conversion to linked data.
The first one is traditional vendor-supplied cataloging or copy cataloging. This will be cataloging that is delivered to us from Castellini and we have engaged them to enhance the MARC data with as many identifiers for entities and for RDA vocabularies as possible
so that conversion to linked data will be as smooth and clean as possible. Pathway 2 is on original cataloging, so we will be training all of our catalogers and our catalog department to be able to create a bib-frame and they will set aside maybe one day a week to do nothing but create original RDF cataloging.
Pathway 3 is self-deposit of a single item to the digital repository, so for that we will be looking at the CDER tool that I mentioned at Stanford for the creation of that metadata and the last will be the ingestion of a collection into the digital repository.
So for us that's very different because we'll get things maybe two or three thousand at a time. The metadata comes in a spreadsheet so we have to convert that to something like bib-frame in order to deal with those types of deposits. We have focused on the modeling of our copy cataloging and original cataloging workflows
and have completed that analysis so we'll be set to go on those. Part of our efforts have been in identifying tools to test, so we most likely will use Aliata for our marked bib-frame converter because it's extremely malleable and extensible and they have included a mapping to bib-frame 2.0 within the tool itself
so it will be very useful for us. So last I would just like to talk about the next steps for the project. So we have two broad areas of development that we are focusing on in the next six to twelve months.
So the first one is reconciliation and reconciliation in a number of different contexts. So first I have to say when we made our application to Mellon for this grant there were two large areas that we did leave out of the grant. One was reconciliation and the other one was discovery because we asked for what we felt was an exorbitant amount of money
and we thought we'd have to double it or triple it to handle those things as well and we simply did not have the staff or time. So, but we also realized we need to work on reconciliation before we come to the end of this project as it is described. So I think there's four major areas of reconciliation that we are looking at as part of this project.
So the first is the reconciliation of a single institution's data as it is converted through the conversion process. So the second is we realize we'll probably have to convert that data more than once, either because we have changed the model or because we have updates to the data.
So we want to be able to ensure that when we reconvert the same data, the same URI is assigned to the same entity when it is reconverted. The next will be the reconciliation of that local data across the six institutions as we work together as a group. And the last part of reconciliation
will be the reconciliation of local identifiers to global identifiers. So the second part of our work is something which is actually the first goal of the grant and is the most puzzling to us. So the very first goal of LD4P was the establishment of the ability to create linked open data communally.
So we have to openly admit, though, we think it's a great idea, but we have absolutely no idea what it means or how it is supposed to work. We know that the shift to the web and web standards for the core of technical services production should allow us to work in a coordinated and decentralized way.
But what does this actually mean in a practical sense for the synergistic creation of library metadata in real time? So in the new year, we'll be turning our attention to developing the use cases for this new environment and then doing our very best not to simply focus on the recreation of our current workflows
but try to truly understand what it will mean to work in a new, open, and immediate environment and what those workflows need to be. So we feel that this will be the true heart of LD4P and provide the best path for the transition of traditional library technical services operations
to ones that are truly based in linked data. So thank you very much. Thank you. So we have a little extra time for questions from this presentation, so someone would like to begin with a question.
Is everyone awake? I don't have any questions. I have many questions. Oh, Osman, Osman.
Just a comment. I'm very much looking forward to your conversion tools, and also I'm interested in hearing more about how you use Aliada and what kind of support it has for BIPFRAME 2. I wasn't aware of that. So just general encouragement to, you know, do some good work on the conversion tools
because we really need better ones than what we currently have. Great. I think as we looked at conversion, I think the two things that were most important to us was sort of flexibility in the conversion. I think we realized that although we love BIPFRAME, that we don't imagine that we will only use BIPFRAME for our work, so we want whatever the converter is there
to be able to take multiple inputs and do multiple outputs. And also we're beginning to think about the reconciliation process as part of the conversion process. It may be a separate module that's attached to it, but it'll be imperative as you do the conversion to be able to do that reconciliation at the same time. And yeah, we're really happy with Aliada.
We've been working with Castellini a lot. It is a very flexible converter. They're very invested in BIPFRAME 2.0. They've already done the mapping to the current BIPFRAME 2.0 conversion, and the actual structure of Aliada is very open so that as ELSI continues to go through and make changes, it should be a fairly easy task to plug those in
to get out the result that you'd like. And just a comment that came from the Twitter discussions that's come about this. I think there was some applaud that LD4P had looked at other tools that existed, but also that there are some tools that do exist that weren't mentioned and might also be worth considering for the project as well.
Please join me in thanking again Philip.