We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

BIBFRAME Pilot

00:00

Formal Metadata

Title
BIBFRAME Pilot
Title of Series
Number of Parts
15
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The Library of Congress has begun a second Pilot simulating the cataloging environment with Linked Data, a triple store, and the BIBFRAME data model. A great deal of information resulting from the Pilot will soon be available and the session will report on it. The Pilot and the whole BIBFRAME development are carried out in an open environment with specifications, conversion tools, and system components made downloadable via a web site or GitHub as they are developed. The BIBFRAME 2 ontology is also available as an OWL file and many of the controlled vocabularies used for the data, such as subjects and names, are available in RDF. The vocabulary services have been operational for over 5 years and were basic building blocks for this further development. The Pilot has taken a total-environment approach by converting the whole of the Library of Congress MARC catalog to RDF according to the BIBFRAME data model which the 60 catalogers in the Pilot “catalog against” as they create new descriptions of items. Pilot catalogers are specialists (with various language assignments) who deal with monographs, serials, moving image, recorded sound, still image, cartography, and music. The presentation will discuss the tools used (what they did well and less well), aspects of converting a very large file to a very different data model, RDF and ontology issues, and cataloger efficiencies and problems in the new environment. We will also share thoughts on how this effort may fit into the global Linked Data environment, including how it can benefit from further engagement with other communities and services.
DiagramProgram flowchart
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Lecture/ConferenceMeeting/Interview
Computer animationProgram flowchart
Program flowchartDiagram
Computer animation
Computer animation
Computer animation
Program flowchart
Program flowchart
Program flowchart
Computer animation
Computer animation
Computer animation
Computer animation
DiagramProgram flowchart
Computer animation
Computer animationLecture/ConferenceMeeting/Interview
Computer animation
Transcript: English(auto-generated)
Hi, I'm Nate Trail from Library of Congress, and I'm the lead developer for a bibframe Our colleague our boss Sally McCallum is not able to be here today So to just be ray and I ray will talk a little bit about ontologies and I'll be talking about the pilot We've had the pilot Number two going on for a number of months so far. So this is just a preliminary
Overview of what we're finding and what we're doing In order to get things started we had to build a bibframe database so that our catalogers could catalog against it and that consisted of all of the name titles and title authority records in our
LC naff at the id.gov as converted to bibframe plus a complete dump of our ILS records Bibliographic records converted to the new bibframe to vocabulary and then also Native bibframe from our bibframe editor as the catalogers code into it
The main focus of the pilot is really about data entry and conversion of the data and it's really not enough about what do you do on the back end or on the
Website of it which I think is something that we need to pivot and move toward now that we've sort of built a database for this the database that we have is sort of living it has daily feeds from both the ILS and from the name authority records and when the catalogers add new records those are merged into the database and
The records that they also have to key into the ILS Maintain the ILS the real ILS going forward are kept separate from that
So between bibframe 1 and bibframe 2 we've changed a lot of things the original vocabulary we've overhauled and so now we have bibframe 2 as a Vocabulary and we also realized that our infrastructure was inadequate for the amount of linking that was going to happen between
Our own data and others as well as internally so we have replaced all of the servers that were involved in ID look gov and We have added additional machines to to make it much more robust Our triple store was for store, which is an open source product, and it's no longer being supported
So we've switched that out and just recently we moved to allow HTTPS for all of our links We've also changed all the software most of the code base that we had originally was in xquery But now for our conversion in the
bibframe 2 vocabulary we're using XSL and one of the reasons for that was so that we could embed the Resulting tools in metaproxy and yes So they're instantly available when we make code changes to any library that updates their metaproxy and yes tools The comparison service online is now updated for bibframe 2 and we've added an authority conversion as well
so you can see what a name title authority looks like as a bibframe work and When we ingest everything into the database we do a merge For a bibliographic record to see if it can match to an existing work and all the merge programs had to be
updated for the new vocabulary as well The bibframe catalog that we created had to have a front end so we've got a new search and display interface for that and we're just starting to use sparkle to augment the display and Not for a whole lot of other stuff, but just a little bit of extra
Display stuff the editor itself was written for bibframe one So we've modified that to handle bibframe 2 and the profiles that we have are also now updated for the new vocabulary To A little comparison between ID and bibframe as far as triples
ID has about ten and a half million records that are names and subjects and other smaller Vocabularies they represent it has about 300 million unique triples 21 million subjects only 768 predicates, which is Interesting when you look at down below how many predicates there are for the bibframe database
I am not really delved into that to see why exactly there might be 14,000 unique predicates, but we'll see The number of triples in the bibframe database there's 65 million Descriptions and that ends up being 4 billion triples because
When you do a bibframe conversion It's very wordy in order to allow us to do this merging and matching and at some point soon. I'm going to be Deleting the things that were necessary for ingest but not necessarily for the ongoing bibframe data
So when we merge we started with the base file of name titles. There's about 1.2 million of those and 19 million bibliographic descriptions were added to that and After the merge About 1.2 million works had something merged onto them Not just their own records not that just their own instance necessary necessarily, excuse me
But only 530 million instances were merged on to one of the name authority works so the other 700,000 or so Merged on to bibs that came before them and I'll show you an example of one of those
so this is a title runaway mittens and There are two different editions apparently and if you look on the right hand side The one that's in bold is this particular instance and the sibling right below that with the hyperlink. That's the
Instance that was merged onto the same work. So it's available for viewing as well Just a little bit of the sparkle that we started to use for that instance You don't necessarily know on an instant what its parents title is or what other similar siblings are available what other instances?
Belong to the same work. So we just did a simple query that says okay if you have an instance of some work Go find that work and get its Bibframe title and bring it back and you can also say well Go find all of the things that are instances of that same object and get their titles
And so we have that Some of the issues that we've encountered We're using RDF XML for The conversion from the for the bib records, but the bib frame editor is a JSON editor and so
Before we start doing ingest we have to bring everything to the same structure so that we can do ingest the same way There's a huge number of triples involved and we're still trying to figure out how to limit that and index only the stuff that is necessary When we do our merges
there's a lot of candidates for maybe you shouldn't merge this for example a Photograph doesn't necessarily have a title. So it says untitled but not all untitled work should be merged together So We we actively suppress some things from being merged and
The conversion from mark is still a moving target. So every time the conversion spec changes and Indexed data makes an update we have to figure out how does that update flow into the database? We can easily change the conversion on the way in but all the prior records we have to figure out. Can we Update those on the fly in some way or do we need to reload everything?
so some of the work that we can Still look forward to is trying to figure out how to expose the bib frame catalog right now It's behind the firewall and we're still trying to figure out what it looks like and and how to how to come to grips With it for ourselves, but we might do some kind of a bulk download
I would really like to see something more Less cataloging focused and more web focused. So maybe a new RDF navigation interface, but We have not got a plan for that yet we're still definitely looking at the data that's coming in and
We also would like to ingest new sip records and onyx records and convert those to bid frame the editor has a lot of issues and I'm sort of tempted to just change it to a Simple HTML form and convert to bid frame on the back end We're always looking for new services at ID. So we need your input for what else you want to see from
The systems that we have and soon we'll have spec for holdings. So we'll start ingesting items as well and These are some links to the converters and documentation stuff like that. Thank you very much
Hi, my name is Ray Denenberg. I'm also with the Library of Congress I'm going to talk a little bit about the ontology the bib frame ontology And this is pretty much a condensed version of of what I talked about yesterday in a two-hour session for those of you were there
and those of you who weren't there and who are interested in some of the points that I talk about my understanding is that the presentation yesterday will also be on the Among the conference proceedings and so you can you can you can see a number more further examples of what I give today
so I start by saying that the the development of the bib frame ontology was driven by a number of principles and two of which were simplicity and extensibility
And by that I mean that When a particular feature was Suggested to us and keep in mind that bib frame is intended to be a core Bibliographic ontology so when a particular feature was suggested to us We evaluate in deciding whether to support it or not. We usually would evaluate whether it was a core bibliographic function
But if we decided not to support it then we would go to whatever lengths necessary to try to ensure that it would be extensible and we encouraged extension on and We encouraged and encouraged extension ontologies
Where the people developing those have much more expertise for developing ontologies for special collections and things like that to develop extensions So I mentioned a few of these
Make sure this comes out. This can be seen because this one's in black everybody see this, right? So there's art objects that's being led by Columbia University Harvard is leading the effort for for extensions in in both maps and moving images
Perform there's a perform music group That's being led by Stanford that's the one that I have worked probably the closest with and I think I'm not sure this but I think it's probably the furthest along among the Among the extension ontology and somebody can correct me on that
Rare materials by Cornell and then there's bibliotech Oh, which is Which is loosely described as a bib frame extension, but they're gonna be speaking next so they can they can characterize that Better than I can probably So I want to first give a very quick review of the bib frame model just to
Just to provide some context for what I'm going to talk about. So the basic bib frame model Begins with a work and a work can have one or more instances. So for example work Candide the book might have a print version published an electronic version. Those would be two instances and
Every instance can have one or more items and those are the copies of the given instance. So for exit and If we also define work-to-work relationships is an important aspect of bib frame
so for example the book can deed and the play can deed these are these are two distinct works and They are connected by the property that we've defined BF related to which is basically a super property for for a number of sub-properties
Sub-properties of BF related to Now I just want to say at this point that the related to property was pretty much conceived for the purpose of Work-to-work relationships, although there were some work to instance relationships instance to item relationships primarily work-to-work relationships
But the the Extension groups primarily the music extension groups wanted to relate Works to other sorts of things not just instances and items and so in the interest of
Extensibility we dropped all of the domains and ranges on the Related to property and I'll talk more about that in a moment or two and finally the In the bib frame model we have a number of Subclasses of work so for example a book is a BF work
but it's also a BF text as BF text is a subclass of BF work a Painting would be a BF still image, which is a subset of BF work and with that I Want to talk just a bit about
The music extension and how we've related to that So back to the subclasses of work there are two in particular that are of interest to To music BF audio and BF notated music. So let's take for example the Mozart clarinet quintet so that
you could have a score or you could have a recording of it and The score would be a BF notated music The recording would be a BF audio and these are these are two distinct works
The the but the music extension adds a lot a layer of LA I've got a few typos in this and I'm going to correct those and I'll issue a Revised version of this. It's not a later of abstraction. It's a layer of abstraction That music adds to the to the basic model
You won't find this published anywhere But this is my understanding from reading what they're working on that they're coming up with a work model And in that work model, they would actually define an abstract work So in other words this particular piece of music the Mozart clarinet quintet
Would be would have an abstract work Which is actually the music as it existed in in Mozart's head and then when it was committed to paper it would become become a notated BF would become in the bib frame terms of BF notated music and that's a work But the the actual abstract work would be a layer above that
So this is an a you know, sort of an extension to the bib frame model that the music Extension is is is developing and as far as bib frames concern I mean my bib frame doesn't define an abstract work of that type as far as bib frame is concerned It's perfectly compatible with the model. It's just sort of extends the model
and so they and then when they define this Extended I mean this layer of additional layer of abstraction they defined property realized in so this Mozart clarinet quintet would be realized in a
BF notated music it would also be realized in a BF audio if it's recorded PMO here refers to the perform music ontology and
Okay, so it all the the music people took a look at the BIB frame event model which I'm going to talk about in just a moment and said that this isn't this relationship Related to isn't good enough. We want to relate works and events
so here you have in this case an event a performance of the Mozart clarinet quintet is a BF event and it according in music terms. It's a PMO Performance and then you also have the event to work relation Has recording
So let me talk about bib frame events for just a moment. Um, how much more time? What? Oh Okay, so later late in the game we added an additional core class BF event and So let's say there's a concert. The concert is recorded a book is written about the concert
Well, the concert is an event it's a BF event the recording of the concert is a work and The book written about the concert is a work and the concert is the subject of the book So let me just say what I mean by subject of the book and just so digress for a moment
Talk about bib frame subjects when we when we express a subject in between we want to give a type so, you know for example This subject is a is a person and then you give the actual subject This is a bit of hand waving you could that subject could have a direct object as a man's record or what or so forth But this is a little more human expressible
So let's you express it this is a person the person is John Wilkes Booth as opposed to say a work because it could be BF work and It could be a work. It could be a book about John John Wilkes Booth It could be a BF geographic as I said, it could be a BF work or
What I'm point that I was coming to is it could be a BF event This is one of the main reasons for for describing I'm getting the choke signal here I will let so Just give me another 30 seconds here
We the event content and event content of our two properties that were defined For the purpose of for the event model and We've extended the BF related to so that you could have work to event work to event relations
and the music people created this property created for PMO created for because they didn't think that what I that the that the I Can't go can't go back but that they didn't think that the that the
related to property was Specific enough. Anyway, they would did they have defined additional PMO classes concert performance and festival and in addition They've sort of developed a whole hierarchy of event types and the rare material Folks have created
A custodial event an event type and a whole lot of custodial event Types that are subset of custodial events. I have a lot of material that I want to discuss on bib frame titles I'm gonna I I would suggest it go to the the presentation I did yesterday and there's a
wealth of examples on bib frame titles and So, I guess that's it. Thank you. Okay. Thank you. There's only so much you can There's only so much you can fit into ten minutes So I think we have to move on to the next speaker on any questions. I suggest you take up during the coffee break