We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

De l'Une à l'Autre: Towards Linked Data in Special Collections Cataloging

00:00

Formal Metadata

Title
De l'Une à l'Autre: Towards Linked Data in Special Collections Cataloging
Title of Series
Number of Parts
15
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
A member of the Mellon-funded Linked Data for Production (LD4P) initiative, LD4P at Princeton is participating in defining linked data ontologies specifically for the description of special collections materials. It uses a hand-selected set of 525 items from the Library of Jacques Derrida – all of which bear inscriptions by persons who gifted the books to Derrida or his household – to investigate tools, workflows, and data models that will make the creation of linked descriptive data for annotated material viable in a production environment. To this end, LD4P at Princeton is exploring modeling these inscriptions on an adaptation of the (W3C Web Annotation Data Model) and linking them to bibliographic data converted to (BIBFRAME 2.0) or related models. Since the VitroLib ontology editor, which was developed at Cornell University as part of the LD4L-Labs initiative, is designed to use (bibliotek-o; a supplement to BIBFRAME developed by LD4L Labs and the LD4P Ontology Group), LD4P at Princeton anticipates using bibliotek-o at least part of the time. The presentation will give an overview of the workflows and tools the group has developed, demonstrate the current data model, and discuss our current thinking on implementation issues such as front- and back-end interfaces, API's and the curation of external data sources, and lessons learned along the way.
Computer animation
Computer animation
Computer animation
Diagram
Program flowchart
Diagram
Computer animationProgram flowchart
Computer animation
Computer animation
Computer animation
Lecture/Conference
Lecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
Good morning, my name is Regina Heberlein. I'm a metadata specialist in the department of special collections at Princeton University And it is my great pleasure to talk to you today about a project. We're involved in In that we are a participant in the linked data for production initiative
Our group consists of five people Joyce Bell who is our project lead Jennifer Baxmire, Peter Green, Lydia Santorelli and myself And also until March of this year when he defected to Yale Tim Thompson was also a member of our group
Many of you will know this our group is part of the Mellon funded linked data for production project Which actually includes six research libraries? so Columbia Cornell Harvard Library of Congress Stanford and Princeton And the purpose of the grant is to explore ways to move from mark based production workflows to
production workflows based on linked open data Each partner library in the grant works independently on a local use case and that Includes the data modeling and as the case may be the domain specific ontology development LD 4p is also part of the larger LD 4 family
Which includes the melon funded linked data for libraries LD 4l which concluded in 2016 and LD 4l labs which runs 2016 to 2018 and is focused on tool production LD 4p is a two-year grant that started in April 2016
So we're fast approaching the finish line So Princeton is using Selection from the library of Jacques Derrida as its project test case This is an author library of about 19,000 items that was acquired by Princeton University in 2014
Jacques Derrida, of course was a French philosopher. He is one of the major figures associated with post-structuralism and postmodern philosophy and the collection has met with correspondingly high interest from researchers
The collection includes Derrida's working library, which is one of the things that makes it so interesting As well as books belonging either jointly or separately to Jacques and his wife Marguerite And in a handful of cases also to their two sons
It includes a variety of formats monograph volumes Certainly serial issues off prints clippings type scripts, etc That were largely accumulated in the course of Jacques and Marguerite's professional activities But there is also a set that represents the family's leisure reading
This is another view of the library What makes this collection so interesting from a linked data perspective is that it embodies many and different types of relationships That aren't easy to either expose or let alone make functional
in a library environment with our current descriptive tools I'm going to give you three examples the last of which is our use case So for example Derrida extensively annotated his books so there is a lot of marginalia and also a lot of insertions and
For those interested in studying Derrida's reading habits or intellectual formation for example those represent a record of a dialogue between Derrida and the text There is currently another project exploring this at Princeton University It's called Derrida's margins, and I believe they're not using a linked data approach
Even though I think it lends itself to one Another really interesting type of relationship in this collection is the shelving order meaning the precise location of the books Because it is meaningful In that where a book was kept
So for example in the studio or somewhere else in the house and who its neighbors were Contains information about how it was used by Derrida I'm an archivist by training and archivists call this kind of meaningful
background Context this is the kind of context that allows you to know what you're looking at But it is not contained in the resource itself and so we try to address the presence of substantial Contextual information in this collection by treating it as an archival collection
So it is described in a finding aid rather than in individual mark records and That allowed us to Display the original shelf order and to allow users to sort by it among other data points Of course what that gives you is still basically a browse list so the functionality is very limited
Last not least to get to our ld4p use case Excuse me The collection contains a large run of over 6,000 items of books that were gifted and in many cases Inscribed to Derrida and also members of his household
Derrida famously never threw away a book Including those he never opened there are some shrink-wrapped ones that we got as part of the collection and obviously never read And there has been active scholarly interest in these dedications Because they represent a physical manifestation of Derrida's reader network and intellectual influence
Currently this sort of information is Buried in a note in the record I'll show you that in just a bit, and we really have no mechanisms to include it in our discovery functionality Here you see some examples
For our project we selected about 500 of these dedications Pretty randomly even though we did try to filter out living gifters to be sensitive to privacy concerns And so the goal of our project is to explore and develop a production oriented workflow that allows
us to capture and make discoverable the relationships embedded in the dedications and To that end we have been working at creating a data set that combines in linked open data format the bibliographic item data and when I say bibliographic item I really mean the bib frame item and
identified dedication data this one is an Amusing example also an interesting example It's amusing because Derrida has annotated a dedication and the dedication is pretty warm and cordial You know to the effect of ingratitude for your friendship and with my strong and faithful and great admiration yada yada yada and
His response is hypocrite exclamation point So this is actually also a very interesting case of a meta annotation and excuse me annotation and I'll use this example for a while to demonstrate what we can and cannot
Do with it in this project? So as mentioned before we currently describe this collection in a finding aid not in a mark record This is a snippet of the XML. It's an encoded archival description, which is an XML schema for archival data
You notice the big blob in the center there It's just a string of text This by the way would be a very similarly sad situation in a mark record where this would also be in a note field The data is completely unstructured and the entities are just embedded in a string And I've circled sort of the low-hanging fruit here even though you could go further
So we developed a data model to structure this data Which is based on the w3c web annotations model that's currently in recommendation status w3c and
By the way, those of you who attended the triple IF workshop yesterday have gotten a much fuller and better overview of this model I'll just give a brief a summary of what we're doing with it So the web annotations model models relationships between information objects And it was designed to model born digital annotations that reference web-based targets as opposed to physical data carriers
But it actually Generalizes well to a library environment where a catalog reference a catalog record references information Which could be on the physical data carrier or on a digital surrogate?
the spec says I quote an annotation is considered to be a set of connected resources typically including a body and a target which you see here and Conveys that the body is related to the target The exact nature of this relationship changes according to the intention of the annotation
But the body is most frequently somehow about the target end quote and so this of course is equally true for our stringy dedications in the books from the last library where the Annotation that is to say the metadata package provided by the catalog er
References a body the content of the dedication on a target the data carrier or digital surrogate And then as you can see on the right here Conceptually the target hooks nicely into the existing bit frame structure
I should say that the idea for using web annotations came out of conversations between members of my group and Tim Cole and some others And was then developed and published in the Journal of Library Metadata by my colleagues and former colleague Tim Thompson Jennifer Bags Meyer Joyce Bell and Peter Green
Applying this model to the hypocrite dedication and meta annotation We see here the anno one or annotation one the information package regarding the dedication itself Then there's an anno two That's the package regarding the comment on the dedication and then we have an anno three that describes our
meaning the catalog of cataloging staffs identification activities If I start plugging in some data, this is what it would look like. I don't know that you can Really read this but I'll walk you through
So this is anno one And no one has a body called body one Which is the text of the dedication and it has a target page X and that's because I neglected to look up the actual page number Page X has as its scope item one Which in turn is our bit frame item?
the purpose of body one in anno one is dedicating anno two in turn would have as its body a body two and Then point to body one and it would look like this So body two
The text of hypocrite has the purpose of commenting on the target body one the text of the dedication this is a Very very simplified view It's like irresponsibly simplified of how we created and consolidated the bibliographic and annotation data into a linked data record
And we're still in the process of doing that. So in very simple terms We started with two data sources. That's a gross lie, but Conceptually, there were two data sources the data for the bibliographic item and the data for the dedication
We modeled one on bit frame and the other in web annotations Put both for their respective sausage makers and voila at the end. We got linked data out of it This Is the non romanticized version
in Peter Green's retrospective workflow analysis Of all the steps that we actually had to take And the details of this aren't really as important as just showing you the fact of the multiple contortions That we had to twist ourselves and our data into on the path to a consolidated record
I Will mention two things though that were Annoyances or major complications one is that we we had to go through mark to get to bit frame Because right now we can't produce bit frame natively and so that of course is kind of a roadblock
We also ended up having to make our own editor to create the records for the annotations And I'll talk a little more about that later This is that editor It Is a tool that Peter Green built for us
So we could move forward because our timeline Didn't sync up with the development of the editing tools that are being developed by ld4l, and we just needed something to move on so There are a lot of disclaimers Associated with this tool. It is not meant as a community tool. It's a one-off for our internal use
It doesn't quite do all that we want it to do for example It cannot yet accommodate the meta annotate annotation the hypocrite bit Unless we put it in the code manually and That is why Peter calls it MacGyver where?
This is the landing page you can expand a whole suite of background documents here Including step-by-step instructions, and that is because the editor assumes as its users our regular cataloging staff We're not necessarily familiar with linked data. So here if they want to they have all the background
materials You can retrieve records by ID We're starting from a prepared set of four hundred and eighty nine And I've randomly selected record 450 here for no good reason at all
What you get is an image of the dedication that you can open up to double-check it against the transcription a text box with the transcription an identification box which we'll use in a moment, that's the green box here and
the code template below The buttons in the center there both are buttons to move to the next step and also show you What you've what you still have to do the steps in the process?
Here you can expand the view of the incoming metadata, so there's a link to the finding aid record the OCLC record the WorldCat work record among other things and then This is the view of the actual
dedication for double-checking purposes and Assuming that is okay. We can start identifying and That is done by highlighting the entity in this case. That's a person. So I click person in my my radio button there What that does is populate the identification field with 12 characters to the left and to the right
and Then it appends the code you see below to the template The next step is to manually edit the template The dedicated is presumed to be Jacques Derrida
So he's here in an RDA property already filled in I added a line for Marguerite I would also have to do a second pass to identify her I've just skipped over that in the interest of time And then the author of the dedication is the translator Marie Claire Pasquier, which we know from the EAD record So I've given her a via
URL here, too And then the tool allows you to view this graph Which you probably can't read but that's okay the blue bubbles are the bip frame entities the red bubble is the OCLC work record and The yellow bubbles here are what I just did
so there is one annotation motivated by describing the body of which has the purpose of inscribing and the other two where I tagged the Names are motivated by identifying and the body in those cases has the purpose of tagging And I'll just quickly walk you through this We are not sending the files to a triple store yet
It all lives on the file system the tool outputs the turtle file an SVG file on an HTML file It's a basics application this is the file structure There are two main files the form itself and an xquery module We will make that tool available at some point in the future just not yet it actually keeps changing on a daily basis
And finally just a note on our lessons learned and challenges This Is our original grant timeline with our goals and when we first started the project we quite underestimated how long the data modeling in particular would take and
We didn't quite count on having to make our own editor because at the time we were still thinking that Well, something is gonna come out of ld4l and we're gonna use that but the timelines just didn't converge that way So we spent time building this editor
What that in the end meant for us was that we had to give up on some of our goals like working on the catalogers workbench editor or involvement in a rare mat or other community ontology development efforts And also our anticipated use of retro lib and bibliotheco
our next steps are now to Refine our tool and the production workflow make our test data set Available and then hopefully start building some queries on it And I'll be happy to answer any questions to the best of my ability
But please feel free to reach out to any of these good people who will probably be able to answer them more competently Thank you Thanks a lot regime. So it's are there any question
On this project. Yeah Yeah Thank you again This was quite inspiring. But if I was a philosopher, I would be quite skeptical now
because According to Derrida, I think two people using this tool would would make Would use this tool in very different ways and I think the annotations They produce would be quite different
So for a philosopher, what is the outcome of this? I'm going to take the pedestrian approach to this question and say few of our catalogers are Philosophers or would take the philosophical approach to this
If you remember is this still up? We have a step-by-step Instruction manual here the first bullet point the how-to it really tells you what you need to do Step by step with very I think well, I might be wrong
But I think it leaves very little room for doubt So it's just a matter of doing the work of tagging the name Is it a person or is it a place is this person also the dedicator or is it not? And the tool does the rest so I'm actually a little more optimistic now we've just
Started producing records this way So it might very well turn out that you are right and we have to go back and be more restrictive about what we do But for the moment, it seems very streamlined and I'm a little more optimistic. I think that we'll get uniform output
There was another question. Could you explain a little bit the the outcome? But what could we expect if you finished the project I didn't really understand the use case I'm sorry, but this will there be a portal for researchers or We just be part of your catalog
So the use case is that? In the end you want to be able to query this data, right and you're right. I apologize I didn't really I'm give examples of that but you could then do something if we Link it to external data sources that of course is something that we haven't even begun to do But let's assume we get to that step. We publish our data set
It is in fact pulling in data from say Wikipedia DB pedia from a variety of sources you could then begin to answer questions like Say how many people publishing in French alive in a certain Time frame and members of a particular school of thought
gave books to Derrida and then Researchers can draw inferences about in which circles Derrida was active How he was read who he interacted with and that sort of thing
Yeah, see you have your answer I Thought it was very interesting what you what you showed and I was wondering on the conceptual model because now we have a lot of data hidden in note chills in annotations and
We and and I think the assumption still is that you have a work That's the printed book and then you have notes. Did you consider the idea that you actually have three works? the the text the text of the dedication and the work of Derrida and then connect it as a complex work I
Was kind of worried that would come up We haven't we haven't it's not like that thought hasn't occurred to us, but it's a little Overkill maybe for what we're trying to do because really all we're trying to do is make entities available
For discovery in these annotations and for that it seems like the annotation The web annotation model does the trick and it's a lightweight Fairly easily implementable solution to what we're trying to do that said we keep saying to ourselves that
We really need to tweak the model and we really need to rethink this and rethink that so This is not written in stone But I don't think that at this point in the project will go back to that level of conceptual rethinking
We'll just I think stick with this lightweight solution and tweak it a little bit any question I Just have one regarding the the big diagram with your the whole workflow And I was just wondering the the decision to have all these separate tools
Was it only guided by the fact that you were expecting something from this other ld4 project or Or is it was also because you had all the requirements such as having only open source tool I'm just wondering in in the world of tools whether you looked at everything and nothing was suitable. Just what your was the
The sole process to go for Yeah a workflow that is so fragmented in a way it is Partly it has to do with Dependencies that didn't work out the way we thought they would partly it has to do with the fragmented nature of our incoming data because it really it was an Excel spreadsheet there a word document that
We have to scrape there. We all consolidated it in a database. It went to it, you know and Part of it Which is a challenge that I actually had in my notes and then kind of glossed over because I thought well everybody has that challenge is that
We didn't really plan From the first moment we couldn't have because we didn't quite know yet what we were dealing with but If we had been in a position to do that that would have been
It would have made everything so much easier. So that's part of the answer. The other answer is that it's really You know, we are a Group of five people each of us bring certain skills Basically a certain toolbox to this project and we use the tools that we find in our toolbox
So if we know how to get it from Excel to this other thing That's what we're gonna do because it does the job And so yes, there are a couple of Eddie's in here and it's a little tortuous, but it got us to where we needed to be Had we known better what we were dealing with from the first moment we would have planned it much differently, okay
Thank you. I think it's quite a good Maybe a good point for people that want to to start such an effort on their own as well. So yeah Okay. Well, thank you very much for for your presentation. Thank you