We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Implementing the IIIF Presentation 2 0 API as a Linked Open Data Model in the Fedora Repository

00:00

Formal Metadata

Title
Implementing the IIIF Presentation 2 0 API as a Linked Open Data Model in the Fedora Repository
Title of Series
Number of Parts
16
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
"The IIIF Presentation API specifies a web service that returns JSON-LD structured documents that together describe the structure and layout of a digitized object or other collection of images and related content." IIIF website The dynamic serialization of IIIF JSON-LD structured manifests via SPARQL CONSTRUCT is an interesting possibility that has great potential for cross-domain discovery and rendering of digitized objects with variable criteria. I have explored this possibility by implementing a data model in the Fedora Commons Repository that matches the specifications of the IIIF Presentation API. Fedora has the facility to index objects via Apache Camel directly to a triplestore. With SPARQL CONSTRUCT, the triplestore can serialize normalized JSON-LD as a graph. The use of "ordered lists" (aka collections) is a fundamental component of JSON-LD and necessary feature of the IIIF manifest sequence which is represented in a canonical RDF graph as a cascade of blank nodes. In order to dynamically create the sequence with SPARQL requires that the data is modelled identically to the IIIF specification. This gist is a representation of a compacted and framed JSON-LD graph that was serialized from a SPARQL query of Fedora metadata. The ability to assemble parts of distinct, disparate and disassociated digital objects on demand in one cohesive presentation becomes a real possibility. For example, the "range" object is equivalent to a part of a sequence, like a chapter in a book. With SPARQL, it is possible to target ranges from different "editions" based on a metadata specification (i.e. a person, place, or date) and unify them in a manifest object which is then rendered by a client viewer like OpenSeadragon.
Lecture/Conference
Program flowchart
Computer animation
Computer animation
Computer animation
Program flowchartComputer animation
Program flowchartComputer animation
Computer animation
Computer animation
Program flowchartComputer animation
Program flowchart
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Panel painting
Computer animation
Computer animation
Computer animation
Program flowchart
Computer animation
Computer animation
Computer animation
Computer animationProgram flowchart
Computer animation
Computer animation
Lecture/ConferenceComputer animation
Program flowchartComputer animation
Program flowchartComputer animation
Lecture/Conference
Lecture/Conference
Transcript: English(auto-generated)
Okay, so the final speaker in this session will be Christopher Johnson and he started work on semantic web and linked data as a software developer for Wikimedia Germany.
The title for his talk is implementing the IIIF presentation 2.0 API as a linked
open data model in the federal repository. Hello. In the interest of time, I've sort of cheated and made a video. This is, I think,
the most how to create and publish IIIF manifests using the Fedora repository in SPARQL. The starting point is the Fedora repository version 4 with two existing collections, Test and Addition. I'll be creating a new collection called SWIB16
with an object ID 0098 in the next steps. I've extended the Library of Congress bagger spring rich client application with a Fedora Java client to make an implementation tool called the modeler that features primarily in the creation of the data model. There are many benefits to the bagging paradigm
for file and metadata packaging and this is another topic entirely. It's sufficient to note for this presentation that the file set for the 0098 object has been saved already as a bag and I'll simply load this bag into the modeler as a starting point. The files must be ordered with a numerical sequence in accordance with our expected presentation sequence. The
bag info.text contains key value pairs that comprise the primary variables in the data model that are the collection ID, the object ID, the service host names in the containers. These variables can be changed in this form. The flexibility of this metadata packaging approach is a significant advantage for deployment to multiple instances, for example local dev, remote
testing, remote production. All that is required to redeploy is to change the host names. It's possible to add optional metadata labels like title, author, attribution, etc. here as well. Also you may note that the text HOCR resource field that can be used as a source for the creation of
text model that allows full text searching and annotation display in the images. The modeling process is a control sequence that starts with creating containers. This is a put request that happens instantly using the Fedora REST API. Switching back to the Fedora interface, refreshing the page
reveals the newly created collection SWIB16, its child resource 0098 and its children which are the default containers to the IIIF model. The next step is to upload the binaries to the resource container. Using the serialization extension to Fedora, these binaries are subsequently
serialized to the file system where they're ingested by the image server. This group of 28 files is 751 megabytes and putting them into Fedora is quite fast, even over the wire to a remote instance. Here it takes about 15 seconds to finish. After the images are uploaded, then the image metadata needs to be patched. This is a
practical benefit of using Fedora because the binary metadata descriptions are essentially attached as a child object called FCR metadata, so it can be independently referenced. With the IIIF manifest, the image dimensions are required as well as the service URI to the image server instance. Annotation lists are created next. The lists are
required before the canvases. The canvases are then created. Patching the canvases depends on the resources and lists existing first. Switching back to the Fedora interface, dereferencing a canvas URI shows the metadata that was just patched which includes the width and height and its associated image annotation and its
label which matches the label and sequence of the image itself. Now here's where it gets interesting, the sequences. Most manifests will only have one sequence, and this model makes the assumption that it has the identifier normal. This is probably the most important step in the whole process,
patching the sequence. This creates a semantic glue that connects the canvas and image resources together as an RDF list. In the Fedora interface, we can see how the RDF list is represented with the RDF first and RDF rest properties. The first RDF first is called the head, must be a resource node and not blank. And the last node in the list called the tail must
use the RDF nil property. After the sequence has been created, then the manifest can be patched. Manifests have many optional properties, but they must have at least one SC has sequences. The last step that is required for a normal image manifest in this system is the creation of XML
files that are used by the image service for identifiers. These XML files are serialized into the same directory as the image binaries where they are post-processed by the image server ingestor. Now we can use the manifest service to get the JSON-LD representation of all this linked data that we just created that has now been indexed into the
triple store. The manifest service is an API interface directly to SPARQL. This service expects the IRI template node parameter and then gets the manifest graph from the triple store as ntriples and then uses the JSON-LD processing library from RDF and framing methods to convert the graph and make it look beautiful like this.
This is the expected serialization format for a IIIF manifest and can be used without modification by a client like the universal viewer. The manifest URI can
also be de-referenced back to the Fedora container so that supplementary data, if available, can be implemented by a client. Moving on to the next process, creating the text model. This is optional,
but I think that it's pretty cool and worth demonstrating. All the following steps read the HOCR document and create nodes and values based entirely on that document. The first step is to create the pages. Second step, create the areas.
Third step, create the lines. Here I pause the
recording and check the status of the indexing and serialization in the hot IO console log. The creation of thousands of new triples does not happen instantaneously, so indexing the words may take
hours depending on how large of a document is being processed. Apache Camel is quite robust, however, and can still handle routing large message queues. During the pause, I have also started the universal viewers so that we can view the newly created manifest. This is the universal viewer.
I'm just showing here that the manifest URI loaded into the viewer points back to the manifest that we just created. You may notice that when the
images load, the tiles have not yet been cached, so this is the first time they have been loaded from the image server. Subsequent loads will appear much faster. Everything appears to work, so we've created and published the IIIF book representation in less than 10 minutes.
Returning back to the text model creation, the next step is to patch the pages, areas, lines, and words. This basically means to create the RDF
list that create the sequences that connect them together. It is possible that some pages may be blank. In this case, page 18 has no area, so the client simply reports that a patch request for that page cannot be done and moves on. Another benefit of the REST interface of
Fedora. The same patching process is repeated for areas and lines and words. Switching back
to the Fedora interface, we can see how a word resource looks. It has a content characters property that contains the literal value of its text. It also has the OA has target property
which is the pointer to the geometric location on the image. The final step in the process is patching the annotation lists. This
builds pages of word annotations and assigns them to their list containers that were created earlier. In order for a client like Mirador to get these annotations for a particular page, in this model, a service URI is used that executes an annotation list query that uses the list
node resource rather than dereferencing the list node itself. In the process of generating the annotation list, all the words for that page have to be retrieved from the Fedora repository. If we check the list here, you
can see that it's made again of the RDF first and RDF REST properties where each
one of the blank nodes contains a word resource. Switching back to hot IO now, we can check the indexing process which takes
some time. Looking inside the manifest service now, I demonstrate here how the example manifest sparkle query looks like
when executed directly from the end point. Note that it is a construct type query. The where part of the query uses property paths to diverse the RDF list. It's a rather complex query actually, but it works. Here I demonstrate that while it is possible to serialize JSON-LD directly from
sparkle, this looks nothing like the JSON-LD format that is expected by the IIIF presentation API. There are several important technical differences. The big issue is with the at graph keyword. This is not accepted by the IIIF client. The second is that from RDF
does not know how to read a scallom IRI and expects the list to be constructed with blank nodes, so it cannot format the RDF list as arrays. I wrote a method that incorporated it into the from RDF API that takes scallom IRIs and changes them into blank nodes that the service can convert the RDF list as arrays.
The third is that JSON-LD has a variety of shapes, namely expanded, compacted, flattened, and framed. The IIIF format is framed. The manifest service is able to take the from RDF result JSON-LD and then frame it with a nested callback from the same request. This is the ntriples format that the from RDF method expects and what is delivered to the manifest service.
Switching back to the manifest service IIIF serialization of the same data, it is pretty clear to me why the JSON-LD is so nice. A human can actually read it.
Now I demonstrate how the text model is used by the client.
Universal Viewer search within feature accesses the search service API that is specified in the manifest. My implementation does this with sparkle, but others may use Solr. Either way it is just a q equals keyword search. The query results are returned as an annotation set
from the manifest service, not a list, and the client reads the target values of the results and then highlights them on the canvases. Here I show the raw
results of the query that was just executed from the client. The client expects this exact
format for the results. Another benefit of using Fedora resource identifiers rather than hash type URIs is that they can be dereferenced. I now switch to the Miradore viewer to demonstrate how an annotation list can be rendered. This
functionality is not yet supported by the Universal Viewer. Going back to the page 12 and the keyword Dixie that we just located, if I toggle the annotations view using the button in the upper left hand corner, then all the content characters that have been assigned the OA linking property will be rendered on the image. While label pop-up
seems simple, it is possible to extend the annotation displayed to include any properties for that word resource. Here I switch back to the manifest and show the raw value of the annotation list that was just used by Miradore. Voila, there is Dixie as a word resource 12174 in the annotation list. In my view, specific
and concrete client implementations have vast possibilities when backed by a flexible and linked open data model like the one that just presented. Thank you. I'm sure there are questions, so I'll leave the remaining time for discussion.
Very much time here, and I could talk for hours on the subject. It's really the most efficient way to present a very complicated process. Are there any specific questions?
Any questions from the audience at this point? It wasn't completely clear to me how much work you had to do on the metadata first before you were able
to sort of expose it as triple IF. None, actually. There is actually zero work that's in the modeling. It just starts with a bag. The Java code is actually using Fedora's built-in process
to generate the metadata as it exists in the model. I mean, it's really natural. I think it's a very pure approach to metadata modeling because there's no intermediate transformation at all, because Fedora represents the data as RDF internally.
I can understand that it's RDF, but I'm surprised if everything is already there that you need to create the triple IF so that there is sort of... Triple IF doesn't require anything that you wouldn't have already in there. Well, I mean, the
triple IF presentation API is a data model. It has a very specific format for what the clients expect. Modeling this relationships of the different objects requires this semantic glue,
which is comprised of the RDF lists. I think really the challenge in the development of this was to understand how to represent the lists that are part of the format and then serialize them into the triple store and then query them so they can be represented in JSON-LD,
because the JSON-LD library itself does not really facilitate some of the transformation using the from RDF methods for the scolums. So I had to understand a lot about the scolum to blank node interaction and also some of the limitations for the JSON-LD framing algorithms that,
particularly with the at graph keyword, that limited the expected serialization of the presentation API. So yeah, there was a lot of research I think I had to do into understanding JSON-LD and also Fedora into making this process work. Also the process of using
Apache Camel to serialize from Fedora into the triple store is quite sophisticated, I guess you could say, because it's using active MQ and a lot of messaging components. But I think basically this is pretty pre-built code.
I didn't write any module for Apache Camel. I'm using what's been developed by the Fedora developers themselves. Okay, any questions? Okay, there's one.
Hi, Christine. This is great. There's a real, I think, tour de force demonstrating a pipeline which I think perhaps the value is not just in what the fact that you've done this once is the fact that most of what you've done is presumably extraordinarily reusable.
Do you think there are any points which are very specific to your data set? No, I think my design I really tried to make completely abstract from my data set so that this is particularly the use of the bagging paradigm from the Library of Congress as a basis for the portability
of the data itself. I think this concept makes it very reusable and that's the intent of this architecture so that anyone can do it with any type of image. The text modeling depends,
in my implementation, on its HOCR format but there are other formats for correlating image geometry to particular text characters. I think ALTA is another one and METS. So it's possible to extend the Fedora modeler to incorporate
other formats for text modeling but I think the text modeling is a real benefit of using the linked data model because it's extensible to other APIs. So for instance, the client may develop its own API
to query for a particular type of annotation and represent it differently in the client. So for instance, if you wanted to show entity types or whatever and specifically highlight them on the image, this is entirely possible with this model versus another type of representation.
Anyone else? Just to follow up on his question, so is this something that you've made available already or are planning to make available for other users of Fedora so that they could apply this to their image collections?
Yeah, I mean, I think the premise of, you know, if someone's already got images in a Fedora repository, they may have to re-implement the process because of the serialization, you know, they have to incorporate the image server into the serialization of the images.
But so it's kind of, for existing consumers, it may be not so easy to migrate, but I think if you're starting from nothing and you want to have an instant manifest service, this is the way to go because other processes for generating manifests are probably more rigid and not geared
towards linked data. So, I mean, of course, anyone can write a manifest and it's just one off, but this allows you to change variables and manipulate your data in a way that is compatible with the Fedora thinking process which is flexible and extensible.
So, yeah. Okay. If there are no further questions then we will thank the speaker. And it turns out we finished a little bit early
which is good because then we have more time to mingle during the coffee break. So, the coffee break will start after I give up the microphone and we will continue at 20 minutes to four. So, please be early because we will have the lightning talks after the coffee
and, I mean, if somebody loses one minute from a normal presentation that's not so bad, but if somebody loses one minute from a lightning talk then that's worse. So, please be early. Okay, let's go for coffee.