We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation

00:00

Formal Metadata

Title
RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation
Title of Series
Number of Parts
16
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
RDF is a graph data model, so the best way to understand RDF data schemas (ontologies, application profiles, RDF shapes) is with a diagram. Many RDF visualization tools exist, but they either focus on large graphs (where the details are not easily visible), or the visualization results are not satisfactory, or manual tweaking of the diagrams is required. We describe a tool *rdfpuml* that makes true diagrams directly from Turtle examples using PlantUML and GraphViz. Diagram readability is of prime concern, and rdfpuml introduces various diagram control mechanisms using triples in the puml: namespace. Special attention is paid to inlining and visualizing various Reification mechanisms (described with PRV). We give examples from Getty CONA, Getty Museum, AAC (mappings of museum data to CIDOC CRM), Multisensor (NIF and FrameNet), EHRI (Holocaust Research into Jewish social networks), Duraspace (Portland Common Data Model for holding metadata in institutional repositories), Video annotation. If the example instances include SQL queries and embedded field names, they can describe a mapping precisely. Another tool *rdf2rdb* generates R2RML transformations from such examples, saving about 15x in complexity.
Lecture/Conference
Lecture/Conference
Program flowchart
Program flowchart
Program flowchart
Computer animation
Computer animation
Program flowchart
Program flowchart
Program flowchart
Program flowchart
Program flowchart
Program flowchart
Program flowchart
Program flowchart
Program flowchart
Computer animation
Computer animation
Program flowchart
Program flowchart
Program flowchart
Lecture/Conference
Lecture/Conference
Transcript: English(auto-generated)
I hope everybody had a good lunch break, so welcome to this session.
My name is Osma Suaminen, I'm from the National Library of Finland, and we have four speakers today. But first of all, an announcement concerning the lightning talks. So the lightning talks will be held after the coffee break, after this session.
So we have eight lightning talks registered, so every speaker will get four minutes to talk on their subject, so you know how much you can fit into that. Okay, but the first speaker in this session is Vladimir Alekseyev, and he leads the
data and ontology management group at the Ontotext Corporation from Bulgaria, which is one of the leading semantic technology companies, has been doing this for a long time, and they have about 70 people, and one of their products is the CraftDB database, triple
store. And he has a PhD in computer science from the University of Alberta, and he has done some projects, including the research space project with the British Museum and the Yale Centre of British Art, and he's also been working with publishing the Getty Trust vocabularies,
including the art and architecture thesaurus, for example, as linked open data, and he has also done several projects for Europeana. But his talk today is titled RDF by Example, so RDF PUML for true RDF diagrams and RDF
to RML for R to RML generation. What a mouthful. So, welcome Vladimir. Anyone who attempts to pronounce these abbreviations is quite a brave man. So you see a lot of diagrams in my presentation, you won't be able to read most of them, but
in addition to this version, which is a presentation, there is also continuous HTML where the diagrams are much bigger and you can read it at your leisure. I'm sure the organizers will put on the URL for this. It's on GitHub. So the link for this continuous HTML is here.
There is no PDF yet, I'll see how probably I'll also make a PDF. So in my daily work, I do a lot of data modeling for all kinds of domains, you see examples further on, and I've always wanted a good visualization too, I've tried several,
And I think that this is very important for RDF modeling because that's a graph data model, right? And I think that the people who hold the data, the subject matter experts, the cultural heritage professionals, the librarians, they have to be able to understand it and say
whether the mapping is right or not right. So I made a tool that uses plant UML which itself uses graph, these both are languages where you can define in textual form a diagram, plant UML is used widely in the software industry for doing UML diagrams, just describing them in text.
And the benefit of generating diagrams directly from RDF is that they are true, they are exactly what you mean in your model. It's not just hand waving and you don't need to update them or tweak them, they're
all laid out from the RDF. So here's a very simple example. I don't know how many of you are familiar with Terto, but on top is the Terto code. You see the last line there is a little instruction, in this case it just says that one of the nodes should be displayed in the node where it is referenced instead of
being displayed as a separate thing. And you see that the graph is easy to read, it corresponds to the Terto basically one to one. Everything that I can put in the node is put in the node to save space, to save clutter. These are called inlines, so the types and RDF literals are inlined, but you can
also inline more nodes. This is the generated plant UML, you see that it's not a very complex language, you have arrows there, node names and so on and so forth, but it can quickly get tricky when you have more features. So because readability is a very important concern for these sort of diagrams, I've
done several features, for example, you have parallel arrows, then I only show one arrow with several labels. If you have several values for the same predicate, then they are collected inside the node with parenthesis, basically shortcuts that you can also see in Terto.
I also handle reification and similar other specialized kind of things, we'll see them later. So this is an example of arrows that collect the property names. In this case we have several properties connecting two nodes and to save space we display them this way.
This is from CDOC CRM modeling the Getty-Cona aggregation of cultural objects. Now we can do a bit with the arrows, for example, change the direction, by default the direction goes down, but in this case because supposedly this thing on the left
which was the motivation for doing the thing on the right happened earlier in time, I just want to put it on the left to emphasize the chronological order. We can also change the shape of the arrow, dashed versus solid and that kind of stuff.
We can also put in what are called sterile types, these are these colored circles plus the Gilem method italicized names on top. So this is from the Getty-Tessari, they have different kinds of nodes that are implemented using on one hand scores, on the other hand this on the bottom comes from the ISO25964, the latest standard on Tessari.
So here we're just showing G's guide term, A's and Tessari's array, some particular construct in that mapping we're emphasizing with these circles and stuff. I mentioned reification, well the thing here is if you want to say more about the
relation, how do you do it, let's say confidence or provenance of that relation, some date or who created it, things like that. And there is something called the property reification vocabulary, which basically allows you to describe which properties are
used in the reification to address the relation and so on. So the tool recognizes RDF reification and CDOC CRM reification constructs and displays them like this. So basically the idea is, see this node on the bottom, rather than connecting it with
two arrows to the source and to the subject and the object, we just attach it to the relation and the addressing properties are here on the bottom and they have a little arrow on the left, on the right and, you know, this colon showing, just showing the
reader where to look to figure out how it relates to the arrow. Now we're getting to some more complex diagrams. So this is from the Getty Kona, modeling sources and contributors. You see a rather complex diagram, luckily it's just
the tree. We also see here collecting of values, so for example over here we have more than one value and also comments, so if you want to describe to the reader what this code corresponds to, you can put an RDFS label and it will be shown for an inline node, it will be shown with a comment after a hash sign. This is another part of the
Getty Kona, the Econographic Authority. It is similar in scope to the icon class that was described in the presentation before the break and here again we have reification, we have a custom arrow
because has spouse is a bidirectional property, symmetric property, so we show it to the left and without any arrow. Or I did some work for the American Art Collaborative which is 12 museums in the States trying to map their data to CDOC CRM and establish a
demo service. This is one alternative of modeling the concept of cast after, so one sculpture is cast after another sculpture from the same mold. It could be interpreted as a different sort of network. This is from the Ulupana task force on FRBR-OO and
FRBR-OO is a CDOC CRM extension for bibliographic data. If you have four classes in FRBR, you have I think about 35 in FRBR-OO. So here's some works after Donkey Hot and you know the various connections between them and so on and so forth,
rather complex. This example is from the European Holocaust Research Infrastructure Project. One of the research problems they're trying to tackle there is to investigate Jewish social networks and how that influence the chance of survival of people. And this is just the model here showing it uses CDOC CRM
and German National Library's AGRELON. It stands for Agent Relations Ontology to Social Network. This is an example from a European project called Multisensor. It dealt with video
annotation, news annotation, social network stuff. So it used a bunch of related ontologies for media fragments, open annotation and also quite in a large way NIF, the NLP Interchange Format for describing NLP results over text. This is a bit of a model for
social network analysis showing influence and centrality for person in a social network. This is again modeling open annotation and confidence which comes in another ontology, the Istanbul FISSE. Here an example of again using these
stereotypes. So on the left you have an original article in Spanish, these letters SSS, and on the right a translation in English. And you see that with just a bit of control, just saying this translation should go to the left rather than down.
We see very well the parallelism in the two parts of the network. Now because in Multisensor, one of the partners also does frame net analysis. We devise the way to embed frame net into NIF. This is again generated with PlantUML but not from RDF.
It uses PlantUML packages, just show the grouping of nodes. Now this here is an actual example of a frame net annotation. And it's not a model, it's generated from actual triples about one sentence. Even though it's totally unreadable to anyone who cannot zoom very deeply into it.
And this is only half of that diagram, this is the other half. But it was important for us to see the connectivity of this network and to make sure that the triples we were making were right. I redid one of the examples in the open annotation specification. So you have here blank nodes, lists, and interesting kinds of stuff. I proposed to the PCDM people, this is
the Portland Common Data Model for a common metadata model for institutional repositories. And over there the idea is that you use the circles to designate
different types of metadata. It's quite easy to write. This is a handmade diagram from Link TV. We discussed this project before the break. It's about video annotation. And pretty much an equivalent thing is in this diagram, which is generated from RDF. And of course a lot less
effort to create this one. This is the model of, if you have heard of the International Council of Investigative Journalists, Panama Papers, and now the Bahama leaks. So we did an RDF rendition of this, and this is sort of the data model that we used. GitHub turns out that they have, they
can show a diff of two images. So as the model was evolving, on the left you see the old version and on the right is the new version. Okay, and now in the last several minutes I want to talk a
little bit about, can we use these models to generate conversion to RDF. I mean everybody's data is in different systems, a lot of the data is in relational systems, and the W3C standard for conversion from relational to RML is called R2RML. And yeah, it turns out that if instead of
sample values we use field names, I made another tool that can generate R2RML conversions out of that. So here is from the Getty Museum a model of exhibitions, and just this node in the middle, it describes a particular sub-exhibition, if you will.
An exhibition being at a particular site, in case it's a travelling exhibition. Out of this we generate R2RML and one node generates about 15, because in R2RML you have to be very specific about the subject, the property, the object, every object you have
to describe with a separate node, and so on. And so what this generator does, it saves you a lot of work, and also allows a subject matter expert to inspect the model, and guarantees that the transformation will be consistent with the model. After we feed it this relational data,
it produces this actual RDF. The shape of this RDF is pretty much the same as in the model, but because you have two exhibitions over three venues, there's more nodes in it, right? This is a more involved example, the central node of the Getty Museum RDF, which would
be the museum object and nodes around it. And so this R2RML generation is working well for converting relational sources, but we're also having to deal with XML, with JSON sources. And then the question is can we extend this, and we're currently working to extend it
for other types of input. So there is RML, which is an extension of R2RML, to deal with JSON and XML. We're currently experimenting with it. There is XParko, which is a melding of XQuery and SPARCO, and I think that we might be able to generate XParko, or at least
a subset of it. For tabular stuff, there is Tarko. And here are just a few models to finish off. So up to now I've been showing stuff from Cultural Heritage, but here is things from clinicaltrials.gov. So we have a rather elaborate experience with
life sciences, things that are important to pharmaceutical companies and so on. And this is a model for just one part of clinical study results, which basically describes the statistical outcome. Lately I have been working a lot with company data, so this is done
in Broad Street data that is mapped to the financial industry business ontology, Fibo. Or this is legal identity identifier. This is a global initiative to make a sort of
global trade register, basically to make all of the US funds that created the crash of a few years ago to at least register and to know their shareholding and control structures. And again, mapping this glade to Fibo in this model. The difference is that over here we have XML, XPATs, and inside the notes are XML fields rather than relational
fields. In the further future, we hope to extend this towards RDF shapes. What you have been seeing here are RDF shapes, but there is a standard for that called SHACO. And I think it's a more modern approach compared to ontologies to describe your semantic data
model. And first of all, to be able to visualize RDF shapes, and secondly, to be able to generate them from a more succinct representation, I think can be quite useful. Thanks.
I'll generate some myself. Any questions from the audience? Thank you for your presentation. Out of curiosity, what visualization did you try first? Just
how scalable are the visualizations? To what degree you can actually fit stuff on your screen and actually still make sense? Well, a good example is V-Owl. V-Owl can visualize an ontology, and it's integrated in several toolkits for working with ontologies. But
in order to really be able to read the V-Owl diagram, because you have overlap of the nodes and of the labels, you need to drag them around to review stuff. Before that, there's visualizations that rely on GraphVis, but because they put every node out, and
because, for example, they don't use prefixes, don't shorten the node URLs or the property URLs, they're very hard to read. As for your second question, it's very important. I think what you've seen here is kind of the maximum you can cram on a screen, but
it's not a problem. You don't try to describe a complete mapping of, let's say, 200 fields on one screen. You split it up in four or five screens. Then you can just run the generated R2RML files in succession, and they will spit out whatever is needed. I think maybe the strongest v2 I've seen is by AllegroGraph. I've played with it just
a bit. I cannot make it work outside of AllegroGraph. And with being a competitor to AllegroGraph, it doesn't do a good enough job for me.
Any other questions? Maybe I'll ask one. It wasn't obvious to me, so is the RDF PUMO, is it available somewhere so that I could play with it, or is it still working? It's still not clean enough to put it out, and we're still wondering whether this is
such a smart tool that we can make some money out of it or whether we want to open source it. It is being used by others in the American Art Cooperative, but yeah, we will decide this later on. Okay. Anyone else? If there are no further questions, then we will thank the speaker.