We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Performing LOD: Using Europeana Data Model for the aggregation of metadata from the performing arts domain

00:00

Formal Metadata

Title
Performing LOD: Using Europeana Data Model for the aggregation of metadata from the performing arts domain
Title of Series
Number of Parts
16
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Imagine a theatre play. There are contributors such as the playwright, director, actors, etc. The play may have several performances with changing casts while actors may contribute to other plays. The play might be based on a drama which also has a screen adaption. All this is documented in manuscripts, photos, videos and other materials. The more relations you find among these performance-related objects, the more it emerges as a perfect use case for linked data. At the University Library Frankfurt am Main, the Specialised Information Service Performing Arts aggregates performing arts-related metadata of artefacts gathered by German-speaking cultural heritage institutions. It is funded by the German Research Foundation and aims to give researchers access to specialized information by providing a VuFind-based search portal that presents the metadata modeled as linked and open data. The Europeana Data Model (EDM) offers a universal and flexible metadata standard that is able to model the heterogeneous data about cultural heritage objects resulting from the data providers’ variety of data acquisition workflows. Being a common aggregation standard in digitization projects a comprehensive collection of mappings already exists. With the amount of delivered manuscript data in mind, the DM2E-extension of EDM was used and further extended by the ECLAP-namespace covering the specific properties for the performing arts domain. The presentation will show real life examples and focus on the modeling as linked data and the implementation within the VuFind framework.
Lecture/Conference
Computer animation
Computer animation
Computer animationEngineering drawing
Computer animation
Lecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
So next up, we have Julia Beck and Marco Kneppa. Yes. They are going to be talking about a project they're working on to aggregate and display specialized information on performing arts.
I'm happy to hand it over to Julia. Yeah, thank you very much. Yeah, after Andromeda's talk, I'm now very proud to stand here as a female computer scientist to present, so yay. Yeah, I'm working on the project,
Specialized Information Service Performing Arts. It's a funded project by the German Resource Foundation. And what we are developing is a search portal which is also online in the beta version now, so it's not perfectly fine, but we're working on that.
And to make you understand our product a bit better, I thought we do now imagine to do some research in this portal, and let's say you're interested in some theater play, like Leon St. Lena by Georg Buechner. And so now we type it in, we see we get some hints already what we might be looking for,
and then we get some hints. Here, about 100 hits we get from different theater providers that are with us, and the material types are ranging a lot from like normal books to costumes and theater bills
and also scene models, so we're dealing with a lot of different object types. And what we can do is then maybe we say we are interested in photographs from this Leon St. or from one Leon St. Lena performance, and then we get four hits by the Theater Museum
in Dusseldorf, and let's say we are very interested in the last one, Woman with the Guitar. Looks quite nice, so have a look. We get to the result page with the detailed view, and you get the typical information about the photo. It has a title, a description, some place
and where it was created and the date. What we also get here is some related content. So we can now, due to linked data, go to the performance description of Leon St. Lena that is linked to this photo.
And what is especially interesting here in this description of the performance is the roles we're having that are quite different from what you maybe normally see for books. It's a theater director, it's a set designer and the costume designers that were involved in the production, and then we also have some performance place as a special kind of place
where the performance took place. And what we also do, except from linking to performances and back, is that we link to person pages, like you saw before in the Swift Flip. We have something similar that you can get
more information about, for example now the creator, Georg Buechner, when he was born and the variants of his name, and we also get links to Wikipedia and Vayef and many more, as you see. So what's behind all that, the interesting part?
At the moment we have 16 data providers, but there are more to come from the performing arts domain, especially the German-speaking community at the moment, so Germany, Switzerland, and Austria. Their institutions are ranging from museums,
archives, and to libraries, where I'm from, for example. And the technologies we use to model the data as linked data, we use the Europeana data model, and this gives us the flexibility to model our data
as linked data, because the data we got was very heterogeneous, so it really differs the way the data is presented or described. And what we also used, because we got, in theater productions you have to deal with many manuscripts and documents,
so we used the DM2E namespace, which is digital manuscripts to Europeana. And what we also used is the E-Club ontology, which is a special ontology for performing arts, coming from an online archive about performing arts called E-Club. And what we also used in the background
for the framework that you see is ViewFind, which the SwissBib is also using. And it gives us the search and filter functionalities, and the solar index in the background where we gather our data together.
What we have for the agent pages are the, we have information from our data providers about the persons, but what we also sometimes get are the GND numbers, and with the GND numbers we can use the service from the German National Library, which is called entity facts, which gives us fact sheets about persons in JSON format,
so we can display persons with that one. So, and I told you before, we have a lot of different institutions in work. We have libraries, archives, and museums, and this is where the problem starts,
because it's like Europeana, basically. So we get from libraries, basically Pica data, Mark data, and from archives we get EID, or some kind of individual standard. So we get Excel sheets of FileMaker, we have files, databases, everything that's on the market,
basically, we have. And from museums we also get Lido data. So you can see there's obviously some work to be done. But the good thing is, now that we are transforming to EDM, that we have a strong community in Europeana,
and they really helped us to transform the data by providing us with some mappings. So we didn't implement everything on our own, we could reuse other products from the market. So, what we then do, is we have the data split into the normal title data,
and the authority data with our agents, and we enrich that with the entity facts, as I said before, and then we display our content in ViewFind, with a so-called record driver, which helps us to,
yeah, to show the data, because ViewFind is normally based on Mark data, and there's obviously no Mark data to come, so we had to implement some record driver that is doing EDM. So, still might be a bit unclear what we are actually doing.
In the first part, in my workflow, I normally analyze the data, and obviously normalize it quite often, because there are really different standards involved. When the data is not coming as XML data, we are also transforming it to XML,
to have some kind of portable format to cope with, and then we are mapping, and as I said, often reusing, but also implementing own mappings. Then what we do is enrichment, like entity facts, and what we also want to do is enrich with geonames and other enrichment tools,
but this is work to be done, and what's also work to be done is deduplication, because every institution has some book about introduction to dense studies or something, so it would be nice to deduplicate that, as well as deduplication of the persons, because I think we have 50 Shakespeare's
in our portal at the moment, according to, yeah, due to the different spellings, so I don't know. And what we then do is obviously mapping to the solar index format, and then it's in our index, and what's just really nice about that, that we have the segregation format,
so that the last steps are really the same for all the data, and we have some control about how we do that. I guess, no idea. Didn't knew that. So I'm continuing telling a few words about the model behind. We had to do a very pragmatic approach,
because we are working with real data, and we want to have a real portal, and so I start with a model as it is now, as we are using it now, and then thinking a bit about how it might be developed. We start with an example. You have already seen the photography.
It's in the EDM model. It's in provided CHO. Cultural Heritage Object is the short form CHO, and yes, we have a lot of properties. I don't show them here. I just show the links between the resources,
just to get a better view on everything, and this is the subject, is the person in the picture, and we have another provided CHO. We define it. It's the dramatic production, so it's the play itself, not the play,
but the performance or set of performances of Leonce and Lena, and yes, we define it somehow as a set of objects that are in the archives
that show what happened in this event. We have, of course, an author. Here we use already an extension of DM2E to be precise enough, and we have more people involved. Set Designer, where we use the ECLAP vocabulary
to extend the EDM, because otherwise we have only a contributor, and we have another contributor. He's the director. If we only use the standard properties of EDM, we can't reflect the different roles,
and we have a lot of loss of information in the data. But this is not perfect. We would go in this direction because we have a play. We have not only the performance, we have a play which was written one day, and the performance or the dramatic production is an event.
So we would like to model it with events. Where the object was present, has something to do with the event, and of course, the play has something
to do with the event. And then we can connect the resources of the persons with this event. And on the other hand, we have the author, which has a creation event, and we connect the author with the creation event.
This looks very nice, but it is not, because now we have the creation event, of course. The creator is the author. This is not really a loss of information, but here we don't have any roles.
So we need properties pointing from the event to the person or something like that to get more information. And of course, we have to work on the data to have the information in the data, because it doesn't make sense to model it
if we don't get the data in. But all together, we are happy with the model. It's flexible. There's no, not necessary to mention here. We have contextual entities. We can use from entity facts, and we can model all kinds of cultural objects,
because we have a mixture, as mentioned, from libraries and archives and so on. We can extend it by adding the roles to be more precise and not to lose information
of our data. Of course, we don't want to do this. It can be consumed anyway if we offer the data, which is intended to offer them as linked open data. So if someone is only interested in the connection with the persons without the roles, it can be consumed very easy.
And we can model all structures we have, and even more structures we don't have yet, but probably we get them. And of course, as mentioned, we can reuse mappings. We are dealing with standards.
Once we have started with the mentioned Excel sheets, and if they are in one standard, we can work with them. Yes, and the export for linked open data is ready. We don't need conversion for this.
Yes, and thank you for your attention so far, but there's one thing left to see. Thank you very much.
Do we have questions for our speakers? Yeah, thanks for the presentation. It was quite interesting, but I asked myself, where do you really store the data?
You mentioned the solar index, but where do you store the data then? Or did I miss the point? You could say that we don't store it for real. We have that index where the information is.
No, that are links, actually, that we get from our data providers. So they are hosted somewhere else, and we just get them. That's also a problem because we have some data providers that have more pictures, but we can't host them for them. But we also don't have that many pictures, actually,
because the theater and dance studies, they don't have that many digitalization projects yet. So this is work for the future, I guess. In case they aren't digital representations, we just link to the providers themselves.
So they have to have it online. It's the same idea like DDB or Europeana. Another question? Sorry, you're in here. Yeah, thank you very much. I found it very exciting because we are just in the process
of doing something around exhibitions, and we have all the problems. It's an event, it's a temporary event, but it is not seen from the different library assistant, like an event, but like an institution. And around this event, there are lots of different things and it's stored in libraries, archives, et cetera, and I hope we can reuse your experiences.
Thank you. I have some questions about the data model. It was presented, it's very non-specific, as you said. It's a sub-property of creator, contributor, and whatever else.
Also, you need more precise modeling. For example, a photo of a performance, the author is different. The photo was not present at the performance. It's a different event that captures something. I would recommend you take a look at FRBROO, which has a whole bunch of classes
that have to do with performing arts. I'm not saying necessarily you should use it. It's kind of complex, an extension of the CDOC CRM. But I think that you need a richer model to keep the data in a faithful way, and then maybe a simpler model to submit to Europeana. Because, for example, events are in the EDM, but they're not supported by the portal.
If you submit events to Europeana, they'll just get lost. It's something we're discussing in the data quality group, that events are necessary, at least for the purpose of capturing the precise contribution of somebody. So maybe the set design happened at a different time
from the performance. Maybe you have that date, where do you put it? And other details. But unfortunately, it's not yet supported by Europeana. So I think you need to have dual-purpose models, a richer model to capture exactly the circumstances and the simple model for Europeana submission.
Yes, I start with the second. So transmitting to Europeana, regarding transmitting to Europeana, since we don't have digital representation, we are not planning to submit to Europeana.
We would, of course, do if the portal was different, not restricting on digital objects. But in the moment, we are just collecting metadata, and we are just reusing the model, because we liked it and we wanted to reuse
all the mappings we mentioned. And this is the only thing we are doing. So I know Europeana is not handling events in the moment. So it doesn't matter, because we are using the model for this portal. And the second, of course, it can be modeled more complicated with an event for the photograph and so on.
But we have to be very pragmatic, because it doesn't make sense to model something you don't have in your data. So we always look at the data, what makes sense, and learn from the data what needs to be modeled. And hopefully we get some,
together we can develop the EDM model. It doesn't make sense for us now to improve the events, to introduce properties that point from the events to the person without contact to Europeana. And this is what we are going to do,
to be in contact and try to develop it. And one other thing is also that this example we showed now is a very nice example from the theater museum in Dusseldorf, because they use the Lido model. And in Lido you also have events. So they try to describe their data with events already.
But most of our other data providers, they describe their data very different. This is also a problem we have in the transformations, that some are coming from a perspective what happened in this theater house, and others are describing objects, others are describing the performances,
and some of them are describing objects and performances in the same record, and we have to kind of sort it out what it is. So sometimes the data is not really prepared to model these events. So yeah, it's hard to map that in such a nice model when the information is not there.
Do we have other questions? Do you receive authority data from all your institution,
or data describing persons, or if not, how do you address the problem? Yeah, so we have two different cases. One case is we have a GND number, but this is mostly for the library data. The other ones are not that introduced to GND yet.
But what we also get is we have some data providers that provide a lot of files with information about persons. So I want to also reflect that by putting them in the same index with all that information. But what I have to do now is to first de-duplicate this information by implementing algorithms
that really check for a birth date and name, is it all the same, and then hopefully be able to de-duplicate that. And for example, for the Swiss theater collection, I could do it really good, because we have a librarian working in this project who was enriching the Swiss theater dictionary,
which is online, with GND data. And I could then map these Swiss people to the Swiss people in our Swiss theater collection, which works quite well. So there we could do some enrichment. About the other enrichment, I don't really have the master plan now,
so I will work on that. But I guess it's really matching. Yes, same as we have seen before. Same. So future magic to be seen. Fantastic. All right, another round of applause for our speakers.
So our final.