We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

A distributed Network of Heritage Information

00:00

Formale Metadaten

Titel
A distributed Network of Heritage Information
Serientitel
Anzahl der Teile
15
Autor
Lizenz
CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
The Dutch Digital Heritage Network (NDE) started in 2015 by the national cultural heritage institutions as a joint effort to improve the visibility, usability and sustainability of the cultural heritage collections maintained in the GLAM institutions. One of the goals is the realization of a distributed network of heritage information that no longer depends on aggregation of the data. This talk will focus on our approach for developing a new, cross-domain, decentralized discovery infrastructure for the Dutch heritage collections. A core element in our strategy is to encourage institutions to align their information with formal Linked Data resources for people, place, periods, concepts and to publish their data as Linked Open Data. The NDE program works on making all relevant terminology sources available as Linked Data and provide facilities for term alignment and building new thesauri. Another important goal is to provide means for browsing the collections in a cross-domain, user centric fashion. Based on possible relevant URIs identified in the user queries we want to be able to browse the available Linked Data in the cultural heritage network. The bi-directional use of Linked Data without aggregation is still a technological challenge. We decided to build a registry that records the back links for all the URIs used in our network. Next to Linked Data definitions of organizations and datasets we will also record fingerprints of the object descriptions. This information will provide the back links which make it possible to navigate from a term URI to the objects that have a relation with this term. We are currently developing a Proof-of-Concept and will show the first results at the SWIB conference.
Besprechung/Interview
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Besprechung/Interview
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
ComputeranimationFlussdiagramm
Computeranimation
Vorlesung/Konferenz
Vorlesung/KonferenzBesprechung/Interview
Transkript: Englisch(automatisch erzeugt)
Good morning everybody. Very nice to be here on this stage in this conference. It was announced as the distributed network I'm going to talk about. And it's on the title of my slides as well.
But it will be the distributed network is the ambition we have. That's what the goal is we're trying to reach. And in this talk I'll explain how far we are in approaching that goal. And what kind of vision we have behind doing this.
In my presentation I'll be explaining a little bit more about the initiative in the Netherlands for a digital heritage network. It's actually a lot broader than only the technical infrastructure. Then I'll be looking for a short while at the current way of doing things.
Like the way we aggregate collection data from heritage collection. And what in our perspective the problems are with doing that. And how we could really think about doing it differently.
And what strategies could be to improve that. And based on those ideas and in that part I'll be addressing a number of problems. We have a linked data community, a semantic web community in general I think. And that's for me a very important topic to discuss here.
Because I'm really interested in how we can think about ways of doing things differently. And I think there are some shortcomings in the current technology of the semantic web. But I'll be talking about that later on.
And then based on these ideas I'll show how we are trying to implement it in the Dutch digital heritage network. And how the steps are we taking. The Dutch digital heritage network is an initiative of the five national organisations.
I'm from the national library. We work together with the national archives, with the national scientific institute. Also with our cultural heritage agency.
And a number of research institutes as well. At some point a few years ago we sat together around the table at the Ministry of Culture. And said well we really should think about the way we are doing things. And how we can improve it in order to increase the social value of the things we are doing.
Because we do a lot of things. We are investing a lot of money in making our collections available in the digital way. But the results from that are not optimal as I say politely.
So the idea is how can we look at all the things we are doing. The way we are approaching our problems within the domains. And how can we talk to each other and see what the possibilities are to improve that.
And that's written down in the strategic plan in 2015. At the moment it's working out in founding a new organisation in order to create this way of working together.
So it's really a big thing where all the national institutes are involved. But also the networks they are part of. So it's also about provincial, regional and local institutes. And we are really trying to talk to them and work together and see if we can improve things.
And doing it from the user perspective is really something very important for us. So it's not from thinking about from the perspective of the domains, the things we are doing.
We are basically doing great things of course. But the user doesn't know our structure. The user doesn't know our data. The user doesn't know what is where basically. And if you think backwards, and I'll be showing some examples of that in linked data problems as well.
You really need to think in a different way to do it right for the user. And we heard it yesterday also of course. We came up with a three layered model which is probably not really a new concept for you.
But we are really focusing on three levels. We look at the sustainability of all the things we are doing and the digitization processes we have and the digital material there is. And see if we can share our services in the network so people can profit from the things we have.
We are looking from the visibility layer and see if we can make flexible services that are really user based. And the user will be of course anybody. It can be any type of uses of our data and we want to provide that from a flexible network.
And that's the middle part. It's about the usability of the things we have. And thinking in interoperability technologies like linked data is a very essential part of this strategy.
If you look at the current infrastructure, I always like to show this slide. This is an example of all the great things we are doing. There is Europeana in there. There is our national publications environment that we are running from the KB together with all the university libraries.
With millions of digitized papers in there. There are regional portals. There are specific portals for material types. If you want to get the whole picture being a user then you have a lot of problems.
Because you end up searching all the different portals in order to find the right answers. So there is really no easy way to get an idea of what is really available. And all these portals are built on the same kind of infrastructure.
As you all know, if you want to build a portal like that then you start aggregating the data. We have these marvelous solutions for that and then you build an index and you show the people what you have.
And you do that based on source data that is around and which you aggregate through an aggregation process. And I think it really brought us a lot. The OAI PMH protocol is a very useful protocol. We are all using it and it helps a lot in order to deliver the information to the customers.
It did open up our silos before OAI PMH. We had closed systems with all the specific protocols in there and now we have a general way of addressing or getting to the data.
And it also urged us to start thinking about our own data because the data was merging into central places. And we needed to think about data models. And I think the work Europeana did with the EDM model is really great to see how you can look at the same data from a general perspective.
And it supported across the main cross collection feasibility. So I think we have been doing great things. But it's not very optimal.
It's not the smartest way to do things. And this is a mapping from a real life situation in the Netherlands where we aggregate data and then bring it to one aggregation platform and another aggregation platform. So hey, that's convenient.
I'll get my feet from there. And then we build trees of aggregators. And if you think about the real data is living beneath and the user is somewhere way further at the end of the line. And the connection between the user and the organization that maintains the data is very distant.
The information is being copied. It's being enriched. It's being corrected. But all those kind of things are not going back to the source because we don't have at the moment.
I don't know any system that has really good provisions in order to feedback the enhancements that have been done on another place. Because we're just doing copying and processing steps further.
And if you look at it, and I'll stop bashing the aggregation approach because it is very useful and we still need to do it. But if you look at it, there are two major problems. One problem is that we have a very poor semantic alignment.
So we don't have general attention for naming the same things with the same identifiers with the same definitions. And I think the integration of the data itself is problematic because we keep copying stuff and we have license problems, we have version problems.
We have data living in one portal that's not the same version as data living in another portal. And we thought, well, how can we make this, how can we do it smarter? And for that we started to think about principles of building a discovery infrastructure based on different principles.
And one very basic idea in that is that we want to keep the data in the source. We want to have the source being the leading point for publishing the data.
And of course at some point you need to build an index, you need to do some things with it, but it always should be referenced to a clear source. And that's a very principle design idea in there. And if you look at it, we try to catch on with the general idea of going away from the repository centric way of approaching the web,
approaching the data living on the web to a more really web centric idea.
So let data live in the web and be usable. And I was here on Monday enjoying the workshop about Dochili, which is really the way of thinking about this. And that is basically what we're trying to do as well, and we try to approach that using the linked data principles.
Well, I think we cannot be, it's not really a need to explain the linked data principles for this audience, I think. But it really comes back that in the source data that you use the URIs for shared references. So for the terminology sources that are available, make sure that people can use
the right URI for an author, for a place, for a person, for a concept. And make that available in the source system. And another thing, there was yesterday a discussion about in the keynote what should you make available as amount of data, so the top ten fields.
And I think thinking in linked data terms, there's no need for that because you can adjust the way you open up your data according to the user needs. Because you can select models that support and like schema.org is a flattened model for search engines.
So search engines understand who you are and what you are doing. But if you have a need to show the complete deep model for your domain, then there's no obstacle for doing that. Because linked data supports multiple models and even, I know that in the data exchange working group there's even thoughts about
supporting this on the HTTP level that you can ask for one kind of format for your data or another one. I think that's very interesting.
At the network level, we try to provide the things for the institutions. So we have shared terminology sources being published in the right way. We provide APIs that can be implemented in systems. And we're really building on previous work because a lot of these ideas have been around for a number of years.
We try to use commercial tools. There's the pool party tool which we use for shared building of authority lists. But also specialized tools that have been created in the Netherlands.
So basically you say let's build your data, use the right URL for the right references in there and make your data available. And it's of course a big problem because source systems can't really provide it because they're not fit to do that.
So the IT suppliers supplying the source systems, the collection management systems are
really in our focus as well and really are organizing sessions with them. And they're actually really seeing this as a strategic advantage in order to really go along with this. And it's very interesting. But even if you do publish it like that, then we have a problem.
And this is a little bit of the linked open data cloud. But there was yesterday the question who has his data in the cloud. But the question should have been who has his data published in data hub. Because that's the basis for the linked open data cloud. And it means if you have linked data and you want to have it being found, you need to register at data hub.
Which is kind of a weird model for the semantic web, I think. And that's actually something that's also being picked up with DBpedia and the new DBpedia strategy. And DBpedia will be really focusing on making the connections between data sets and will be supporting structure to find relations in data sets.
And that's a new strategy that's being developed at the moment for DBpedia. There's another problem. This is all the things we do.
We have an object and we point to a definition. And this is from the Arts and Architecture Thesaurus. And we say, OK, this is about the windmill. And that's great. But the user doesn't know the object. The user knows the term windmill and wants to find all the things that are connected searching for windmill.
So what you want to do is basically have it also the other way around. So put a backlink at the source description as well. And this is something that we cannot do at the moment. And I think the development of linked data notifications and protocols like that
are very interesting in order to solve this part of the problem, I think. So what we do today, there's a few approaches to make it work. You can publish it at schema.org and then let the search engines do the magic.
We can copy it to a triple store. And that's what we do normally. We aggregate all the data to a triple store and then build very nice things. But then we are building the same kind of aggregation platform. We maybe could do a federated query.
But based on the current Spark or endpoints, that is not really feasible, I think. And then, as we saw earlier as well, we're really looking into using linked data fragments. And we are working together with Ruben Verboer and colleagues to see if we can make this work for our network.
Doing federated search, I have to speed up, sorry. Doing federated search is an interesting idea, but we will be ending up with 1,500 institutions. Doing federated queries over 1,500 endpoints is not going to work.
So what we also want to do is create a layer with the backlinks in order that we know, OK, for these kind of topics, for these kind of places, these kind of sources are the most feasible to query. So that brings me to the last part.
We are building a strategy for this distributed network. And that means at the source level, we'll make sure that linked data becomes available in all the sources. And at the network level, we want people to register their linked data so we know at least that the linked data is there.
And we will build a knowledge graph with all the backlinks in there. So we have a discovery infrastructure for the linked data in the network. And based on that, we can drive portals and platforms in order to select the information and even do aggregations.
I mean, that's a practical approach, of course. So if you want to know more about this, we published this in June this year, a high-level design. And it's available on GitHub.
We're working on a roadmap. At the moment, we are really in the prototyping phase. And we're actually trying to work from the current infrastructure and make additions to it. So we're not planning to build a complete new different thing that we're just trying to add in the current infrastructure,
add new technologies and smarter technologies in order to make steps. And we do that in multiple projects. And I have a few links in there because I promised in my abstract that I would show things.
If you go to these links, then you find the first ideas. And it's not the distributed working full-fledged, but it's just working on these ideas. So create the shared references. Use the linked data from different sources. And one of the projects is from Adam Net.
And Lucas Colster is involved there as well. So I find it always nice to mention his name. OK. Thank you for your attention. Thank you very much, Enno, for these insights.
I think we have a little time for some questions if there are any. So the comment about linked data notifications, is that something you're actively working on?
Or have scheduled? Because we've asked similar questions in some of our projects. We have been looking at it. And for me, the thing keeping us from doing it is the implementation part at the local site.
So you really require institutions to handle data finance protocol. So I think at some point, we will be using it internally in our infrastructure. But in order to require suppliers to really implement the protocol, that's for us too early.
So that's why we are focusing now on the linked data fragments approach. Because I think that should be able to implement it locally and then build big links from that. So that's our second prize.
Thank you. Thanks again, Enno. Thank you.