We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Publication of Inspire Datasets as Linked Data

00:00

Formal Metadata

Title
Publication of Inspire Datasets as Linked Data
Title of Series
Number of Parts
295
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
In order to increase interoperability and facilitate the reusing of geospatial data, it is proposed a methodology of publishing INSPIRE-compliant datasets as Linked Data, using the RDF format and various ontologies such as the ones derived from the ARE3NA for the Annex I themes, or GeoSPARQL from the OGC. This methodology would cover the whole process of generating the RDF triples from GML sources, setting up a triple-store to persist the information, and issuing SPARQL queries to the exposed endpoint. A working example would be presented using the Spanish CNIG endpoint, where several datasets from the Annex I are hosted. Then a series of queries joining external information from other endpoint, like DBPedia or GeoNames, would be used as a mean to demonstrate the interoperable capabilities and the potential applications to enrich spatial data, extract meaningful insights from it and use it to support information systems.
Keywords
Graphical user interfaceLinked dataDirection (geometry)State of matterInformationStandard deviationOntologyMereologyDisintegrationSource codeData modelBridging (networking)Software frameworkOperations researchRule of inferenceNatural languageLinker (computing)Inheritance (object-oriented programming)Endliche ModelltheorieQuery languageFunction (mathematics)DataflowService (economics)Maß <Mathematik>Computer networkAddress spacePresentation of a groupInformationType theorySubject indexingOntologyData modelSemantic WebLinked dataInformation privacyProxy serverSource codeService (economics)AbstractionCartesian coordinate system1 (number)Software frameworkState of matterGeometryNormal (geometry)CASE <Informatik>INTEGRALRevision controlExterior algebraSearch engine (computing)Different (Kate Ryan album)Data conversionComputer configurationHierarchyLevel (video gaming)ImplementationWorld Wide Web ConsortiumStandard deviationAddress spaceMixed realityAreaCoroutineProcess (computing)MappingOpen setAdaptive behaviorSet (mathematics)Proof theoryNetwork topologyWeightMultiplication signDataflowData compressionPlanningPoint (geometry)Field (computer science)MereologyTraffic reportingIntegrated development environmentResultant40 (number)Execution unitDirection (geometry)Term (mathematics)SoftwareCategory of beingOnline helpScaling (geometry)Operator (mathematics)Forcing (mathematics)File formatDreizehnComputer animation
Linked dataCore dumpInformationScripting languageComputer fileOntologyDataflowQuery languageDisintegrationSource codeField (computer science)Repository (publishing)Standard deviationCategory of beingSocial classLinker (computing)Finitary relationTranslation (relic)Term (mathematics)Computer iconGeometryDigital filterMaxima and minimaPersonal identification numberMenu (computing)Data Encryption StandardDrum memoryMIDIPoint (geometry)Personal digital assistantService (economics)Execution unitLink (knot theory)UsabilityInterface (computing)File viewerMaß <Mathematik>Computer networkAddress spaceSearch engine (computing)Source codeSoftwareData modelTask (computing)BitQuery languageSet (mathematics)OntologyData conversionGoodness of fitSampling (statistics)Mathematical optimizationFlow separationFilter <Stochastik>Process (computing)Category of beingProof theoryLevel (video gaming)InformationDifferent (Kate Ryan album)ResultantMatching (graph theory)CASE <Informatik>Projective planeTranslation (relic)Scripting languageSeries (mathematics)BijectionDigital photographyNumberAddress spaceFile viewerService (economics)StapeldateiServer (computing)MappingComplex (psychology)Line (geometry)DatabasePolygonWindowMoment (mathematics)Product (business)Point (geometry)BuildingStability theoryHost Identity ProtocolRing (mathematics)AreaSpecial unitary groupHypermediaField (computer science)Parameter (computer programming)Mixed realityOperator (mathematics)Loop (music)View (database)19 (number)VideoconferencingArithmetic meanEuler anglesMultiplication signUsabilityGame theoryMetropolitan area network40 (number)Musical ensemblePrice indexLink (knot theory)TouchscreenDemo (music)FamilyComputer animation
InformationPoint (geometry)Open sourceAdditionUniverse (mathematics)Integrated development environmentWeb 2.0Optical disc driveElectric generatorKey (cryptography)Product (business)Transformation (genetics)Source codeLinked dataSearch engine (computing)Query languageSoftware testingLink (knot theory)Scripting languageSoftware developer2 (number)Vulnerability (computing)Multiplication signDifferent (Kate Ryan album)
Transcript: English(auto-generated)
So but now, we start with the first presentation, Publication of Inspired Datasets as Linked Data, by Enrico. Okay, hello everyone.
As Torsten has commented, the presentation is about solving or trying to provide alternatives for the use and the publication of the inspired datasets, and use them as linked data
so they can be better exploded by third parties. I will start with a revision on the motivation of this work that we have done for the Spanish National Mapping Agency, then the technologies and the status of the
different standards, following by a workflow that we have created as a proposal for the conversion and the publication of the inspired data to link data using RDF, and finally show you some use cases and conclusions of the work.
Well, starting with the motivation of all these, of all these, all these projects, the thing is how to make the information from inspired that the member states already have available
for the most possible amount of people, for the people in general. As you know, there are open data initiatives, and there are many initiatives to use and reuse and add value to the public data. One in this line, we have inspired as a motivation,
as a European directive, for all the member states' countries to try to make the information available in each country public. And the goal with inspired is that it has tried or is harmonizing all the data models
and the way that the information and the services and the information is presented the data model itself and also the services. It has been thought to an European scale, but the truth is that for each member state
has been also a challenge and has also make a very positive impact in how the information is distributed and reused inside the country, each country. And this directive, well, inspired make the information public via all the standards of the OGC,
WMS, CSW, all the download services, SOS, WFS, all of them. But this information is available to a certain amount of people. It's public, of course, but only the people know how to explore it,
how to use it, how to discover it, how to download it, how to deal with it. So what we have tried here is to make a kind of proof of concept on how to use other technologies on top of inspire. I mean, we have the information on inspire
and we make an adaptation of that information so that information can be reused and spread to get to more people. Okay, so this other technology is when the semantic web appears. So the thing is that we have chosen Linkedin data
and RDF as a way of using the semantic web and to translate the inspire data published to this RDF. So we can make the information public, not only not in this different language,
not only for the people with an SparkQL endpoint and so on, but also findable by the search engines. By now with the CSWs and the WFS, okay, you can upload a full data set of addresses or hydrography or whatever, but you cannot type in Google,
give me the Danube River or how to find this address, which is already published in inspire, but Google is not indexing, it's not Google. Or any search engine is not indexing that information. So that information is official, it's there, it's public, but the normal people, not the geospatial world,
not the phosphoryl people, normal people do not know how to find it or how to use it. So we have thought that a possible approach to it is to make a kind of proxy to publish it as RDF,
so it could be eventually indexed by any search engine. And also offers the possibility as the semantic web is something that is very old in the non-spatial world. So it can be combined with any other sources
that speak the same RDF language, okay? So we can make it linkable, this official data, to other sources. As commented, the goal of this linkable data is that it is also a long-term, a long-standing technology supported by several standards.
And also has, well, the people from ARENA, the ARENA activity, has also realized that it can be a possibility to explore this linked data option to publish the information.
So there has been also an activity in this case to support this kind of publication. So with the publication of the inspired information in the linked data format, we can integrate different data sources and reuse the official geospatial information
with external ones like DBP, the audio names. But the first goal here is to define a methodology that could help us to make it consistent in different inspired themes or even in different use cases or for a different country.
So we have been also defining this methodology as a kind of workflow. And also choosing the technology framework for this workflow implementation. The state of art of the semantic web is,
well, we have the RDF or the OWL from the W3C that standardize the way that semantic web and linked data is structured. And you have to, well, it's quite abstract.
It's an abstract concept that I think that the best way to understand the RDF is that it's just another data model that you have defined several hierarchies. And at some points, you link at a certain level of that hierarchy or that data model. And then you can start to relate different data sources.
Then to explore this semantic web and linked data, there exists the GeoSpark UL standard, supported and developed by the OGC, that produce RDF queries, spatial queries, and predicates to perform the more usual operations,
see the intersections and so on, playing with the geometry and with RDF. So you could mix things, for example, give me the natural areas intersecting with this river and that kind of queries,
mixing different data sources in RDF via the GeoSpark UL endpoint. So we have had a defined, we think, a simple workflow to migrate the information from what we have here
in the step zero will be the features published according to Inspire in WFS, in a WFS service. Okay, from that, the first step will be making a mapping to the ontologies that we will explain later, then an ETL to generate the RDF,
then upload the RDF to the Spark UL endpoint, and then start playing with the Spark UL endpoint, not only on this, also building applications on top of it. For the extraction of the Inspire data sets,
we have selected a few of them, which we have been researching and thinking, which will be the ones that could mix with that value to external economic operators, for whoever who want to make routines or who want to make environmental processes and so on.
We have picked up the administrative units, municipalities and so on, geographical names, transport network for optimal paths, the hydrography for the environmental and the addresses in case you want to reach places.
I think those are the usual. So from this WFS services Inspire companion that we have picked up, the first step, or the second step, sorry, has been to map the schema to the ontology. Here, there is an activity from the ARENA project
where they have defined the spatial series of good practices and also vocabularies for the Inspire themes in RDF. So you can find the data model in WFS and also the RDF in ARENA,
the RDF corresponding to that Inspire data model. So what you have to do then is to make the mapping, which is quite a straight, of course, always there are small difference that you can struggle and tackle then
when you are doing the mapping, but in principle, it's a kind of one-to-one thing. Then what we have to do is to pick up the GMN with those mapping that we have defined in step number two,
create the RDF triples, which is practically what we have said before with the WMS DML, and then the ontologies from the ARENA, one-to-one. Here, we have given a photo to all of these, and finally, we came to another ETL script in Python.
We tried to use kind of the O K T E L and also R M L to define the mapping from the XML to the RDF format, but with so large amount of data in the GMLs, they are huge, finally crashed.
So we couldn't make it work. We missed enough reliability with existing tools because of the huge UML that was produced by the WFS.
So finally, we had to run batch with Python. It was, anyway, it was quite straightforward. It was translated from the GMN to the RDF. Then when we have the RDF, the thing is to upload to an endpoint. Here with the endpoint, we have two,
let's say, main in our research. From our experience, we have located two main RDF servers for the linked data. We target also, and also Parliament. Well, Parliament is a database with an endpoint
for the triples, and it also is a little bit more. But we have, in our previous experience, we chose Virtuoso, and it worked with points, but for complex operation, we were not able to make it work.
So we tested Parliament with more complex features like lines and polygons instead of points, and it has worked. So we finally chose Parliament as the endpoint for the RDF and the Spark UL.
Then once you have generated the RDF files, it's quite easy to upload them, and then have the endpoint to test the RDF and make the queries in Spark UL. So finally, once you have uploaded it,
you automatically have an endpoint. You have here the URL. You have a window there where you can write the Spark UL query, and start to mix the sources with another RDF sources.
Well, now this is, with this, it will be achieved one of the main issues that we wanted to address with this project, which is to be able to publish the inspired information in linked data format, so it could be exploded in a machine-to-machine process.
But we also wanted to integrate it with external sources as DBpedia or geonames. So here, what we have to do is in the Spark UL property, sorry, query, define a property
where we say that these two tasks are equivalent. If you remember from the previous slides, we said that we could think on this linked data thing as data models. They are all different, but at some points, you can link them
with a tag like same as. You can say, I don't know the name of the municipality. It can be not three name, and in the other, city name. You say, define this is the same, and then you can link all the ontology. So with the add of this tag to the query,
you have the query linking external sources. So you obtain a second wall, which is connecting inspired's data sets to another external source. And now here, it's a little bit of samples
of several of the steps that we have commented before. Here, you have the conversion from GML to RDF. As commented, it can be done with any ETL, but because of the size, we made it with Python scripts. Here is a query, a Spark UL query
that mixes the information from the inspired RDF and the DBP endpoint by setting up filters and mixing the both sources of data.
This is the endpoint of parliament. But this is the result, okay, where we have the WKT of the geometry, how it has been, where has it obtained from.
And then on top of it, as another proof of concept, we made a, well, a map viewer that could explode physically shows in a map that the results
of those queries, just to make it more user-friendly, because think that this Spark UL thing is not really designed for a person to do it. You can see this, this is SQL, but even more complex. So this is conceived for matching to matching queries.
So with these Spark UL queries, we have achieved a good performance with the parliament. We have been able to link the DBP and geonames to the RDF, linking the inspired themes that we have published.
And here we have another example. You can see here the same as, the same as tag linking the both regions. And finally, what you obtain here is information that little pop-up with the name and the population,
that seems very little thing, is the mixing two different data origins in linked data. So as conclusions for this, the results that we have obtained for this work
has been a pipeline or workflow to define the WFS to RDF translation and publication and tools or scripts or a stack of technologies to be able to do it so with those inspired themes.
And for the Spanish national agency, they have now an endpoint with all these inspired themes translated to RDF. So eventually, a matching search engine could index it.
Well, this is it, this is it. This is a use case on how to translate from inspired to RDF and connect it to external sources of data. So if you have any questions.
Okay, thank you. Are there any questions? Thanks a lot. I would have three fast questions. First one is transformation script,
is it accessible for reuse? The second question is whether you are aware about some already indicated the demand for the data you made available via linked data approach. And the third question goes for this linking of inspire,
link data with additional data resources. This linkage is executed during the generation of the query or there is some specific environment where these links are created. Thanks. Thank you. For the first one, still is on development.
It's not in production for their testing. You have the end point here, but the scripts are still not available, but the Spanish National Mapping Agency is committed with the open source, so they will be available.
What I can tell you here is to try to email, to obtain data, to know when it is available. The third question, it is made on the fly when you make that query,
which also can lead to performance issues eventually, which is a weakness of these kind of approaches. But there are also, well, there are kind of catches, but we have not used them in this project.
And the second, I'm sorry, but I was not able to pick it. If you already receive some data from outside, some public sector, academia, NGOs. Well, most universities are researching on this.
We know that universities are digging into it as a possible way to make data more available and link information from different sources. Also, there are some public entities
that have complementary data that have been interested in this approach. But, well, they are studying this because in the end, it's a matter of resources. Maybe for them, it's easier to ask for the database and do it at home and not build a Spark point and so on.
It is more focused, I think, in the publishing the data for the web search engines. I think that's the goal there. Do we have more questions?
I think we still have time for a short one. Nope, okay, if not, then thank you once again.