Merken

Spatial data and the Search engines

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
as hello hello welcome in this
very nice room with a view so so will view was yesterday and a decanter presentation by me just a couple cases so there's a couple of duplicated slides because and this in a policy in the slides it is there's a lot of relations between these communities and in their programs as Maria again it's but it's again we and and it's claim borderline and then the giver helst she's she's she was not able to to be here and there in the conference
a so this is a slide from from the from the brink from from from gene over at the Dutch as the organization or which she showed us as a kick off of the expected due date for the web and does that there was running in the Netherlands said in at the beginning of this year and actually in next phase is running out so as a special community where quite disconnected from the rest of the web and energy of 4 where does that mean mostly focused on the on the search engine spiders there's a lot of sees W and W fast services in distance and those services of view most are quite invisible proteases engines so she presented that with a brick wall also then the users that are not aware of the OGC services also gets kind of yeah maybe what is that frustrated or just we don't understand and I wanna mention is or mentioned you know from the other group and you know from is actively participating in and also is the OGC and WGC working group which was set up last year and wants to have the gather a set of best practices and to to bring those 2 worlds together and in that picture is from 1 of the presentations and don't no actually you as the year the ownership there but convergence Mr. global and Mrs. Q party other answers is missing Mrs. globalization and Mr. yeah so there were 4 topics in it as bad as some other ways of data publication and usable spatial you are going to them 1 by 1 so topic 1 was canceled due to no no it is sensible proposals were were made but and there redefined the topic and that's the face 2 which is currently running a topic to was a usable spatial that publication platform I'm not going to much into that they identified as a guide to the B S D the platform to use and the 3rd 1 is a was scrollable geospatial data using the ecosystem of the web so they took spatial data and define best practices from the API and a kind of world what would be the best practices and they have a lot of interesting don't and get up you can go to that and check it out
the main conclusion and is that they introduce interesting terms like developer
experience in instead of user experience because they not focus on like and developers and this thing time to 1st successful service call and like distance and the date the eyes are very much based on this way which is kind of a this seems to become a kind of standard within the API the world well 1 of the findings when it was search engines have limited content negotiation because in this way respects said to use a lot of content education but this is apparently not use the search engines and there while this into it lies with our findings but search engines are just quite unpredictable then I go mad our topic and this is where I and slide to Clemens and much OK so so our work was really or research topics so that's why it's called topic were
called the research topics is that unlike topic 3 so they were very similar because the idea is how can we make the spatial data accessible on the on the web crawlable by search engines usable API is and that topics 3 they really and in use anything on the spatial data infrastructure but our our topic was focused more on the on the aspect of OK we have the spatial data infrastructure was C is w like we've heard the presentation before WFS WMS and and and and and we have been GIS software that clients that can connect to that and developers were familiar with the OGC standards and they can also use that so so we reach that but the question was how can we and get through the wall to the rest of the of the community so the approach that we propose then that was
accepted that we actually introduced the proxy there
right so an additional layer of but which has transparent proxy so we don't really catch the data etc. but but what we do is we actually have individual components that on the back and actually act as clients through CSW Paul will talk more about the metadata and is w the part that you see on the left-hand side of and and I'll talk a little bit more on the on on on the part in the in the center which actually and use the data so and when we tried to map the principles that we have in the spatial data infrastructures which support very complex things that the GII people are interested in to the more simpler things and different representations that the web is interested so we there is the result of that is also an open source of our project the LD proxy that's the Linked Data proxy but that is available and it supports just WGS 84 it supports the modeled on work as the mechanism that's and search engines understand them and also support content-negotiation and most importantly for recall ability that we make links to every feature in the WFS so that the address database has 8 million or 9 million features and you can clicks for each of them but and that makes it indexable by search engines if the search engine would index everything so as as you mentioned there's some unpredictability on
how quickly they do that the or or what they do so that's there's still quite a few of the questions that are related to that but with that but we can we reach the search engines we can also make the data by also not making just available in HTML but was due Jason J. Smale it's such a we can also make it available to to developers and using Web API is and then we also established
links so that's if you go to Google and would publish everything on L the prox c . net so that that's that's sort of the deployment of the of the proxy software but I'm and then you can find some that are currently 18 it 2005 110 results in that set and when
you click on 1 of these links so for each if you click through from the landing page you can see that on top the inspire address WFS that's basically the entry page that's the capabilities document so to speak and WFS words and then you click to the feature tidy addresses and then you can pay goes for pages and then after all the data and then you can can think of to individual features like the 1 the address that that you see here then you can find that also then of using searches but I think
that what we also have done is said to have created a darker but the image and you can you can get that's use your talker and get the software up and running and connected to your WFS in in in a few minutes and so so that's 1 thing and that is that is available to to make it possible for you to try it out very quickly with your WFS as we try to support as many WFS so also with some limitations but that they have as good as possible so that but we're also interested about hearing but you're hearing experiences the decoder's on on the top and so you
can use that so that to it was that I'll hand back to Paul this so it isn't this is a similar experience with then for the for the year's w part and if you are looking Google for a certain dataset title you'll find in this in this case Open Data count was the 1 that we we we use as a test instance it will give you did the dataset metadata the yeah so
that will that will bring you to an HTML page and showing that data so so this is a slide of should yesterday to uh we identify more or less there's probably a lot more but for main data uh communities which all which is and then uses their own kind of metadata standards and so our world use eyes 1 1 1 5 and and here on the search engines side we have this schema cumulative Oracle datasets for ontology and the D. get ontology note that was most of my side my position yesterday was focusing on that 1 but there's another 1 I don't have a size of that but I think it's really interesting and community also it's the Linked Data community and the Linked Open Data community what then in the 2nd line here is also interesting uh because especially in the both in the search engine will but especially in the Linked Data world and linking to a common ontology and got a common vocabularies is is very important and to be and discoverable and to be connected what you see here in together world is that a lot of governments defined uh echoed list and have a legal background which is which completely makes sense in in their legal world but and in the risk is that they're quite disconnected from from the other communities which use DB pedia has kind of the center of the cloud or at the Google Knowledge Graph for for the Google search engines and so the point here is that if you want a major data available to make sure that dental links between these vocabularies exist or service should data for each of these communities to have the capability because that that allows each of these communities need to and to consume the data
and so in color slides about the schema of the data sets so schema log is his initiative of the of the year and in the main search engines to have a shared ontology for things and
but it also community website where where they can
anybody could put dissipated developed the the standard for
and this is what it looks like on on 1 of those popular search engines and if I would look for for is building and I would usually find this type of results but in this case the search engine has uh and stored some structured data about the things we nose OK so this is a building and and and this is the matching scheme 1 of the so so I will be really great at that data and that we would have to sing for datasets however this GenGen hasn't implemented that yet so it I mean we so we we publish a schema of our dataset and annotation in our HTML and then the search engine would see that there were representing a dataset there and couldn't easily make a nice dataset summary here but you know so what what's to come and this is an
interesting tool it's destructive beta testing tool is 1 helps you do to to see if the if the HTML that you created in your catalog of in your proxy layer what what what things that the today and the crawler is able to extract from its which which is structured data that this is where the code is for DVDs sky modeled ought to mapping of Pfizer 1 9 1 2 9 you see it's in it's in a plug-in of gene network so so any national profile plugin can have its own customized uh scheme under the log or d get mappings so it's not in accord genetic owed itself and some
challenges but we identified
directly of of them as persistent identifiers for WFS records and there's aspire IDE so in theory this this could be done if if everybody would feel that that is a unique field in the WFS but we've seen a lot of W offences which don't have a unique identifier then then it's really hard to to create a persistent unique and your eyes for WFS records um we've seen a lot of bugs and that links in existing catalogs in existing WFS is so because Google just really punish
you directly says OK I founded in 2000 dead links on that page that picture but separation so the idiot because it goes through all these features and gone the negotiation it's you you really don't know what what and we found that on the nutrition is really boring search engines so so as and if you want to have the data to the channel due to a surge engine just give it an explicit URL for an explicit format in this case HTML with but if a man yes another challenges
and we had to so that we wanna keep the relations which exist in the WFS also in the proxy and then you have to like it make sure that that did did did the datasets which is also exposed and words link to is also exposed to violent proxy and then link to the opening now of the proxy juror so topic 1
is still continuing and now as topic 5 and 6 and by tripoli and alter
again dead those those
experience and that their mission was to exceed a proceed on on the findings that we did and and then bring into praxis practice and
indeed it is and it's really nice to see that their work is coming is will be presented this September 8 but already begin to give us some some something so this is a lot number model from Tripoli is there is a big spider fulling data so it's spiders or where to find triples and have a lot of triples already um and they said the set of blood laundromat to harvest III of the proxy data into a triple store so all that w fast data that is exposed by all the proxy is doubt proxy it into this 1 and this 1 has a SPARQL endpoint so we can now because the browsers couldn't answer questions like give me everything within 90 kilometer read a radius that has now that this is the opening hours between this and that and the the distance difficult questions again there's our interface on the search engine to query them but it is here so that we're really anxious to see the results in a couple weeks what they what they found
which so and being found in
the search engine and is not the same as Linked Data In and those are 2 different communities so if you wanna if you wanna targets those audiences you have to set up you have to make special rules for each of those communities and being exposed by a search engine also helps you because the search engine will tell you which of your day dies and is used and is is is valuable and what of course the other policy approach is is it interesting for example for buildings and then industries and lanterns would it will not be invalid of valid for every dataset this doesn't make sense to have a and coverage pixel as a single website so some resources that
there was no of sorry thank you born thank you Clemens there's plenty of time for questions it so if
you going down the prox is working with V WFS but only so would it be possible to extend it to use all the backends like rest phase API and things like that but that yes of course that it would be a so but we basically use a WFS line because that's the that's the data that we have in the STI and then that was we were actually reusing existing code that we hadn't just made it available open source so because we also have another development that actually provides then the Esri ArcGIS API but now we change the operands and made available for the Web structures and the lower and we have the WFS client but it could easily also create other clients if if you have a specific API said that you want to support I have a question to the audience that so so if anybody here from from the like degree GeoServer map server community and because this is a proxy approaches so it's not the final goal and the final goal is that that that the actual server implementations implemented these API and as I would challenge deduce over maps of the community the think about OK Our OGC standards only once every 1 has support or do will only the we also want to support this type of of of API and maybe somebody has a has an opinion on that does anyone want to comment on that I'm not from the Jews certain but I know that just community extension reduced fare remember corrected which publishes WS WFS data directly into which demands that Candler crawled by such engines but I don't know if it goes up form as it approaches but sorry no the thanks for the talk and found on the guitar purple that you propose an extension harmful scheme and command probably if 1 of you is involved in that full degree you graphical domain could elaborate on this 1 this the need for specific you extension schema and compare how does it compare to the existing due on Vocabulary and schema things like this is great to hear that somebody actually read the report that uh yes so so what we find is that the current now the current location model in schema . org is just limited is is that actually has hours and and it's not for example not defined what is a separator between error lots along with this is very basic and also but it is a shame leak is of the of G because she did that work as she proposed and the and extending extension or optimization of the current as the mother-daughter location model yeah but I think when 1 of the concerns the legal had was that were how it has been defined is really consistent with the data you would structured from linked data point of view and and we had a discussion with Dan Brickley from from Google who is biting them the main guy behind sky model work I would say that I'm and there have been ongoing discussions although in the scheme of our community about the GUI extension for for a long time my guess is I don't think that if they know that it's kind of broken and they're trying to the hot what it what it should be but they're still trying to find out what the what the right way as I don't think it would be going there right in in in the way that the game proposed in our report I'm and I think it's it's more the discussion that is also currently going on if you look at how do you Jason dust things and at the same time if you want to move it to Jason LDA so more RDF representation that doesn't work so there is ongoing discussion also between these communities and I think that will influence also housekeeper will to and would you in the future any more questions or comments that's not the case
Rechenschieber
Sichtenkonzept
Relativitätstheorie
Besprechung/Interview
Optimierung
Kombinatorische Gruppentheorie
Domain <Netzwerk>
Sichtenkonzept
Selbst organisierendes System
Benutzerfreundlichkeit
Spider <Programm>
Gruppenkeim
Content <Internet>
Term
Systemplattform
Computeranimation
W3C-Standard
Energiedichte
Dienst <Informatik>
Benutzerbeteiligung
Menge
Code
Abstand
Elektronischer Programmführer
Softwareentwickler
Phasenumwandlung
Benutzerfreundlichkeit
Systemaufruf
Kombinatorische Gruppentheorie
Computeranimation
Rechenschieber
Dienst <Informatik>
Client
Benutzerbeteiligung
Software
Suchmaschine
Faktor <Algebra>
Abstand
Inhalt <Mathematik>
Softwareentwickler
Standardabweichung
Resultante
Proxy Server
Kraftfahrzeugmechatroniker
Subtraktion
Bit
Open Source
Datenhaltung
Selbstrepräsentation
Adressraum
Binder <Informatik>
Computeranimation
Gewöhnliche Differentialgleichung
Metadaten
Benutzerbeteiligung
Client
Suchmaschine
Automatische Indexierung
Mereologie
Portal <Internet>
Projektive Ebene
Zusammenhängender Graph
Spider <Programm>
Resultante
Proxy Server
Dienst <Informatik>
Binder <Informatik>
Quick-Sort
Computeranimation
Menge
Suchmaschine
Software
Adressraum
Proxy Server
Softwareentwickler
Spider <Programm>
Sehne <Geometrie>
Adressraum
Gewichtete Summe
Dienst <Informatik>
Binder <Informatik>
Iteriertes Funktionensystem
Computeranimation
Homepage
Landing Page
Software
Einheit <Mathematik>
ATM
Wort <Informatik>
Simulation
Bildgebendes Verfahren
Offene Menge
Punkt
Ortsoperator
Kumulante
Computerunterstützte Übersetzung
Zählen
Information
Computeranimation
Homepage
Metadaten
Graph
Mailing-Liste
Linked Data
Suchmaschine
Code
Gerade
Softwaretest
Ontologie <Wissensverarbeitung>
Binder <Informatik>
Rechenschieber
Linked Data
Dienst <Informatik>
Offene Menge
Instantiierung
Standardabweichung
Webforum
Ontologie <Wissensverarbeitung>
Element <Mathematik>
Beschreibungssprache
Onlinecommunity
Maßerweiterung
Sichtenkonzept
Mechanismus-Design-Theorie
Computeranimation
Rechenschieber
W3C-Standard
Gruppenkeim
Suchmaschine
Gruppentheorie
Speicherabzug
Gruppoid
Addition
Kantenfärbung
Stab
Resultante
Proxy Server
Puls <Technik>
Gewichtete Summe
Content <Internet>
Abstraktionsebene
Nummerung
Code
Computeranimation
Suchmaschine
Code
Datennetz
Datentyp
PERM <Computer>
Speicherabzug
Ereignishorizont
Normalvektor
Meta-Tag
Softwaretest
Datennetz
Hyperlink
Spider <Programm>
Plug in
Profil <Aerodynamik>
Plug in
Nummerung
Mapping <Computergraphik>
Datensatz
Datenfeld
Verschlingung
Content <Internet>
Online-Katalog
Identifizierbarkeit
Strom <Mathematik>
Binder <Informatik>
Programmierumgebung
Physikalische Theorie
Computeranimation
Programmfehler
Trennungsaxiom
Proxy Server
Relativitätstheorie
Content <Internet>
Binder <Informatik>
Homepage
Verschlingung
Suchmaschine
Offene Menge
Dateiformat
Wort <Informatik>
URL
Strom <Mathematik>
Metropolitan area network
Modem
Computeranimation
Streuungsdiagramm
Resultante
Proxy Server
Radius
Fehlermeldung
Spider <Programm>
Browser
Content <Internet>
Zahlenbereich
Knotenmenge
Computeranimation
Informationsmodellierung
Linked Data
Menge
Offene Menge
Suchmaschine
Elektronischer Fingerabdruck
Abstand
Speicher <Informatik>
Schnittstelle
Linked Data
Web Site
Pixel
Suchmaschine
Gebäude <Mathematik>
Schlussregel
Computeranimation
Proxy Server
Punkt
Minimierung
Selbstrepräsentation
Besprechung/Interview
Implementierung
Code
Computeranimation
Leck
Client
Domain-Name
Benutzerbeteiligung
Bildschirmmaske
Informationsmodellierung
Spieltheorie
Front-End <Software>
Datentyp
Maßerweiterung
Datenstruktur
Softwareentwickler
Phasenumwandlung
Gerade
Trennungsaxiom
Sichtenkonzept
Open Source
Nummerung
Strömungsrichtung
Mapping <Computergraphik>
Minimalgrad
Server
Fibonacci-Folge
Benutzerführung
URL
Verkehrsinformation
Fehlermeldung
Standardabweichung

Metadaten

Formale Metadaten

Titel Spatial data and the Search engines
Serientitel FOSS4G Bonn 2016
Teil 184
Anzahl der Teile 193
Autor Reyna, María Arias de
Portele, Clemens
Simoes, Joana
Verhelst, Lieke
Genuchten, Paul van
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/20309
Herausgeber FOSS4G
Open Source Geospatial Foundation (OSGeo)
Erscheinungsjahr 2016
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Why is it so hard to discover spatial data on search engines? In this talk we'll introduce you to an architectural SDI approach based on FOSS4G components, that will enable you to unlock your current SDI to search engines and the www in general. The approach is based on creating a smart proxy layer on top of CSW and WFS which will allow search engines (and search engine users) to crawl CSW and WFS as ordinary web pages. The research and developments to facilitate this approach have been achieved in the scope of the testbed "Spatial data on the web", organised by Geonovum in the first months of 2016. The developments are embedded in existing FOSS4G components (GeoNetwork) or newly released as Opensource software (LDproxy). We'll introduce you to aspects of improving search engine indexing and ranking, setting up a URI-strategy for your SDI, importance of URI persistence, introducing and testing schema.org ontology for (meta)data. We’ll explain that this approach can also be used in the context of linked data and programmable data, but it is important not to mix it up. María Arias de Reyna (GeoCat bv) Clemens Portele (interactive instruments) Joana Simoes (GeoCat) Lieke Verhelst (Linked Data Factory) Paul van Genuchten (GeoCat bv)
Schlagwörter GeoCat bv
interactive instruments
Linked Data Factory
GeoCat

Zugehöriges Material

Ähnliche Filme

Loading...