Serving Geospatial Data using Modern and Legacy Standards: a Case Study from the Urban Health Domain
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68899 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
Field (computer science)Programmer (hardware)HorizonSpatial data infrastructureProcess (computing)Open sourceMultiplicationPhysical systemIntegrated development environmentSeries (mathematics)InformationAreaCloud computingService (economics)Software frameworkMetadataDomain namePresentation of a groupTask (computing)Open setBitTimestampTerm (mathematics)Characteristic polynomialMultiplication signRun-time systemInformationOrder (biology)Special unitary groupTime seriesData acquisitionDimensional analysisContext awarenessIntegrated development environmentVolume (thermodynamics)Physical systemBuildingDifferent (Kate Ryan album)MereologyMetadataMoment (mathematics)Set (mathematics)Slide ruleAssociative propertyLevel (video gaming)Product (business)DampingAttribute grammarSoftware frameworkService (economics)Computer animation
05:27
Event horizonOrder (biology)Associative propertyService-oriented architectureTime evolutionOpen sourceHeat transferJava appletObject (grammar)Scripting languagePositional notationComputer-generated imageryData structureDirectory serviceError messageCodierung <Programmierung>Open setSoftware developerSemantics (computer science)Web 2.0Case moddingCode4 (number)Programming paradigmShift operatorRule of inferenceElectric generatorOpen setService (economics)Multiplication signContent (media)MiniDiscTesselationRepresentational state transferDivisorWeb serviceGoogolLatent heatGame theoryService-oriented architectureLevel (video gaming)MereologyRow (database)Search engine (computing)FamilyVector spaceCodierung <Programmierung>Computer animation
10:44
Semantics (computer science)Open sourcePoint (geometry)Software developerSlide ruleTesselationSoftwareComputer animation
11:31
Vector spaceMetadataSoftware repositoryGradientImplementationServer (computing)Suite (music)Level (video gaming)RepetitionCache (computing)Hybrid computerLibrary catalogSalem, IllinoisSoftwarePersonal digital assistantImplementationSoftwareLimit (category theory)Different (Kate Ryan album)Natural numberServer (computing)NumberTesselationData storage deviceOverhead (computing)Software repositoryInformationInternet service providerMereologyAreaScripting languageClient (computing)Elasticity (physics)Open sourceDatabaseSimilarity (geometry)Information overloadShared memoryData structureMultiplication signLibrary (computing)WordStack (abstract data type)FreewareSlide ruleComplex (psychology)Library catalogBitSuite (music)Row (database)Sinc functionSet (mathematics)CASE <Informatik>Special functionsFile systemCase moddingGeometryArithmetic progressionFront and back endsConnectivity (graph theory)Functional (mathematics)Computer animationProgram flowchart
19:25
Computer animation
Transcript: English(auto-generated)
00:00
So, as it was said before, we work in an R&D company which is operating in a geospatial domain. And the research you will see in this presentation was produced in the context of the Emotional Cities project. So, the Emotional Cities project is a research project which is aiming to map emotions
00:30
to the urban landscape, so to the natural and built environment. So, basically, we would like to understand what are the emotions that are triggered when someone walks in a park
00:45
or near an empty building or a particular part of the city. As you can imagine, this task demands a lot of data, of different datasets. From the traditional geospatial domain, so the datasets that characterize the human environment.
01:06
So, anything you can think of like the road map, the buildings, also the environmental characteristics like temperatures, sun and so on. And then we have neuroscience data and perhaps this is a bit more new for us
01:25
because it's not data that is traditionally used in geospatial domains. So, it's data that has perhaps not much diversity in terms of geospatial attributes, but it has a huge diversity in terms of timestamps.
01:43
So, it's very dense in the time dimension. So, the idea is to collate all these datasets and we do that by creating a spatial data infrastructure. I'm going to talk a little bit more about that in a few slides.
02:02
But this spatial data infrastructure needs to be able to deal with these sometimes large volume datasets. At the same time, they are very heterogeneous. And they have these characteristics that I said before, they are both geospatial and temporal.
02:21
So, time series data. So, we want to do this according to the FAIR principles. So, the FAIR principles are describing what we say, it's the information needs to be findable, accessible, interoperable and reusable.
02:43
So, ultimately, we all want our information to be reusable. We want people to be able to use our data for building other research, for creating products, services, etc. Otherwise, the data remains in silos.
03:02
In order to do that, there is a very important role of the standards. So, the standards are what make sure that we can do that in an efficient way. So, this has both challenges and opportunities.
03:20
So, in one way, in this project, we have the opportunity to choose what we want to do from scratch. So, we are not bound by any legacy system. We can choose which standards we want to use. But on the other hand, we have a user base, which are mostly scientists,
03:43
some of them not from the geospatial domain, even some neuroscientists. So, they are not used to use GIS standards or even standards at all. And the people who are used to use standards,
04:00
maybe they are not used to the standards that we have in mind. So, we have some challenges here, we're going to talk about that in a second. So, before I move on and talk more about the SDIs, I thought I should stop for a moment and define this. So, basically, an SDI can be seen as a framework.
04:22
It includes the geographic data, but everything which is also associated to it. So, the metadata which describes this data, the services, and ultimately also the users. So, when we think about an SDI, we need to think about who is using this data and what are the tools they are using.
04:42
And then the whole idea is that they are all connected in a smooth way and they make the use of spatial data efficient, ultimately. So, SDIs have been around for a very long time.
05:01
We can think about the beginnings of the SDIs in the early 90s. In 1994, there was an order in the United States to coordinate the acquisition of geographic data. At the same year, OGC, the Open Geospatial Consortium, was founded.
05:23
So, not a coincidence. And then we have seen many standards deriving, coming from OGC. I put on the timeline WFS, which perhaps is a standard that some of you are familiar with.
05:42
And then, after that, we had other standards as well. We had WMS, WMTS, and so on. And here in Europe, we had also a milestone, which was perhaps you are familiar with the INSPIRE initiative. And this was kind of also like a rule, so something that had to be implemented,
06:08
but forced a lot of agencies all over Europe to face what standards are and how to implement them. And then we started to see a shift around the 2010s.
06:24
There was the explosion of REST APIs, and the web started to become more what we are used to see today. In 2011, we had the appearance of Swagger. So, Swagger is a specification which allows us to document web APIs in an interactive manner.
06:47
And this originated, which was later on, the OpenAPI specification. And in OGC, things were also moving. So, in 2018, there was an hackathon called the WFS3 hackathon,
07:08
which originated later on the first release of WFS3. And this was kind of a game changer, because WFS3 was the first of what we call the OGC APIs,
07:23
so a modern family of web APIs. If you think about the first generation of web services in OGC, you will see that they are not leveraging much of the practices that we see in the modern web today.
07:41
So, if you think about using status codes or content negotiation, even using JSON, this was not the case in most of the OWS services. And it's okay that it was like that, because they were developed a long time ago, when these things were not the common rule.
08:05
But the web changed, right? And so did the OGC standards. So, now we have this family of APIs, the OGC APIs, and they are doing that. They are leveraging the practices that we see in the modern web.
08:21
So, they are using the HTTP methods, status codes, content negotiation and so on. They are using schema.work to make sure that the data is seen by the search engines, because people are searching for data in Google these days. So, it's important that the data is visible there.
08:43
They are using the OpenAPI specification, and most of all, they are very flexible. So, they are developed in different parts, and they are not very strict about which kind of encodings you should use.
09:00
So, you can use JSON, you can use HTML, and so on. So, it's a completely new approach. And the final goal of this, the ultimate goal is to make sure that these APIs, these standards are more friendly to the developers. So, they are easier to implement, even if you don't have a lot of OGC experience,
09:23
or even GIS experience. So, this is the picture of some standards that were used in the first generation of web services. So, the first one is when you wanted to do an SDI, a spatial data infrastructure,
09:40
typically you would use these standards, so WMS, WMTS, WFES and WCS. When you think about this in the new paradigm of the OGC APIs, you would replace WMS by OGC API maps, WMTS by OGC API tiles,
10:04
WFS by WFS3, which is also called OGC API features, and WCS by OGC API coverages. And all these services would be discoverable through OGC API records endpoint.
10:26
Now, in this project, because we have mostly vector data for now, we decided to focus on these two standards. So, OGC API features and OGC API tiles, considering that we are using vector tiles.
10:45
So, the question that we ask ourselves is, is it possible to this day to create already an SDI, a complete SDI, using these new OGC API standards? Do we have like the software available, do we have the tools, do we have the knowledge?
11:02
And this is the answer that Antonio is going to, the question that Antonio is going to answer now. So, the point now, the reason also we submitted this paper to the Phosphorgy,
11:22
was to share this experience with this SDI development. The quick answer to this question is in this slide. Is it possible, as long as we accept the definition of special data infrastructure, like not something that should include necessarily every use case,
11:45
but that can cover quite an interesting number of use cases. In our case, it was possible to serve OGC API tiles, features and records that quite cover all the array of use case that we had.
12:01
This is the schema of modern, what we call modern stack, that it's developed around the OGC API, that is a one-stop solution for OGC API standards, because it offers different support for different standards.
12:22
And in the back end, I don't want to enter in the detail of every component, but just to mention that we asked the partners in the project to share data using a data lake. Data lake is used in the way of many of the data lakes.
12:43
I don't know the structure, I don't know which kind of data is going to be shared, so I put everything in the data lake and later we decide where to store this data. In our case, this data is going to be stored both in the file system for the tiles and in the ElastiSearch storage.
13:04
We did that developing, of course, there is the overhead of having to develop the pipeline or scripts to ingest this data into the storage, but later it was quite straightforward, since we have OGC API having in its providers
13:21
the possibility to connect to both the tile repo and the ElastiSearch and to serve this data to different clients. Different clients, here we have some examples like the QGIS leaflet, ArcGIS and some Python library that already supports OGC API standards.
13:43
Also Kibana, that is part of the Elastic Stack and it has quite interesting support for just special function. This is an extra. A few words about PyG API. PyG API is a Python implementation
14:00
of OGC API suite of standards. It's now an OGC project. It's, of course, a free open source software and it's an OGC reference implementation for OGC API features. OGC API features, it's at the day one of the approved standards,
14:20
the others are still a work in progress, but PyG API is already supporting this standard, both collaborating into the definition of them. On the side of clients, we are a bit less supported
14:40
because, yes, we have support for OGC API features in QGIS. There are some popular libraries where it's already possible to connect to these OGC API features but it can be better
15:01
considering that we have to serve this data set to citizens. The final purpose of the SDI is to publish this data. It's something that is going to be interesting in the next year of the project. What are the limitations? Yes, it's possible to create an SDI with OGC API
15:22
but there are some limitations like the evolving nature of standards and that means evolving nature of the software supporting this standard but also the lack, as we said, of implementation of clients and also the know-how about this.
15:43
To face that, and also to cover what we call the legacy standard, the OWS standard of OGC, so the one that Joanna mentioned before, the WMS, WFS, et cetera,
16:00
we put the solid solution of the GeoServer that is a quite similar use case. We have GeoServer in the place of PYG API connecting to the file system and the database PostGIS and connected to different clients. Both share the data lake
16:21
so we didn't overload our users with the thing of having to consider this duplicity to have to use two different SDI. They just put data in the data lake and scripts and pipelines put this information in the proper stack.
16:43
So we, in the end, have an hybrid approach taking the best of both solutions. Having this solution, of course, introduces a complexity for us developing the SDI because we have to maintain more tools, we have to
17:01
take care of more tools but we are born in a fortunate area where we have tools like Docker that let's make life easy into developing and shipping more tools that have to be interconnected. All the pipelines that we mentioned, it's quite easy with
17:20
these tools. I don't know if we could have done the same without Docker. In the end, what is the structure of the data lake? We put this slide to show that in the end, we have two stack, the modern and the legacy but we have the data lake for users and
17:41
for the other hand, that we have to present to the user so how to find this data, we put a catalog that is shared between the two stack. The catalog is done with OGC API records since it's supported by PyG API we decided to use the advantage of this solution
18:02
to implement the most modern approach to the catalog topic. We expect in the end, end users to be able to find that data via this catalog. What we learn by this experience
18:21
is possible to start to cover some use case with OGC API stack. It's definitely possible. With all the advantage of having a standard that is going to be more and more important in the upcoming years. Also, it's
18:43
the only thing is not yet possible to have one solution for both, for supporting all the legacy standard and modern standard. You still have to use more tools like in our case GeoServer and PyG API. And yes,
19:01
in the end it's if you can cover your use case with OGC API you can go with that. Otherwise you can still having a similar approach supporting for the time being the previous standard with some other solution. And that's it.
19:20
If you have any question we will be happy to answer. Thank you.