We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Serving Geospatial Data using Modern and Legacy Standards: a Case Study from the Urban Health Domain

00:00

Formal Metadata

Title
Serving Geospatial Data using Modern and Legacy Standards: a Case Study from the Urban Health Domain
Title of Series
Number of Parts
351
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
Serving Geospatial Data using Modern and Legacy Standards: a Case Study from the Urban Health Domain Urban planning and design play an important role in amplifying or diminishing built environmental threats to health promotion and disease prevention (Keedwell 2017; Hackman, et al. 2019). However, there is still a lack of good evidence and objective measures on how environmental aspects impact individual behavior. The eMOTIONAL Cities project (eMOTIONAL Cities - Mapping the cities through the senses of those who make them 2021) sets out to understand how the natural and built environment can shape the feelings and emotions of those who experience it. It does so with a cross-disciplinary approach which includes urban planners, doctors, psychologists, neuroscientists and engineers. At the core of this research project, lies a Spatial Data Infrastructure (SDI) which assembles disparate datasets that characterise the emotional landscape and built environment, in different cities across Europe and the US. The SDI is a key tool, not only to make the research data available within the project consortium, but also to allow cross-fertilisation with other ongoing projects from the Urban Health Cluster and later on, to reach a wider public audience. The notion of SDIs emerged more than 20 years ago and has been constantly evolving, in response to both technological and organisational developments. Traditionally, SDIs adopt the OGC W_s service interfaces (e.g.: WMS, WFS, WCS), which are based on SOAP, the Simple Object Access Protocol. However, in recent times, we have seen the rise of new architectural approaches, which can be characterised by their data-centrism (Simoes and Cerciello 2021). Web-based APIs have numerous advantages, which speak for their efficiency and simplicity. They provide a simple approach to data processing and management functionalities, offer different encodings of the payload (e.g.: JSON, HTML, JSON-LD), can easily be integrated into different tools, and can facilitate the discovery of data through mainstream search engines such as Google and Bing (Kotsev et al. 2020). These APIs often follow a RESTful architecture, which simplifies its usage, while minimising the bandwidth usage. Moreover, the OpenAPI specification (OpenAPI Initiative 2011) allows to document APIs in a vendor-independent, portable and open manner, which provides an interactive testing client within the API documentation. OGC has embraced this new approach in its new family of standards called OGC APIs (OGC 2020a). Although still under active development, it already produced several approved standards: the ‘OGC API - Features’’ (OGC 2022b, the ‘OGC API - EDR’ (OGC 2022c), the ‘OGC API Common’ (OGC 2022d) and the ‘OGC API - Processes’ (OGC 2022e) which provide standardised APIs for ensuring modern access to spatial data and processes using those data. There are many similarities in the process of designing and implementing open source and open standards. OSGeo encourages the use of open standards, like those from OGC and there is even a Memorandum of Understanding between the two organisations (OSGeo 2012). In practice, many long-standing OSGeo projects implement OGC standards and they often contribute to the standards development (e.g.: GDAL, Geoserver, QGIS, OpenLayers, Leaflet). However, in the majority of cases they still implement the legacy W_s standards, rather than the new OGC APIs. In the eMOTIONAL Cities project we have set out to create an SDI based on OGC APIs, but realised that we needed to support some legacy standards, because an OGC API equivalent was not widely supported yet. This has led us to create two stacks: one OGC APIs (e.g.: modern) and another one using W*s services (e.g.: legacy). Both stacks rely on FOSS/OSGeo software, and whenever relevant we have contributed to some of those projects. The modern stack includes Elasticsearch and Kibana (Elastic), which add extra capabilities in terms of searching, analytics and visualisation. For the sake of reproducibility, all software components were virtualized into docker (Wikipedia 2022) containers and they are orchestrated using docker-compose. The results are published in the eMOTIONAL Cities public github repository (eMOTIONAL Cities H2020 Project 2021). Despite its numerous advantages, we still see a lack of adoption of the OGC APIs within most SDIs. In part this could be due to the standards not being well known, but it could also be due to a lack of knowledge about which implementations are available out there, specially as FOSS. In this paper we would like to share our modern SDI architecture, and the reasons for choosing pygeoapi (Kralidis 2019) for publishing data as OGC API Features, Vector Tiles and Records. Although the standards we selected target the Urban Health use case, we believe they are generic enough to be useful for sharing data in other contexts (e.g.: climate change, cross-border datasets). We are confident about a transition to OGC APIs, but we are also conscious that this may take time, and for a period of time many solutions will have to offer both modern and legacy standards.
Keywords
Field (computer science)Programmer (hardware)HorizonSpatial data infrastructureProcess (computing)Open sourceMultiplicationPhysical systemIntegrated development environmentSeries (mathematics)InformationAreaCloud computingService (economics)Software frameworkMetadataDomain namePresentation of a groupTask (computing)Open setBitTimestampTerm (mathematics)Characteristic polynomialMultiplication signRun-time systemInformationOrder (biology)Special unitary groupTime seriesData acquisitionDimensional analysisContext awarenessIntegrated development environmentVolume (thermodynamics)Physical systemBuildingDifferent (Kate Ryan album)MereologyMetadataMoment (mathematics)Set (mathematics)Slide ruleAssociative propertyLevel (video gaming)Product (business)DampingAttribute grammarSoftware frameworkService (economics)Computer animation
Event horizonOrder (biology)Associative propertyService-oriented architectureTime evolutionOpen sourceHeat transferJava appletObject (grammar)Scripting languagePositional notationComputer-generated imageryData structureDirectory serviceError messageCodierung <Programmierung>Open setSoftware developerSemantics (computer science)Web 2.0Case moddingCode4 (number)Programming paradigmShift operatorRule of inferenceElectric generatorOpen setService (economics)Multiplication signContent (media)MiniDiscTesselationRepresentational state transferDivisorWeb serviceGoogolLatent heatGame theoryService-oriented architectureLevel (video gaming)MereologyRow (database)Search engine (computing)FamilyVector spaceCodierung <Programmierung>Computer animation
Semantics (computer science)Open sourcePoint (geometry)Software developerSlide ruleTesselationSoftwareComputer animation
Vector spaceMetadataSoftware repositoryGradientImplementationServer (computing)Suite (music)Level (video gaming)RepetitionCache (computing)Hybrid computerLibrary catalogSalem, IllinoisSoftwarePersonal digital assistantImplementationSoftwareLimit (category theory)Different (Kate Ryan album)Natural numberServer (computing)NumberTesselationData storage deviceOverhead (computing)Software repositoryInformationInternet service providerMereologyAreaScripting languageClient (computing)Elasticity (physics)Open sourceDatabaseSimilarity (geometry)Information overloadShared memoryData structureMultiplication signLibrary (computing)WordStack (abstract data type)FreewareSlide ruleComplex (psychology)Library catalogBitSuite (music)Row (database)Sinc functionSet (mathematics)CASE <Informatik>Special functionsFile systemCase moddingGeometryArithmetic progressionFront and back endsConnectivity (graph theory)Functional (mathematics)Computer animationProgram flowchart
Computer animation
Transcript: English(auto-generated)
So, as it was said before, we work in an R&D company which is operating in a geospatial domain. And the research you will see in this presentation was produced in the context of the Emotional Cities project. So, the Emotional Cities project is a research project which is aiming to map emotions
to the urban landscape, so to the natural and built environment. So, basically, we would like to understand what are the emotions that are triggered when someone walks in a park
or near an empty building or a particular part of the city. As you can imagine, this task demands a lot of data, of different datasets. From the traditional geospatial domain, so the datasets that characterize the human environment.
So, anything you can think of like the road map, the buildings, also the environmental characteristics like temperatures, sun and so on. And then we have neuroscience data and perhaps this is a bit more new for us
because it's not data that is traditionally used in geospatial domains. So, it's data that has perhaps not much diversity in terms of geospatial attributes, but it has a huge diversity in terms of timestamps.
So, it's very dense in the time dimension. So, the idea is to collate all these datasets and we do that by creating a spatial data infrastructure. I'm going to talk a little bit more about that in a few slides.
But this spatial data infrastructure needs to be able to deal with these sometimes large volume datasets. At the same time, they are very heterogeneous. And they have these characteristics that I said before, they are both geospatial and temporal.
So, time series data. So, we want to do this according to the FAIR principles. So, the FAIR principles are describing what we say, it's the information needs to be findable, accessible, interoperable and reusable.
So, ultimately, we all want our information to be reusable. We want people to be able to use our data for building other research, for creating products, services, etc. Otherwise, the data remains in silos.
In order to do that, there is a very important role of the standards. So, the standards are what make sure that we can do that in an efficient way. So, this has both challenges and opportunities.
So, in one way, in this project, we have the opportunity to choose what we want to do from scratch. So, we are not bound by any legacy system. We can choose which standards we want to use. But on the other hand, we have a user base, which are mostly scientists,
some of them not from the geospatial domain, even some neuroscientists. So, they are not used to use GIS standards or even standards at all. And the people who are used to use standards,
maybe they are not used to the standards that we have in mind. So, we have some challenges here, we're going to talk about that in a second. So, before I move on and talk more about the SDIs, I thought I should stop for a moment and define this. So, basically, an SDI can be seen as a framework.
It includes the geographic data, but everything which is also associated to it. So, the metadata which describes this data, the services, and ultimately also the users. So, when we think about an SDI, we need to think about who is using this data and what are the tools they are using.
And then the whole idea is that they are all connected in a smooth way and they make the use of spatial data efficient, ultimately. So, SDIs have been around for a very long time.
We can think about the beginnings of the SDIs in the early 90s. In 1994, there was an order in the United States to coordinate the acquisition of geographic data. At the same year, OGC, the Open Geospatial Consortium, was founded.
So, not a coincidence. And then we have seen many standards deriving, coming from OGC. I put on the timeline WFS, which perhaps is a standard that some of you are familiar with.
And then, after that, we had other standards as well. We had WMS, WMTS, and so on. And here in Europe, we had also a milestone, which was perhaps you are familiar with the INSPIRE initiative. And this was kind of also like a rule, so something that had to be implemented,
but forced a lot of agencies all over Europe to face what standards are and how to implement them. And then we started to see a shift around the 2010s.
There was the explosion of REST APIs, and the web started to become more what we are used to see today. In 2011, we had the appearance of Swagger. So, Swagger is a specification which allows us to document web APIs in an interactive manner.
And this originated, which was later on, the OpenAPI specification. And in OGC, things were also moving. So, in 2018, there was an hackathon called the WFS3 hackathon,
which originated later on the first release of WFS3. And this was kind of a game changer, because WFS3 was the first of what we call the OGC APIs,
so a modern family of web APIs. If you think about the first generation of web services in OGC, you will see that they are not leveraging much of the practices that we see in the modern web today.
So, if you think about using status codes or content negotiation, even using JSON, this was not the case in most of the OWS services. And it's okay that it was like that, because they were developed a long time ago, when these things were not the common rule.
But the web changed, right? And so did the OGC standards. So, now we have this family of APIs, the OGC APIs, and they are doing that. They are leveraging the practices that we see in the modern web.
So, they are using the HTTP methods, status codes, content negotiation and so on. They are using schema.work to make sure that the data is seen by the search engines, because people are searching for data in Google these days. So, it's important that the data is visible there.
They are using the OpenAPI specification, and most of all, they are very flexible. So, they are developed in different parts, and they are not very strict about which kind of encodings you should use.
So, you can use JSON, you can use HTML, and so on. So, it's a completely new approach. And the final goal of this, the ultimate goal is to make sure that these APIs, these standards are more friendly to the developers. So, they are easier to implement, even if you don't have a lot of OGC experience,
or even GIS experience. So, this is the picture of some standards that were used in the first generation of web services. So, the first one is when you wanted to do an SDI, a spatial data infrastructure,
typically you would use these standards, so WMS, WMTS, WFES and WCS. When you think about this in the new paradigm of the OGC APIs, you would replace WMS by OGC API maps, WMTS by OGC API tiles,
WFS by WFS3, which is also called OGC API features, and WCS by OGC API coverages. And all these services would be discoverable through OGC API records endpoint.
Now, in this project, because we have mostly vector data for now, we decided to focus on these two standards. So, OGC API features and OGC API tiles, considering that we are using vector tiles.
So, the question that we ask ourselves is, is it possible to this day to create already an SDI, a complete SDI, using these new OGC API standards? Do we have like the software available, do we have the tools, do we have the knowledge?
And this is the answer that Antonio is going to, the question that Antonio is going to answer now. So, the point now, the reason also we submitted this paper to the Phosphorgy,
was to share this experience with this SDI development. The quick answer to this question is in this slide. Is it possible, as long as we accept the definition of special data infrastructure, like not something that should include necessarily every use case,
but that can cover quite an interesting number of use cases. In our case, it was possible to serve OGC API tiles, features and records that quite cover all the array of use case that we had.
This is the schema of modern, what we call modern stack, that it's developed around the OGC API, that is a one-stop solution for OGC API standards, because it offers different support for different standards.
And in the back end, I don't want to enter in the detail of every component, but just to mention that we asked the partners in the project to share data using a data lake. Data lake is used in the way of many of the data lakes.
I don't know the structure, I don't know which kind of data is going to be shared, so I put everything in the data lake and later we decide where to store this data. In our case, this data is going to be stored both in the file system for the tiles and in the ElastiSearch storage.
We did that developing, of course, there is the overhead of having to develop the pipeline or scripts to ingest this data into the storage, but later it was quite straightforward, since we have OGC API having in its providers
the possibility to connect to both the tile repo and the ElastiSearch and to serve this data to different clients. Different clients, here we have some examples like the QGIS leaflet, ArcGIS and some Python library that already supports OGC API standards.
Also Kibana, that is part of the Elastic Stack and it has quite interesting support for just special function. This is an extra. A few words about PyG API. PyG API is a Python implementation
of OGC API suite of standards. It's now an OGC project. It's, of course, a free open source software and it's an OGC reference implementation for OGC API features. OGC API features, it's at the day one of the approved standards,
the others are still a work in progress, but PyG API is already supporting this standard, both collaborating into the definition of them. On the side of clients, we are a bit less supported
because, yes, we have support for OGC API features in QGIS. There are some popular libraries where it's already possible to connect to these OGC API features but it can be better
considering that we have to serve this data set to citizens. The final purpose of the SDI is to publish this data. It's something that is going to be interesting in the next year of the project. What are the limitations? Yes, it's possible to create an SDI with OGC API
but there are some limitations like the evolving nature of standards and that means evolving nature of the software supporting this standard but also the lack, as we said, of implementation of clients and also the know-how about this.
To face that, and also to cover what we call the legacy standard, the OWS standard of OGC, so the one that Joanna mentioned before, the WMS, WFS, et cetera,
we put the solid solution of the GeoServer that is a quite similar use case. We have GeoServer in the place of PYG API connecting to the file system and the database PostGIS and connected to different clients. Both share the data lake
so we didn't overload our users with the thing of having to consider this duplicity to have to use two different SDI. They just put data in the data lake and scripts and pipelines put this information in the proper stack.
So we, in the end, have an hybrid approach taking the best of both solutions. Having this solution, of course, introduces a complexity for us developing the SDI because we have to maintain more tools, we have to
take care of more tools but we are born in a fortunate area where we have tools like Docker that let's make life easy into developing and shipping more tools that have to be interconnected. All the pipelines that we mentioned, it's quite easy with
these tools. I don't know if we could have done the same without Docker. In the end, what is the structure of the data lake? We put this slide to show that in the end, we have two stack, the modern and the legacy but we have the data lake for users and
for the other hand, that we have to present to the user so how to find this data, we put a catalog that is shared between the two stack. The catalog is done with OGC API records since it's supported by PyG API we decided to use the advantage of this solution
to implement the most modern approach to the catalog topic. We expect in the end, end users to be able to find that data via this catalog. What we learn by this experience
is possible to start to cover some use case with OGC API stack. It's definitely possible. With all the advantage of having a standard that is going to be more and more important in the upcoming years. Also, it's
the only thing is not yet possible to have one solution for both, for supporting all the legacy standard and modern standard. You still have to use more tools like in our case GeoServer and PyG API. And yes,
in the end it's if you can cover your use case with OGC API you can go with that. Otherwise you can still having a similar approach supporting for the time being the previous standard with some other solution. And that's it.
If you have any question we will be happy to answer. Thank you.