A crawler for spatial (meta)data as a base for Mapserver configuration
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/69087 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
InformationInformation
00:22
Data modelInformationEndliche ModelltheorieOpen setComputer animation
00:38
InformationComputer animation
00:48
Task (computing)Integrated development environmentInformationComputer-generated imageryMetadataExtension (kinesiology)Shape (magazine)Physical systemProjective planeTask (computing)MultiplicationData managementComputer fileResultantOpen sourceBitSoftwareIntegrated development environmentMetadataGeometryComputer animation
02:06
Router (computing)InformationSet (mathematics)
02:13
ExistenceMetadataInformationDefault (computer science)Level (video gaming)Web crawlerHarmonic analysisInformationComputer fileFile formatAdditionTransformation (genetics)LoginScripting languageProjective planeMetadataChainEndliche ModelltheorieProcess modelingTable (information)MathematicsLibrary catalogMappingDefault (computer science)Service (economics)Domain nameServer (computing)Computer animation
04:18
Cellular automatonInformationGoodness of fitCASE <Informatik>Software developerProjective planeProcess modelingCodeXML
Transcript: English(auto-generated)
00:04
Hello, Florence, I have five minutes, I'll be quick. I'm from Israel World Soil Information Center. We maintain our reference collections of all the soils in the world, our scientists go to all places to create and we show that on the museum and we have that in store.
00:23
We also have a data set, so we fetch any data that we can get on soils from all over the world and our statisticians build a global soil model which we then have open access available. On the conference you may see these people running around, that's us, the OSTO team
00:45
at ISRIC. George is not here unfortunately but you may know him. This is a bit the stack that we have, we run a DevOps environment on Kubernetes and we have a lot of open source software that we're using. So here's a problem setting that I want to bring today is that data dissemination is just too difficult.
01:06
And it's only required incidentally at the end of a project and you really have to think, oh, what do I have to do, where do I have to click? It involves multiple tasks, multiple environments and the task is not reproducible.
01:20
So this is where we think DevOps brings in good conventions. So the deployment of data can be done with Git workflows, the content itself is more versioned in Git and metadata equals data or the other way around. So this result in transparent data management and we can, our soil community suggests improvements
01:44
via the Git issue management. Let me first introduce you to the sidecar concept. We have the geo package or the shape file somewhere on our system and there should always be a metadata file next to that.
02:00
Esri introduced that concept in the 90s or at least they made it big and we continue on that. So I'm going to present the set of tools that we have around this concept to support the data DevOps. So this PyGeoData crawler, it runs on a folder of files. So when you start a new project, you get some data from a customer and you run the
02:24
tool just to see what's in that folder. If there's metadata, you import it. If there's no metadata, it will use GDAL to fetch metadata from that file and then users can suggest additional metadata via Git pull requests.
02:41
Then there's PyMapFileCrawler. We use the sidecar metadata information to generate automatically a maps of a map file. The URL to access that map file is then introduced back in the metadata. So if that metadata ends up in a catalog, you can access that WMS, WFS.
03:00
Initial style is default, but you can override it with a style sidecar file. We're looking forward to see OGC API maps also land in PyGeo API so we can also support the PyGeo API tool chain. Then there's the next step. That's a community data-driven harmonization.
03:20
Of course, with all that data, soil data coming in from all over the world, it has thousands of formats. So Luis, our ATL guy, has a lot of effort to harmonize that to a common model. But here we want to ask the community to help out. We set up an initial transformation script and people in the community can improve suggestions
03:43
to the transformation script. Rename table X to Y. Each change in the transformation script runs a CI CD process which does the harmonization again. And in the logs of the harmonization, in the logs of the CI CD, you can see if things fail.
04:00
So that's very transparent. And then, to my surprise, there was already such a thing. It's called Data to Services by University of Maastricht from the medical RDF domain. But that kind of conferred me that this is really an interesting approach. So this is a project under active development.
04:22
We're actually using this tooling in our own work processes, but the code itself is still very better. So this is where I want to share ideas with you if you maybe have already tooling available that we can use in our workflows or actually new ideas. So I really appreciated the lightning talk before this one because I think that's also a very good use case.