Mapchete - tile-based geodata processing
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 295 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/43493 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Bucharest 201991 / 295
15
20
28
32
37
38
39
40
41
42
43
44
46
48
52
54
57
69
72
75
83
85
87
88
101
103
105
106
108
111
114
119
122
123
126
129
130
131
132
137
139
140
141
142
143
144
147
148
149
155
157
159
163
166
170
171
179
189
191
192
193
194
195
196
197
202
207
212
213
214
215
216
231
235
251
252
263
287
00:00
PiService (economics)Thermodynamischer ProzessScale (map)Process (computing)Array data structureSatelliteoutputRaster graphicsFunction (mathematics)Simultaneous localization and mappingConfiguration spaceCodeParameter (computer programming)Directory serviceComputer fileSatelliteFunctional (mathematics)Directory serviceTesselationSoftware developerData structureProcess (computing)Projective planeMedical imagingFunction (mathematics)Library (computing)Scaling (geometry)Web browserArray data structureThermodynamischer ProzessWeb 2.0Raster graphicsMappingZoom lensNumberType theorySingle-precision floating-point formatTessellationCore dumpBitQuery languageDiagramPlug-in (computing)Computer fileoutputSoftware frameworkFormal languageDevice driverParameter (computer programming)Electronic mailing listSuite (music)Revision controlSoftware testingImage processingReading (process)File archiverMultiplicationUniform resource locatorWebsiteAreaMessage passingMatching (graph theory)Client (computing)Network topologyService (economics)Electronic data processingRight angleFirst-person shooterLink (knot theory)CASE <Informatik>Data compressionClosed set2 (number)Level (video gaming)Open setTape driveOffice suiteHeegaard splittingComplex (psychology)Numbering schemePiMultiplication signForcing (mathematics)Repository (publishing)FrequencyArchaeological field surveyComputer animation
06:40
Computer configurationArc (geometry)Interface (computing)Line (geometry)Revision controlFile formatoutputFunction (mathematics)Electronic mailing listMessage passingProcess (computing)Local ringPrice indexData conversionTerm (mathematics)Server (computing)Subject indexingProcess (computing)TessellationStandard deviationoutputCASE <Informatik>DivisorPlug-in (computing)Endliche ModelltheorieTesselationPoint cloudZoom lensModule (mathematics)Function (mathematics)Set (mathematics)BitLevel (video gaming)Sheaf (mathematics)Parameter (computer programming)Musical ensembleMultiplication signData structureComputer fileShape (magazine)File formatGeometryPhotographic mosaicWeb browserCuboidBlock (periodic table)Thermodynamischer ProzessOcean currentPersonal area networkScripting languageResultantOpen setAnnihilator (ring theory)Rational functionDevice driverException handlingObject (grammar)FrequencyNumbering schemeNumberFilter <Stochastik>Functional (mathematics)Latent heatLie groupTraffic reportingMereologyMappingQuicksortMenu (computing)Computer configurationMedianWindow functionLine (geometry)Configuration spaceVideo gameThumbnailHill differential equationCategory of beingSource code
13:08
Digital filterTerm (mathematics)CASE <Informatik>Computer fileTime seriesPhotographic mosaicArray data structureFile formatGAUSS (software)Process (computing)Endliche ModelltheorieFunction (mathematics)Field (computer science)Network topologyComplex (psychology)Line (geometry)Multiplication signWorkstation <Musikinstrument>NumberShape (magazine)Source code
14:18
Digital filterLine (geometry)Active contour modelProcess (computing)Computer fileRaster graphicsGeometryBit rateMappingGraph coloringDifferent (Kate Ryan album)Expected valueOffice suitePolygonTouchscreenEndliche ModelltheorieNatural numberWater vaporFunction (mathematics)Active contour modelSource code
15:26
Active contour modelLine (geometry)BildsegmentierungProcess (computing)Device driveroutputFunction (mathematics)Directory serviceArray data structureData structureHypothesisWeb browserParsingSource codeMatrix (mathematics)Client (computing)Raster graphicsMathematical analysisSatelliteLevel (video gaming)Visualization (computer graphics)Photographic mosaicMedical imagingDirectory serviceClient (computing)WritingOnline helpType theorySingle-precision floating-point formatOcean currentCASE <Informatik>Function (mathematics)Server (computing)Process (computing)HypothesisFile formatReading (process)Direction (geometry)Installation artDifferent (Kate Ryan album)Point (geometry)TessellationWeb browserOpen setSubject indexingPlug-in (computing)Parameter (computer programming)TesselationCore dumpFunctional (mathematics)outputInverse functionBitSet (mathematics)Data structureZoom lensArithmetic meanWorkstation <Musikinstrument>Decision theoryLine (geometry)QuicksortComputer chessMultiplication signKey (cryptography)State of matterRepetitionPhysical systemAreaCuboidDevice driverVideo gameLevel (video gaming)2 (number)Fingerprint
22:33
Client (computing)Directory serviceRaster graphicsWeb browserMathematical analysisSatelliteLevel (video gaming)Service (economics)BitThermodynamischer ProzessLecture/Conference
23:02
Source codeCore dumpElectronic visual displayBitDifferent (Kate Ryan album)Vector potentialMultiplication signType theoryOffice suiteInternet service providerTesselationMatrix (mathematics)TessellationProjective planeMeta elementWeb 2.0Numbering schemePixelBuffer solution
25:16
2 (number)InformationTessellationMeta elementLevel (video gaming)AlgorithmPresentation of a groupMultiplicationMultiplication signNumbering schemeTesselationPreprocessorOrder (biology)Array data structureImage warpingModal logicView (database)Time zoneProcess (computing)Scaling (geometry)RepetitionThermodynamischer ProzessPerformance appraisalGreatest elementBitAlgebraLibrary (computing)Control flowLecture/Conference
29:07
Control flowBit
Transcript: English(auto-generated)
00:08
Okay. It's 3.30. I think we can start now. So, hello everybody. My name is Joachim Unger. Daniel already introduced me together with him. We are working for EoX, a small company,
00:24
not so small anymore, from Austria, Vienna. We mainly do projects with ESA in the area of satellite imagery, so processing, writing clients, and today I will be talking about MapChatter. It's a tool developed by us, and it does tie-based geodata processing, and
00:48
let's see whether it can break the curse of raster sites server processing. So, the project is available in GitHub, so it's a Python package. You have here
01:02
the URLs to the GitHub repository, as well as some documentation on Readthedocs. It's been published under a MIT license, and at least our test suite tested against Python versions 3.5 to 3.7. We first created it four years ago, and it has been in development
01:31
since, serving us for various projects, so like creating web maps and doing some satellite imagery and processing. So the initial idea was to use Python code, because Python is
01:47
a very nice language to learn, so it's not as complex as other ones, so that's why Python is so popular, and to use this and build a framework around it to facilitate global scale processing, and the core idea is simply that you write your Python function
02:08
and then MapChatter takes it and reads the data and applies this function on your data, and then writes an output. That's basically it. So here we created a short
02:22
diagram that should explain it a little bit better, so as in every type of processing, you have inputs, so data inputs and data outputs, and there's something happening in between. So what's happening in between is, as I mentioned before, the Python process, it could also be Cython, if you wrap it under using Python. The nice thing is you
02:46
have access to NumPy or SciPy or any image processing libraries like Pillow, Scikit, and whatnot. And as inputs, we are heavily using Rasterio and Fiona to read Raster and
03:02
vector inputs, so MapChatter takes, for example, a Raster image and converts it into an NumPy array, which then is available within the process function. Likewise, with vector data, we use Fiona to open it and basically use the GeoJSON-like dictionary
03:20
and pass it on to the process function. It's also modular, which means you can write plugins. So internally, we created a satellite archive reader, which basically reads the Sentinel-2 archives and Sentinel-1 archives from AWS and Mundi, and converts it into a four-dimensional array, which then we will
03:44
have available within the process function again. And as outputs, likewise, we use Rasterio and Fiona to store either single GeoTips or so-called tile directories. I will go into a little bit more detail afterwards, or use
04:01
or write in GeoJson if you have a vector output. And there are also plugins available, and can be extended by plugins. So, for example, we also have an X-ray writer, which allows to write a tile directory where the tiles are not GeoTips, but X-ray dumps, so, for example, NetCDF or SAR files.
04:30
So what you need to create a process is have a so-called MapChatter file. It's basically a YAML, so it follows the YAML syntax. And here you
04:43
should be stored which driver to use, which compression, whatever. You define your pyramid or grid, because, as I told you before, we are processing in tiles, so every input data has to be brought to the same grid, or to the same pyramid. So you have to define it here, and, of course,
05:03
you can also add some custom parameters for your process function. The process function is being stored in a Python file, so we just need a Python file containing an execute function, and this execute function can open the data and do whatever you want with it, and then you return
05:25
either a numpy array if you write the raster output, or a GeoJSON-like feature list for vector outputs. The tile directory output is something we borrowed the concept from WMTS.
05:41
Ivan was mentioning the web maps, so web maps are split up, as you know, in tiles and process levels, and so when you access, for example, OpenStreetMap, your browser would query PNGs or JPEGs according to a certain tile directory structure. So the first number you see here
06:02
is the zoom level, the second you see here is the column, and then there's zero, and we simply use geotiffs or x-ray dumps or whatever to store the data. It has some big advantages,
06:20
so, for example, multiple processes can write into the same output because it's not a single file, it's based on many files, so you can do a great parallel processing, and you can store really, really large data, so we are processing Sentinel-2, cloudless mosaics, so we have a worldwide 10-meter coverage,
06:41
and we store it in this format, so it's not one big tiff, but it's thousands of tiffs, and if we know the structure and the tiling scheme, we can easily query the tiles we need for a given bounding box. So you could say it's a little bit like a cog, but split up into multiple files because we have
07:02
internal tiling and overviews, but it might be an exaggeration as well. And, yeah, as I mentioned before, we are not bound to geotiffs, we can also do, of course, PNGs, as already possible, and geochasings, NetCDFs, whatever. And natively, Mobjetec can also read and write from
07:22
and to S3 buckets, which also helps with the cloud processing use case. So you could have various workers, and they all write into the same bucket, so that's very handy. It's being run via the command line.
07:41
So the command line interface has a couple of commands I will quickly pass through it. There's a command called formats, which simply lists the already installed formats. I mentioned the plugins before, here you can see whether your plugin already installed or not. Likewise, with processes, we can also register processes, so you don't have to provide the path to a Python file,
08:03
but you can also point it to a Python module path, and if you register it into the main package, then it gets listed here. The create command simply creates a new Mobjetec and dummy Python files, so it's been there to make it easier
08:25
to create a new process. There's a serve command, so the serve command is very interesting for debugging. It starts a flask server on your local host, and then you can open the browser and open layers,
08:42
a map would show up, and as you zoom and pan through the map, it would actually process the tiles and show you your current process outputs. So it's quite handy when you're developing your process and playing around with scripts and parameters that you immediately see the results.
09:04
Then there's the main command, it's the execute command, it simply executes the whole process on your given AOI. Because T-Dog cannot read tile directories, of course, there's an index command, and index would simply create either a VRT file, or if you later on use it with map server,
09:26
you can also create a shape index file. Because the VRT is not really optimized for a large mosaic of data containing thousands and millions of files, so map server has this special format, it's a shape file,
09:41
or can be a geo package, where just the bounding boxes of your tiles are stored, and then it can easily read it. And with this index command, you can create it in one line of command. And there's a convert command, it heavily borrows from the syntax of rio-convert, so the ristiro command line tool.
10:01
It simply extends it by being able to read and convert to tile directories, and vice versa. So you can use it if you have a global dataset, so for example when we get a customer request, they want a certain country or AOI from our global mosaic, we use this command to extract a subset.
10:23
I'm now going to show you some examples, so the first, oh, it's barely legible. Okay. So the first example I wanted to show you is the hillshade, so it's mainly, or I think you could say it's the hello world or ndvi example of cartography.
10:41
So Ivan already mentioned it. Unfortunately, yeah, it's very, yeah, it's very dark. Yeah. So on the left side you see the yaml configuration, and you see that you can just provide the path to the python file. Then the next block should be the pyramid,
11:02
so here we have one of our standard pyramids, it's called the geodetic pyramid. You can set performance settings like meta-tiling, so it would not process the 256 times 256 tiles, but two by two of these tiles, which makes much of the processes a lot faster.
11:23
Then you have an input section, and input can be anything, so it can be a path to a TIF, either remote or local, or some special input from one of your plugins. And at the output, likewise, you just specify which of the output formats you want to have and which bands and some driver-specific settings.
11:42
Then you have optional parameters, so you can limit the zoom levels you want to process, because we are always talking about tile pyramids. And the last section is the process parameters, so in this case you have the hillshade parameters, so like the azimuth, the altitude, and the exaggeration factor, the set factor.
12:01
And what you probably don't see here, we can have zoom level dependent parameters, so what I did here below is to have the exaggeration factor of the elevation model being, it changes between the zoom levels, so the more you zoomed out, the more exaggeration we want to have,
12:21
because we want to have the mountains shine out. And on the right side you have the execute function, it's quite straightforward, so you have this magic MP object, which opens data. Then you have the hillshade function, which just takes this numpy array,
12:41
and then you return the numpy array and you're done. So my initial idea was to do a live demo, but it didn't really work out, so here you have a browser window, and on the top you see the command I would have to run. It's the map chat to serve, and then the path to the file. And then in the background,
13:01
it would process the tiles that you're looking at, so whether you pan or zoom in or zoom out. The second example is, the first example but with filters, so it also calculates the hillshade, but you can also import pillow and scipy
13:20
and do some gauss filters, median filters, to play around to generalize the elevation model a little bit, so to optimize for your visualization, for your hillshade output. It's really bad that you cannot see it. So I will have to explain.
13:42
So it just has more lines, so what it does is to convert the numpy array to the format that pillow requires, and then applies this filter, and then you go to the next step. So you see, the more complex your use case will be,
14:00
your process file would grow with it, but you can do a lot of stuff here. So this would be the output of the filtered mosaic. I have the time series. Oh, what I forgot to mention is you can also do clipping. So the reason you won't see any pathometry here is because we also provided a shapefile
14:22
and told the process file to clip the raster via the geometries. So we were using this a couple of years ago already to create the hillshadings of our WMS maps, and it was quite fun and quite handy to work with it.
14:41
What we also did was to extract contour lines out of elevation models. I will skip through this a little bit faster. What we did in our maps was to, we wanted to show the contour lines in different colors depending on the underground they're in. So, for example, on water, that should have been blue. On land, brown.
15:02
And we wanted to have light blue contour lines on glaciers. So what this process does is simply reads them, extract the contour lines, intersect it with a different geometry. So in this example, it's simply the natural earth land polygon, and returns the geochasen output,
15:26
which then we could, so this is a screenshot from QGIS. So I visualized the contour lines depending on whether they are over land or over sea. So you can do different visualizations here.
15:43
This is a more recent example because we are doing a lot of mosaics, so we are trying out different approaches, and here we tried out some image segmentation. So we are using one of our existing mosaics. One of the reasons I'm showing you the example is if you see on the top left corner,
16:01
you see a path to an S3 bucket, but it's not a TIF, it's just a directory. It's because it's a tile directory. So you can, out of the box, just point it to this input, and it would recognize it and the tiling scheme, and then read the data accordingly. So this would have been very cool to show live,
16:22
but we have to do it via screenshots here. So what we did with this example was to take the RGB mosaic and use a scikit-learn or a scikit function that does image segmentation and use the output segments
16:43
and visualize it in a nice way. So just draw the red outlines over the... Is it visible? Okay, a little bit. And here again, the nice thing was you can zoom and pan, and in the background, it would load the TIFs
17:00
and extract the lumpy arrays, run the segmentation, and then renders the PNGs, which then can be visualized in the browser. And here I've assumed inversion of book harvest. So this makes it really nice to fiddle around with the process parameters and how many segments to use and how big should they be, and does it even work,
17:21
or is the output the way I want it to be? And yeah, stuff like that. So wrapping it up, we switched to a plug-in-based system, so we try to keep the core package as small as possible, but using plug-ins to extend it
17:42
via processes or input or output formats. Therefore, we use the Python setup tools entry points, which is quite easy to use. And there's also a... It's also in GitHub, it's our X-Array driver, so you can try it out there. Simply put, pip install mapjetter and mapjetter xarray,
18:03
and you can read and write multi-dimensional data as well. There was already a PostGIS plug-in, but it's been written like four years ago, and I didn't manage to keep it updated since then. But it could be extended via this way as well. So you see, we can be very flexible
18:22
towards different types of data and data structures. Because Ivan mentioned GeotiffJS, we already... So last year, more than one year ago, we supervised the master thesis
18:41
that would... Well, the topic was first, can we read those tile directives in open layers, and how would it work, and then under which circumstances would it be efficient or not.
19:00
So I mean, it was already discussed, but for the use case that the user just wants to have one single visualization, it's probably better to have it processed on the server, or rendered on the server and just visualized on the client, but if the user wants to interact with the data, then it's better. But then you have to weigh the pros and cons,
19:22
because if you transfer the data, it takes time, so it depends really on the download speed. And if you want to interact a lot with the data, then you pay the price. If you don't want to, then the server simply gets slower. What we are now working on
19:41
is to make GeotiffJS able to read tile directories, because GeotiffJS now can just read single TIFs, but the idea is to point GeotiffJS to a tile directory, and it reads the tile structure, and while rendering the data, or while being provided with the bounding box, it should figure out which TIF to read,
20:02
and which subsets, and how to warp it to put it into the correct output. But there we still have a big chunk of conceptual work in front of us, because the current tile pyramid definition is very minimal.
20:21
Let's put it this way. It worked for our use cases the past years, but to make it interoperable with other tools, we might have to change here and there something of the concept. Yeah, and the target of this would be to have... So the OpenLayers interface I showed you,
20:40
it just works with PNGs, so the target would be to have GeotiffJS there so that your process output could be any type of TIF, and you could interact with and query it while your Geotiff tiles get rendered, or get processed by the server, and then you can directly look into the data,
21:03
because the other possibility would just have to be that you create a bunch of TIFs, run the vrt index command, and then open it in QGIS, and having it directly in the browser would be much more nicer and efficient. So next step and some follow-up talks I recommend.
21:24
So the next step would be to work on the GeotiffJS client. Before that, we have to figure out the tile directory spec. So if you think this data structure will be useful or would be useful for any of your use cases, then feel free to contact either me or Fabian.
21:41
We'll be happy to get some help there, because more brains working on the same problem is usually better than having just two. And what's also on my agenda is to implement the possibility to write single geotiffs with custom overviews,
22:01
meaning that the overviews in the geotiff are not being generated from the base level, but you can write each overview individually with different settings. So for the hillshade example, this would really be nice, because then you could store the hillshade in one TIF, and depending on the zoom level, it's a little bit different, at least for this use case.
22:21
It would be great. Yeah, follow-up talks. Fabian is the creator of GeotiffJS. He will talk tomorrow afternoon. So if you found these talks in this session interesting, you should go there. And also tomorrow in the same session, there's my colleague Peter,
22:40
going a little bit more into detail on our cloudless processing. And yeah, it's also, I think, in the same session tomorrow afternoon in the Oprah room. So thank you for your attention. And yeah, feel free to get in contact if you have any questions or suggestions.
23:06
All right, so I think we managed to catch up a little bit, a little delay in the display at the beginning. So we're in time. We don't need this five minutes of changing rooms now. So you have five minutes for questions.
23:24
So yeah, I think for everyone that has been working with different sources and bringing things together and larger data sets, this can have or has some potential or you see some potential in this tool.
23:41
So maybe give it a look. And so if you have some questions for Joachim. Yeah, sure. So the time you are going to work on this presentation, but right now is it like a similar approach like when you should like to type? Like you started as a data item and then write type?
24:01
Yeah, currently we follow the WMTS simple approach, which just defines the Web Mercator and the Geodetic Tiling Scheme. And yeah, we added on top, you know, the possibility to store the tiles in meta tiles
24:20
and even using the paddings, the pixel buffers. But we, yeah, we just wanted to define it a little bit more explicit. So there's a Python package in the core of MapChatter called tile matrix. And the possible type of definitions there
24:43
are very, very basic. So not at all the whole scope that WMTS, for example, provides. But it can do whatever projection you want as well. It's just not that obvious to use a non-default tiling scheme. And we want to improve on that.
25:03
All right, so many more questions? Actually, all right, yeah. I wonder if you can give us some, for example, how long it takes in general?
25:24
Well, it takes just a couple of seconds. Okay, sorry. The question was whether I could provide some performance information on, for example, how long the hillshade would take. Yeah, it's just a couple of seconds, yeah.
25:45
I don't know for the quick view. I mean, it strongly depends on what the underlying geotiff is, so I usually create the geotiffs with overviews. Because otherwise, if there are no overviews and you try to assume level three, it will take a while.
26:01
But then, it's super fast. I mean, for the heavy lifting, we are using Rasterio, which has GEDAL on the bottom, which uses C. So the warping is fast. Then, we have access to NumPy arrays. So if the algorithms are fast, the thing works, yeah.
26:25
But as I mentioned, the tiling scheme or the meta tile setting also plays a big role. So usually, it's better to have bigger tiles than to have many, many smaller ones.
26:56
So the question was, if there is a big tiff, whether there should be some pre-processing
27:00
in order to be used with this library. Short answer, no. So everything that can be read with Rasterio should work with MapChatter as well. Long answer, yes. You might get some more performance if you use internal tiling and overviews.
27:25
So the question is whether we use multi-processing to process multiple tiles, yeah. So we use either the multi-processing package, or we can alternatively use billiard, which is the multi-processing fork used by Celery so that we can have a Celery worker use
27:43
a spawn multiple process as well. So the question is whether it's possible to distribute it
28:01
on multiple workers. Well, we have a wrapper around it, but it's not yet open-sourced, which we might do in the next months. Maybe, maybe not. I don't know. But basically, yeah. So you have your MapChatter workers, and you send them a process, and they will do the same.
28:23
We don't distribute the tile processes themselves, because on the large-scale processing, we found out that it's better to cache some of the data, and therefore it makes more sense to have one zone with multiple tiles being processed by one worker
28:42
that has cached data. But yeah, there's something in the pipeline, yeah. All right, so I think one last question, depending on your necessity for coffee. All right, so great.
29:02
I think thank you very much all for your time. Thank you for all the great presentations. And yes, I guess I see some of you in a little bit after the coffee break. So yeah, thank you all.