Mapchete - parallelized batch geoprocessing using Python

Video thumbnail (Frame 0) Video thumbnail (Frame 967) Video thumbnail (Frame 1618) Video thumbnail (Frame 2539) Video thumbnail (Frame 2969) Video thumbnail (Frame 5018) Video thumbnail (Frame 6171) Video thumbnail (Frame 11529) Video thumbnail (Frame 13309) Video thumbnail (Frame 19372) Video thumbnail (Frame 20334) Video thumbnail (Frame 21384) Video thumbnail (Frame 22061) Video thumbnail (Frame 23757) Video thumbnail (Frame 24185) Video thumbnail (Frame 32778)
Video in TIB AV-Portal: Mapchete - parallelized batch geoprocessing using Python

Formal Metadata

Mapchete - parallelized batch geoprocessing using Python
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Processing geodata can be fairly simple until the input data reaches a certain size. Creating a hillshade or extracting contour lines from a DEM can be done quickly, but if you want to do this with e.g. the global SRTM dataset (1296001 x 417601 pixel), the process will crash (unless you are visiting from the future). Besides, if there are additional steps required like clipping the data to the 400MB landpolygon behemoths from OSM or applying custom filters, you probably find yourself starting to write your own tool chunking the data. mapchete tries to solve this issue by helping you to focus on developing your geoprocess written in Python and applying this process to the data. It does so by automatically reprojecting and chunking the input datasets into tiles (based on the “WMTS simple profile”) and running your Python process for each tile individually and in parallel on all available CPU cores. mapchete offers two command line tools. mapchete execute runs the process on the full dataset, similar to tile pyramid seeding for map caches. mapchete serve hosts an OpenLayers interface and processes only the data in areas and zoom levels you are currently inspecting. This allows you to test and assess your process on the full dataset on the server, instead of clipping and downloading subsets on your laptop. mapchete is used as the data preprocessing backbone of EOX Maps, a service which provides background maps for example to the European Space Agency.
Computer animation
Revision control Point (geometry) Stapeldatei Server (computing) Computer animation Mass Field (computer science) Spacetime
Point (geometry) Area Metre Software engineering Digital filter Mapping State of matter Line (geometry) Computer-generated imagery Mereology Neuroinformatik Medical imaging Process (computing) Pixel Physical system
Stochastic process Multiplication sign Execution unit Open set Shape (magazine) Parameter (computer programming) Function (mathematics) Sphere Bound state Computer programming Subset Different (Kate Ryan album) Feldrechner Vector space Row (database) Process (computing) Texture mapping Software developer Stochastic process Electronic mailing list Bit Instance (computer science) Type theory Latent heat Process (computing) Vector space Raster graphics output Configuration space Energy level Cycle (graph theory) Pole (complex analysis) Laptop Server (computing) Mapping Line (geometry) Web browser Number Energy level Utility software World Wide Web Consortium Module (mathematics) Information management Zoom lens Dialect Server (computing) Projective plane Sphere Subject indexing Function (mathematics) Optics
Information Validity (statistics) File format Plotter Projective plane Vapor Function (mathematics) Revision control Vector space Mixed reality output Energy level Configuration space Endliche Modelltheorie Posterior probability Form (programming) Alpha (investment)
Classical physics Functional (mathematics) State of matter Stochastic process Dot product Multiplication sign View (database) Source code Set (mathematics) Parameter (computer programming) Web browser Open set Bound state Emulation Latent heat Lecture/Conference Boundary value problem Process (computing) Configuration space Endliche Modelltheorie Abstraction Module (mathematics) Rule of inference Addition Zoom lens Programming paradigm Dialect Simulation File format Stochastic process Electronic mailing list Independence (probability theory) Parameter (computer programming) Total S.A. Process (computing) Vector space Raster graphics Personal digital assistant Function (mathematics) Configuration space Right angle Energy level Musical ensemble Reading (process)
Email Metropolitan area network Functional (mathematics) Digitizing Image resolution Perturbation theory Special unitary group Arm Element (mathematics) Vector space Different (Kate Ryan album) Uniform resource name Quicksort Drum memory Capability Maturity Model Wide area network
Area Service (economics) Pixel Slide rule Multiplication sign Amsterdam Ordnance Datum Twitter Frame problem Dimensional analysis Number Twitter Revision control Schmelze <Betrieb> Series (mathematics) Resultant
NP-hard Classical physics Pixel Fibonacci number Stochastic process Multiplication sign Process modeling Water vapor Insertion loss Mereology Dimensional analysis Product (business) Number Medical imaging Preprocessor Term (mathematics) Different (Kate Ryan album) Mathematical optimization Physical system Mapping Tesselation Ökonometrie Projective plane Mathematical analysis Electronic mailing list Sound effect Parallel port Bit Limit (category theory) Demoscene Type theory Data management Arithmetic mean Process (computing) Computer animation Personal digital assistant Internet service provider Buffer solution
Chen L. OK so like OK welcome
to the sessions this afternoon and it cheerfully next 3 sessions and and Sean was supposed sessions static and make it but I'm here have 1st talk is going to be a cherry often OK thank you very much you have lower body and then the him and work as a couple refer X will be in the
space company and it's really true we're mainly stuff with the European Space Agency's on yeah so we're working mainly in the Earth Observation fields on but we're also going methods that kind of him and so we also offering global background that
so far all services and for some services of 47 you can given that we have w mass and points and currently working in a new version the and here I want to talk about our tool we've we've been developing in the last year to create 2 people sets that as you might know and if you ever start the that in the recent years has become it became unfairly easy now with tools like kind for example so long and there are a lot of great
tools out there where can just say OK it's the data here must i fires and then the schools would with provide you with that and I a if you for example and want to make entering the thought of initiating ready then you
somehow have to get the situated and of course this G all the commandments
was computer they can do it but it's probably not a good idea if you applied to the velocity and that this is because the system is really huge it's 30 meters globally and yeah it wouldn't work the same would apply if you want to compute contour lines for example so it won't work because 1st it's not really a viable because they're in the process of so big and the 2nd point there's no customization available so if you want to tweak you shading afterwards by applying the image because for example or doing some tripping by the lands that wouldn't work and well and when generating and met the permits wherever introduce a couple of years ago so if you if you go to OpenStreetMap you basically so basically the global map is captain ties and you just get served the the area in this state and you you requested which is a pretty good approach so that they that with its own by simply attracting the data into smaller the small
parts which are more easily traceable for processing there
number 2 basically there's this year we have our data and we define how on that look like and then the tool will provide them so we thought I mean that would be great if differences again all input data and then some kind of a process of understanding but uh any kind of due process and we would get our process optical type or whatever and deal with thoughts this approach worked about maps so there must be something out there that what do with the but yeah nothing really to the found so we started to create own and it's got lecture and it's a tool written in
Python I mean it could tell you a long list what I think great and by reusing the main reason we started with it because of I started with it is unacceptable for not really used to programming and about consumer thing and comfortable with itself that was piped about on the other hand I've seen that's has a really great ecosystem so you have a lot of modules that will help you with any kind of she processing costs to 1st IP package for example on on the books for the rest of the array processing for shape units and vector processing I 1 actually basically what it does it it chart to input data into tires and then applies any kind of user-defined processes you right the process and think and applies this process to every single time and under that we're using Listeria fuel to read the rest of the vector data which are great tools both by the way and we support so both the sphere directed the projection so that mostly use projection and also the WGS 84 that long I could say which we use a little bit more because we rather have to show global data and then this projection is slightly better because you can also cover the poles examples well I mentioned comes with 2 command line utilities and the first one is that can be served so what it does is it starts a so it's wouldn't flask and then you can open a browser and then go to put 500 and then you have an opening as instances and as you assume thorium met and can we mapped the ties are being processed only 1 and there's also a can overwrite flex if you turn this and then you can and edit your coat then at the real apartment immediately see what's what affects you and your editing had only cold and so it's it's very very easy and great to to immediately tests what you have is usually and if you have a big amount of data then you would even put it on the server and style motor regions of interest and processes and locally on a laptop or to the processing on the server and then download the outputs afterwards which is really messed up with this metric to serve and you're much faster new development cycle and there's a 2nd tool it's basically the same amount of support vector the execute it takes the same process and it can process the full time parameter or you can limit at a certain level so you can say I just want to live 0 2 8 or simply processor-specific time if you know the time index or of course you can do this patient subset as well so you can kind of just randomly in this on books so this is the configuration should look like it's a it's a young modified and you need some certain input values and of course the
path to Europe I think 5 of been in the input forms here using some and in the rest of it the vapor and some information on the on the output format so the pathway has to be stored in and performance we currently supporting PNG than especially PNG versions of whom we develop between shaving what it does here is that it writes the history information into the alpha channel and mixed evidence black because then you can only the ancient away in a kind of other things or other metal layers we also support of course due to so we use this to start on until innovation models or used duties posteriors for vector data and goes experimented in bounding the wrong has not been plot of yeah you specify specified you out the projections of the you Chittagong mechanical and get the datatype conventional data on what you will need for us to us that is the same thing and then there are also options for months so you
can specify the and that some level of validity
of the sensor for example you might check the process can be only available between so nervous 0 and 8 examples or you can do with spatial bond here as well we also support me the time in which means that in Connecticut and that you can say it it should not assume made of having basically Chester and increases the parasites size so on on on the web client it will still be there for small price but in the bank and it will and the size but it's it by there's 64 times which in some cases really makes the processing fast and then of course you can you can specify the only you're on your own and process specific parameters and we also developed that the basic configuration possible of which would and all you have to to specify so independence some of its attendance and parameters so for example for the region if you want hi exaggeration of the innovation will before calculating the insulin is below 5 for example then you can specify so it always takes so when when processing inside it always takes the configuration view of this some which comes really handy if you're dealing with differently simplified fixed data right and yeah and this is how you process 5 should look like we tried to to use the same paradigm WPS that's basically if it was a classic of here and then you execute function and you can write this again anything here when you execute function what you would like you can import any kind of Python modules you need for your processing the open data there's an open function which can be used both for vector for us that data it also has a as a function which will return true if the time of source data is empty or not which and if few processes depending on the state so you can immediately creates makes it faster and easier you can for example to read RGB data you simply write this it's yeah it's very similar to to Wisteria and how the government or observe the set thing and there there's also a global right function which will so you don't use a which of the get top few bands of using less data or a single or a few if just 1 person or you can provided with a list of features and then it would simply write it this and make the so depending on how you define the of so we didn't really try to make it to make it very very easy so that you don't have to care about reading or writing and simply focus on what you want data and there are also some common functions we interviews so there's a function which contained in and around contour extraction function and the kids so they tip function you can be provided with the most the data and make the dataset and take the data also you can specify some prophesy of so you can say I don't want the exact coastline but so do the above around the coastline and so it was at the start of the include of this this could be extended but for our purposes at least these 3 3 functions very used in here what did you use it for comes as a total the format
so we use the the list of of of addition models collected by the open terrain community and treated only global innovation models based on past and at the end and some regional data sets IV is captured a to merge this simulation data appeared the coastline boundaries and also combine it with some the data is is it's from Kindle books to see so innovation but we also trade at all Unix-specific initiating
so we worked on this rather not tried to but try to get as close as possible to the great Swiss toppled yeah it's it's not that easy forget it so but we are quite satisfied with it but they to get this shading normal he should function wouldn't work because we use a initiates from different resolutions and combined combined them and then applied some median filters to get the to get the best although of and digits the and we also used it's sort of want to show you the
vector functionalities animal whether it can be seen but we actually having a multicolor complements elements so we they make the complements workplaces so like the great stupa itself and the complements of brown on on that but we also use the tool that does the tipping and the intersecting glacial and we also and back into
the sentence to data so for this example we collect that all the numbers for I think it guess it was a time span from 2 or 3 months and sold at the pixels by brightness and then the starts of the big that's most likely to be not null the yeah and for this version we have to we experimented with the non pumping because when you don't belong array you can you not about 2 or 3 dimensions but it can also you can basically dump series erased yeah this is another area most to technology especially during the time but we are only for for a 1st version quite a satisfied with the results so this is basically a pixel-by-pixel mostly from uh within the time frame of 3 months yeah but it is available on IP as well so you can use it to install it
and oriented but this will and it also has a Twitter accounts and yes from the thank you very much at back call let's go
into question hard aquestion minute back 1st cool things as a great talk and had a question around I think a red and the blue there the light much as you can run like in parallel with each is and some kind is that much faster than running a serial ways it because I know that Python is sometimes not take the great parallel processing has a playable and have to look at a yes it's definitely faster and than I mean it depends on the kind of process here if if there's if the bottleneck is this guy 0 for example it was like much sense yeah sure but for most of our use cases it worked well although we to treat lot with them at the time size and with the numbers of processes right can but could thanks and you would be about the depth reaction and then acknowledged support only a straight America at the end of the UGS shouldn't 84 is possible also to add that the production system over a month i'm leaving for itself but months yeah yeah OK so well it should be theoretically possible an the following use cases reaches communities to protections so we that maybe we do it all the way I would have observed that tell you what we what we did as well so far for the global projections is if you read that I with a buffer near the optimal then it take the data from the other side as well as to get together and this will make it little bit difficult to to define your original econometrics I mean it will be possible but the article that well thank you very interesting interests very interesting tool I'm just wondering why didn't use directly by the and called it a lot of stuff that is already there uh yeah mean of found harmful to the limit laid I didn't think it about it's that kind UPS could do this and then there was in the middle of has everything and I really had another question which technology used for paralyzing multiprocess seen near with processing model that so I'll just leave you with the challenge to integrate this week by P is for each you can have a fair at bit reprograming question in the scene is that you have multiple tiles and you to process in parallel and then you go look into the towns are apply map now some of the questions that have you tried running this and other we're by interpreters like by by no so that a young and flies in the beginning I calculates the ties that intersect with labeled data so but I don't from what I was that some of you have to remember that the selection in for and then I starts probably Fusarium loss of we used by by In the last part of it is just again didn't diving into that some of the text so we have time for more questions by the way so 3 4 or more questions for remarks are OK and you have to been you know how did you get classical list images he just the select by handle while local girls 5 took all the images I could get and then there's a great book post a lot from that books about it so that they do the same with lots of data so you basically build up pixels texts and then you to explore pixel sorting by and brightness because the brightest pixel would most probably be cloudy and the darkest long would most probably have culture as in and then you have to treat around its some pixel in the middle that's almost 5 there thinking you if I understand correctly it's mostly for mapping that in doing this some preprocessing from here but if you go a bit further in terms of analysis for how do you handle borders you have because you can have types of processes where you can order effects if you don't work as overlap on some of the dimension of the you can provide offer value around the it's required for initiating for example because they shooting process it always leads body the water so intense as manager at the optimal of it retrieve data from the other side the difference OK anyone 1 last question OK then was finishing the session I think you have