OSM planet data to vector tiles in a few hours: OpenMapTiles & Planetiler
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68995 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Firenze 2022248 / 351
1
7
13
22
25
31
33
36
39
41
43
44
46
52
53
55
58
59
60
76
80
93
98
104
108
127
128
133
135
141
142
143
150
151
168
173
176
178
190
196
200
201
202
204
211
219
225
226
236
242
251
258
263
270
284
285
292
00:00
Elasticity (physics)Slide ruleVector spaceElasticity (physics)TesselationPresentation of a groupPerfect groupMapping
00:29
Vector spaceSinguläres IntegralMassPersonal digital assistantAxiom of choiceDirection (geometry)Level (video gaming)MathematicsRight angleGoodness of fitAuthorizationTerm (mathematics)CASE <Informatik>Electric generatorBoss CorporationProcess (computing)Library catalogMappingProduct (business)Axiom of choicePoint (geometry)Point cloudProjective planeStrategy gameTesselationAttribute grammarTable (information)Variety (linguistics)TessellationVector spaceType theoryNeuroinformatikLatent heatOpen setArithmetic progressionLocal ringFlow separationMilitary baseFitness functionFunctional (mathematics)Presentation of a groupInstance (computer science)Mean value theoremFrictionNumbering schemeOpen sourceComputer iconMusical ensembleImplementationPosition operatorMaizeComputer animation
06:41
Computing platformVector spaceMultiplicationCodeOpen sourceMean value theoremData conversionNatural numberElasticity (physics)GeometryCache (computing)OracleOpen setCountingType theoryData typeNumberAreaWebsiteTotal S.A.Row (database)Level (video gaming)TesselationOpen setProjective planeVariety (linguistics)Logical constantMathematical analysisPrice indexBuildingDifferent (Kate Ryan album)Open sourceQuicksortPresentation of a groupMultiplication signComputing platformBitIterationJSONXMLUMLComputer animation
07:46
Source codeOrder of magnitudeBuildingAreaElasticity (physics)Number theoryElasticity (physics)Level (video gaming)Open setOpen sourcePoint (geometry)Computer animation
08:11
Vector spaceQuicksortDatabaseCASE <Informatik>TesselationProjective planeSystem callData conversionMean value theoremOpen setLevel (video gaming)Computer animation
09:37
Computing platformMultiplicationTable (information)Function (mathematics)IntegerZoom lensInheritance (object-oriented programming)Computer networkSurfaceThermal expansionRankingFloating pointTransportation theory (mathematics)BuildingBoundary value problemMean value theoremRippingJava appletOrder of magnitudeOrder (biology)BefehlsprozessorProcess (computing)Server (computing)Order of magnitudeStatisticsComputer clusterComputing platformStatement (computer science)Water vaporRight anglePolygonMultilaterationTable (information)AlgorithmMathematical optimizationZoom lensFile formatVirtual machineCASE <Informatik>CodeSingle-precision floating-point formatElectric generator2 (number)MathematicsImplementationQuery languageLevel (video gaming)Open setTesselationFunctional (mathematics)GeometryAttribute grammarDatabaseTwitterCapability Maturity ModelScalabilityPerpetual motionMultiplication signCompilation album1 (number)Computer fileJava appletBuildingLaptopSystem callOperator (mathematics)MappingMean value theoremXMLComputer animation
14:05
GeometryComputer multitaskingQuicksortComputer fileQuicksortParallel portPort scannerResultantVector spaceOperating systemThread (computing)TesselationCore dumpKey (cryptography)Process (computing)TessellationGeometryFunction (mathematics)BitArithmetic meanDiagramProgram flowchart
15:38
Process modelingReading (process)Computer fileSource codeCore dumpArchitectureBefehlsprozessorPhysical systemJava appletFeedbackOnline helpShape (magazine)Function (mathematics)Process (computing)Shared memoryDecision theoryOcean currentDivisorGoodness of fitValidity (statistics)StatisticsPoint cloudNumberImage resolutionVirtual machineIntrusion detection systemCASE <Informatik>Computer animationProgram flowchart
Transcript: English(auto-generated)
00:00
Perfect. Thank you. Hi. Hi, everyone. My name is Yuri. I work for Elastic. I do maps. So is everyone else here? I know. All right. Today I'm going to have a little presentation about how to get from raw data to the vector tiles. We all know what vector tiles are.
00:20
I'll try to go briefly into them, but not enough to bore you. So to the vector tiles and beyond. The what and the why. You need to show a map. It has to be zoomable. It has to be really fast, responsive, have enough data to go into details, but not enough
00:42
data to make your computer extremely slow when you're kind of getting the overview. So the tiles. The main goal is to basically transform the raw data, usually OpenStreetMap. We don't have anything better really. As far as open and good quality and everything else and community governed and many things
01:05
like that. The problem is that vector tiles are not really standardized. I mean, we all kind of agree that NVT is the way to go, but what's inside the tile, we haven't really agreed. So and performance is also kind of important.
01:21
You know, you want to generate those tiles really fast, otherwise, well, your boss or your bank account kind of comes knocking on your door saying like, how come you spent three years worth of Amazon or Google Cloud instances? We all have that experience, right? Yeah? No? Yeah. Okay. Some of us did.
01:40
I didn't. I'm very frugal. Yes. Anyway, the choices. There's OpenMap tiles. We just had an awesome presentation by Tom right there in this room about OpenMap tiles. Wonderful project. I've participated in that since a very, very start. I was there even before it started kind of hovering there in the clouds looking.
02:04
There's also Shortbread, which is a newcomer, very, very rudimentary. There is a product map specification, also fairly rudimentary, but does a lot of wonderful things. We have the author right there. There's bare maps.
02:21
There's many other up and coming. It's so easy you can do it yourself really until you realize there's some corner cases and more corner cases and more specifics for that specific region because in the OSM, they decided to map differently of Lithuania compared to all the other countries. We all know how that progresses.
02:43
You always have a lot of expertise on how to do locale mapping and then you need to put it all into your vector tile generation to normalize it. OpenMap tiles. There's pros, there's cons as any other project knows. It's a very mature project.
03:02
It has been around for, what, seven years now, six, seven, something like that. Has a huge community adoption, huge customer adoption. There's a lot of paying customers to MapTiler and a few other projects like strategy map, for example, uses the same schema and quite a few I don't recall right now what their
03:20
names are. So it's pretty popular. There's some cons. There's no community governance. It is, I'll get to what you mentioned last in a few minutes. There's no community governance. It's limited to just the base map, the one style. Like it has to be, one solution fits all and it never does.
03:41
So we need to go beyond that. It has to be, it needs to become like a la carta where it's a catalog where you pick and choose what you actually want. And it's CC with attribution so that also causes some friction. Yet, thanks to Tom and Peter and many other wonderful folks at MapTiler, they just
04:04
announced about 10 minutes ago that they are considering to change the schema to CC0. Yay! I'm very excited about that one. It should become a catalog of layers where everyone can say, I want really, really detailed points of interest because my barbershop is critical on that map.
04:24
I just like, yeah. I got to have my barbershop. There's open governance where people will be able to actually decide as a community. This is how we want to progress. This is the tooling we need. This is the schema changes we allow or don't allow.
04:43
This is the flexibility we want to provide to the users or we don't want to provide to the users. So there's all these wonderful things that MapTiler is putting on a table and I'm extremely excited about that. And there's some work hopefully will happen in that direction.
05:03
There's others. I mean, Shortbread, Preto Maps, others. They're fully open source, CC0 licensed from the beginning. That kind of gives you a very solid, good base to go forward. They have very good ties with OSM community in terms of most of the people who created
05:25
it were coming directly from the depth of OSM community, people with very good understanding of how the process works. And they have very clean separation of specification. This is what schema contains and how to get there.
05:43
If you separate those and publish it, you can have multiple implementations, which is another positive thing. There's cons. There's no community governance of those either. It's usually a few people or one person running the project. Wonderful person, very smart, intelligent, but you can never cover all the bases with one
06:02
person or a few people. It's very severely limited in functionality. There's not that much, simply because those are much less mature. And they're not widely adopted. And they usually also single solution, fits all approach, which means if you want, again,
06:21
your favorite restaurant there or a patisserie or they have amazing desserts here, you're out of luck because the authors of the map decided to have just one type of food place and it's like some food icon, like probably a burger, and not very good, especially for Italy, which has such a huge variety.
06:42
So open-mapped tiles overview. Now I'm going to go deeper into the actual projects. How much time? We're doing good on time? Yeah. I'll speed up a little bit, not to bore you. So it's an open-source platform, lots of support, Dockerized, I mean, there's a lot of goodies there. I'm not going to go too deeply into it simply because there were so many other wonderful
07:03
presentations about open-mapped tiles. One of the best indicator of the success of the project are these little fork and star indicators on GitHub that just shows you how wonderful these things are. They're used by a lot of different customers and companies, including Elastic, which I
07:22
work for. It's been running for over five years. There's active community participation, constant improvement, constant iteration, lots of wonderful tooling. They can even generate everything. This is how we use it in Elastic, by the way, in Elasticsearch right there, that map is based on open-mapped tiles to show buildings and all sorts of other things, to do in-depth
07:45
analysis on the dashboard. Or this is actually something that Jorge right there did, he's my colleague, even though he's very active in open-street-mapped community as well, and just geo-community in general. This is the, ooh, it's the Kenner Island volcano eruption, and lots of data points,
08:06
and Elasticsearch is capable of processing all that data and visualizing it. The end of the advertising. Tooling. How do we get there? We go from OSM to database to tiles, usually. So what you do is you take either Impossum, well-known tool, but kind of stabilized by
08:26
now. Let's call it stabilized. It has not improved in any way in the recent five-ish, five years. It's still a great tool, but it's not moving forward with the new features as much as anyone would hope. And there is OSM to PGSQL, which was practically dead for a while, until some wonderful people
08:43
revised it from the dead and added Lua to it, and did all sorts of other wonderful things, and now it's extremely fast and does all sorts of really good conversion from the OSM data dumps into Postgres databases. We all love Lua. I do.
09:01
And then the second step. How do you convert from the database from Postgres, in this case obviously, to the vector tiles? Well, ST as MBT was really a blessing for all of us to instantly generate any vector tiles from any of your data from Postgres. It's just been amazing.
09:20
There's also a new kid on the block. I'll talk about it later. And there is a Mapnik. Anyone use Mapnik? I am so sorry. I mean, it's a wonderful project, but yeah, that was an awesome, awesome project. Rest in peace. Open map tiles in depth. So there's like all these steps to get started.
09:42
You just run it. There's a make file, and it just runs everything in Dockers, and it just works. It's a very mature platform. You can customize it. You can change layers, but again, those will be your layers, your custom layers, not the original ones.
10:00
And it's all done by Postgres now. You can just throw tons of Postgres servers at it. Postgres servers in parallel will just run ST as a VT and massively, massively generate those tiles. Please watch out for your Google, Azure, or other bills coming in later.
10:21
So that gives you the horizontal scalability, because the more servers you have, the faster you generate your tiles. Yay. In Elastic, I think I spent about three to four days with three to four really, really, really large Google Cloud machines to churn through all the data. And there's still a lot of optimizations in there to skip like oceans if like a lower
10:43
Zoom has the same, like if the Zoom 12 had water in it and nothing more, I just assumed that everything underneath it has probably water or not important anyway. So it solves quite a lot of generation time, and this is how it looks inside.
11:01
So basically, all of the OpenMapTiles code gets compiled in this gigantic SQL statement. When Postgres people saw it, they're like, you have how many functions in the one call? We never anticipated this use case. We cannot optimize it. Actually, please turn off the just-in-time optimization because it actually becomes, instead
11:21
of 400 milliseconds, the query becomes like six seconds. They're like, no, no, just turn it off. They couldn't optimize it. Yeah. Basically, the internals of it look like this. Each layer is ST as MBT, wonderful function, which in turn converts the geometries
11:42
and then just puts together all the actual attributes from the needed tables. Community, active community, as I said, lots of contributions, including from myself. And then we had this wonderful Planet Hour thing.
12:00
When it came over, we all, I just love this picture. Peter, don't worry, there's going to be a redemption right after this. So the way it worked is Planet Hour, this guy from Twitter, Michael Berry, came over and said, well, this is not what I do for the company he works for.
12:22
I just like maps. I'll just hack together a little to convert all of OSM database, which is only, what, 80 gigs now in compressed, highly compressed, right above formats to tiles. And he did it. It's only two plus orders of magnitude faster compared to Postgres approach.
12:43
And yet, it's not dead. So there is a lot of hope because of all the changes in the licensing and governance and all the other things, OpenMapTiles is still very relevant to all this because OpenMapTiles defines what we need to generate.
13:00
Planet Hour is just yet another wonderful tool that will make it really, really fast. Well, not will, it already is making it really, really fast. It's basically you generate the whole planet on a beefy machine in under one hour. Single machine, laptop, three, four hours if it's beefy. That's just, as I said, two orders of magnitude faster.
13:23
Written in Java, I was surprised myself. Apache 2.0, there's some statistics there. Schema is now defined in YAML, plus some complex stuff is defined in Java. So like complex operations, like building merges. For example, if you have lots and lots of small buildings, it's kind of hard to define that as a YAML thing,
13:43
but then you say this is the algorithm you use for merging small polygons into a bigger one. And there's already two implementations that support Shortbread and OpenMapTiles. Prodmf, PMTiles support is coming, so there's a lot of active work on that.
14:05
Well, how does it do it? Do we have techies here? Any techies? No one? No one considers themselves a techie? Okay, a few, all right. So there's some technical understanding. Let's go a little bit in text. So what does it do? It takes a geometry, a lake, a street, or anything really,
14:24
and it just works with one geometry, it slices it and dices it into vector tiles. But not vector tiles, it's like vector tiles slice. Like just that one geometry as will be represented in the vector tiles.
14:43
And then it just puts them together with a special sort key into a dump file. And because this is a very simple and independent process for each individual geometry, it can do it massively in parallel. So each geometry just gets sliced, dumped into a file. Each thread can dump into a separate file just to make the operating system happy.
15:04
And then after all that is done, there's like a merge sort. It just takes all these files, because they're, oh sorry wait, it sorts each file individually. And then it merges all these files together, producing the result. And because they're all merged, it means that at the tip of each file,
15:25
when it kind of scans them all in parallel, the needed tile is like right there. So it can just kind of concatenate all these pieces together and dump them to the output, which is, again, extremely efficient process. This is slightly more in-depth.
15:41
Again, I don't want to go too far into the details, but basically it does the resolution of the, how many people love the fact that we have nodes and ways and ways refer to node IDs, and then you have to resolve from no ID to point. Oh, you love that?
16:01
I liked it from the start, yes. That was a very, very painful process. We all would love to get rid of that. And we all know that it was a bad technical decision. I've made my good share of bad technical decisions. I'm very proud of. So this was one of them. And hopefully there's work by Johan Ottav, who is trying to get rid of this.
16:25
And there's a lot of other suggestions that are, hopefully someday we will get rid of it. For now, we're doing like two-pass resolution, and it goes into workers that really slice and dice things, and then it goes into chunks which get sorted, and those sorted chunks get dumped into the output.
16:44
One hour is over. Which means you really don't need all this updating stuff unless you're OSM itself. I mean, it's a valid use case, but it's just very rarely, because one hour is fast enough to regenerate the planet. An hour later, you just run it again if you have enough money in your cloud account.
17:07
Current performance, there's some stats at the top. Basically, if you have a beefy machine, it's already pretty low. I think Mike got it to even lower numbers now. It's quite impressive, as I said.
17:22
And Planetar, we're getting involved. Well, try it out. Play with it. If you love Rust, talk to me. Maybe someday in the future we'll rewrite it in Rust just for the coolness factor. If you love Java, please contribute already, because that will give us a good blueprint of how to implement it in Rust.
17:41
Yes, that's pretty much it. Questions?