Building a national vector tile set for the Netherlands
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 295 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/43469 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
Vector spaceWebsiteRandomizationVideo gameExtreme programmingOrder (biology)Lecture/ConferenceMeeting/Interview
00:26
Vector spaceStreaming mediaBuildingBitGeometry
00:59
MappingCASE <Informatik>Set (mathematics)
01:22
Lemma (mathematics)Link (knot theory)Menu (computing)Online helpPolygon meshDrum memoryBoom (sailing)Smith chartSimultaneous localization and mappingWiener filterMeta elementScaling (geometry)1 (number)Set (mathematics)
02:07
FeedbackView (database)Social classUniqueness quantificationGeometryLocal GroupInclusion mapData typePolygonDigital filterRegulärer Ausdruck <Textverarbeitung>Line (geometry)Spherical capAxonometric projectionPattern languageClique-widthMappingLevel (video gaming)RandomizationComputer fileMultiplication signOpen setConsistencySource codeComputer animation
03:26
ConsistencyBit rateConsistencyType theoryLevel (video gaming)Graph coloringComputer animation
03:43
Avatar (2009 film)ConsistencyLevel (video gaming)Source codeBoundary value problemInflection pointMappingSource codeSet (mathematics)Type theoryDifferent (Kate Ryan album)Data modelTessellationLevel (video gaming)Zoom lensConsistencyRootForcing (mathematics)Complex (psychology)Water vaporArithmetic meanSystem callAreaGraph coloringComputer animation
05:47
MappingElectric generatorData modelPanel painting
06:15
Data modelForestForcing (mathematics)Mixed realityComputer animation
06:37
Zoom lensOpen setData modelData modelConsistencySimilarity (geometry)Level of measurementOpen setLevel (video gaming)Zoom lensArchaeological field surveyComputer animationLecture/Conference
07:06
Boss CorporationGeometrySource codeData storage deviceSource codeData storage deviceDatabaseLevel (video gaming)Endliche ModelltheorieComputer clusterQuicksort
08:01
Vector spaceCache (computing)Data storage deviceVector spaceProjective planeTessellationCommunications protocolTesselationObject (grammar)Level (video gaming)Moment (mathematics)Sound effectMultiplication signGroup actionSpacetimeDigitizingComputer animation
09:32
Vector spaceCache (computing)Proxy serverGeometryRaster graphicsProxy serverTesselationInternet service providerMappingProcess (computing)DialectLevel (video gaming)Configuration spaceLocal ringCache (computing)Zoom lensLecture/ConferenceJSONXML
10:34
Decision tree learningGeometryTesselationMiniDiscLink (knot theory)Hash functionGeometryComputer animation
11:25
Cache (computing)Concurrency (computer science)Dimensional analysisMessage passingBoss CorporationTessellation
11:45
Content delivery networkComputer networkProxy serverServer (computing)Web 2.0InternetworkingContent delivery networkBitWeb browserTessellationComputer animation
12:11
EmailCache (computing)Domain nameContent delivery networkControl flowZugriffskontrolleCodierung <Programmierung>TessellationVector spaceWeb browserGoodness of fitServer (computing)Codierung <Programmierung>EmailCache (computing)FrustrationDialectComputer animation
13:42
Cache (computing)FlagRight angleDialectBitTessellationZoom lensLevel (video gaming)Different (Kate Ryan album)Set (mathematics)Dot productProcess (computing)Computer animation
14:45
Internet forumCore dumpUniform resource locatorAuthorizationFile formatCore dumpUniform resource locatorProjective planeRule of inferenceSource codeMultilaterationXMLUMLComputer animation
15:24
Revision controlBoss CorporationGeometryData modelContent delivery networkSource codeAlpha (investment)Sinc functionData modelComputer programmingCore dumpCache (computing)Self-organizationArchaeological field surveyMetreEndliche ModelltheorieFunction (mathematics)Level (video gaming)Computer animation
17:00
Data modelSource codeAlpha (investment)Content delivery networkArchaeological field surveyOpen setObject (grammar)TesselationLevel (video gaming)CodeContent delivery networkProjective planeWeightTessellationAuthorizationComputer animation
18:15
MultiplicationLimit (category theory)AdditionLecture/Conference
18:42
MappingContext awarenessWeb browserLibrary (computing)GeometryContent (media)Order (biology)Web 2.0TessellationComputer fileInformationTesselationGroup actionHeat transferNumberPresentation of a groupServer (computing)Connected spaceLevel (video gaming)Client (computing)System callProcess (computing)Multiplication signLecture/Conference
Transcript: English(auto-generated)
00:07
Anyway, are we okay on the technical side? Okay. Welcome everybody. How many of you were at the random hot air balloon talk?
00:22
Huh, that's reasonable. This is the technology behind it. If you weren't, look it up on the live stream or the video stream. That was a really good talk. So my name is Steve Rottens, and I work at WebMapper.
00:41
We are four people, we are based in Utrecht, the Netherlands, and we interact, we work at the intersection of data design and technology, and being a geo conference it's obviously geo data. So a little bit of background on why we started this whole building set for the entire Netherlands.
01:09
So the Netherlands is known for its maps. We have lots of beautiful maps, like any country probably you have the one on 25,000
01:22
maps and 50,000 maps and everything, and it's all open data nowadays. So we have this nice government data set of 1,500,000, 250,000, 150,000, so all
01:43
this data is there and then you suddenly, so this is all nice and the same, and then suddenly you go to the large scale topography, which always messes me up, because it's called large scale but you zoom in, so it's a bit weird for me, but anyway.
02:02
That's completely different from the other ones before, and it's lots of data, and this is the actual data it produces, or the government gives you, it's weird data, and we don't really care about the data, we want beautiful maps.
02:24
So this is the same data as the other map, this one we created for the municipality of Amsterdam, and this one works, it's like you can zoom in, it's consistent in styling and everything, this is what we want.
02:40
So my colleague Nino, who did the awesome talk on random hot air balloons, wrote the style for this. This is one of three map files to do this style. No, we're not there yet.
03:01
So you can see why we're not entirely happy with this government open data, yeah, we're there, and there are two more, trust me, they are the same size. So that's why we created Carticle, we want it to be fast, fast is good, simple, so
03:21
you can spend more time on designing and less time on writing insane map files, accurate, so we could actually use it for, amongst others, government people, and consistent. So the consistent thing is the trick, and Nino asked the company, I want a six colour
03:47
map. So with the six colour map she meant, we have six types of data, we give them one colour each, and then it should be a map, and it should work all over.
04:02
So this is a six colour map, you have a colour water, you have a colour natural, or whatever you call it, rural or agricultural, urban, roads and labels. So that's what our goal was, and to do that you need consistency, and with consistency
04:23
I mean if you have, so this is natural area, we have bare and we have high, high is forest, it's not high as in smoking, we're Dutch but not that high. And it should work on all zoom levels, it should be the same, and this is the big trick,
04:46
it should work across all your source data. We actually have eleven layers, instead of six that's a slightly technical reason, doesn't really matter, but so vertically are the eleven layers we have, and then
05:04
over here are the eleven source sets we have, and we basically have tiles from five to sixteen. So you can see it's kind of complex to make it slightly easier, so the build-up
05:21
layer consists of eight different sources, and they all have a different data model and everything. And like I said, we wanted to create beautiful maps, and not just beautiful maps, lots of beautiful maps.
05:41
So we figured that the thing we will do most is rendering maps, so rendering should be easy and fast. So we started thinking, so what's next? Generating tiles, well, we'll be generating tiles but we'll be doing that a lot less than rendering maps, and we have to import all this open data, and we will do that even less.
06:06
So we'll put all the hard stuff in the importing data bit, so everything else becomes easier. So you need a simple data model. This is ours, it's on GitHub, and it's hierarchical, so over there there's natural,
06:26
and then you have high, low, and bare, but then you can go further, like what kind of forest you have, so you have deciduous, or mixed forest, or whatever. Ordnance Survey had a similar idea.
06:41
I was this morning at the Open Map Tyler talk, and they said they also have their own data model. I didn't dare to change my slides anymore. So this idea of creating a consistent data model across all your zoom levels is not a bad idea,
07:03
apparently, because other people are doing it as well, which means you need to map your source data to your data model. This is a lot of work, so this is for the topographic map, one to a million,
07:24
and so you have to do that for all your source data. And what you do is you just create shitloads of database tables, because storage is cheap, and it will help you later on. So SQL is your friend, you can do magic stuff in SQL, and you want to do this.
07:46
You want to get your data nice and sorted, and yep, we're there. It's cool, we have our data, we have our data model. This is all, well, not quite.
08:04
We still need to talk about generating, surfing, and updating our tiles. This is actually the floor of the palace in Amsterdam, and they have a really nice projection where you can see how the map evolves over time.
08:22
If you're in Amsterdam, you have to go to that palace. So you need to pick a vector tile engine, there are lots of those. We had a few very specific requirements. We have all our data in PostGIS, so we needed the PostGIS backend, which means tip canoe was out, and always have two vector tiles was out.
08:47
And we wanted fast and simple, so we put everything in object store. I call it S3, it's actually Digital Ocean Spaces, but that's too long. So S3, it's the same protocol.
09:02
Which meant GeoSurfer, T-Rex, and TileLifer out. Permin will talk about T-Rex, and he said to me that he might be inclined to do S3 backend, so maybe we switch to T-Rex.
09:22
T-Rex is awesome, because it does custom tile projections. So T-Cola was our friend, or is our friend at the moment, because it does those two things, and it does a couple of other things which are nice. It's very much inspired by MapProxy, the raster tile proxy.
09:44
So it has this concept of providers, maps, and caches, so you can say, my data is here, that's your provider. My data goes into this layer, and the layer is natural,
10:02
but depending on the zoom level, we load in different data. And then you can have a cache where you store it. So this is S3, you can also do MBTiles, or local file, or whatever.
10:22
So it's nice and easy to understand the configuration. That is very pretty. One thing which you should have done, and if you didn't do it, then you should go back to your SQL.
10:40
You want to sort your data, because it's only used to generate tiles, so make sure that your data is optimized on disk. I didn't invent it, it's Paul Norman. He wrote a really nice article, the blue thing is an actual link. If you click on it, you get the entire story.
11:02
But basically you order all your data in a geohash, and then the cluster thing means that we're going to move all the data on the disk in such a way that everything which is close to each other in the tile will be close on disk, which makes it faster.
11:27
So, like I said, Tkola is nice. With it, we have our tiles. We see them to S3, and we're done. The dimension passed.
11:42
We wanted to have it fast, and we just moved them to S3 details, but S3 is really slow. So, it's cheap, which is good, but it's slow. That's why the internet has invented content delivery networks.
12:07
It enables web speed. There are a couple of things you need to do. You need to do gzip encoding of your tiles, so that is native Tkola. Don't worry about that.
12:20
You want to set your cache headers, because unless your tiles are changing, very often you just want to tell the browser, keep them for a week, it's fine. So, you can configure that in Tkola, and when I was writing this presentation, I realized it's not really documented anywhere.
12:40
So, here you have it. This is the thing you need to do. This is actually a day, I think, or a week. Who's good at math? So, it's a second. It's a week, right? Yeah, it's a week. The other thing which is, for some reason, browsers still think you can only do
13:00
four requests at one server. So, if you want to do multiple tiles, which you do, you just do multiple subdomains, that's probably something you have to configure in the CDN. And if you do it all correctly, you get this kind of headers. It says it's in the Mapbox vector tile, it's gzipped.
13:22
Oh, of course, the cross-origin resource sharing, make sure you have that correct. And these tiles, this is a fresh browser, it's 50 milliseconds getting your tiles. So, that's what you want. Then the tiles are there quickly,
13:40
and then the browser can do its thing. 318% complete, wow. So, let's talk updates. Now, we have updates. Just add the flag over, and then you have updates. It's so nice.
14:02
Well, we didn't update our data. So, you can also say, I only want to change these tiles, or these set of tiles, for these zoom levels. So, if you know which data is updated, you can then tell Ticola to change
14:22
to update those specific tiles. However, we have lots of data, like I mentioned. And so, we have 11 different data sets, and some of those look like this. So, updating is hell.
14:44
So, we have a solution. We cheat. So, we're not actually using the official open data from the government in the weird format. We have NL extract, which one of the authors is sitting here.
15:04
It's government data. It's SPCS dumps, and other formats, but we use the Postgres dumps. It has stable URLs, which are still regularly updated. And that is nice. Oh, and the whole project is open source,
15:21
which is really good. And stable URLs are nice, because now we can automate shit. Automation is nice. So, since it's a stable URL, it's just like latest, you just go there every day, you check is it newer than what I have?
15:42
No, then you do nothing, and if it is, then you run your scripts, you run your SQL, that's why you want to do everything in SQL. So, we run our SQL, and everything is updated again. So, it's very, you check, is there anything new?
16:01
You import it, you map it. Since there are Postgres dumps, you just load the dump in your database, and then, once that's done, you copy the data from the government data to your data model. You do seed overwrite,
16:20
and this is a thing, because, well, there are two things hard in programming. One is naming, and the other one is caching. We want to cache everything, but if you have updates, it's like, what do I do? You can bust a CDN,
16:42
and then you say, remove everything from the cache. It's not, you can do it. Oh, I'm way too fast. So, to wrap up, it is key to create a sensible data model. We have one, Ordnance Survey have one,
17:03
so we don't have contour lines in our data model. I checked, the Ordnance Survey have like, for every 50 meters, which would mean for the Netherlands, there are seven. So, that's like, meh. Map your source data. Do all the data wrangling in
17:22
Postgres, because, like I said, you import your data not as often as you render, or generate tiles. So, we use Tejola. There are a couple of other sensible tile engines. Put them in an object store,
17:40
and surf them through a CDN. And the bonus one is, get your own, so that is whatever country you are, the two characters. ISO codes are really weird. Anyways, so, enel extract, or if you are in Romania, roextract. Create such a project to make every
18:01
country fellow happy, because I don't know about your government open data, but it's probably as messy as ours. And this is by far the most useful project we have in the Netherlands. So, that's it. Any questions? Questions?
18:39
Hi. In addition to the
18:41
supporting multiple subdomains to get around the browser limitations, have you looked at the benefits that HTTP2 gives you? Not yet. I'm aware it's there. Not entirely sure how well it's supported in mapping libraries. Does that matter?
19:01
From my experience, all modern browsers support them, and most server technology supports it, and if you're on, it's usually only supported on HTTPS, but when, if you've got it, it works really, really well on lots of small content like tiles. So, it's worth looking at. Thank you. That's interesting.
19:25
Thanks. Very interesting. I'm currently creating tiles with geo web cache, and in order to achieve all the information in one single file, I create group layers. Is there any way to do the same here?
19:42
Like, to achieve better performance, not to transfer multiple files to the client? You mean, so, our 11 layers... So, each level will be one file transferred. So, we have, all our layers are in one tile. I should have mentioned
20:01
that you want to have all the layers in one tile. You don't want to have multiple tiles. No, no, no. That's what the Tcla does. And T-Rex as well, I guess. More questions?
20:21
Are you raising your hand because, yeah, okay. Hi. Awesome presentation that's just, like, ballin'. So, are you using just one Tcla server? In my experience, like,
20:40
one of the bottlenecks is Postgres. Do you just, like, ramp up the number of connections it can make? Or how do you process the tiles quickly? We use only one Postgres server. Since the data is well-sorted on disk,
21:00
the bottleneck is not Postgres anymore. It's actually the CPU generating the tiles. So, yeah. And it's Netherlands. We're not a big country. It takes a couple of hours to do all the tiles of all the countries. So, which is fine.
21:24
Any additional questions? Thank you very much, Steven.