Using COMTiles to reduce the hosting costs of large map tilesets in the cloud
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/69085 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Firenze 2022159 / 351
1
7
13
22
25
31
33
36
39
41
43
44
46
52
53
55
58
59
60
76
80
93
98
104
108
127
128
133
135
141
142
143
150
151
168
173
176
178
190
196
200
201
202
204
211
219
225
226
236
242
251
258
263
270
284
285
292
00:00
TessellationProjective planeScaling (geometry)Point cloudDampingFile archiverIntegrated development environmentLimit (category theory)Conformal mapFile system
00:57
Point cloudFocus (optics)System programmingPhysical systemUniverse (mathematics)SoftwareVector spaceElectronic data processingFocus (optics)XML
01:26
Server (computing)Mean value theoremPlane (geometry)Client (computing)Web browserScaling (geometry)Level (video gaming)MappingPoint cloudClient (computing)TessellationServer (computing)Virtual machineZoom lensComputer animation
02:38
Server (computing)Mean value theoremPlane (geometry)Client (computing)Web browserPoint cloudFront and back endsComputer configurationDatabaseData storage deviceVector spaceScale (map)Raster graphicsTesselationPoint cloudTessellationComputer configurationData storage device2 (number)Server (computing)Computer architectureDatabaseNear-ringInfinityObject (grammar)Fluid staticsScaling (geometry)Functional (mathematics)Level (video gaming)Integrated development environmentCloud computingProjective planeFlow separationMappingLibrary (computing)Point (geometry)Overlay-NetzNumbering schemeZoom lensFile systemVector spaceComputer animation
07:19
Point cloudArray data structurePoint (geometry)Raster graphicsSatellitePoint cloudServer (computing)Computer fileInstance (computer science)Web browserFeedbackRange (statistics)Category of beingMappingCurvatureFront and back endsProcess (computing)Multiplication signMetadataAreaComa BerenicesSubject indexingDatabaseFile archiverLevel (video gaming)TesselationTessellationVisualization (computer graphics)Vector spaceFile formatElectric generatorSatellitePoint (geometry)Different (Kate Ryan album)Raster graphics
10:15
Dependent and independent variablesContent (media)Range (statistics)Object (grammar)Reading (process)Partial derivativeRange (statistics)File archiverTesselationPoint cloudData storage deviceContent (media)Object (grammar)TessellationComputer filePartial derivativeMetadataInternet service providerComputer animation
11:50
Focus (optics)Range (statistics)Raster graphicsDatabaseServer (computing)Point cloudFront and back endsScale (map)Vector spaceData storage deviceTesselationServer (computing)Coma BerenicesYouTubeTessellationLevel (video gaming)Point cloudDatabaseWeb browserScaling (geometry)Vector spaceComputer animation
12:31
Query languageEmailObject (grammar)SequenceRange (statistics)Coma BerenicesMappingLevel (video gaming)TesselationSubject indexingMetadataScaling (geometry)Structural loadRandom accessStapeldateiConnected spaceAverageInternetworkingSpacetimeCurveTessellationRow (database)Zoom lensNumberDifferent (Kate Ryan album)File archiverPatch (Unix)Sequence diagramHeat transferAddress spaceEmailBoundary value problemMultiplication signContent delivery networkExtension (kinesiology)Bound stateData storage deviceObject (grammar)Latent heatClient (computing)Computer animation
17:13
Regular graphAerodynamicsData compressionServer (computing)Price indexData storage deviceWeb browserContent (media)Range (statistics)Codierung <Programmierung>EmailReading (process)Content delivery networkPoint cloudLambda calculusInternet service providerTesselationPoint cloudSubject indexingPoint (geometry)Data compressionContent delivery networkComputer configurationTessellationObject (grammar)Range (statistics)Codierung <Programmierung>Projective planeData storage deviceVector spaceLevel (video gaming)EmailFile archiverContent (media)Complex (psychology)Library (computing)Streaming mediaMultiplicationRandom accessImage resolutionLimit (category theory)StapeldateiHeat transferCloud computingAddress spaceGeometryNumberCategory of beingOctreeBuildingExtension (kinesiology)Data structureMoment (mathematics)Endliche ModelltheorieHypothesisRevision controlWritingStudent's t-testDifferent (Kate Ryan album)Coma BerenicesComputer animation
25:24
Computer animation
Transcript: English(auto-generated)
00:00
Okay, so, um, hello everybody and thank you for having me. Today I want to talk about a project called ComTiles, um, which I worked on during the last year and I successfully used in some projects with significantly reducing the hosting cost on global scale tilesets. So the main problem which ComTiles tries to solve is, um, the limitation which, um, the
00:26
most widely used tile archive formats or tile container formats like Q package and especially MPTiles have, though they are designed only with a POSIX conform file system access in mind and they are not designed for the usage within a cloud native
00:41
environment. So ComTiles tries to overcome these limitations, um, by being a cloud native, cloud optimized format from scratch for large or global scale tilesets. Okay, so, um, but before we start, let you, let me introduce myself.
01:03
My name is Markus Dremel and I work as a geospatial software architect at a company called Rode and Schwarz in Germany. And I also work as a lecturer at the Dettendorf Institute of Technology, which is a university of applied sciences, um, in Germany. Most focus is on teaching vector data processing on spatial systems.
01:29
Okay, um, let me start by giving you a quick overview of how, um, a conventional way of deploying global scale or map tilesets, global scale tilesets, um, on a
01:42
dedicated server or virtual machine. So without the usage of an, of a cloud environment, um, you have something like the following. Um, at the top, um, so we're talking about now a planet, um, yeah, a map expected tile planet, and you want to deploy this planet to a
02:02
sleepy maps client like MapLibre, OpenLAS, or cesium. So the first step is because this, um, you, yeah, you download, uh, the planet, uh, from a mirror, then you have about 65 gigabyte of data, but this planet is unstructured. There are no zoom levels or,
02:21
or overviews and no, in general, no, uh, no cateographic generalization. So you have to transfer, um, this dataset into an, yeah, into map or expected tiles, for example, and you can use tools like map tiles, tools, or planetiler. And then you have an, oh, sorry, um,
02:41
MBTiles file which is just simply an SQLite database with a special scheme. And, um, we having, if you converted that, you have around about 360 million tiles for a planet scale tile set. So we're talking about zoom level zero to 14 for the vector tiles and about
03:01
90 gigabyte in size. And to, um, make this, um, SQLite, so this MBTiles database accessible to a sleepy maps client, you have to translate, um, the HTTP XYZ queries into, um, yeah, SQL queries, for example, via tile server. For example,
03:21
you could use tile server GL. And you also want to do this, uh, for example, in our projects, um, we have done this for a lot of other tile sets. For example, you want to overlay the base map with contour lines or you want to display shadings. Well, also you want to use the new, uh,
03:41
map library feature where you can extrude terroths. So displaying 2.5D terroth data and the base of the base of this is, for example, Terroth DM tile sets. Or if you're using, uh, the state-of-the-art 3D maps renderer CSUM, then you want to use for the terroth quantized tiles and for, um,
04:02
displaying 3D, uh, buildings, um, the 3D tiles format. And if you're aggregating all this together, you have around about over a billion tiles and terabytes of data. So, and we also faced this challenge and we had, um,
04:20
some requests to bring that into the cloud. So this architecture, which you're seeing here, um, because it should be more scalable, more cost efficiency, for example. But the main problem which we phrased is that there's in general in the cloud, you have a separation between, um, the storage and the compute, which means you have your tile server is running something like a, uh,
04:42
functions and service environment like AWS, um, Lambda for example, or Azure functions or in a Docker container. But all of these, uh, services are ephemeral, which means you don't have direct access to the file system. And for the storage of the tiles or in general for static assets,
05:02
um, in the cloud, you're using an object storage. So that's the way for deploying, uh, cheap, uh, or it's a way to deploy tiles in a cheap way. And it's also very scalable. So it scales near infinitely. But the point is you have now not further, uh,
05:21
POSIX compliant, um, access when you're talking about UNIX distributions, you now have, uh, an HTTP interface, which, um, these, um, object storage is offering. So this architecture is not working anymore.
05:40
And we also phrased this problems in a lot of projects and there are, without the use of a cloud optimized formats, you have probably two major option. So the first option is you could directly host the map tiles on that object storage. But, um, the problem is that you have to load up hundreds of million of tiles. So, and you have to pay every single, uh,
06:04
request because there's a put request, um, which you have to pay from for the cloud provider. And if you have, um, this aggregated tiles, for example, 3D tiles, and, uh, so then you have even billions of tiles and over, yeah, terabytes of data, which have, you have to upload.
06:24
And the second option is, which was more favorite option from us is that you use a dedicated database where you are importing the tiles from the MPTiles database, for example, in an Amazon RDS database. But the problem is if you, then you have to, um, yeah, it's, well,
06:44
it's very expensive. We have calculated this and we were really surprised how, yeah, how expensive that was because you have to pay the database and yeah, the, the business, the container or the, yeah, the service where the tile server is running.
07:01
So we, we stopped then because it was, like I said, too expensive. We, um, and then we decided we need something like an MBTiles or Chio package for the cloud, which is more cost efficient and which is easier to handle in the cloud, especially for deployment. Um,
07:22
so I know maybe you heard something about these new categories of cloud optimized formats. Um, there was a lot of talks in different categories of, um, because there's a wide range of, uh, cloud native formats. So for example, there is, uh, something like the mother of all cloud optimized formats,
07:44
which is, uh, the COG. So the cloud optimized Chio TIFF with, which is optimized for, yeah, or which is for satellite or raster data. And, um, yeah, which also was the inspiration for building such a cloud optimized archive for tiles. Then you have something like, uh, for point clouds,
08:02
which is a cloud optimized point cloud, a file format. Then you have the newer generation of vector formats, which is for example, flat Chio path or Chio Parkey. And then you have the tile based cloud optimized format. On the one hand, you have something for N or multi-dimensional areas like SAR or tileDB,
08:24
which are not Chio specific, but I thought we thought also about extending that and bringing the Chio into that. But we ended up building our own solution. Um, so our own tile archive solution, um, with com tiles. So there's also another solution maybe you heard of,
08:42
which is a very widely used also, which is, um, PM tiles. So at the time we are developed that we were not aware of that file format and we, um, did some, yeah, some other approach, uh, especially in designing the index and regarding, regarding the metadata.
09:03
So, and some distinction between the tile archive and the vector data, because now you could think, yeah, why, why not using such a new vector data cloud or a cloud optimized vector data format like flat Chio path or Chio Parkey. So they, in my opinion, they are mainly optimized for the usage of your analytical workflows and
09:26
processing. But, um, the tile archives are really focused on visualization. So bringing that, um, sleeping maps user experience to the cloud, because in my opinion, such an tile archive only have has a chance for a wider adoption if it
09:43
offers the same user experience which you have when you're using an entire backend with a database and a server. And this, this was also one of my main concerns. If this is possible to achieve this sleeping maps user experience, which means when you zoom in the map or you pan in an, in a global base map, in a browser based map, um,
10:04
that you get instance feedback. So with no delay and then very fluent workflow. And this was also one of the main challenges to achieve such an user experience. So, um, some,
10:21
just a short recap over the principles of cloud native maybe, or you probably heard something about in the conference. So the basic of all these formats is you are deploying it on an object storage like AWS S3, Azure cloud, uh, Azure prop storage or a Google cloud storage for example. Um,
10:41
because it's like I said, it's very scalable and it's very cost efficient, very cheap to store a large amounts of data. So storing about a hundred gigabytes costs between one and five euro depending on, on the provider. And then the base principle of this is you can read port portions,
11:00
so chunks of a file via HTTP range request. So if you have the whole file like on the right here, which about 90 gigabyte in size, you can now only query the specific tile which you're interested within that archive. We are HTTP range request.
11:21
So which is a partial content tour six and to, to know where these tiles are located within this archive and how large, so the size of this tiles, you need something like the spatial index. And then, um, also a very important concept. You have only one file, but they are also,
11:43
the metadata are contained for describing the tiles. That's what it has it. So, okay. What's com tiles in, um, in general. So you, it's based like a set on the ideas of cloud optimized YouTube and extended
12:00
for the usage of roster, but especially in particular for vector map tiles. It's, um, it's streamable and especially read optimized for hosting map tiles. It's global scale and you can directly access these map tiles from the browser. So you don't need an, an tile back end with a database or server.
12:22
You just store them on an, on an S3 for example, and you can directly query that from, from the browser. So I want to give you a short overview of how such basic concepts. So how com tiles are basically working with a short sequence diagram.
12:42
So on the left you have your Slippery Maps map client and on the right, for example, map rebrand. On the right, you have an object storage or an CDN. So the first step is you query the header and the metadata, which are describing the tileset and the metadata are based on the two
13:01
dimensional tilemetric set and tileset metadata specification from the OGC. So yeah, they are describing the tileset and the tilemetric set. So the extension of the tileset and this also the basic concepts or the base
13:22
for having random access to the index because I, we know that the extension of or the bounds of the tileset and we can calculate portions of the index based on the boundaries of the tileset.
13:41
Then the next step is, and there I put on the most, most of the time and tried different approaches is how to design the index to be very efficient. So the goal is to have as a few requests as possible to, like I said, to have this map, Slippery Maps user experience. So
14:05
what then happened is I choose the compressed index pyramid, which has about 20 kilobytes in size. And with that index pyramid, you're getting all tiles for zoom level zero to seven, which are around about 21,000 index records,
14:25
which then you can access the real, the actual tiles. So with that you have a basic overview over the frequently accessed tiles. So, and then you have all,
14:42
all the addresses and all the sizes of the tiles for zoom level zero to seven. And now you can access the actual tiles. Yeah, we are X, Y, Z range requests and a feature which I have worked on and which really improved performance on HTTP one requests.
15:03
And also in particular reduce the transfer cost is the tile batching because you now can patch together different, different tile requests. For example, if you have an HD monitor with in full screen, you have around about 15 tiles and because these tiles are ordered within the
15:22
archive on a space filling curve, you can reduce the requests to two or three, yeah, two or three tile requests, two or three range requests, which can reduce the number of tile requests up to 19%. And the next step is then if you go zoom deeper into the map and you are,
15:46
go beyond, for example, if you're talking about blended scale tiles set beyond zoom level seven, so eight to 14, I, we are using a concept called index fragments. So which is I also used in the naming is used I think in tileDB.
16:05
So a fragment is just an aggregation of index records together. So which are per default, 4,096 index records, which are a total size of 12 kilobytes.
16:21
And these are also ordered on a space filling curve. And if you are deploying that on a CDN, we are talking about maybe on an average internet connection to 20 milliseconds, which this prefetch costs. So it can be even faster. So it's not not really noticeable for the user regarding the user experience.
16:48
And the next step then is to, you now then can query, you can go on with that. You can query now the map tiles again and also do it in a batch, in a batch request. So batch together the requests.
17:03
And then again, you can load the index fragment on a, on a, so lazy load the index fragments and yeah, and go up to zoom level 14 that way. So these, um, my concerns, um,
17:20
that this will not have, or when I started the project, like I said at the beginning, I had some concerns if it's too much latency for that prefetch request, but it turned out it's, uh, it works really nice. And like I said, we have a lot of projects which are fine with that,
17:41
but you have that additional fetch for portions of the index. So this may be 20 milliseconds or something if you're on an, on an deploying on CDN. So one option is to use an serverless tile server, for example, hosted on an AWS Lambda or um, cloud on cloud workers to eliminate that additional index request.
18:03
Then another challenge is, um, that storing large tilesets, like I said in the beginning and an object storage is very cheap, but, um, the transfer costs and the egress costs can get very expensive because you can't limit on an object storage to only rage request. Um, the user can always download the full, the full data set.
18:24
That's some kind of pro, uh, problem, but there are the cloud providers are now supporting requester pays, which the user then have to pay for the egress costs. So for the transfer costs. So you not have done any more that anonymous access, which can be a problem, but yes, you,
18:43
you are not running out of costs. Um, then a point is sadly, multi-party multi-party range requests are not supported by cloud providers on object storage, but they are supported on a CDN. So that would also, if you're using a CDN,
19:02
you can also further batch together that requests. Um, another point is which are CDN is solving that most cloud providers doesn't support HTTP two for object storage, but for an CDN. And this is also very important features for tiles because on an HTTP one,
19:23
uh, or for example, if HTTP one endpoints, we are an object storage. You can on a Chrome, just have six concurrent requests to one origin. So you have to download the first six tiles and the next six. And then for example, the left,
19:41
the last three tiles and this overcomes HTTP two with the multiple indexing feature, which is a very important feature. So, and then another problem can be that compression is not supported via the content encoding header in HTTP range request. It's only supported for, yeah,
20:03
you have to bake it into your tile archive. And there would be some work around if you're using a compression streams API, but this API is not yet supported on all browsers. So you have to ship your own decoding libraries for when you're using vector tiles. Uh,
20:22
another feature which hopefully will be supporting the future, but currently not on CDNs is the feature of dynamic compression. So that it is supported on a CDN for HTTP requests, but not for range requests so that you can tell the object storage to
20:41
compress, to compress in different formats like Brotli or GZIP or deflate. So this would be all also very nice for, um, um, yeah, for com tiles because then it could be dynamically compressed the index reference. And one point is also,
21:02
but that's all mostly for all cloud native formats. They are really optimized for, for reading, not for writing. So updates are some kind of expensive because you mostly replace the whole data set. Okay. Um, so the, the next steps, um,
21:23
which we are working on. So first we want to deploy, I have an student which is working on his final thesis on bringing that to a final one point zero version. At the moment we have some release candidates and also working on, uh,
21:41
integrating into, for example, planet Tyler and currently only we have libraries for map Libre, but not for, uh, leaflet for example. So he's working on further supporting different, uh, yeah, different libraries. But some stuff I am want to work on or still started to work on is support
22:04
of 3D tiles because this is a topic which I worked together with a colleague. I think maybe four to five years ago we have written our own 3D, probably one of the first, uh, first 3D tiles converter, which convert buildings to, um, yeah,
22:24
OSM buildings or, or different height, high resolution buildings to, to 3D tiles. But, um, before the 3D tiles next spec, it was not possible, um, to have random access to such a 3D tiles data set.
22:42
But now there's a new feature which is called implicit telling, which is an extension on 3D tiles. And now I'm through tiles has support for fixed spatial data structures. For example, fuck watery for 2D or an octree for 3D, which allows random access. And we try to combine now this together with com tiles. Um, yeah,
23:01
to to also host what deploy 3D tiles in a cloud, on a cloud object storage. Um, then, and that's the point, um, which I, we are working on is, um,
23:20
so vector tiles are really Mapbox vector tiles are really optimized. Um, they are using dictionary encoding for the text. They are using for geometries like data encoding, zigzag and variant because of, uh, uh, protobuf they are really optimized. But I think if you're bringing concept from this new cloud native format,
23:41
like parquet, for example, um, like the column orientated column oriented approach, um, for the properties of the feature. So especially for the geometries, um, I think you can have quite a better compression, especially if you have a lot of points. So if you're going beyond some level 14 and all,
24:02
and on some level 14 tiles, you will have a lot of advantage. I'm late. Okay. Yeah. Just two questions. Okay. Yeah. Uh, last two points. So, okay, just quick. Um, the next point is want to reach faster decoding, um,
24:24
like zero encoding inspired, like I said, by GLTF, but it's, this is some kind of hard, it's not like in 3D tiles if you have complex 3D models because you have something, uh, which is called layouting. You can, depending on the style, um, the tessellated geometries can change.
24:41
And the last step which we are working on is that, um, you can make also, um, the tiles queryable. Um, so we are an additional secondary index and this can reduce the number of features depending on the style up to 50% because if you have a light or a dark scene, you mostly don't need, for example,
25:03
poise or, or addresses. So you can just count them up, cut them out because they are indexed and you compress them and then you can use that multi range requests on the CDN to batch them together again. And I think this can have also a big, uh, yeah,
25:21
influence of the size of the tiles. So yeah, thank you. Uh, and if you have any questions, just go on.