We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Web mapping at any scale: Bite-size, full-stack cartography with Protomaps + PMTiles open source tools

00:00

Formal Metadata

Title
Web mapping at any scale: Bite-size, full-stack cartography with Protomaps + PMTiles open source tools
Title of Series
Number of Parts
351
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
Protomaps is a new, open source set of tools for vector cartography on the web. It’s designed to enable projects of any scale - from hobby projects of a neighborhood, to dense datasets covering the entire planet. It finally makes it simple to both host tiles and render them using web standards, and accomplishes it in the most affordable way possible. This talk will be an overview of the entire mapping stack, driven by an ethos of simplicity. Component projects include: - The OSM Express database for syncing and querying fresh OpenStreetMap data - The PMTiles cloud-optimized archive format for serverless hosting on platforms like S3 - The Protomaps JS renderer for custom cartography on the web using Canvas 2D - The relationship to complementary projects like GDAL, Leaflet, MapLibre, Tippecanoe and FlatGeobuf I’ll also describe successes and failures in adoption among users over the past two years, as well as future development plans.
Keywords
202
Thumbnail
1:16:05
226
242
Scale (map)Independence (probability theory)Coma BerenicesNetwork topologyComputer clusterPhysical systemOpen sourceVector spaceComputer fileBuildingService (economics)Fraction (mathematics)Office suiteFluid staticsStandard deviationMethodenbankSystem programmingCore dumpHypothesisPersonal digital assistantPhysical systemTesselationProjective planeQuicksortMappingVector spaceCartesian coordinate systemWeb 2.0Server (computing)DatabaseStack (abstract data type)Neighbourhood (graph theory)AeroelasticityCore dumpEuclidean vectorCASE <Informatik>Business modelSystem programmingData structureDivisorOpen sourceLevel (video gaming)Computer fileBus (computing)Inclusion mapFile formatEntire functionThread (computing)Multiplication signMathematical optimizationOpen setScaling (geometry)Fluid staticsIndependence (probability theory)Confidence intervalSoftware developerStandard deviationComputer animation
Open setTessellationFile formatOpen sourcePhysical systemHypothesisProjective planeCore dumpQuicksortOpen setExistenceComputer animation
HypothesisSoftware developerVector spaceWide area networkTime seriesWeb browserVolumenvisualisierungTesselationCASE <Informatik>QuicksortVector spaceZoom lensLevel (video gaming)Library (computing)Multiplication signMappingWeb 2.0Web browserFormal languageClient (computing)Computer fontShape (magazine)Different (Kate Ryan album)Local ringPlug-in (computing)Fraction (mathematics)ReliefComputer animation
Service (economics)Vector spaceQuicksortSatelliteFormal languageTessellationVolumenvisualisierungLocal ringComputer animation
RotationVector spaceOpen sourceFocus (optics)Personal digital assistantHuman migrationHypothesisLatent heatDatabaseQuery languageSubject indexingExpressionFile formatTable (information)Function (mathematics)Server (computing)Table (information)Open sourceVector spaceProjective planeReal-time operating systemOpen setFraction (mathematics)Focus (optics)Multiplication signRotationGame theoryExpected valueCuboidAreaQuicksortWeb serviceSelf-organizationCurvatureCore dumpFile formatBlock (periodic table)CASE <Informatik>DatabaseRaw image formatRouter (computing)Data storage deviceRelational databaseCellular automatonSelectivity (electronic)Limit (category theory)Function (mathematics)Subject indexingLevel (video gaming)BuildingVolumenvisualisierungPlug-in (computing)2 (number)TesselationHypothesisCartesian coordinate systemZoom lensLibrary (computing)MappingDifferent (Kate Ryan album)Formal languageGoodness of fitFreewareHeegaard splittingComputer animation
HypothesisFile formatPoint cloudMetadataNeighbourhood (graph theory)TesselationEntire functionDatabaseScaling (geometry)Subject indexingRevision controlTessellationDirection (geometry)Directory serviceDatabase transactionFile archiverWritingRange (statistics)File formatPlug-in (computing)Electric generatorPublic key certificateSingle-precision floating-point formatMultiplication signQuicksortLastteilungVolumenvisualisierungWeb 2.0Level (video gaming)SpacetimeLibrary (computing)CASE <Informatik>Web browserProjective planeRead-only memoryComputer fileRight anglePoint cloudServer (computing)Computer animation
Length of stayVolumenvisualisierungPrototypeBuildingPersonal digital assistantFile formatPoint cloudProduct (business)LaptopComputer networkScale (map)Raster graphicsVector spaceAddress spaceSpacetimeTessellationPattern languageEntire functionServer (computing)Content (media)SpacetimeOverhead (computing)Data structureDirectory serviceVisualization (computer graphics)Data compressionInternet service providerData storage deviceScaling (geometry)TesselationCurveSubject indexingCASE <Informatik>Plug-in (computing)Formal languageWeb browserDefault (computer science)Order of magnitudeProjective planeIntegrated development environmentRevision controlMathematical optimizationQuicksortDifferent (Kate Ryan album)BlogDiagramSeries (mathematics)Order (biology)LengthLaptopPosition operatorGoodness of fitEndliche ModelltheorieInfinityComputer configurationGene clusterComputer animation
Data compressionDirectory serviceSoftware testingVector spaceTotal S.A.Codierung <Programmierung>Binary fileKey (cryptography)CurveDivisorPrinciple of localityPrice indexData conversionDirected setFunction (mathematics)outputCodierung <Programmierung>Visualization (computer graphics)LengthTesselationDirectory serviceData compressionFunction (mathematics)TessellationLocal ringCurveVector spaceVariable (mathematics)CodeKey (cryptography)Electric generatorFormal languageTotal S.A.Direction (geometry)Data conversionOverhead (computing)Revision controlFile viewerInteractive televisionRemote procedure callMiniDiscBinary codeSoftwareBefehlsprozessorQuicksortQuadrilateralVector graphicsMoving averageSoftware testingAreaCodeComputer animation
Local ringFile viewerAutomorphismExact sequenceComputer fileElectronic mailing listFile viewerUniform resource locatorDrag (physics)TessellationQuicksortSingle-precision floating-point formatAttribute grammarSource codeJSONComputer animation
Server (computing)SimulationMultiplicationPoint cloudLambda calculusContent delivery networkComputer programMusical ensembleServer (computing)TesselationProjective planeOpen sourceWeb 2.0Web browserWeb crawlerMereologyFunctional (mathematics)ImplementationMultiplication signComputer fileService (economics)Euclidean vectorDifferent (Kate Ryan album)Point cloudIntegrated development environmentDebuggerTessellationSelf-organizationCompilation albumComputer animationXML
Transcript: English(auto-generated)
Hey everyone, I'm Brandon. I'm here to talk about a project called Protomaps I've been working on for several years now. And the tagline is web mapping at any scale. So Protomaps is an indie cartographic stack. So what does that mean? I like to describe it, or someone described it to me as an end-to-end system for bundling and styling
the vector web maps using only static files. And this is sort of a system that I've been building based on about eight or nine years of building these cartographic projects. And it's kind of an umbrella project. So one principle is that it's like an open source buffet.
You might not be interested in using the entire end-to-end system, but there might be some component in it that is useful for your application. So some core values of this project that are more like technical design oriented is that firstly, it works the same at any scale.
That means that if you have a mapping project that is like your neighborhood on OpenStreetMap, or if you have a civic tech project that is like your city that you live in, or even the entire planet of data, the principles are the same. It's not like when you extend your application to work with the planet, you have to suddenly bring in
like an entire fleet of servers and databases. So the way that's accomplished is by favoring really simple technology. So static files wherever possible. Favor being modular via established standards, like GeoJSON or like vector tiles that are sort of the de facto standard. But also don't be afraid to try something really new.
And the way I think about trying something really new is by taking a systems programming approach, which means if you need to reinvent a database or reinvent a format to have sort of optimal data structures for the things you're trying to accomplish. Some core values that are not technical,
but more sort of social, like this project out there in the world, is as an indie projects, I can be very hypothesis driven. I don't have to have a defined use case in mind. I can be like, is this problem able to be solved in this way? And sort of develop an experiment.
Yeah, I mean, see how that technique works. And I think it's kind of like this faster horse problem we have in GIS especially where, like if you ask people what they want, they want a faster horse. If you ask people how to make their deployment simpler and they're using PostGIS, then it's, you know, put PostGIS in a Docker container and put it on Kubernetes
and have like 16 threads that are running all the time to make it work. And never about can we fundamentally re-architect things to be simpler. And the relationship to open source, you know, as an independent developer, it's really important that it is open source because otherwise people would not have confidence in it. If it's, you know,
if it's a bus factor of one, then they want to make sure that they have access to the improvements as the project goes on. You know, it's because the other technology might have, you know, a big company behind it. And open source is also key to building an audience inclusive of sort of hobbyist and civic minded use cases.
The kinds of applications that can't necessarily pay for Google Maps just because they have no budget, they're doing something really sort of socially useful, that things like geocoding and map tiles are really scarce because of a business model. So talk overview. I'm gonna talk about the three core open source projects
and sort of the hypothesis I had for why that project should exist. Talk about whether or not they succeeded or failed and those are all open source. So it's sort of an end-to-end system from OpenStreetMap, the open source dataset, all the way to a tile format
and finally a rendering system. So the first hypothesis, which if any of you do web mapping, is about Leaflet.js. It's a library everyone loves. It's been around for a long time. There are modern competitors to it that are fancier, but there's something about Leaflet that people really love.
It has great plugins, but it has never rendered vector tiles. So I decided to build a new canvas-based renderer for Leaflet that's really lightweight called Protomaps.js. And the important thing about this, it is non-real-time. It does not do fractional zoom. It's sort of the Leaflet experience. When you change zoom levels,
it jumps from one zoom level to the next. I think people generally like the Leaflet experience. It's good enough for a lot of use cases. One really important thing is it implements labeling. It's fairly decent. It supports things like compound labels with different languages. You can do localization on the client side. Supports actually many more languages than MapLibre
because it uses the browser APIs for font shaping. And it is a good enough renderer for a lot of base mapping use cases. One interesting thing is that the styling language is not like a sort of embedded in JSON language like in other renderers, but actually TypeScript. So it is extensible and also composable.
So just an example, you can also use it as just a Leaflet layer. So you might have like the EOX sort of cloudless on the satellite layer underneath it and just use this renderer to put vector labels or vector tile labels on top, possibly localized to a local language. It does label layout with no rotation.
So those are just access-aligned bounding boxes. What challenges or failures has the project gone through? One thing is I've spent a lot of time working on the engine and not a lot of time working on like themes or open source styles. There's this like game development clip,
which is like the best way to never write a game is to write a game engine. So kind of a similar thing where I think what it really needs is some open source styles that people can drop in immediately. In general, I think people do have high expectations from using Apple Maps, from using Google Maps around real-time rendering, fractional zoom, like a really smooth experience you can get
with things like the new open layers GL and also Map Libre GL or MapOps GL. And if you need map rotation, this obviously won't work because Leaflet just does not work that way. So what do I want to do next with it? Focus on differentiating use cases. It's sort of a lightweight renderer and like about 30 kilobytes.
Do interesting things with custom symbology, build more styles. Here I've sort of ported Damon's toner design to vector tiles in this sort of styling language. Also a really great drop-in if you already have an application that uses Leaflet and plugins that for whatever reason you don't want to migrate to a different renderer.
So that's that first project. Second sort of hypothesis for a downstream project is about OpenStreetMap. So OpenStreetMap is, as people know, is a pretty tough dataset and users, they want access to fresh datasets of on-demand areas. So they don't really want to deal with
like a national level thing. Maybe they're only interested in one county and they want to be able to select OpenStreetMap data for that county. So I built this web service called, it's called Protomaps downloads or Protomaps extracts and it's backed by an open source database that's called OSM Express. The way it works is you can choose an AOI
up to about a 100 million node limit and it's backed by this transactional embedded database that stores OSM nodes, ways, relations. It can be updated. It uses S2 cells if you're like a geek about spatial indexing.
So it's pretty efficient for weirdly shaped areas. There is a free on-demand UI where you can just go download this stuff. You can download an extract that is totally up to date by the minute. 15,000 times it's been used and it's also been used as a building block for other OSM infrastructure at other companies.
So what things have I learned about maybe problems with this project? Still we're in the case where it only outputs raw OSM data and it's still sort of exotic of a format and the audience for that is usually just OpenStreetMap devs. If you're feeding that into a Tyler or a router or a geocoder, it's pretty useful
but there is a wider audience for OSM than that. Updated OSM data I've also discovered is not actually a huge demand among a lot of organizations if they have a weekly data set. It's still pretty good enough. It's also a C++ project so there's packaging I'm trying to improve.
So what I'm working on next is to have tabular output out of this, probably in flat geobuff format where it can interact with GDAL. And also think about adding a way to automate this. So if you have, if you're an organization that needs a dump like once a day, then you can just access that area sort of on demand via API.
So the third project is called PM Tiles and this is something that I'll spend most of the time on and the sort of motivation for designing this was I found that storing and serving read-only web map tiles can be made much simpler than it is
if we sort of adopt these cloud-optimized principles into tiles. So one really popular solution in the space is called MB Tiles. It's based on SQLite which is an embedded database. It's quite simple but most people don't really write to it more than once. They write a bunch of tiles into it and then read-only from that database.
So it's kind of overkill for a lot of use cases. If you want to read, then you need to bring in the SQLite library which is a C dependency. It can have a lot of duplicate data which blokes the indexes. So the goal here was to build a sort of next generation of MB Tiles, hence the name,
and make it really compelling by making it a cloud-optimized format. So you might have seen a complex rendering stack that has GIS datasets imported into PostGIS, imported into tile servers, and then there's load balancing and SSL certificates and processes to monitor.
So anyways, let's get rid of all that. So PM Tiles, or some people say you could just have each one of your tiles individually on S3 in a Z, X, Y folder. It works pretty well for small tile sets but it's kind of like here drinking espresso
in single shots. If you want to wake up, you got to drink a bunch of them because every single time you update a tile on S3, you're paying some transaction cost for writing. It's going to take you weeks to upload a planet-scale dataset that has 300 million tiles. So PM Tiles is a cloud-optimized tile archive.
Doesn't matter what the tile is. Uses range requests. Constrained to Z, X, Y indexing. Has deduplication. So if you have, you know, 70% of the world is ocean. So 70% ocean tiles are just going to be stored once and that's it. And direct plugins for Leaflet and also MapLibre GL, you need a pretty recent version
to have that right plugin support. So quick layout. It's a single file, the tiles there, the directory at the front. So this idea of being scale-free is like, you can just upload one file, like a big pot of Americano coffee. Just one big thing instead of tons of little things. That might be 10 megabyte.
It might be 500 megabytes. Planet scale is like 80 gigabytes. So it's a single file that can store an entire tile set and be accessed remotely, so directly to the browser. One use case that has come up is somebody has been building an RStudio plugin for this.
So for a data scientist using R, they don't necessarily want to have to run a tile server to share some data set. But the idea of just putting that visualization tile set onto like S3 or another storage provider and being able to share the visualization is really useful.
So the outcomes from this, it's gotten a good amount of uptake and I found that it enables new use cases, this model of not having sort of an API or a server to talk to. It could be shareable notebooks. It could be offline if you're on an airplane or in space.
And the pricing, because it's sort of scaled to zero, like if you're not using it, you're not paying for a server. So it's affordable for sort of small scale projects and it's already being used for a bunch of projects. The big thing is I'm at this conference to kind of talk to people to figure out
what needs to go into the next spec version. Motivation is V2 was kind of get it out the door, don't optimize, just get it running on all the environments. So you know, server, different languages, Python, JavaScript. V3 introduces compression, so by default, GZIP for tile data and also compressed indexes.
So there's like some overhead with fetching the directory structures for each tile request that can be cached. In general for V2, it was just like, do something really basic, don't compress. Like half a megabyte of overhead and with V3, it's generally under 50 kilobytes. So it's an order of magnitude better. Also introducing an optional clustering
on space filling curves that can make certain types of access patterns more efficient. So I wrote a blog post on how this works just so I can justify my design. So inside PMTiles, tile content can be de-duplicated but also now directory entries can be de-duplicated
because each entry is no longer a ZXY but a position on a Hilbert curve or a series of Hilbert curves going from order zero to order infinite. And you can encode a run length in Hilbert space. So if we look at this diagram on the bottom right,
it shows you a single entry that covers like 170,000 different tiles that all have the same value so that entire segment of the world can be captured by a single very small entry. And this is just a cool visualization of all of the oceans or all of the repetitions.
So even things like national parks in Canada that have repetitive tiles can be run length encoded. Directory compression, so just to get you an idea of what kind of ratios we get here. So I've been working with this test, it's a vector based on data set I generated from zero to 14. For V2, the total directory size is about six gigs
because there's no compression. So after you do a couple of delta encoding and like variable length integers, run length encoding, you can get it down to 92 megabytes. One thing that I've been working on as of just yesterday
is talking to people, question I get is why not just use sort of quad keys which are the same thing as z-order curves or Morton codes. Hilbert is more expensive but has better locality because there's much more continuity. If you're trying to run length encode a continuous sort of area, it can find a lot more
of those, generally about a 5% improvement. And if you're using PMTiles, you're probably waiting the most for doing the IO to actually grab the tile over a disk or the network. So the CPU overhead I think is negligible. And PMTiles version three will come
with a lot more ecosystem tools. So a new standalone binary to do really fast conversion from MBTiles with no installation. The current one is in Python. Interactive viewer for PMTiles, whether local or remote. Direct output from popular tile generation. So the tool TipiCanoe, if you've used that,
I've been working on adding PMTiles native support to that as well as readers and writers in more languages. So what the PMTiles viewer looks like, this is on GitHub now, but you can load a URL or drag and drop a local file. It works the same way and get a list of every single tile, be able to hover over a feature
and see its attributes, sort of like a inspector. Another new thing that I've open sourced recently is serverless function as a service implementation. So if you'd like to adopt PMTiles but don't want to change your front end, you can simulate a ZXY endpoint
with no server involved via Lambda or Cloudflare workers. And you can also cache those tiles at the edge. So it is blazing fast. And the implementation can be deployed just by copy and paste. It's a tiny little zip file that's about two kilobytes because the spec is so simple. So you can find that on GitHub.
And you can see our previous spider web of different dependencies is now just S3 and some kind of CDN like Lambda all the way to the browser. So one really great part of this as having these open source components is being able to collaborate with people
around the world and beat a lot of them for the first time here at Phosphor G. So here's just some of the people that have made commits to any of these three projects. So I wanted to thank them. And please talk to me if you're here. And next steps for me, I am in the midst of turning this into a business. So if some part of this or all of it
sounds interesting to your organization, I am doing commercial support for organizations, whether that's developing those open source tools more or somehow deploying it in your environment, working with your data sets. And here's how to contact me. Thanks.
Thank you. Thank you, Brandon.