We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

CartoDB Basemaps: a tale of data, tiles, and dark matter sandwiches

00:00

Formale Metadaten

Titel
CartoDB Basemaps: a tale of data, tiles, and dark matter sandwiches
Serientitel
Anzahl der Teile
183
Autor
Lizenz
CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
Produzent
Produktionsjahr2015
ProduktionsortSeoul, South Korea

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
CartoDB is an open souce tool and SaaS platform that allows users to make beautiful maps quickly and easily from their own data. To complement our users needs, we launched last year our free-to-use open source OSM based basemaps Positron and Dark Matter (https://github.com/CartoDB/CartoDB-basemaps), designed in collaboration with Stamen to complement data visualization. While architecturing them, we had several compromises in mind: they had to be powered by our existing infrastructure (powered by Mapnik and PostGIS at its core), they had to be scalable, cacheable but frequently updated, customizable, match with data overlays, and, last but not least, they had to be beautiful. This talk is the tale of the development process and tools we used, how we implemented and deployed them and the technology challenges that arose during the process of adapting a dynamic mapping infrastructure as CartoDB to the data scale of OSM, including styling, caching, and scalability, and how (we think) we achieved most of those. I will also talk about the future improvements that we are exploring about mixing the combination of basemap rendering with data from other sources, and how you can replicate and tweak those maps on your own infrastructure.
126
FokalpunktMapping <Computergraphik>SystemtechnikTeilbarkeitPhysikalische TheorieHackerTesselationEinschließungssatzComputeranimationProgramm/Quellcode
SoftwareentwicklerSoftwareMultiplikationsoperatorBitKeller <Informatik>
KanalkapazitätFunktionalDifferenteTextur-MappingGebäude <Mathematik>MengePhysikalisches SystemProgrammbibliothekPolygonMapping <Computergraphik>SystemplattformSoftwareentwicklerOffene MengeVisualisierungQuaderAssoziativgesetzArbeit <Physik>Spezifisches VolumenBenutzerschnittstellenverwaltungssystemKategorie <Mathematik>Codierung <Programmierung>PolarkoordinatenDemoszene <Programmierung>Skalarfeld
SystemplattformKategorie <Mathematik>Mapping <Computergraphik>Endliche ModelltheorieCASE <Informatik>Textur-MappingMultiplikationsoperator
Mapping <Computergraphik>Gemeinsamer SpeicherTextur-MappingOverlay-NetzInformationMengeEinfügungsdämpfungKorrelationsfunktionMailing-ListeStabForcingRechenwerk
MengeMAPPunktURLTextur-MappingOverlay-NetzTexteditorBitDateiformatMapping <Computergraphik>SchlussregelVisualisierungRoutingMultiplikationsoperatorGruppenoperationTesselationPolygonTabelleAssoziativgesetzSerielle SchnittstelleSoundverarbeitungNegative ZahlGefangenendilemmaDemoszene <Programmierung>PartikelsystemMaßerweiterungElementargeometriePhasenumwandlungWort <Informatik>Ordnung <Mathematik>Kategorie <Mathematik>Physikalisches SystemDifferenteElektronische Publikation
MultiplikationsoperatorTextur-MappingPhasenumwandlungTexteditorBitUML
AssoziativgesetzTexteditorDefaultTextur-MappingMinkowski-MetrikFitnessfunktionMapping <Computergraphik>AggregatzustandStabMultiplikationsoperatorWeb SiteInverser LimesSichtenkonzeptVisualisierungHilfesystem
VolumenvisualisierungDatenbankTabelleInverser LimesZentrische StreckungTextur-MappingElementargeometrieUML
ZahlenbereichEinsEndliche ModelltheorieBitMarketinginformationssystemFokalpunktHilfesystemSprachsyntheseBefehl <Informatik>DatenfeldBenutzerschnittstellenverwaltungssystemFormation <Mathematik>ServerGarbentheorieExistenzsatzOpen SourceProzess <Informatik>SoftwareentwicklerMapping <Computergraphik>Textur-MappingSystemplattformPhysikalisches SystemVisualisierungZoomSichtenkonzept
Globale OptimierungMapping <Computergraphik>Endliche ModelltheorieSoftwareentwicklerDemoszene <Programmierung>MultiplikationsoperatorNotepad-ComputerTexteditorTextur-MappingUML
AggregatzustandPlug inAffine AbbildungSoundverarbeitungComputeranimation
VariableHierarchische StrukturProjektive EbeneTextur-MappingMathematikAbfrageWasserdampftafelMengeMapping <Computergraphik>Plug inVisualisierungBildschirmfensterGraphfärbungAggregatzustandAssoziativgesetzRuhmasseAutomatische HandlungsplanungNotepad-ComputerWort <Informatik>Element <Gruppentheorie>BildschirmmaskeMessage-PassingProgramm/Quellcode
Mechanismus-Design-TheorieTelekommunikationVerdeckungsrechnungTextur-MappingInformationFokalpunktAssoziativgesetzWeb SiteZusammengesetzte VerteilungMapping <Computergraphik>Dynamisches SystemMengeVisualisierungEinschließungssatzEchtzeitsystemSchaltnetz
Program SlicingMinkowski-MetrikTextur-MappingOpen SourceMathematikDatenerfassungBitKomplex <Algebra>VisualisierungRuhmasseProjektive EbeneKategorie <Mathematik>
MultiplikationsoperatorWorkstation <Musikinstrument>Physikalisches SystemVideokonferenzMereologieDatensichtgerätNatürliche ZahlParametersystemPhasenumwandlungTextur-MappingQuick-SortMapping <Computergraphik>Pen <Datentechnik>AssoziativgesetzIterationAudiovisualisierungServerZoomFitnessfunktionEinschließungssatzTexteditorTesselationHydrostatikBildgebendes VerfahrenAbfrageATMURLVersionsverwaltung
TypentheorieEndliche ModelltheorieTeilbarkeitRechter WinkelAbfrageOrdnungsreduktionMultiplikationsoperatorDialektElektronische PublikationDatei-ServerDienst <Informatik>Mapping <Computergraphik>SoftwaretestMengeMeta-TagTesselationKategorie <Mathematik>LastSoftwareTextur-MappingDefaultSoftwareentwicklerServer
MultiplikationsoperatorProgrammierumgebungTesselationUnrundheitServerHash-AlgorithmusCodeHook <Programmierung>Grenzschichtablösung
Workstation <Musikinstrument>Datei-ServerMultiplikationsoperatorProgrammierumgebungFlächeninhaltWort <Informatik>Nichtlinearer OperatorDateiformatMechanismus-Design-TheorieDefaultTesselationWKB-MethodeMathematikServer
KnotenmengePunktPolygonMetropolitan area networkWKB-MethodeDämpfungLokales MinimumMannigfaltigkeitOrtsoperatorAssoziativgesetzMathematikXMLUML
Mapping <Computergraphik>TesselationOpen SourceUmwandlungsenthalpieCASE <Informatik>VariableSoftwareServerHackerCodierung <Programmierung>DatenbankPixelGraphZusammengesetzte VerteilungOrtsoperatorQuick-SortPunktTextur-MappingEndliche ModelltheorieMultiplikationsoperatorBeobachtungsstudieDatensichtgerätHyperbelverfahren
Computeranimation
Transkript: Englisch(automatisch erzeugt)
Okay, let's see if this works Okay, sorry, okay So hey everyone. I'm Alejandro Martinez. I'm a systems engineer at Cardi B and I wanted to give this talk. This is about the Cardi B base maps, a tale of data tiles and dark matter sandwiches and
After all, it's a tale or like a story of how we ended up serving base maps by just an evening hack of a co-founder of Sergio
which tried to do something and On Friday evenings, we have something that we call the leapfrog Fridays which is basically about the evening to spending some time hacking on top of the Cardi B stack for some experiment experimental stuff or things that we want to improve or
feed a little bit more in the stack and We do this a lot because we like to push our own limits, and it's a way of development We build a lot of the Cardi B stack, a lot of the new pieces on top of the existing pieces of the Cardi B stack
For example, the geocoding is just SQL functions on top of the on top of On top of a Cardi B account Which have all the ready data and use the own postures, filters, search capacities to search around for geocoding names and polygons or the data library data sets are built which are the
if you log into your Cardi B account and go to create a new visualization and you got a lot of open data that you can use Out of the box, that data is actually fetched from another different Cardi B account which is having the data and it's being copied to your own account and then so a lot of internal API's and
things we use both for development on the systems team or for everything that are built on top of the Cardi B because we think it's a way to sort of improve the experience both for us and for everyone who wants to build things on top of our platform So back to the base maps. A base map is simple and yet complex
I mean a simple base map is just a layer of data. In our case we want it to be some open data and with a matching style, which most of the times is the most difficult thing So it makes sense for an evening to try to
try to create some base maps using Cardi B, even though Cardi B since it began, it wasn't envisioned as a platform for making base maps, but for putting layers of information on top of base maps like overlays of data of quite a small amount of data compared to OSM or compared to any other
data that might be worth to be called a base map to share information on top of But most of our stack was already based on PostureSquad, PostGIS which happens to be like the most common stuff for serving base maps itself
even though we've focused a lot of serving dynamic data that changes frequently and it's not as big as the data set from OSM We believe it could be worth a shot. So we went working and we obviously went with the with a less detailed data set, which is Natural Earth and
got all the polygons and related things to make a bare base map, which is not even province level, just a country level and try to study a bit and we did like three to four base maps using the own Cardi B editor with a big account and data we uploaded using the
Cardi B UI and we used this to try to explore how difficult, how far did we get on our purpose of being a data visualization overlay some kind of specialists from how far we were from being a base map editor and
we were almost there. I mean you could make a basic base map using Cardi B by just uploading the data sets and pushing the making the styling which can get to be really tricky and difficult as you deal with different zooms and
things, but we found that the Cardi B editor was not the best suited tool for this because well it started for the UI I mean the UI wasn't designed to make to suit such a big amount of layers one on top of each other and there was a point when they overlapped each other and it broke and some other things like the data set size
I mean you can upload data sets of two gigabytes, three gigabytes tops but that's not worth it. If we wanted to make a worldwide data set we could not import it using the Cardi B UI and that was fine because you usually don't want to upload and display 100 gigabyte table at once. Base maps are the exemption for us, not the rule
but yeah despite of all these hurdles in the editor making a simple base map was quite easy and it was simple to make it work because Cardi B, the tiler, already serves XYZ tiles
but it does with this with this code, we call it this layer group ID which is something that depends on both the time the data sets have changed and the and the style, but we didn't need that because we wanted to get a fetch URL so we did the quick route which is just make a route and make a
rule and nginx to just make the affix URL pointing to the real one for the visualization and that was the easy way to have I mean we already had something that you could access a base map simply on leaflet without even using Cardi B GIS or any tool that accepted
XYZ formats for serving base maps without very much work so achievement unlocked, like we got the first base map and we've had the base map, we launched the simple to base map like a year and a half ago I think a little bit more maybe and they were already available in the Cardi B editor for a long time
but then some almost a year ago we wanted to go a bit further, we wanted to go a bit further because we started to we wanted to remove the map views limiting because if you want to make social base, social maps and makes that get shared by the community and
like getting to the getting to getting people to make maps without being afraid of how many times will we debut and they want to make them and actually want them to be popular we had to remove that restriction and make all the maps in Cardi B have unlimited map views, but then
we had the data that is overlaid, but then we also need something unlimited to put behind and we tried, of course there are a lot of people serving base maps that do very much better than us, but we wanted to give it a try and to
make an OSM base map that we could host ourselves and be that we could be the responsible stuff and we could pay like the usage of it and that also was designed for data visualization in the sense that this base map is really going to be the default base map in the Cardi B editor, so
a hundred percent or up to ninety percent of the usage is going to get is from visualization that made on top of it and data that is overlaid, so we wanted to be as close to data visualization as we could so that's why we decided to cross the OSM limits and we got the help of omniscale to use an imposome, make a
definition that makes sense and having the whole OSM inside a table which happened to be inside a Cardi B database and inside a Cardi B account and we got a select all from planet with all the geometries in OSM that well, it doesn't make sense to query a
150 gigabyte OSM data with only PostgreSQL for data rendering, but we have it there and we could like do stuff on top of it, and then to cross the matter limits. Yes, this is a bad pun because of the name of the base map Dark Matter We got the help of a statement which helped us make two awesome
open source base maps, which is Positron and Dark Matter, the white and the dark one which were designed with data visualization in mind, as in they're the ones which are going to be using Cardi B by default, so there better be and
that's how we got a bunch of interesting stuff on top of the already imported OSM to handle the zooming and visualization on top of, while inside the Cardi B platform our systems base map infrastructure, for example we used material we used materialized views to filter the data, the sections of OSM that
were going to be relevant if it's assumed to avoid like transferring too many stats, too many data to the tiler server and we used materialized views because they're very handy and PostgreSQL 9.4, they can be refreshed concurrently so it made sense to use them as
some kind of mirror of the OSM data that we keep updating with imposing madness moses with the filter data for the base map and also some a lot of SQL magic to make sure that the data matches its zoom, etc. And this was
done by the awesome Gaussian statement both with which helped us doing things, doing all these things and then we also got to the development process, which was a bit, we started developing the base maps in TileMill which it also uses Cardi CSS, it pretty much matched what we wanted to display
but then we found some issues with the Cardi CSS handling is slightly different from the one on our Cardi B map APIs so we decided to just go look for something else and while the initial development was done on TileMill
we went iterating and did another another draft editor, which was made on HTML with just Cardi B GIS, but we ended up with creating another cool way to create base maps on top of Cardi B, which is the Atom base map editor, which is not really a base map editor per se
but it's a plugin you can put on top of Atom which just will connect to your Cardi B account and allow you to easily edit any, to easily edit some Cardi CSS and automatically push it to Cardi B and
and have a preview window to display the the data set you've shown and the visualization you're creating. So this is it and it's cool because you can just change anything. For example, I'm going to change the water color because water is like blue, don't you think?
so with a bunch of cool plugins for editing and linting and cardio editing on top of Atom we felt it was like the right ecosystem to fit in and you just save and the actual it automatically gets pushed to Cardi B and it generates a new base map with the style you've set, you've sent, and you can also not only change the
the style which has its own hierarchy and variables and every any custom Cardi CSS common things but you can also change the queries that are applied to the to the map and to each layer so you can, like for example, I just opened a new one
apply any kind of queries, for example, this is a cool example I got that you can just apply an ST transform and have the same map dynamically render and reproject it into another projection. This is Robinson. So another advantage of this is that all the base maps are rendered using our existing infrastructure, which is
which is focused on dynamic mapping and making sure that things get updated quickly so you could also mix on top between the layers of that base map. You could also mix your own a data set or your own
information that you can keep updating using the Cardi B editor, the SQL API or whatever way of accessing Cardi B you want to use and you can use them inside of base map, either masks, overviews, filtering using SQL or any kind of combination you want to achieve as you're pretty much setting in the SQL and CSS and we're just rendering and
we also have all the caching mechanisms and invalidation mechanisms both with our locker bandage on a CDN, which is fastly, to make sure it will keep the asset updated in almost real time so we went up, we kept experimenting on these base maps and
developing new features that I'm gonna go briefly through, that I'm gonna go through briefly and the first one is sandwiches. Sandwiches in the sense that our base maps, as I already said a couple times, were thought for data visualization and with data visualization you often get things like this. This is like a map
that is all the transparency. It doesn't have very much transparency, so it's covering all layers So what do you do? Well, it's quite simple. It sounds quite simple, but it has a little slice of complexity. You can just put the labels on top So what it is, is we're releasing another
as the the base map project was quite structured in layers. We just released another different layer which was just a layer which only had the labels on top, so you can just put, using leaflet and and carry bjs, put the base map which is the
thing you see behind the blue mass On top of it, you put whatever you want to visualize and on top of that you put the labels and it makes for a nice visual change. It's a little small detail, but this affected a lot of pieces on our stack even though we had a bit of styling issues, like, well, you have to alter the style and adjust the labels, but
it was pretty simple to make it, but we also went and implemented some other things all over the category b which ended up exploding and affecting a lot of pieces of our stack, because right now the the Cardiby editor is using the sandwich labels by default, without telling you, and
most people didn't even notice, which I think is pretty cool because it feels like natural. We implemented some quite things in our tiling server to be able to cope with all of this and
those preview visualizations on your base maps, on your maps, I mean, sorry the previews were actually in the first iteration of the editor when they're using leaflet and they got all the three layers but we wanted to go a step further and be able to serve just like an actual image for your whole visualization, including the base map and
the labels and, of course, your data. I went, we went over this by using the maps API that we already have, which is based on WinShift, which is based on Mapnik, and it's called WinShift
adding what we call the sandwich mode which basically means that you can no longer only request Mapnik layers with a style and CSS and queries but you can also request an HTTP layer any HTTP layer that allows XYZ, for example, in this case, it's our own base map but you can put in pretty much any other base map on it and
it will request all the layers and compose them together and serve them as another map, which is just like a combined version in JP in PNG of that map and another thing we added for the dashboard is that you're no longer confined to request XYZ tiles but you can also request, we also did a static map API, which means that you can request, give me this tile for a fitting zoom and
center in these coordinates and of a given size and you can just alter in the URL the parameters that you see here on the end you can just change and tweak the map where you want it, you want to display it so
the last part is a systems part that I'm most involved in with not because the base map during the development of the base map we ended up improving our infrastructure by load testing and comparing and new settings and spreading new things that we ended up
testing with the base maps but expanding to the whole category B The first of them is meta tiling. Meta tiling is a simple concept which is that the tiling server, which is the one that serves the tiles, instead of when you ask for a tile, instead of rendering just that tile it renders the, for example, if you want to request this tile
it will not paint this tile, but also paint the whole adjacent bunch of adjacent tiles and have them saved, so because, you know, most of the times it's going to be people seeing a huge map and requesting a lot of tiles, so it's kind of intelligent to reduce SQL queries
to just paint a bit, assuming that the user will request it Meta tiling is under a win shaft one you can buy tile like Mapnik, in fact on an internal cache, so when you request a tile, it ultimately generates the adjacent tiles
The problem with our 16 stack is that our stack is more or less like this, the overall stack of the category B system, our software as a server system, which is that we didn't only have one tiler but we have more than one tiler and we balance amongst them using another upper layer, which is not here, which is in GNX, which by default
will the routing that we'll use is quite stupid, in the sense that it will be just a round robin and randomize the request along all tilers, but the problem with you have that is that when you request a tile to a tiler the tiler will paint all the adjacent tiles
but that tiler will not serve the adjacent tiles because round robin will probably serve it to another one so we'll just end up painting like four times the amount of tiles that we wanted to paint without for nothing so we went exploring and hacking around and we found a very interesting way to do this, which is consistent hashing
which is just that each request is assigned like a hash that will determine which server will serve it but with nginx and OpenResty which is a Lua environment put on top of nginx that you can use to hook into requests you can decide what can you use to
to calculate the hashing so we went and after some exploring we came up with this, this simple piece of code we'll just do some math operations, given how the quadtree works to make sure that all the tiles that have the same that are contained into the same meta-tile for the same soon are served by the same tiler server
so it's kind of more optimal routing for the distributed environments of serving tiles and then again the last thing we've done to play with this more for squeezing all the performance out of the serving was
ditching WKB. WKB is the default transport format for PostgreSQL, which is this is an 8-byte float for each coordinate of its position, for example imagine that you have a polygon with
a hundred points or corner vertices, it will transfer each of those vertices with approximately a precision of that and usually never have that precision I mean you don't usually have subatomic precision in your WKB visualizations, if you do then you're the coolest man I've ever met
we decided to explore how could we change this and we ended up with something what we call Tinywell No-Binary which is a specification that we open source and we want to build up on and we work with some other people to do this and it's equivalent to that
when no binary, but using delta encoding and variable precision to make sure it fits the the more or less the the precision that you want to display in our tile because you don't want subatomic precision for displaying a 0.256, 0.25C style usually with having where in which pixels, a pixel is the point, is enough
and well, I got another talk about this in a later session, but Tinywell No-Binary basically helped us really squeeze a lot of performance for this and get a huge performance improvement because network in this case of base maps was one of our main bottlenecks and
we just, moving to Tinywell No-Binary I think you can guess on this graph when we moved reduced to our overall traffic to 10% of what it was on the traffic between the database server and the tile server
So yeah, this is how Evening Hack ended up like destroying and causing improvements all over the stack and That's all. I think I'm here buried, so if you have any questions Thank you. Any questions and comments?
No questions? No? Okay, thank you so much.