We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

MapCache: The Fast Tiling Server From The MapServer Project

00:00

Formal Metadata

Title
MapCache: The Fast Tiling Server From The MapServer Project
Title of Series
Number of Parts
95
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Production PlaceNottingham

Content Metadata

Subject Area
Genre
Abstract
MapCache is a new member in the family of tile caching servers. It aims to be simple to install and configure (no need for the intermediate glue such as mod-python, mod-wsgi or fastcgi), to be (very) fast (written in C and running as a native module under apache or nginx, or as a standalone fastcgi instance ), and to be capable (services WMTS, googlemaps, virtualearth, KML, TMS, WMS). When acting as a WMS server, it will also respond to untiled requests, by merging its cached tiles vertically (multiple layers) and/or horizontally. Multiple cache backends are included, allowing tiles to be stored and retrieved from file based databases (sqlite, mbtiles, berkeley-db), memcached instances, or even directly from tiled TIFF files. Support of dimensions allows storing multiple versions of a tileset, and time based requests can be dynamically served by interpreting and reassembling entries matching the requested time interval. MapCache can also be used to transparently speedup existing WMS instances, by intercepting getmap requests that can be served by tiles, and proxying all other requests to the original WMS server. Along with an overview of MapCache's functionalities, this presentation will also address real-world usecases and recommended configurations.
Metropolitan area networkFluid staticsComputer-generated imageryAxonometric projectionImage resolutionSource codeComputer-aided designServer (computing)TesselationLibrary (computing)CodeInterface (computing)Demo (music)UsabilityCache (computing)Communications protocolData managementData compressionInterpolationStandard deviationElectronic program guideLevel (video gaming)GoogolKey (cryptography)Electronic mailing listEmailTraffic reportingError messageMaxima and minimaDigital watermarkingMessage passingBitMedical imagingSpeech synthesisCommunications protocolOperator (mathematics)CASE <Informatik>Multiplication signMusical ensembleServer (computing)MappingPoint (geometry)Endliche ModelltheorieMachine codeInterpolationCodeData managementCodeIntercept theoremQuicksortImage resolutionMetropolitan area networkAsynchronous Transfer ModeAreaLatent heatTessellationBoss CorporationTracing (software)MultiplicationSemiconductor memoryOpen setLevel (video gaming)Web 2.0Real numberGame theoryInternet service providerNumberDifferent (Kate Ryan album)Single-precision floating-point formatExtreme programmingPresentation of a groupProjective planeAuthorizationTransport Layer SecurityTerm (mathematics)SpacetimeMereologyPlanningRippingWeb browserExtension (kinesiology)Dependent and independent variablesScalabilitySource codeFilm editingTesselationFront and back endsModule (mathematics)Cache (computing)Error messageMeta elementSlide ruleRow (database)Parameter (computer programming)Classical physicsProxy serverInformationElectronic program guideElectronic visual displayAxiom of choiceFehlererkennungComputer-generated imageryException handlingInterface (computing)Demo (music)DebuggerQuadrilateralKeyboard shortcutData compressionDoubling the cubeEmailUsabilityComputer animation
Computer-generated imagerySource codeData compressionBefehlsprozessorAerodynamicsBand matrixMaxima and minimaOperations researchWireless Markup LanguageEmailDigital watermarkingZoom lensCache (computing)Error messageTraffic reportingMessage passingCodeUniform convergenceStrategy gameBitSubsetAliasingStandard deviationCodierung <Programmierung>Image resolutionExistenceSet (mathematics)Software testingTemplate (C++)Directory serviceLimit (category theory)Physical systemNumberComputer fileBlock (periodic table)SpacetimeMiniDiscInsertion lossIdentity managementIdeal (ethics)Data storage deviceSatelliteComputer configurationCASE <Informatik>HierarchyData storage deviceMultiplication signSpacetimeOperator (mathematics)Representation (politics)Arithmetic meanCASE <Informatik>Computer fileUniversal product codeType theoryData structureMultiplicationImage resolutionDataflowMassDatabaseWebsiteInterface (computing)SubsetCodeSemiconductor memoryWeb 2.0Digital photographyPresentation of a groupDecision theoryTessellationData conversionExecution unitPhysical systemStandard deviationNumberLibrary (computing)HierarchyInstance (computer science)SatelliteMetropolitan area networkRight angleOpen setLatent heatExtension (kinesiology)Musical ensembleInsertion lossComputer configurationMathematicsState of matterFreezingVariety (linguistics)Query languageFile system1 (number)Concurrency (computer science)Lattice (order)Medical imagingTesselationMiniDiscFront and back endsCache (computing)Level (video gaming)File formatMappingBookmark (World Wide Web)EmailLimit (category theory)Server (computing)Direction (geometry)PixelZoom lensConfiguration spaceDefault (computer science)Projective planeGraph coloringBefehlsprozessorData compressionStructural loadMatching (graph theory)Block (periodic table)10 (number)Reading (process)Computer animation
Interior (topology)SatelliteData storage deviceNumberComputer configurationCASE <Informatik>HierarchyMiniDiscSpacetimeCache (computing)Normed vector spaceNewton's law of universal gravitationMultiplicationSource codeSubsetDimensional analysisZoom lensGeometryLevel (video gaming)Type theoryElectronic mailing listRegular expressionBinary fileTimestampPhotographic mosaicAerodynamicsPresentation of a groupVector spaceMetropolitan area networkMUDSoftwareUniform resource locatorExtension (kinesiology)DatabaseWordElectric generatorPhysical systemCASE <Informatik>Subject indexingMusical ensembleAreaSequelEvent horizonExecution unitState of matterMedical imagingSubsetMultiplication signGeometryLimit (category theory)WeightLevel (video gaming)Photographic mosaicSatelliteServer (computing)Whiteboard10 (number)Band matrixSoftware testingInstance (computer science)BuildingPresentation of a groupGene clusterDigital electronicsOrder (biology)Computer programmingField (computer science)Reading (process)Network topologySystem callSet (mathematics)Numbering schemeLattice (order)Roundness (object)MereologyLocal ring2 (number)MultiplicationPopulation densityDimensional analysisSpacetimeData storage deviceModule (mathematics)Division (mathematics)TesselationCache (computing)File systemAsynchronous Transfer ModeSource codeWebsiteBenchmarkThread (computing)TessellationBefehlsprozessorHeat transferUtility softwareFunctional (mathematics)File formatBit rateInternet service providerComputer animation
High-level programming languageServer (computing)TesselationObservational studyImplementationCASE <Informatik>Computer animationXML
High-level programming languagePlanningComputer animationXML
Hill differential equationHigh-level programming languageRight angleQuicksortSet (mathematics)MultiplicationInsertion lossElectronic mailing listXMLComputer animation
3 (number)High-level programming languageHill differential equationState of matterPoint (geometry)Stress (mechanics)Device driverXMLComputer animation
Maxima and minimaTerm (mathematics)Instance (computer science)Level (video gaming)Cache (computing)CASE <Informatik>SoftwareSource codeXMLComputer animation
Hill differential equationCuboidComputer animationXML
Hill differential equation3 (number)High-level programming languageLaw of large numbersOpen setCache (computing)Level (video gaming)Carry (arithmetic)Computer animationXML
High-level programming languageUniform resource nameHill differential equationPhysical lawObject (grammar)Revision controlGame theoryPoint (geometry)Server (computing)XMLComputer animation
High-level programming languageHill differential equationPoint (geometry)Level (video gaming)Multiplication signComputer animationXML
High-level programming languageMereologyRoyal NavyComputer animationXML
High-level programming languageHill differential equationXMLComputer animation
Transcript: English(auto-generated)
Um, no, not at all. Oh, this one. Okay. That's just for the record. Okay. So, um, I'm Thomas Bonfort. I work with the map server team and, uh, I'm going to be presenting about, uh, map cache, which is the, uh, tiling parts, uh, new tilings. Well, more or less new now,
tiling server, the maps of a team. Um, I realized that with this as a newbie presentation, uh, it was probably meant to be newbies for people who didn't know map cache. Um, I'm still going
to put a slide about what's a tiling server just in case. Um, instead of rendering images and data on demand, tiling is pre-generating small, small image tiles and storing them beforehand for faster access by web clients. Um, I'm going to use a bit of terminology here,
uh, that I'll explain now. So a grid is how you've cut up the, the space under a given projection with different resolutions you've chosen. Uh, what's the extent of your, your
errors of interest? Um, a source is what I'll call, um, the, the, um, upstream data provider. Uh, so the, the most well-known is, uh, WMS. So something that can produce an image for a given size and extent. Uh, other data sources could be, uh, native of map server, Google,
Mapnik, whatever. Um, a cache is, um, where you're going to be storing the tiles once they've been created. Uh, a tile set is the mixing together, uh, grid source and cache and serving them
under a specific name. And the service is under what protocol you're going to be serving those tiles. So there are multiple, multiple protocols. Uh, what's map cache itself now? So it's actually a tiling library, more than a server. And it has, so it's hooked up with a multiple server front ends. So it can run natively as a module inside Apache or Nginx.
There are some, also some node bindings for it to run it under node JS. And it can also run as a standalone CGI or fast CGI. Um, it's versatile in the sense that, uh, you have lots of choices as to what kind of
cache backends you can use to store the tiles, uh, supports multiple protocols. Uh, you've got some tile management features, which allows you to seed specific areas, uh, reseed old tiles, interpolate between tiles to create new data. It's written in native code C code. So it's
very fast. And, uh, while I put ease of deployment, easily deploying a tiling server is probably not a very strong selling point, but at least you have a small demo interface that gives you what kind of JavaScript code you should cut and paste if you want to use the service.
Uh, the history. So it's, uh, three years old, basically. And, uh, and it's under map server, map server governance since, uh, 2012. Uh, so, uh, first talk about the protocols which are supported. So which are the web protocols it's by which you can request tiles.
So there's all the XYZ, uh, classical protocols, KML and WMS. So for the standard tile addressing, you can address them in TMS, which is what's the original OSGO tile addressing spec, WMTS, which is, uh, OGC's, uh, response to TMS to, to have something on the, on the
OGC, uh, virtualized quad keys, which I don't think are very much used, uh, my guide addressing either, or just plain old X and Y and Z. Um, it also has a WMS extension, which means it will understand WMS requests and possibly build full get map requests out of the cached
image data. So here, typically the example is if you have one layer, which is a base layer for the roads of an area and the second layer, which is the tiles of a radar, you can request both layers to map cache and it will, um, uh, glue the tiles together and just send you back
a single tile with both image data inside it. So in terms of bandwidth, you're, you, you can always display, display a single layer instead of stacking up multiple layers.
Um, it can also mirror a full WMS server. Um, so in that case, you just stick it in front of a WMS server and it will cache each individual layer. And when it receives requests, we'll, we'll stack the layers together, or it can also store the layers that are already
stacked together. Um, and it's also forwards get feature info requests in case, uh, that's something you need. So the get feature info requests will be forwarded back to, up to the upstream WMS server. Uh, so that's what I wanted. And, uh, there's a last mode, which is kind of a OGC proxy. So you, you can put it in front of any OGC, OGC
between parent, parenthesis because it doesn't have to be OGC, but in front of any server, it will, uh, intercept, uh, requests that can be made from tiles and whatever else it doesn't understand can be forwarded to a number of other backend servers, uh, depending on parameters. So
basically you can have it as a single entry points to cache your frequently accessed layers and then forward the WFS requests or WMS requests in grids that you haven't cached or whatever to, to add the service. Uh, miscellaneous features, which I usually found in
tile caches. Uh, so it adds the HTTP headers to know that your browser can cache it for a given amount of time. It can automatically expire tiles. So we're saying that your tile set is, uh, should only last five minutes. Then a request for a tile that's older than
five minutes will get regenerated instead of being sent stale. You can choose what kind of error reporting you send back to your client. So if you don't want, you want to hide your errors, you can just return, uh, an empty image or an image with the logo saying you don't have any data or, or an error code or whatever. Meta tiling is, uh, I won't go into details. You can
watermark your, your tiles. And there's a nifty new feature. So if you have low resolution data that you want to serve up to higher resolutions, instead of having to seed and or cache tiles up to all the higher resolutions, it's all upsampled your lower levels to return
an upsampled image for the higher levels. Um, it's made to, so it has some image operations to optimize the, the image data that's stored into the caches. Um, so the WMS server, basically
you, most of the time you make it return uncompressed and lossless image formats. Then you'll cut it up into tiles. In that case, you avoid double JPEG compressions or whatever. So you have a full, a full, um, um, it will come later. Uh, so you can cache,
be aggressive when you're storing the tiles. So they, they have the smallest size possible, or be lean on the compression algorithm when you're doing, doing images on the fly to, to, uh, put less load on your CPU. That's not what I wanted.
It will detect, uh, empty or uniform tiles. Most of the, when you're tiling down to very high resolutions, most of your tiles are actually empty. You have no data on it. We're not storing the, not storing the whole data, but just storing the fact that we have
an empty image in a certain color. Um, for fully seeded caches, you can also say that if I don't have a tile at someplace, there's no need to go and ask the WMS server for the data. That means I have no data at this place and return a transparent image.
Uh, grids, uh, you can define multiple grids per, per tile set. So you can cache multiple resolutions. Basically your most used one is, uh, Google Nacota ones. You can also cache,
uh, WGS84 for, for the, um, global Google Earth ones. It also understands non-standard, uh, projection codes. You can alias them in case your favorite mapping agencies have come up with their own non-standard codes. Um, and you can also use, uh, um, generic grid,
but just say, I'm only interested in a subset of it, so I'm only displaying zoom levels zero to eight, or I'm only representing tiles that cover a given extent. Um, uh, so now there's a meat
of the presentation. So it's where you're going to be storing the tiles. So there are many back ends that are possible for this. Um, a cache is actually a very simple interface that just has, uh, four methods when you, to tell it to, to, does a tile exist for this one? Get me, get me this tile, put a new one or delete, delete a given tile. And then there's some
backend specific, uh, hacks that I'll talk about. So the, um, the first kind of cache is the disk cache. So you're basically storing tiles into a file system. Um, it's very fast because, uh, you're very close to the file system. Uh, you can do symbolic linking of,
of, uh, uniform tiles to save on disk space, and you can choose whatever layout you're going to be storing your tiles in or to reuse an existing layout. So the standard tile cache, uh, disk layout, or you can read ArcGIS caches or provide your own one.
Uh, the cons of this cache, it's very difficult to manage once you've got thousands, tens of thousands, millions of files, uh, just knowing how many files you have, having, knowing the space they're taking is very slow operations. Pushing a million files up to a remote server takes a lot of time. Uh, you also very often hit file system limits,
so you don't have any high nodes left to store, store tiles. And you, it's not very optimal storage-wise because the block sizes on the file system don't match the actual tile sizes, so there's a waste of space here. Um, the other kind of cache are the, uh, storing tiles in,
as blobs in a SQLite database. Uh, so you have, you only have a single file to manage, which is much, much more attractable and, and more efficient storage-wise. Uh, you can extend it to actually be able to, to give any query you want, given an XYZ return me a blob for,
for this tile. Uh, so that's how you can also support the MB, MBTAS format. Uh, and so it's efficient in space. Uh, there's also something to, to, uh, detect blank tiles. In that case, you'll only store the, the RGBA quadruplet instead of the
image data, and then dynamically create a, uh, pallet at PNG from this and just inject the, the RGB header inside the, the PNG header. Uh, disadvantages, uh, it does need some tweaking if you overshoot some default SQLite limits for caches that are more than around a terabyte.
So there's some, some configuration options you have to pass to SQLite when you're opening databases larger than that. Uh, I don't know what the exact limit is, uh, maybe around one or two, two terabytes. And, uh, concurrent insertions are also quite slow. Uh, SQLite isn't
made for concurrent insertion. So if you have multiple seeding instances, then you're going to hit some locks that make the, the insertion. So, uh, it can store styles in the, in a memcache compatible, uh, instances. So that's very nice. If you have data that's very short-lived,
basically traffic info, uh, forecast or whatever, you know, that it's going to be valid only for a given amount of time. Uh, you can distribute that across multiple servers. Uh, memcache does the pruning for you. So the expiration and deletion of old tiles for you,
uh, the cons that be limited storage, basically because you storing everything in memory or, well, some of the memcache compatible disk backends, but, uh, that's it. A very interesting cache for people with the basically satellite imagery use cases.
Um, we store the tiles as, uh, large JPEG files, uh, which 10,000 or 20,000 pixels. And we use the internal JPEG tiled, uh, representation to store
the JPEG data inside the TIFF file. So when you want to access, um, uh, tile data, you just have to know where to hit inside the TIFF file and you get the JPEG data directly. Um, so this means you have to set up a hierarchy of files to know inside which tile you must be
hitting to forgive an X, Y, Z, whatever. Uh, but it's very, very efficient on this space, uh, because the TIFF format doesn't duplicate the JPEG header for every tile, which is the same because you're always doing this RGB tiles that's 256 by 256.
So that's only once stored once for the whole tiles inside this TIFF file. And it becomes much more tractable because you're storing basically 10 or 100,000 tiles per file. So compared to the disk cache, you've, you've divided your number of files to store by 100,000, whatever. Uh, the disadvantages with this format is that it's limited to JPEG.
Um, the TIFF, uh, specification only supports JPEG as an internal representation. So we don't have, can't store PNG data. So basically you can store data where there's transparency in it. Um, concurrent writing to a TIFF file is also problematic. Uh, so you
have to, okay, you have to lock around it. Uh, and, uh, if you update or delete tiles, then the TIFF, uh, the TIFF, uh, library doesn't, uh,
doesn't, uh, bring back the, reclaim the storage space that was lost by the, by a tile. The use case is here. So there's some very large instances of, uh, satellite imagery providers who are using this. Uh, they have caches in the order of more than 50 terabytes of data stored in, in this format with, uh, with, uh,
tens or hundreds of millions of requests, uh, per month. Um, so it's, it's rather robust and it's a very, very good, uh, format if you, if the, the disadvantages don't limit your use case.
Uh, we also have a seeder, which is, uh, has some advanced seeding functionality. So it seeds with multiple threads in case your source data, so WMS server has lots of CPU. So you, so you can fill up, fill up your source WMS. Uh, drill down mode. So you,
you take advantage of the file system caches on the WMS server. Um, you can see the particular subsets. Uh, so for a given dimension, uh, regenerate tiles that are older than certain dates. Um, and you can also see, give it to, um, restricted to a geometry that you can filter
with the OGR SQL, uh, syntax. So you can say, see to level 18, all the cities with more than some, some, some density. So, so you've preceded the areas with a lot of people and where, you know, there's going to be a lot of access. Um, there's another feature, which is
dimensions. So for a given tile, you can access this done in multiple dimensions. So typically, you want imagery and the dimension is the date and you want the imagery from 2012 or 2013, whatever. So that's used in, for forecasts. So the weather forecast is valid for a certain
date and it has been created at a certain date. So you have two dimensions playing in that case. It could be also elevation. So the temperature, temperature at a given level under the sea or whatever. Um, I have a use case here, which is the front facing, uh, Bureau of Meteorology
site of Australia, who's using that. So they have, uh, they have, uh, dozens of tile sets, each accessible for different, uh, time. So you can access the forecast for three hours, six hours, and that's the time. And this is some data that's constantly being updated
as, as new forecasts are coming in, um, um, up. And, uh, we also support specifically, uh, time dimension, uh, for, for basically satellite imagery with a given time you only
cover a certain extent. And then, uh, so one image covers is taken at a precise instant of time and only covers a small area, but then you can say to map cache, okay, create me a mosaic of all the images that were taken in 2012. And so it will fill up all the,
look into the caches for all the, the images that match this interval you're giving it and create a single mosaic. Um, there'll be much more details about this presentation and all the, the, um, other stuff that's, uh, with this, uh, this, uh, server at the presentation at three o'clock this afternoon and in the other location. So I encourage you to go there.
That interests you. Um, what's coming next? Uh, those are all possibilities. I mean, not, not all of them are definitive. So be able to use already existing TMS or HTTP caches as a
source. Use Google, uh, natively as a source for your, for your data. Uh, native. I don't know what that is. That's probably an unfinished sentence. Um, some caching data directly to
Amazon S3 or some React clusters, uh, support for UTF encoded tiles, which has come into maps, uh, very recently. So definitely want to support that. And I'm over. So
the time dimension kind of works for any, uh, time division. So you can do yearly or monthly or down to hours. So the question was, uh, is the time dimension limited in, in, uh, in its
granularity more or less? You just give it a start and end date and then you extract whatever intersects that interval. So you, it can be a year in that, in that case, your interval is from January the first at midnight to the 31st at midnight. Uh, well, 24 hours.
It's whatever you want. It supports the, the W the OGC time, uh, format. So you, you have the granularity of year, month, day, hour, minute, second, millisecond.
That's what it does. Uh, if you have many, many tiles on there, it's basically, it will become slow. Of course, if you have, uh, 10,000, uh, 10,000 data sets in that here,
you do have to, to create a 10,000, well, read 10,000 images. So, uh, you, you'd probably be better off in that case to create a cache. You can use that, but put another cache in front of it. It won't be quite that many, but maybe Stefan knows roughly the, when, when it,
when it stops scaling with, uh, 10,000 is probably you've stopped already. Anybody else? Sorry. Is there a utility maybe with the CDT or another one that one could use
to extract a portion of the cash and copy it to another storage, for instance, if you've got like an existing cash instance and you would like to export to NMB tile for a subset for an area. Yes. Yes. That's a geocache. Transfer mode. You give it two caches and it's all
puts, uh, depending, you guys did it. Does it scale? I mean, if you have, um, uh, twice more users, uh, can we put a twice more
service? It's very difficult to benchmark because, uh, you've hit your, it hits your bandwidth limits long before you hit the CPU limits. So
it's scaled to your bandwidth. It fills your bandwidth. So, um, basically run running as an engine next module. You can, you can, the network bandwidth. Yep. Network bandwidth, uh, running as an engine next modulates can without problems, serve up to 80,000 times a second, which is probably bandwidth in the
order of multiple gigabytes per second. So, and, and even in that case, the CPU's aren't, it's 100%. Um, yes, I have. Um, seeding in other cases and in network access also,
I don't like to include, uh, I, I used to include, uh, benchmarks in the presentations. I don't like to do it now because, um, I don't find that benchmarks made by a single person are very valid. So if it's a team effort with, uh, the other tile server teams working on
it, then that can be an interesting stuff with, uh, having one person from one team benchmark, different, uh, different implementations always makes the one person server be the fastest. Okay. Other questions at the end of the, uh, data set that I've come across use cases where
it's nice to be able to access that by. So I was wondering, is there any plan to
choose something like a, uh, a GEDAL drive, sort of similar to an NCDTF, uh, container that lists the data sets that are in there and then kind of work on them and extract them?
So, uh, VRT that lists multiple, multiple data sources, for example. Um,
well, Google has some, some mini drivers for TMS. I don't know if it has WTS also and WMS. It does. So you can point it to the, to the capabilities documents
of, of your map cache instance. And in that case, uh, it's all, it's all see the cache as a, as a data source through the network. But, uh, I don't know the answer to that question.
Uh, if you, if, if you have some bugs, then, then do report them. But, um, um, uh, I don't know. I don't know. You can definitely do it in QGIS. Uh,
you can open WMS from map cache and ArcGIS with WMS. I don't know. What version of WMS do you support?
1.1 and 1.1 and, uh, stuff to do the coordinate swapping for 1.3, but not the full one point. Well, it's not a fully WMS compliant server because you don't do
all the get style stuff, maybe the SLD, uh, passing or whatever, but basically for get map requests, you can go up to 1.3 and it's all, it knows what CRS is, uh, inverted, uh, to, to, to understand that and return the correct time for, for three to six.
Last question. Maybe some people have done it. I haven't. Uh, it's supposed to work. Yeah. Yeah. Yeah.
With the new CMake stuff now, I don't know. Um, but, um, it's, it's supposed to, and if it doesn't, it's a bug. So, um, yep. Okay. Thank you.