We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Why is popularity the biggest enemy of WMS?

00:00

Formal Metadata

Title
Why is popularity the biggest enemy of WMS?
Title of Series
Number of Parts
266
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
FOSS4G 2023 Prizren The Web Map Service (WMS) is the most popular standard of sharing data remotely. It is commonly used as a basemaps, a way of presenting governmental spatial data, and as a data source when creating vector datasets. Creating a WMS requires original data to be read and then rendered. This process can be slow, especially if the source data is heavy and not optimized. This is the case, for example, with Sentinel 1 global satellite data, which is a collection of daily revisions with a total volume of 250 GB per one day. Here we demonstrate an efficient way to share such a very large data set as WMS using Mapserver scaled with Kubernetes. Mapserver is used as engine of our WMS, because of it speed and ease of automation. In order to optimise the performance of the service and therefore the user experience, it is recommended to store the data in the right format, with the right file structure also being aware of limitations of storage, bucket or disk read speed. GDAL provides a set of options that can be executed in a single command to overwrite the original data with new, cloud optimized. It is usually good practice to store selected zoom levels as a cache, but for time series data that is enriched daily, the cache is not overwritten as new data arrives, but is incremented. Despite its popularity and advantages, WMS as a standard of serving data has its limitations. The potentially large disk read time is multiplied by the number of users sending requests. Tests using JMeter (100 users sending 100 GetMap requests in a loop) have shown that on a relatively strong processor (32CPU), the greatly increased traffic acts as a distributed denial-of-service (DDoS) - the server stops responding. This problem is solved using Kubernetes (K8s) which allows metric-based automatic horizontal scaling of containerised applications, in this case – Mapserver. Prometheus as a K8s cluster monitoring tool allows custom metrics to be defined e.g., number of http requests per time interval. Prometheus makes it possible to distribute the traffic between newly created pods so that all requests can be answered. The aim of the talk is to stimulate discussion, confront the idea with experts and demonstrate good practice in creating a publicly accessible WMS, with a focus on optimising speed under heavy source data conditions, supported by a working example and statistics.
Computer animation
Category of being
MiniDiscDependent and independent variablesVisualization (computer graphics)Service (economics)Front and back endsRepository (publishing)Metric systemDivisorScalabilityThermal expansionSource codeMereologyElectronic visual displayComputer fileMappingLevel (video gaming)Structural loadChemical equationVirtualizationServer (computing)Computer configurationData storage deviceMetreMultiplication signHorizonImage resolutionSoftware repositoryMiniDiscWeb 2.0MetadataRaster graphicsLastteilungPolarization (waves)Set (mathematics)Subject indexingNumberMoment (mathematics)Computing platformSource codeMatrix (mathematics)Configuration spaceSocial classObject (grammar)CASE <Informatik>Variety (linguistics)Sound effectKey (cryptography)Zoom lensScaling (geometry)Front and back endsComputer animation
Simultaneous localization and mappingArtistic renderingResponse time (technology)AverageStructural loadMusical ensembleSoftware testingMetreMultiplication signComputer animation
Computer animation
Transcript: English(auto-generated)
My name is Martin Nemeski, I'm a junior data scientist at GloFero and today I would like to talk to you about why the popularity is at least a big problem when it comes to serving data through WMS.
So, let's start. So, maybe let's try to define problem before solving it. When I was working on our Eo4Ua initiative, which aims to serve the spatial data related to the web.js platform,
I realized that the OpenStreetMap base map, WMS, which I was using, works better at specific hours, and more precisely, it works better when everyone is asleep at the middle of the night. It helped me to define two problems, that WMS is not capable of managing a huge load of requests,
and it struggles when there is big traffic on reading the source data.
So, I tried to take this use case and recreate it using the possibilities created by my company. So, I took the satellite imagery, I took the EoData repository, which is over 35 petabytes of data, on S3 storage with server backend, it's over 10,000 of this, and it solves the problem of reading the data,
at least when it comes to the satellite imagery stored on the efficient storage with server backend. The second problem, the huge load of requests is solved by deploying map server on Kubernetes cluster,
with defined auto-scaling. So, how does it look on the Kubernetes side? The happy user sends a request, which hits the Kubernetes cluster,
and more precisely, it hits load balancer. The load balancer request is redirected to one of the nodes. And what's important, the number of nodes depends on the number of GET requests hitting the cluster.
We define it using the Kida, Scaled Object Definition, and the dramatization matrix. So, and from there, from the node, there's only one node per node to maximize the effectiveness of map server,
and from there, the request goes to the set matrix and hits the source data, which is CLG, or JPG 2000, which is not so cloud optimized, but about this in a moment. And how does it look on the map server side?
So, map server is using the super tool called Time Index, which is the spatial index for raster layers. It's a footprint vector, storing the path to file, and its metadata, like polarization for Sentinel-1,
or cloud cover for Sentinel-2, also time is possible to use. And we are using the virtual storage infrastructure by Jida to hit the S3, and from the S3, we are using the COG files, or JPG 2000, which is, like I said, not so optimized,
but offers the variety of options when it comes to spatial resolution, and within the map file, which is a configuration file for a map server, we are able to create a pyramid from it. So the data for 10 meters resolution is only requested for really high zoom levels.
And yeah, I ran some tests, I did run some tests using the JMeter load tool, and as you can see, the throughput increases with time,
the cluster is scaling, and even though the throughput is so big, the average response time remains stable and even decreases. And here, unfortunately, this is a little bit blurry, but you have to believe me that this is the screenshot of a horizontal port-out-of-scale definition from Kubernetes.
And thank you.