Tuning Open Source GIS Tools to Support Weather Data / Rapidly Changing Rasters

Video in TIB AV-Portal: Tuning Open Source GIS Tools to Support Weather Data / Rapidly Changing Rasters

Formal Metadata

Tuning Open Source GIS Tools to Support Weather Data / Rapidly Changing Rasters
Title of Series
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Open Source Geospatial Foundation (OSGeo)
Production Year
Production Place
Portland, Oregon, United States of America

Content Metadata

Subject Area
The National Weather Service is developing several geospatial forecast visualization and analysis tools. The back end data store and WMS server is built on Open Source GIS tools: GDAL, PostGIS / Raster, Mapserver, and Mapcache.Weather forecasts are in a constant state of flux. In the case of the National Digital Forecast Database, forecasts expire or are superseded every hour. This presents several challenges when it comes to managing forecast rasters with GIS tools, and delivering the most up-to-date, real-time forecasts with an acceptable level of performance. This presentation will examine the methods and practices we've used to optimize our data store performance, from data ingest to forecast analysis to image delivery.* Using PostgreSQL Inheritance / Parent and Child tables to manage raster updates inside the database* Managing an up-to-date image cache in Mapcache and Memcached, with rapidly changing source data.* Optimizing PostGIS raster tiles and Mapserver DATA queries for faster image generation and display over Google Maps* Future work: Expanding PostgreSQL Inheritance to work with raster overviews
Keywords postgis raster postgresql gdal mapcache mapserver memcached weather performance tuning
Density functional theory Server (computing) Open source Mapping Computer file Software developer Digitizing Projective plane Branch (computer science) Database Mass Revision control Medical imaging Computer animation Office suite Musical ensemble Resultant
Point (geometry) Addition Group action Dialect Pixel Server (computing) Constraint (mathematics) Theory of relativity Inheritance (object-oriented programming) Mapping Tesselation Image resolution Multiplication sign Database Bit Computer programming Element (mathematics) Medical imaging Cache (computing) Computer animation Volumenvisualisierung Contrast (vision) Address space Tunis
Group action Randomization Multiplication sign Sheaf (mathematics) Port scanner Set (mathematics) Parameter (computer programming) Correlation and dependence Medical imaging Estimator Semiconductor memory Different (Kate Ryan album) Physical system Mapping Computer-generated imagery File format Stress (mechanics) Shared memory Sound effect Maxima and minima Database transaction Flow separation Sequence Entire function Connected space Process (computing) Buffer solution MiniDisc Right angle Resultant Sinc function Directed graph Web page Functional (mathematics) Server (computing) Computer file Random access Drop (liquid) Goodness of fit Operating system Selectivity (electronic) Tunis Default (computer science) Consistency Database Multilateration Subject indexing Cache (computing) Spring (hydrology) Computer animation Software Query language Table (information)
Medical imaging Default (computer science) Multiplication sign Electronic visual display Line (geometry) Conformal map
Point (geometry) Multiplication Computer file Open source Information Real number Projective plane Word Process (computing) Computer animation Term (mathematics) Electronic visual display Data conversion Conformal map
Point (geometry) Metre Addition Default (computer science) Pixel Computer file Mapping Divisor Tesselation Keyboard shortcut Coroutine Insertion loss Database Tessellation Revision control Medical imaging Subject indexing Computer animation Query language Personal digital assistant Computer configuration Single-precision floating-point format Artistic rendering Pattern language
Metre Satellite Server (computing) Multiplication sign Image resolution Mass Order of magnitude Number Medical imaging Different (Kate Ryan album) Forest Cuboid Energy level Extension (kinesiology) Metropolitan area network God Covering space Mapping Tesselation Weight Database Tessellation Process (computing) Computer animation Software Order (biology) Right angle Near-ring
Area Image warping Computer animation Computer configuration Tesselation Image resolution Right angle Line (geometry) Extension (kinesiology) Resultant Entire function Row (database)
Metre Trail Server (computing) Computer file Sequel Parity (mathematics) Multiplication sign Image resolution Sheaf (mathematics) Drop (liquid) Mereology Dimensional analysis Element (mathematics) Medical imaging Green's function Cuboid Extension (kinesiology) Mathematical optimization Constraint (mathematics) Inheritance (object-oriented programming) Mapping Tesselation Electronic mailing list Database Database transaction Timestamp Band matrix Subject indexing Process (computing) Computer animation Software Query language Network topology Moment <Mathematik> Musical ensemble Table (information)
Server (computing) Group action Ferry Corsten Multiplication sign Mass Parameter (computer programming) Computer programming Dimensional analysis Automatic differentiation Number Medical imaging Frequency Exclusive or Scripting language Binary decision diagram Constraint (mathematics) Mapping Computer-generated imagery Tesselation Structural load Database Limit (category theory) Connected space Radical (chemistry) Cache (computing) Graphical user interface Process (computing) Blu-ray Disc Right angle Table (information)
Vacuum Density functional theory Game controller Server (computing) Computer file Multiplication sign 1 (number) Set (mathematics) Drop (liquid) Dimensional analysis Medical imaging Pairwise comparison Mapping Weight Moment (mathematics) Polygon Database Line (geometry) Interface (computing) Cache (computing) Word Process (computing) Exterior algebra Query language MiniDisc Right angle Table (information)
morning everyone I'm in chemistry I work at national service you will develop laboratory say on talk will about tuning open-source GIS tools to support the data and rapidly changing branches and MIT and by readily changing Rastas I mean Rasta data that can change every hour and anything you may have rendered from previous versions are instantly obsolete so my projects is the national digital forecast database celebrated forecast which were prepared by 122 forecast offices across country we music them together and deliver them as grid files a couple years ago we want to put the in DFT maps on the 2nd regular face and produce the images on demand and the result was a w mass powered by open source geospatial database that was as to some interesting matchups like how many people are expecting were before the snow or some other interesting things that you may have seen in Johnson will CDD demonstration yesterday but more about the national digital forecast
database we just went operational with 2 and a half kilometer resolution data over the contiguous United States and that's about 3 million
pixels per forecasts and ends up being about 12 megabytes if you put all and floating point we have our forecast to 36 hours 6 Ali's out 7 days and 11 elements that have this hourly data temperature 2 points parent temperature relative humidity that kind of thing we also have additional tropical weather and severe weather groups all told it's about a thousand different grids over the colonists an additional we have some smaller grids for Alaska flies go on the other regions so all told we have about 3 thousand different addresses in the database at any 1 time and every hour any or all of them could be refreshed so the challenges with this the 1st hour time constraints if we're getting new forecasts every hour you wanna publish them in a timely fashion we don't want it to a brand new forecasts and not have it available to the public for another 20 or 30 minutes so might be able to publish it as quickly as possible delivering new images real-time we don't really have time to proceed entire cache of all of our weather data so if a user hits an uncached image laughter rendelet render as quickly as possible the the and managing the cache tiles I we don't wanna keep redrawing maps every 5 minutes the data hasn't changed and by contrast we also don't want liver all images if the forecast has changed so we can't just set of 5 minute cash expiring and how that be good enough we have to solve all these programs with our stack is problems with our stack of POS Crespo's GIS GDL map server map caching memcached so start with a little bit of post-stressed tuning because if
post-processing coming along not it very far mn again be displayed in our results and the beauty of our weather data is a we can recreate from preexisting group files so that means we really don't have to care about consistency the database it's easy to reload and we don't really care if anything hits the disk in a timely fashion we can keep a all of memory so the 1st thing we do will make efforts towards postponing ologist activity that happens on POS PostgreSQL secondly we wanna make sure the Query planner is preparing is preferring index scans to sequential scans and lastly we wanna make sure we can handle all requests for getting from a map so these are all speed some of the tuning parameters on POS stress that off as you want it after shared
buffers ideally we want buffers to be big enough to keep our entire database in memory that way or not pushing anything after our updates of the don't have that that ideally just keep enough room of shared memory to account for all of our new rashes and reminder doesn't have to be that they because the rafters end up in a compressed data format anyway so it may be 12 megabytes for 1 rational floating point but inside the compressed page it may take about a mn so it may not be as huge as you as you might think and lastly the caveat is if you have a large amount of shared buffers actually decrease to drop table performance when post breast drops a table it'll scan through everything you haven't shared buffers to see that belongs to that table so you can actually have too much of a good thing especially if were adding and dropping hundreds of thousands of tables which we are doing we can we scan through thousands of times to memory it ends up taking a lot time S. Singh parameter that applies to the right ahead transactional not to tell the database whether or not they should use a sink to push the transaction log out to disk now if we turn that off we get a pretty good performance benefit every time updates but also risks the database inconsistency if there is a system crash out since our data we can recreate from the files this doesn't really matter so much to us we can go ahead and take the performance benefit for the trade off of the risk checkpoint segments and checkpoint time out of the checkpoint is when a database takes everything that's dirty in shared memory and and it's it's different from what exists on your desk and pushes it back out to disk if you increase these values will delay the audio activity and I'll help everything especially heavy database lows us gonna run more quickly a checkpoint segments at the false 2 3 3 different 16 megabytes sections of the transaction log we can easily push out in more we can write a lot of transactions and just wait for that to fill up then push all dirty buffers out disk checkpoint timeout default is 5 minutes sets if you haven't had a checkpoint within the last 5 minutes will go ahead and start pushing all the written pages from memory back out to disk since we're only expecting updates once every hour we can get away with saying this is high 60 minutes we don't expect to have any new data in between that time and a secondary benefit from selling at that high is post press will try to write all the shared memory in half the time it takes to get to the next checkpoint value so if were we as 60 minute interval between 1 checkpoint next were lying post 1st take half an hour to write all the things we've just changed from memory accident so that spring at the I elected your time is going to help the database perform better lower entering images for the effect of cache size is the 1st 1 is going to help the Query planner select index scans this is a it's an estimate of how much memory your system will be able to use for the database between shared buffers any operating systems catch the question is will your data be in memory a free planner this assumes Europe is going to be a memory it feels better about using index scan next day is more expensive on disk because it's a bunch of random access is a set of a great big sequential all at once so we're going to the effort of creating a spatial indexes you wanna make sure hitting them so this can be a large and the from a big system anyway probably have several gigabytes of memory available so that's a standard large next to the from a random page sequential page deserve actually arbitrary values they don't mean anything specifically but the combined the ratio of them tell so tells a agree planner how much extra is going to cost to fetch indexes from disk randomly in a sequential scan from disk and to foster 41 I don't we set the between the 2 and 1 as folly can be good for most systems doesn't really take 4 times longer on most of our good disk systems to fashion in does affect sequentially but even he's so good values it's not going to prevent some sequential scans from happening corre fire may still decide a sequential scan is better as we know better post press provides a couple ways to avoid that you can set some of the tuning parameters for the Query Planner either in your session or right and function so the yellow highlighter and function we set enables sequential Stanton so now it's going to avoid a sequential scan at all costs you more likely to hit the indexes we agree the and lastly for POS tuning our Max connections and the reason why our tuning this is because MAB server with fast EDI compiled into it is going to hold open connections to the database but for sure he never quite reuse them so we were watching a vestige of process and if it's allowed to process 100 images is going to create a hundred different connections out so the database and keep them open never reuse them and so they start stacking up so we'll try to fix that Fast CGI a little bit later but right now we run of 2 thousand connections as they good for us that if you really need to run a lot more than they should consider using the PG pull or some other connection pooling software out there so now we have a database that's pretty much ready to do our Rasta shores and move on to tuning the
display so most of us here I assume reading some Google Maps appear spherical Mercator 38 57 and we decided that it was best to convert our dataset 238 57 ahead of time before we started the drawing natural covered for you can we project on the fly but that's a scanner takes a few times and if we avoid that now at delivering our images to the customer more quickly the
so sure enough Lambert Conformal Mapping but Holly any of our data is actually in 38 57 by default so it is a line up very well and here's an
example of what we're using Sergio work you know info shows us are made of projection source J. Lambert Conformal Our target is the 38 57 so what we do is we convert everything on a RAM drive to the floating-point file just for simplicity's sake the best multi here terms on a multi conversion routines IGL work from that be seen any real performance benefit to using that think word running so many these processes in parallel they work it may not but at this point
our dynasty into our display it's lined up nicely anything else on the door the we get
to tiled images in our data tiling base maps the base factor delivered and 256 by 256 files Norway from Google OpenStreetMap In some cases it makes a lot of sense to tile our dataset as well especially for running a massive like this but in addition to image rendering we also run queries against single points to plot for past values may be able see some the temperature is plotted in that map to the queries against individual pixels and those having to run a lot faster if we have smaller tiles the index is more specific that point but the image rendering was better with larger tiles so 256 by 256 it's kind of a nice compromise to get the best performance of both worlds the the aniracetam PG SQL
well the tool it'll tile everything for us that that's the highlighted in give us are 256 by 256 tiles and look at what is a database couple the other options up there the important that that capital why users copies and it inserts as can be a lot faster for bulk inserting of our data and that p is a new feature that used to be the default used to make all tiles the same size and would add no data on either side of it and new versions of prosody features to all the verses of POS JS you have added that Pete for that pattern back that what you add constraints like regular blocking and that allows other Cheadle routines and pose serious radically teens to take a few shortcuts to run a little more quickly I'll take a quick look at what GeoInfo produced from the no other options with a pixel size of 3 thousand 114 meters and will take note of our left coordinate 14 million West so this is what our tiled dataset looks
like the without any other tweaking that's what we get in red boxes represent our 256 tiles the problem is is a light at all with the resolution of the base map so if I have blue boxes to represent the image tiles from near assume levels that we're going to drop for 357 that's a mass so I know there's a problem we're like in network throughput between the database in a map server we saw upwards of 100 megabytes going across the wire to draw just 1 man and even if we're sending wrathful to process should be much more than 25 megabytes so here's my this is happening this blue is 1 of our image tiles and it's intersecting with 4 different data tiles that we've just on our database so the drive that all forest tiles have to come across the wire the draw just that 1 little tiny box the animal taken a look at this time around the center of the country the that red box intersects with 9 different image tiles so all 9 of those images are going to request that violence and across the wire so work quickly approaching an order of magnitude problem if we don't alignment our data tiles to requesting on the Net so
this we satellite everything base map the so here the resolutions speaking we took those right out of Manchester out XML way too much precision so take the resolutions in 6 is the closest 1 at 24 45 dB when we have to what Jeter produced In tumor 56 by 2 56 tile that's as the size of so many meters across the whole so if we divide the extent Our 40 mm west by that value or a little more than 23 tiles West asthma as a prime meridian so we have to May God send large enough to cover 24 boxes and at 15 million number on the bottom right is on the west extent we want our data set so here's a new g
. worked and is for a target extent and TSS for our target size so we know how many of our tiles creating so that's the line up perfectly GL were also has an option for target resolution but I found that if I just give it the extent and the size of a method we want that gives us a little more precise results and just trying to say I want 44 45
so everything's nice aligned but only do the warping in stretching it introduces a few extra no data areas so if we have entire rows or columns and no data we can just meet them off by adjusting the targets extends the sizes accordingly so the whole northern stretch we can get rid of that now wholly search treachery inhibitor that want to calcium given the western 1 but if that little tiny East spec data so keep that in so no aligned everything is good right
and not quite so it turns out we made the problem worse by doing this because had we all the boxes that surround that extend the technically intersect they have the same line before so it is an easy way to fix that then we
fix that in the map file's data section the the not part of highlighted is the Rastas bounding box intersecting with a little shopping on below we get the bounding box from our image requests orderliness request and tree gone below this just taking those coordinates and bringing them in by 2 meters each and we can get away with doing that might include you we can get away with that because we know what files are lined up with the images that were requesting so now we make sure that we only grab that 1 single box instead of all the torques around it so even by getting a slightly larger datasets by spending a resolution or extents we saw network bandwidth between the database server Nasser go down by 60 to 75 per cent this just by lining up the day tiles to what we're requesting in images and for the rapidly changing weather data so we have we keep track of 2 timestamps for this 1 is a valid time which is temperature is valid at noon today noon tomorrow that's are what time before passes dollar for the arrested drawing really only 1 select 1 dollar time time the humans time what time the forecast prepared we want the most recent 1 at all times so database not really care so much about the issue is time but it will be important in the image cash in the top cash so valid time is actually ideally suited to using POS stresses inheritance so instead of Russia bands we use tables in child tables and you can think of it almost like you would a map cabinet to the pair table represents our full collection for the most recent maps for 1 element temperature per say and each individual about time live in 1 the child child tables and 1 of the draws typically we keep the pair table empty and all the actual data is going to go with the child tables when 1 add new forecaster the parity all we do is create a new table loaded and it's a simple alter table inherit to added to the collection the and if we have an whole forecast we do alter table known here to take it out of the collection and this happens to give us a couple benefited a lot faster than drop table so we we do this market to be dropped later so we were no longer doing any more important work mainly concerned separate process to start dropping all the tables that we took out of our official table and there's a sequel or it's pretty easy uses of no and this 1 for the act constraints clusion ever get to talk about that but it's essentially a query optimization it instantly narrows in which child table has the data you want the user check constraint to do that this is the query for it query the so we have a green highlight list of our time is now now has a check constraint on we guarantee that everything in that table has a about time and 1 transaction the inherent in the table and no here at the old table importantly we didn't use an update on the parietal we did use delete on the parent table we don't have any of those costing us and we'd have to rewrite any indexes which is also a time consuming process so the W-mass query because at 1 dimension to get away a requests lot of our time the and that file sensually stays the same we're always querying out of the parent table we just put the valid time in the where clause and knows exactly to go right for that to or so if we have something like 70 different tables
for not using the constraint exclusions will query all those tables and monitoring on where we have a really getting wanted the so as we get work with the issuance time and we put this as a dimension in map catch his example exactly how we did it and itself as deliver only on newest mass but introduces 2 different problems 1st we need to the user to request exactly the correct issue time out of that cash any doesn't really know what that's going to be and also introduces an opportunity for cache poisoning so imagine a scenario birds but you have a malicious user any requests map with issue it's time in the future and therefore that the images isn't there yet so it receives a request that's the maps are a natural draws the image but it doesn't care that the issue about the issue is time the tissue draws whatever's there and returns it back to the cash so no matter which has this image stamped with the date in the future when you get a forecast that's actually issued at that time now was the return you your all image we don't wanna let that happen so our solution to put a little PHP script in the middle which doesn't let the user requests and issuance time instead it finds it and for the correct 1 from the database itself and will store that in men caches well we will give an expiration time about 1 minute so the idea there anything that hasn't changed we can set we set all or image tiles as long as we want the and if anything is a new forecast within 1 minute we're going to find the correct issue it's time and 1 last thing from map Server This is the Fast CGI parameters that map server the as the first one we wanna look at his next request for process now we want limit this 1 because this is the 1 is holding open all the database connections so ideally we would have after reusing database connections but it doesn't so under heavy load we might run up against pose stresses mass connections so we sell this CDF process exit after handling a relatively the smallest number of requests say 60 years so a limit to how many of these we can watch it 1 time and the 2nd 1 we're going it is the terminations for this is a responding throttle which has noble intentions of the program is repeatedly exiting in responding something is probably wrong with your program that now we not good so every x and watch ads to running score inside Fast CGI if you exceeded by watching too many processes you have to wait a period of time before actually while another 1 to watch Mr. doing something like seeing filed cash ever limiting how many connections were allowed to have maps around August start running into this limit so the idea here is to set the termination minus 1 as could turn off and this can allow map server to keep relaunching itself without any delays the so clear couple of voices Region C is
back in action while the 1st is our new graphical user interface for in DFT and 2nd 1 is the BDD which often will demonstrate yesterday consider any questions me thank B. the by them a wondering but with all the rights that you're doing and you're doing them in bulk um do is this auto
vacuum keep your database size under control or do you have the DVD manual vacuuming a we never really run into backing problems because we're not updating any tables were not to leading rose from many tables what we're doing is word is treating new tables in dropping a table so what vacuum never really comes into play everything is handled by the checkpoints the the tables ever ever brain have you considered uh storing files just on disk as opposed to are in the database servers are in particular reason why they need to be annotated data is too scarce to see if you do research the alternative we looked at that of 1 the benefits I like from having the data right in the database is being able to run funny the geospatial queries against we haven't DFT so that's the the idea behind what's what's draw a line was drawn polygons see what's going on inside there was allies what's happening with and I think that's what is better off doing that inside the database the formants of serving it out have you done any comparisons and we have any comparisons that who so the serving is maps of the DBMS idea the interface right it is that the we have under underneath this the first one the graphical 1 there's a WMS . php and that's what the user can use so did you look at me time having solutions that will help you quicker and also what so what this process that use since he said you're getting data pretty rapidly yes it so how are you keeping up with the processing so that it's ready in time so that you know it's a place the cash that is old set on automated over Boigny caches in their old what's happening in w must PHP is were setting a new issuance time reflecting the new issue it's time that we have in the database so that are mapped cache dimension has that new issue it's time were not actually requesting an old image that sitting cash your the question was the first one to repeat that we have and I love that using the net cash cedar but at the moment we don't really know which we don't have a good idea of which ones we should be seeing you all also being drawn on the fly and as a user request and then they go into the cash the drop pretty quickly so of the whole or whole database ends up being will over 3 gigabytes 3 4 gigabytes With all compressed and yeah would you ever answer old forecast would I threw a what as another guy I mean I guess if people are interested in how was it you know 12 hours ago in essentially 1 the things I'm interested in 2 is how is how the forecasts change over time so now we should forecast for high temperature for Thursday how is that progressed of what what we could do with this is as we're applying the no inherit the taken out of what's official and start storing forecasts so we could do that and I'm interested in doing that yeah prejudice for FIR your own internal consumption for internal consumption that we could do that but we don't do that in this right now but it's 1 the things that I would definitely like to put into it the thank you thank you