Serving high-resolution sptatiotemporal climate data is hard, let's go shopping

Video in TIB AV-Portal: Serving high-resolution sptatiotemporal climate data is hard, let's go shopping

Formal Metadata

Serving high-resolution sptatiotemporal climate data is hard, let's go shopping
Title of Series
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Open Source Geospatial Foundation (OSGeo)
Production Year
Production Place
Portland, Oregon, United States of America

Content Metadata

Subject Area
The world is a big place and time is infinite. Scientists who study any aspect of the Earth's climate are immediately faced with the exponentially growing amount of data that are required to represent properties of the climate in both time and space. The bulk of these data is a substantial barrier to extracting meaningful information from their contents. This barrier can be prohibitive to smaller-scale researchers and communities that want to study and understand the impact of the climate on their localities. Fortunately, a substantial amount of free and open source software (FOSS) exists upon which one can build a great geospatial data application.The Pacific Climate Impacts Consortium (PCIC), a regional climate services provider in British Columbia, Canada, has been making a concerted effort to use geospatial FOSS in order to expand the availability, comprehensibility and transparency of big climate data sets from the Coupled Model Intercomparison Project (CMIP5) experiment. With a full stack of geospatial FOSS and open protocols we have built and deployed a web platform capable of visualizing and distributing high-resolution spatiotemporal raster climate data.Our web application consists of:+ back-end storage with raw NetCDF4/HDF5 files+ a PostgreSQL/PostGIS database for indexed metadata+ ncWMS for maps and visualization+ the PyDAP OPeNDAP server for data requests+ a web user interface to tie it all togetherThis presentation will provide a case study for enabling scientific collaboration using FOSS and open standards. We will describe our application architecture, present praise for and critique of the components we used, and provide a detailed discussion of the components that we had to improve or write ourselves. Finally, though our use case is specific to climate model output, we will provide some commentary as to how this use case relates to other applications of spatiotemporal data.
Keywords spatiotemporal opendap big data netcdf hdf postgis climate
Medical imaging Image resolution Spacetime
Metre Dialect Key (cryptography) Observational study Mapping Information Ferry Corsten Multiplication sign Zoom lens Sound effect Planning Menu (computing) Line (geometry) Flow separation Vector potential Data mining Digital photography Order (biology) Universe (mathematics) Window Physical system Vulnerability (computing) Spacetime
Dialect Decision theory Forcing (mathematics) Projective plane Sound effect Streaming media Mass Degree (graph theory) Digital photography Bridging (networking) Touch typing Business model Office suite Local ring Resultant
Domain name Dialect Scaling (geometry) Information Coalition Variety (linguistics) Image resolution Physical law Expert system Physicalism Sound effect Semantics (computer science) Process (computing) Term (mathematics) Personal digital assistant Forest Business model Computer science Cuboid Local ring Fundamental theorem of algebra Row (database) Physical system
Satellite State observer Server (computing) Computer file Multiplication sign Open set Mass Mereology Dimensional analysis Information technology consulting Product (business) Number Web 2.0 Fluid statics Latent heat Measured quantity Natural number Different (Kate Ryan album) Business model Software framework Area Domain name Airfoil Dialect Weight Projective plane Expert system Data storage device Bit Type theory Uniform resource locator Numeral (linguistics) Process (computing) Integrated development environment Self-organization Right angle Communications protocol Spacetime
Enterprise architecture Dependent and independent variables Server (computing) Observational study Demo (music) Multiplication sign Weight Mass Streaming media Flow separation Metadata Web 2.0 Visualization (computer graphics) MiniDisc Cuboid User interface Diagram Physical system
Polar coordinate system Server (computing) Computer file Ferry Corsten Multiplication sign Function (mathematics) Computer Vibration Subset Fluid statics Component-based software engineering Bridging (networking) Different (Kate Ryan album) Single-precision floating-point format Representation (politics) Selectivity (electronic) Vulnerability (computing) Area Overlay-Netz Dependent and independent variables Information Mapping File format Sound effect Web application Network topology Video game Right angle Remote procedure call
Web 2.0 Web application Code File format Telecommunication Translation (relic) Software testing Database Line (geometry) Metadata
Greatest element Group action Code Multiplication sign Range (statistics) Open set Function (mathematics) Data analysis Mereology Dimensional analysis Usability Web 2.0 Fraction (mathematics) Medical imaging Mathematics Web service Different (Kate Ryan album) Single-precision floating-point format Electronic visual display Extension (kinesiology) Enterprise architecture Electric generator Mapping File format Electronic mailing list Bit Staff (military) Variable (mathematics) Degree (graph theory) Arithmetic mean Process (computing) Repository (publishing) Order (biology) Freeware Implementation Server (computing) Computer file Open source Connectivity (graph theory) Streaming media Mass Graph coloring Number Attribute grammar Revision control Latent heat Goodness of fit Touch typing Ideal (ethics) Data structure Home page Dependent and independent variables Demo (music) Weight Projective plane Database Line (geometry) Limit (category theory) Word Software Visualization (computer graphics) Interpreter (computing) Communications protocol
Execution unit Function (mathematics) Open set
Operator (mathematics) Perspective (visual) Metadata Number
Trail Server (computing) Computer file Multiplication sign Range (statistics) Function (mathematics) Streaming media Disk read-and-write head Perspective (visual) Dimensional analysis Attribute grammar Derivation (linguistics) Core dump Business model File system Energy level Extension (kinesiology) Data compression Domain name Dialect Mapping Tesselation File format Weight Electronic mailing list Analytic set Planning Sound effect Database Angle Personal digital assistant Universe (mathematics) Point cloud Quicksort Fiber bundle Near-ring
Scaling (geometry) Projective plane Business model Energy level Number
this looks like a 0 here so that when it started comes my name's james here 10 can be talking today about serving high-resolution space so space know temporal data on the web and multiple just the subtitle there's not actually can be any shopping involved in this talk of images that I guess I wonder perpetuates stereotypes that canadians come down to the US to go shopping but
so I mean I'm here from the Pacific climate impacts consortia and and named key so a nonprofit based out of the University of Victoria in Victoria British Columbia and our mission is to bridge the gap between academic climate science and applied policy regarding the impacts of climate change so that the study of global climate change our requires large spatio-temporal climate simulations and is climate simulations that provide coverages over both time and space so well discussed today is how we built a system to serve these large datasets tersely stakeholders of but will it's terrible and start the about myself and it's not much on so we got a little map here once zoom menu meal actually see something hopefully the projectors through but the the it the to do that now and so I he taken our stakeholders work to address vulnerability to climate change but also what does that
really mean so I a I live on an island on an archipelago called hiding y and some small coastal community you see at the time of something or of and so are grocery trucks come in once a week by ferry and they're they're easily disrupted by poor weather much of my communities population of 1st nations and everyone relies very heavily on the ocean for subsistence so comitological disruptions very directly cause socio economic disruption on my house along with many others of about 100 meters from high tide line and chancellor map again but others a coastal marsh next to us so and there's about a 1 to 2 meter tied seperates low tide from high tide a couple kilometers of the March of so you can imagine what the effects of another meters sea level rise might be good in the community so in communities like mine about the potential effects of climate change a very 1st order in very directly experience more least it's easy to imagine the but but in of larger communities in 1 2nd here but in and in larger communities also vulnerabilities so Victoria where he cake is based in Vancouver both susceptible to the effects of sea level rise of so this photo here is my plane on the way down here landing in Vancouver and you can see both the edge of the runway and Tidewater in the same picture frame likewise that you'll see this as well another larger seemingly invincible cities like San Francisco New York or like the house in this photo in the Chesapeake Bay not far from Washington DC and that house isn't there anymore so in 2005 the province of BC recognizes vulnerability to climate change by creating the PKK via an endowment to the university of Victoria and charged us with the mission of bridging the gap between the academic climate science and the research community but academic climate science research community and regional users of climate information of us
we conduct credible peer-reviewed research on the effects of climate change in BC and work with stakeholders touches provincial regional and local governments to make use of that research so our stakeholders use climate
projections answer a wide array of pertinent questions and to make policy decisions in engineering decisions for incredibly expensive infrastructure based on results of our impacts models so examples include a whether future river flows can support hydropower whether future storm intensity will necessitate larger culverts storm drains are bridges and to what degree sea level rise might inundate our homes and farmland In this photos few minutes walk from my office and how loss of at the glacier melt might possibly shear mass might affect our streams rivers both Tidewater Alpine glaciers or whether force might become more susceptible to fire for outbreaks of disease yeah so to answer these kinds of
questions requires us to have information a very small scale for high resolution and as applied to climate data high resolution is kind of difficult to define for impacts models so users typically require landscape-level information a better so high is a relative term that usually means high seeking get from but allow me a brief interlude had to 1st explain were climate it comes from and how 1 gets from global physics to landscape-level information in local effects it and so it's so that's a
long and data-intensive processes are process and requires the expertise of planners statisticians were climate scientists statisticians domain experts such as forests Forster's and hydrologists and and computer scientist such as myself so just for the record I personally don't have expertise in climate science of this talk is about that particular button discussing it to give to motivate our use case and to help explain the semantics of our data the so a snapshot of the of the process looks like this a large coalition of international climate scientists from global climate models were reduce and that represent climate at a large scale and these models are drive from fundamental physical laws and include a variety of features that affect the climate system sometimes the at a lower scale information and this is the same models around but with smaller grid boxes on a particular region and that's referred to as a regional climate model for our system and then climate statisticians like those of the cake and downscale of the GC ends were lost against regional or local scale and then domain experts may run local impacts models of a local scale
the so so if you think about that whole process is a data pipeline the cake has expertise in the areas on the right to downscaling some impacts models modeling and but we rely on open data and open protocols from the organization's upstream Witcher mostly federal governments and we want to provide a downscaling products the open data and open protocols downstream so that's 1 reason why I want to release the data and assessing climate impacts doesn't end downscaling and there are numerable other domain experts who can use this downscaling data to assess climate impacts on their areas of expertise for example hydro engineers the highway engineers in hydropower engineers and in fact recently the province of BC began requiring proponents of natural resource projects to begin including the impacts of climate change as part of the environmental impact assessment process and so it's still in its infancy that also there are a lot of consultants and there that real he need data on climate change the the on and finally we're confident there were calls up to peer-reviewed scientific rigor so we much pride as much transparency as possible and having a data public accompanying have any journal articles is a great way to do that not nearly all of our climate data as spatio-temporal Rasta data so it's 3 or 4 dimensional coverage of x y and time dimensions sometimes including is the dimension and so we want to do is create a general-purpose framework for serving spatio-temporal Rasta data and then put a bit of a web UI on top so it turns out that's hard to
do and if you consider that that climate scientists or modelling some number of future scenarios for human emissions that multiply that by all the different global climate models by all the different regional climate models by all the different types of downscaling by some number measure quantities by time and by space all a sudden you've got a lot of data but unlike observations were not limited by the sensors on satellites that we can afford deploy became just can create data out of thin air people and were least a model that's this turning away at simulating the Earth in the ocean and were not even limited by what's happened in the past we were projecting future not just 1 future but many many realizations of the future as modeled by many people and then many modelling centers so that we could have just dropped dollar all over data all the mass of net CDF files on on FT piece on static web server in college but most of the researches with whom we collaborate don't have the ability to download and store hundreds of terabytes of data like we do plus a turns out that most impacts modeling is very location specific so people should have to download data for all of Canada Justice select out Vancouver's data and throw the rest away so we want give people a data they want and only the data that they want
and so this is the the architecture we designed for the system is represented by this diagram and you'll notice of the data itself is down the bottom end of the foundation of everything and then there several streams of services that are built on top of that so the net CDF box your lower left is the only thing that's just data sitting on disk and we've go of post post breast and just on interview mass pied AT and T which the package that we wrote and they're all different services that we have running which respond to incoming web requests all the metadata regarding are available data is organized in post ncw must provides the climate visualization layers and pied apr├Ęs bonds to all the requests for the actual data itself and then PDP response to all the requests that build up the user interface the see also in and go through a brief demo that you may or may not be with study but I'll do ahead of time
so that there is a hypothetical user of our
climate services and this is Alice she's an engineer with PCs Ministry of Transportation and she's working 1 a B remote coastal communities Bella cooler His attach the outside RO outside world by ferry had a single road and she's assessing its vulnerability the extreme precipitation and its effects on roads culverts bridges and other critical infrastructure and to Alice once plausible future climate scenarios for the watersheds rounder highways Weaver map to select area of interest and and so she can get the this overlay the clamor Rastas see where the information is available the right hand side there's a tree of all the different scenarios of greenhouse gas
concentration pathways differences in GC and in different downscaling methods and there's a time selector only interested in the subset of future crap showing use analyze the past plus the future 40 years to correspond to the projected life some bridge and she assesses select output output format and so we often at CDF and then a couple others for convenience so in a park asking grid format and the plain text representation I mentioned just selects download and data start streaming right away the this is area the download of but now that the data that the datasets that a user could download are potentially very large so each scenarios the full spatiotemporal domain and around 150 gig which is a ridiculous to serve up as a static file or http but as soon as you want to write a webapp around it allow dynamic responses subset requests etc. all a sudden you're talking about a lot of I O bound computations that you have to make before the HTTP response was that which ideally should happen in less than a 2nd so while reading for that to download I'll go through all of the software components that we use off the shelf moons we had to modify for this purpose and so we've
written a full web application back in Python which does all the file format translation all the database communication and passes all the metadata onto the web UI to be interpreted the user on so that consists of about 20 800 lines of Python code plus a 15 hundred lines of that just testing code and then there's another 3 thousand lines of which make up I dont which of the modified a lot of Chile to what extent and that here the and
so high that this is a component of the data portal that actually provides the data download services as an implementation of the Open Data Protocol which is designed to be a discipline neutral means of transferring data across the Web and so it's the protocols open source and it's designed to be 0 as an application independent such that you can get data into whatever softer you want to use to do your data analysis boat supported mostly by US scientific agencies such as known and arson and 1 there are another of of different there are a number of other different opened up servers out there but pineapples 1 that we use this architecture is is quite a bit more flexible than some the other opened up servers so that this is a rough layout and you can see it that at the bottom and has a number of different handlers which are written to interpret different data formats and then translate them into the DAP structure in the middle and then on the top there are numerous responders which translate the DAB structure into output formats user once so it's quite a bit of flexibility and I which important for us especially with a wide range of abilities the users have and in be be a bit of an idea of to what degree pineapples off the shelf and I ran Hg turn on all of our pilot repositories i which measures the changes in the repository by lines of code and then the fraction shown there are I the turn of the staff from my group divided by the total Chern of all that matters and you can see that we wrote 1 handler by ourselves From the HDF and net CDF work that we're using is mostly hours that's primarily work that needed to be done to make the server able to stream the response of and for the rest we only had to make minimal changes were using a modified version of NCW mass to provide visualization the climate Rastas on of really like it gives us a lot of stuff for free and so a full-featured web mapping service server that converts net CDF files into tiled images usable on the web and it has support for time dimension which is very important to us given the most arresters are also have thousands of times that's available fortunately it has a few limitations that make it non ideal for use of Big Data so to configure layer you have to go through the files 1 by 1 and add them to a list and then configure 5 10 different attributes and additionally whatever you want to start rescore or restart the server that goes through every single file in order and scans the the whole ball file to determine the ranges that it can assign a color bar on so I can take many minutes possibly hours and it only gets lower the more layers you add so we've done some modifications sensitive you maths to run it off of post press database on word gets its whole list of layers of variable ranges and everything so that that D coupled image generation from I O intensive part of configuring layers and it's made possible to scale up a deployment I and finally the last piece of softer stack is the JavaScript front and the ties everything together for the user and ultimately it doesn't really provide any functionality in and of itself but it's Kiefer providing a good user experience and and it's aware of all of the various services that are provided asynchronously makes requests processes and then displays things to the user so all the mapping is provided by layers and we use Shakery for bit of convenience the so this is the free and open-source softer conference so of just mentioned that to what extent are components of free and open source software and so for the most part of this project unfortunately were were mostly just open-source users not huge contributors and but all of our code for pied have PDP in the way due i are on a date homepage which are linked to the and and are released under various free software licenses that's technically free software but but not so much in the sense that it has a vibrant community effort driving at some were the only contributors and users for all the parts that we wrote out right of and to our knowledge so much the code is still pretty specific turn needs of mostly because we haven't just have the need to generalize it so that if anyone's interested in using any of the components I guess just get in touch but anyway go back to the demo see her data is done downloading
and what you know it looks like it
has 1 engineer has not quickly downloaded gigabytes of custom selected climate scenario output with just a few clicks of the data is fully attributed with metadata units are references and citations to the methods used to perform the downscaling so and
spirit of open science and having all the
metadata directly attached the data is a pretty big deal because it inches of the data provenance is tractable even if are further operations are performed on the data later later on which is the ultimate goal and from our perspective so from here it's relatively easy for Alice to plug the numbers into whatever impacts motions run at the
so with that the and here
and I that's all I'll leave you with my simple can simple conclusions i governments user downscaled climate model output to plan for the effects of climate change near infrastructure there so much model output that data delivery is a non trivial problem tried to make it as easy as possible for users to narrow the data down to what they actually need and we stream into them right away and our work is available and were happy to work with others to make it more what do you do if b I think we've got 5 minutes for questions but the I'm just curious what you do for hosting for your large amounts of data here on your own server should use cloud as things but we do run answers were fortunate enough to be a university where were subsidized some extent so we build a lot of our own stuff that was of local the and the and mentioned the downloaded datacubes a fully attributed its at all attributes in that city other side commented upon has a year after year and that's see the should probably add that's not the case for some of the other formats but yeah what Messier the yeah so on and then some work with as some kind I model data as well and he and as far as like tiling out for as serving out on like say with map tiles and the dad explodes really really quickly so how large is a cluster that you're working with an to process this data um we don't actually use any we don't use any cluster it's a single server that's single-server list pushes Rasta running on a yeah so 1 of the tricks is organizing the the datacubes to be able to read it quickly and if you yeah if you lay out the data in time agency of major and minor mixed up but you laid out in time major than you can make a map really quickly but you can't drill down in 1 dimension because you've got a lot of read the reads the whole way through and so yeah give the layout 1 way you can easily make a map that if you have lay the other way that the can easily read and go and you have any plans for a I'm adding to analytic capabilities sesterces statistical i derivations of the data as as services here we want to but were working on for on angle this the kind of almost answer my question Iran's cares about uh yes restoring large files you wanna serve them up quickly and here's uh what sort of clustering within about the use of what we actually I think I don't think we use any clustering August impression yet so we use file system level compression on of us and then the decompression gets put up across all of our core tha 1st as far as that others that are the files the stored compressed at and the file system here the uncompressed at the net CDF level but a compressed at the the file system level of a and so that way you get because it's spatiotemporal there's lot of no data values notions and of the thing so you get really good compression with all the new data values essentially goes away but thanks the start more the story and yeah on the top of my head when track of the important things go from now from a performance perspective are the ranges that you can generate 1 of the good and the color of but yeah we store like all the GCM Sonora C and so that it came from and on all the time look full-time dimensions so if you wanna see in the file for a specific time step you can look that up the database 1st of all I have to and the Rangers are probably the most important thing so that I can think of off the top 100 that's a good question so I no I think I think kind goes straight to the files themselves 1 the what's what's the spatial domain it depends we mostly work in like mandate story can be seen civic Yukon region but uh we
have global climate models that evolve we have global climate models so it's global and but yeah we do some some projects for Environment Canada and that's at a national level and so
they just depends of about the other data but typically that's so that's 1 of the big things that we do that helps connect a global climate model to local scale as having to do that downscaling and we have a number of people that have the education downscaling so far over the years of expertise the great thank you very much


  340 ms - page object


AV-Portal 3.20.2 (36f6df173ce4850b467c9cb7af359cf1cdaed247)