We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Geo-spatial queries on multi-petabyte weather data archives

00:00

Formal Metadata

Title
Geo-spatial queries on multi-petabyte weather data archives
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Geo-spatial queries on multi-petabyte weather data archives John Hanley, Nicolau Manubens, Tiago Quintino, James Hawkes, Emanuele Danovaro Weather forecasts produced by ECMWF and environment services by the Copernicus programme act as a vital input for many downstream simulations and applications. A variety of products, such as ECMWF reanalyses and archived forecasts, are additionally available to users via the MARS archive and the Copernicus data portal. Transferring, storing and locally modifying large volumes of such data prior to integration currently presents a significant challenge to users. The key aim for ECMWF effort in H2020 Lexis project is to provide tools for data query and pre-processing close to data archives, facilitating fast and seamless application integration by enabling precise and efficient data delivery to the end-user. ECMWF aims to implement a set of services to efficiently select, retrieve and pre-process meteorological multi-dimensional data by allowing multi-dimensional queries including spatio-temporal and domain-specific constraints. Those services are exploited by Lexis partners to design complex workflows to mitigate the effect of natural hazards and investigate the water-food-energy nexus. This talk will give a general overview of Lexis project and its main aims and objectives. It will present the pilot applications exploiting ECMWF data as the main driver of complex workflows on HPC and cloud computing resources. In particular, it will focus on how ECMWF's data services will provide geospatial queries on multi-dimensional peta-scale datasets and how this will improve overall workflow performance and enable access to new data for the pilot users. This work is supported by the Lexis project and has been partly funded by the European Commission's ICT activity of the H2020 Programme under grant agreement number: 825532.
MultiplicationQuery languageFocus (optics)File archiverComputer animation
Self-organizationService (economics)Data managementState of matterEndliche ModelltheorieNeuroinformatikState of matterCategory of beingThermal radiationDimensional analysisService (economics)Archaeological field surveyExploit (computer security)Field (computer science)SatelliteWorkstation <Musikinstrument>FreewareSelf-organizationComputer animation
Continuous functionSatelliteRadarWorkstation <Musikinstrument>Regular graphScalar fieldFunction (mathematics)Data modelField (computer science)SurfacePressureVariable (mathematics)Set (mathematics)Computer simulationMereologyField (computer science)Cellular automatonFunction (mathematics)Endliche ModelltheorieState observerWater vaporPhysical systemInterpolationNumberFile archiverInterpreter (computing)Degree (graph theory)Group actionPressureComputer virusComputer animation
Menu (computing)ConsistencyReduction of orderInstallable File SystemData modelOctahedronAxonometric projectionSurjective functionPoint (geometry)NeuroinformatikLevel (video gaming)NumberView (database)BitPole (complex analysis)OctahedronComputational geometryComputer animation
Installable File SystemData modelSigma-algebraLevel (video gaming)Logical constantPressureImage resolutionPhysical systemPressureMoore's lawSimulationMultiplication signCalculationFunction (mathematics)Shape (magazine)Computer fileLevel (video gaming)Electric generatorMetropolitan area networkConstraint (mathematics)Observational studyPower (physics)Endliche ModelltheorieComputer animation
Image resolutionCellular automatonField (computer science)BitPhysical systemComputer simulationFile archiverDifferent (Kate Ryan album)SimulationVideo gameSquare numberPoint (geometry)Computer animation
PressureSystem programmingModal logicQuantumMultiplication signPhysical systemBitSimulationLattice (order)Sampling (statistics)Constraint (mathematics)Category of beingImage resolutionDifferent (Kate Ryan album)Square numberMetreBit rateSupercomputerMereologyCASE <Informatik>Right angleData storage deviceMusical ensembleEndliche ModelltheorieParallel portMiniDiscPotenz <Mathematik>Computer animation
Point cloudTape driveSupercomputerMountain passFile archiverCache (computing)Tape driveClosed setParallel portBitSource codeObject (grammar)Data storage deviceSimulationLibrary (computing)NeuroinformatikPoint cloudPhysical systemService (economics)Right angle1 (number)MiniDiscChannel capacityPosition operatorQuery languageMereologyMassVideo gameMultiplication signSuite (music)Sign (mathematics)State of matterProcess (computing)CASE <Informatik>Computer animation
Chemical equationCubeHypercubeTime domainSynchronizationHypercubeSubject indexingParallel portDimensional analysisObject (grammar)Cache (computing)BitSummierbarkeitOperator (mathematics)Bit rateQuery languageNumberRange (statistics)File systemFlow separationEndliche ModelltheorieLevel (video gaming)Computer fileData storage deviceAreaField (computer science)Domain nameDifferent (Kate Ryan album)Front and back endsSimulationService (economics)MereologyEvoluteDescriptive statisticsPhysical systemWebsiteCASE <Informatik>Graphics tablet1 (number)Computer animation
HypercubeTime domainSynchronizationPhysical systemLevel (video gaming)PlastikkarteMultiplication signData storage deviceQuery languageObject (grammar)File systemMiniDiscBit rateParameter (computer programming)HypercubeLevel (video gaming)Field (computer science)LengthPhysical systemOffice suiteProcess (computing)Position operatorCASE <Informatik>SurfaceSubject indexingComputer animation
WritingBlock (periodic table)Revision controlSemantics (computer science)ACIDDatabase transactionConsistencyHash functionServer (computing)SpacetimeCASE <Informatik>ComputerInstance (computer science)Physical systemCrash (computing)Mainframe computerIntegrated development environmentMereologyProcess (computing)AreaUniform resource locatorNeuroinformatikRight angleTape driveCategory of beingFlash memoryComputer animation
HypercubeMachine learningFile systemPoint cloudData storage deviceClosed setParallel portProcess (computing)NeuroinformatikComputer fileObject (grammar)Tape driveFile archiverInformation privacyMultilaterationUsabilityComputer animation
Time domainMenu (computing)FluidVideo gameStability theorySubsetAddress spaceSource codeInterpreter (computing)Daylight saving timeNeuroinformatikData storage deviceField (computer science)Domain nameComputer animation
Point cloudTape driveService (economics)SupercomputerCommon Language InfrastructurePolytopMenu (computing)Duality (mathematics)Software developerClient (computing)Exploit (computer security)PolytopPhysical systemFile archiverPoint cloudQuery languageAddress spaceService (economics)Representational state transferReal-time operating systemProjective planeInterface (computing)Formal languageProcess (computing)Line (geometry)Metropolitan area networkComputer animation
Menu (computing)Revision controlSource codeFile archiverUsabilityReal-time operating systemProjective planeOpen setService (economics)System callMultiplication signInterpreter (computing)Open sourceInstallable File SystemImage resolutionComputer animation
Time domainQuery languageInterpolationParameter (computer programming)Degree (graph theory)Point (geometry)Image resolutionFormal languageGradientOperator (mathematics)Computer animation
FacebookPoint cloudOpen source
Transcript: English(auto-generated)
Okay, Emmanuel, the floor is yours. Thank you. So, I come from the European Center for Medium-Range Weather Forecast. And we have a pretty large archive of our forecast.
In fact, it's the largest in the world. And so, extracting data from there is quite tricky. What do we do? We are an international organization financed by 34 member states.
And we perform operational services, namely weather forecasts. And we support the national weather services in the exploitation of our data. We do research in this field and we also provide some Copernicus services.
So, free access to some of our data and some data that we compute, especially for that. Obviously, this is not just computing. We also have to acquire quite a lot of data.
In fact, weather data for us are two main categories. One is made by observation. So, we collect data from satellite, from radar, from weather station, and whatever we can fetch. And we archive those data in our system. And then we use those observations to create an interpolation of the status of the atmosphere.
Then, from that interpolation, we start our numerical models that are coupled models. So, we have an ocean model, a sea ice model, a land model, and an atmosphere one.
Obviously, we are more interested in the atmosphere. This model is producing a simulated set of variables for each cell of our grid. Obviously, the variables are temperature, pressure, wind spin and direction, humidity, and so on.
We are interested in both 2D fields, mainly on the land surface, and three fields like the water atmosphere. The computational cost is linear in the number of cells. So, we have to somehow optimize the data grid that we use to reduce the computational cost.
In fact, okay, now I will forget for a while the observation. I will focus on the model output. This is the part that we really are required to archive and assess quickly for our users.
Again, the grid. Instead of keeping a regular lat-long grid, we optimize a little bit the computational cost by using a grid that is decreasing the number of points as we approach the poles.
We call that grid the Gaussian Octahedral Grid. Why octahedral? Because it's essentially based on an octahedron that is just outside the Earth. And so we can somehow half the number of points that we use for describing each layer of the atmosphere.
Obviously, this is really helpful from the computational point of view, while all the geometric computation are somehow affected by the fact that we have to interpolate from this grid to the one that our user wants.
Another aspect is the Z-level. We cannot use a constant Z-level when we have a mountain. Obviously, we are not interested in the atmosphere under the ground. So, we have the layers that are following the shape of the Earth.
So, the digital elevation model up to a certain pressure and then we go on constant layers. So, even on Z, we have to interpolate in somehow a custom way. Having those constraints, we designed a system that is able to scale with the resolution.
So, the idea is that we want to provide higher resolution simulation for our user, but obviously increasing the resolution is quite costly. Doubling the resolution is usually costing us eight times the computational power.
Two for the dimension, one because we have to reduce the time step to get the accurate calculation. So, the impact on the computation and on the output file is pretty heavy.
Right now, we are running our system at nine kilometers per cell. So, essentially, every cell is 81 square kilometers on the global simulation. These generate layers that are 6.6 million points, roughly.
And so, each field is stored in about 50 megabytes. The problem is that we store 137 layers for each variable,
and we perform 51 different simulations twice a day, to be honest. And so, essentially, we generate millions of fields every day. So, our archive is increasing nearly 200 terabytes per day. So, in five days, we add a petabyte.
So, a little bit more in detail. We perform every day, twice a day, with two different timing, at midnight and at noon, a simulation at nine kilometers. Then, we perform 50 simulation with an ensemble of 50 simulation at 18 kilometer resolution,
again, twice a day. Then, we have a lower resolution, but extended in time simulation. And then, we have also quite a lot of research activity that is also generating new data set.
This is the amount of data that we distribute. And in the last few years, the amount of data has been increasing almost exponentially. And this is our forecast. So, right now, our simulation that is performed in one hour is generating roughly 70 terabytes of data.
So, it means that we are writing to disk roughly 19 gigabytes per second. And this is our major constraint. Our model can scale way better. We theoretically could already produce this amount of data, but simply, we cannot store it.
The IO that we are using today, that is a parallel system with luster and so on, is not able to cope with that. And we are working to improve on that size. But in any case, we are committed to increase to five kilometer resolution by 2025.
So, we are still working on that. Our computing facilities. We have a redundant system with two supercomputers. We already signed for better ones that are going to be deployed this year.
But right now, we are using those that are still in position 42 and 43 of the top 500. So, not too bad. But again, the bottleneck there is the luster parallel system.
Then we have also some resources, some cloud resources for disseminating our data. One is under the umbrella of the Copernicus services. And the other one is still experimental, is a European weather cloud that is useful for our member states to exploit those data close to the data source.
So, instead of fetching the data from us, moving the data in their own facilities and performing the simulation there, they can move the computation close to the data and hopefully reduce the overall latency. And the most interesting part, our archive.
Right now, we have 300 petabytes. Obviously, we cannot keep everything on disk. So, we have a large tape archive. But we have also some nice caching policies. So, essentially, only 4% of the requests are hitting the tapes.
All the remaining 96 are either performed from an object store that we designed or a disk-based cache. Again, we are adding nearly 250 terabytes per day.
So, four, five days are an additional petabyte for us. And also, our archive will hit the capacity of our Oracle tape libraries, because the four tape libraries that we have are going to store up to 370 petabytes.
So, we have to extend it somehow. And in any case, we are going to move all the archive in another computing center during this year. So, we have to manage carefully all those data. Okay, quite a lot of data.
How our user requests the bit they need. They give us a user request. We have a query language designed for that. They can specify the number of levels they want, the sum of the parameters, temperature, humidity, or whatever they need,
a range of dates, because they may be interested in simulating the evolution over time, and also the accuracy. We store a description of the atmosphere each hour. So, they can even decide to subsample.
In this case, they are requiring ten days of simulation with a file every three hours. And also, they can specify a regional domain. Usually, our services are interested in downscaling on a special area. So, we provide the simulation worldwide, and then they focus on their country, their area.
Okay, we can easily split those requests in two parts. One is related to the hypercube data, in the sense that we consider our data as filling an hypercube with the several dimensions that are the data,
the level, the variables, and so on. And we have to index those data according to that. And then, a geometric query, up to now a box, but probably something more interesting will be arriving pretty soon.
Okay, how can we cope with the hypercube data access? We have a domain-specific object store that we call the FieldDB, FDB. So, each data bit in our FDB is a layer, so is a field describing the atmosphere.
The model is writing directly to the FDB. And this is also required to support the throughput that we need from our model, in the sense that the parallel file system cannot guarantee the 19 gigabyte per second that we need.
So, we have this cache that is supporting the IO operations. We are also adding several kind of different backend to our object store. Right now, again, in operation, we have a POSIX file system.
And we are adding something really fancy, using NVRAM. And in that case, we can reach hundreds of gigabytes per second. But we are also considering some cloud-friendly layer like Seth.
Seth is still not performing to the level we have from POSIX, so he's roughly four times slower. So, we are still working on that. In any case, the object store is supporting our hypercube queries.
So, all those parameters are selected from an index, and then we just hit the disk for loading the field with a length and an offset. We get nice properties, like is fully ACID the system.
All the data are fully flushed, so we get some nice guarantee on the data, that the data are stable, even if we have a crash in the computing system.
Moreover, having a large computing environment, we may have trouble, like a node that is dropping or something like that. So, we may need to rerun a computation, and we still want to guarantee that the data are fully accessible and reliable.
So, we have a write once policy. So, in case of a rerun, data are not overwritten, otherwise we can risk it to have two instances of the data that are not fully consistent. We write in a newer location,
then we put the space that is no longer required when the new write has been completed. Okay, the idea is that we produce the data, we store on the parallel file system, and then we archive on the tapes.
With FTB, all the process has been modified, so the object store is taking care of all the IO, and then is forwarding the data to the parallel file system and to the archive, and eventually we will provide also a cloud consumer that is really interesting for assessing the data
and performing computation close to them. For the geometrical part, again, we have agreed that it is not user-friendly, in the sense that it is not the regular lat-lon that our user needs,
so we have to interpolate to a regular one. Up to now we are interpolating the whole layer, so the globe, and just then we crop the user-selected domain.
We do that because using Lustre, we cannot byte address the data, so we have to read the whole field and compute on that. We are all working on nicer data storage for byte addressability,
and so for being able to read just the subset of the data that are required for the interpolation, so essentially the portion of the Gaussian grid that are required to interpolate just the subdomain. This is still in work,
but hopefully it will see the daylight in a few months. Again, what can we do for giving all the user access to our archive? We are developing a cloud as a service system
for letting the user to use our query language at real, assess the data. The system is under development and is financed by a couple of European projects, namely I'm paid by one of those, that is Lexis.
The idea is that the data are assessable even externally by simply hitting a REST API, all from the European Weather Cloud that we are hosting. Everything is a nice RESTful API with both a command line interface
and a Python client that is connecting and supporting all the queries that we offer. We give through these polytop addresses to the archive and to the real-time data. There is a licensing issue up to now,
so essentially the data are not freely available. We are selling the data for a living, but we are committed, thanks to an effort of our member state, to release our data, but it will take a few years.
More details on the exploitation of those data from the cloud will be in this talk given by my colleague John. We are all busy for questions if we have time. Thank you very much. Any questions in the back?
Okay. We have an open source version of the model, that is Open IFS. We have open access for research to the archive.
The only thing that is not assessable for free is the real-time forecast, in the sense that for research, usually having the data with a few days of delay is not an issue, but you can easily ask for research answers and play with the high resolution
and whatever you want. And the project that I'm working with, we will have an open call for exploiting those data, and so you can apply for the Lexis open call and ask for access even of the real-time data.
It will be time-constrained for the lifetime of the project, but still something relevant and useful for playing with the largest archive that OpenAI is available in the world. Okay.
Another one? Yes, yeah. Okay. How is the interface? Sure, sure.
The possibility to get the data on a regular lat-long grid is already there, in the sense that the interpolation is a parameter of the query language. So you can select the desired resolution,
one degree, 0.23, whatever you want, up to 0.1 degrees, up to null, and essentially the interpolation is performed on the fly for you. So you already can get the data on the grid that you like without the need to implement it.
I'm going to cut off here. Oh, sorry. So you can start afterwards if you want. Thank you very much once again.