We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Open Meteorological and Climate Data

00:00

Formal Metadata

Title
Open Meteorological and Climate Data
Subtitle
Building Bridges between user communities!
Alternative Title
Free and Open Meteorological and Climate data - what is missing?
Title of Series
Number of Parts
295
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Copernicus offers, besides the well-known Sentinel satellite data, a wealth of domain-specific open environmental data sets, e.g. data on climate, wildfires, air quality, floods. One of the most popular data set useful for many environmental applications is the climate reanalysis ERA5 produced from the European Centre for Medium Range Weather Forecasts (ECMWF). Improvements in the spatial and temporal resolutions lead to an increase of the entire data volume up to 5 PBs. Additionally to the sheer amount of data, meteorological and climate data have a certain complexity, especially for “non-expert” users, as data can have up to five dimensions and two time dimensions. The current situation shows that a full, free and open data policy is one important prerequisite, but the key to fully unleash the potential is making the data ‘accessible’. If open data is not accessible, it becomes open data that is locked away in large data silos. However, making meteorological and climate data “accessible” means more than just improving data access. It requires improvements and developments along the entire data processing chain, including the development of example workflows and reproducible training materials as well as developing / enhancing mainstream open-source software tools. In this context, the FOSS4G spirit is vital. This talk puts the spotlight on open meteorological and climate data. Current ‘accessibility’ challenges and future needs will be discussed in order to make open meteorological and climate data better accessible to everyone.
Keywords
FreewareRange (statistics)Parameter (computer programming)Lecture/Conference
Universe (mathematics)Open setGoodness of fitMoment (mathematics)LogicLecture/Conference
Open setBuildingOpen setVotingXMLComputer animation
Range (statistics)Service (economics)PredictionLogicLecture/Conference
Service (economics)Integrated development environmentOpen setRange (statistics)
Angular resolutionSet (mathematics)Beta function
AreaLecture/Conference
Parameter (computer programming)Angular resolutionExtension (kinesiology)AreaState of matterParameter (computer programming)Subject indexingDifferent (Kate Ryan album)Computer animation
Angular resolutionFrequencyImage resolutionTemporal logicPrice indexMoistureCodeBuildingSubject indexingMoistureSeries (mathematics)Goodness of fitInformationLecture/Conference
Temporal logicImage resolutionAngular resolutionGoodness of fitChannel capacityMathematical analysis
Open setFreewareOpen setXMLUMLLecture/Conference
FreewareOpen setService (economics)Data structureVirtual machineFile formatUsabilityOpen setData structureVirtual machinePhysical systemComplex (psychology)Mixed realityLatent heatFile formatXMLUML
File formatKolmogorov complexityOpen setData structureMetadataXMLUMLLecture/Conference
File formatKolmogorov complexityOpen setDatabaseLaptopCivil engineeringStandard deviationArithmetic meanObject (grammar)Projective planeTerm (mathematics)Covering spaceObservational studyVector potentialFormal languageService (economics)Single-precision floating-point formatSatelliteMedical imagingVolume (thermodynamics)Web 2.0MetrologieOpen setXML
Term (mathematics)Term (mathematics)Arithmetic meanFile archiverVolumeInformation retrievalLogicMassData storage devicePhysical systemMetrologieService (economics)Projective planeLecture/Conference
Projective planeService (economics)Group actionMultiplication signVideo gameInformationXMLComputer animation
PixelPoint (geometry)Field (computer science)Field (computer science)InformationMultiplication signOctahedronPoint (geometry)DivisorMusical ensembleFocus (optics)PixelRegular graphLatent heatReal numberStandard deviationFile formatTerm (mathematics)Traffic reportingComputer animation
Standard deviationTerm (mathematics)Standard deviationFile formatTerm (mathematics)LogicSelf-organizationGroup actionMetrologieExpert systemMultiplication signSoftwareService (economics)Lecture/ConferenceComputer animation
SoftwareSoftwareExpert systemCore dumpWave packetComputer animationXML
OnlinecommunityMetrologieWave packetTwitterTraffic reportingDecision theorySelf-organizationWorkstation <Musikinstrument>NumberOperator (mathematics)State of matterNatural numberLecture/Conference
NumberTwitterTerm (mathematics)Dependent and independent variablesDecision theoryComputer animation
Decision theoryExpert systemTerm (mathematics)Dependent and independent variablesDecision theoryGreatest elementChainProcess (computing)Level (video gaming)InformationMatching (graph theory)Multiplication signSoftwareLatent heatLecture/ConferenceXML
Decision theoryExpert systemPiSoftware testingPhysical systemMultiplication signWordPoint (geometry)Lecture/ConferenceXMLComputer animation
Point (geometry)Physical systemLecture/Conference
PressureLevel (video gaming)Kolmogorov complexityComplex (psychology)Latent heatDimensional analysisLogicMetrologieThree-dimensional spaceDiagram
Multiplication signComplex (psychology)Musical ensembleEndliche ModelltheorieArithmetic meanFitness functionLecture/Conference
VolumeArchaeological field surveyProcess (computing)Channel capacityWeb portalHydraulic jumpComputing platformWeb portalArchaeological field surveyProcess (computing)Annihilator (ring theory)Channel capacityMoment (mathematics)Limit (category theory)
Address spaceOpen setConfidence intervalVolume (thermodynamics)Lecture/Conference
Maxima and minimaXMLComputer animation
Open setZeno of EleaFile Transfer ProtocolShift operatorService (economics)Different (Kate Ryan album)Type theoryRange (statistics)GoogolWordGroup actionCodeText editorLecture/ConferenceComputer animation
CodePlotterData storage deviceText editorWeb applicationOnlinecommunityCartesian coordinate systemDifferent (Kate Ryan album)View (database)Lecture/Conference
Google EarthPredictionPhysical systemEndliche ModelltheorieComputer programmingState observerMoment (mathematics)OnlinecommunityCurvatureDifferent (Kate Ryan album)Discrete element methodGoogolProgrammer (hardware)Computer animation
File archiverOnlinecommunitySubsetPhysical systemBridging (networking)Different (Kate Ryan album)Operator (mathematics)GoogolMetreLecture/Conference
Data conversionParameter (computer programming)Library catalogSystem programmingOrder (biology)Parameter (computer programming)Process (computing)Data storage devicePhysical systemGoogolMetreWeightCloud computingGroup actionXMLComputer animation
Standard deviationCommunications protocolWeb 2.0Physical systemMappingService (economics)VolumeMoment (mathematics)Formal languageArmSign (mathematics)Process (computing)XMLComputer animation
Device driverClient (computing)EmailService (economics)Process (computing)Web 2.02 (number)Wrapper (data mining)SoftwareFormal languageMoment (mathematics)Focus (optics)Device driverLatent heatElectric generatorBuildingDataflowLecture/ConferenceComputer animation
MathematicsPoint cloudService (economics)Programming paradigmMIDIMoment (mathematics)Projective planePoint cloudService (economics)Perspective (visual)Cloud computingInternet service providerArchaeological field surveyLecture/ConferenceXMLUMLComputer animation
Physical systemProcess (computing)MathematicsProgramming paradigmCloud computingPoint cloudMultiplication signMessage passingLevel (video gaming)Lecture/Conference
MathematicsPoint cloudService (economics)Programming paradigmMathematicsMessage passingLevel (video gaming)Computer animation
CodeWave packetMessage passing2 (number)Shared memoryAuthorizationMereologyThermal conductivityWave packetCodeOperator (mathematics)Multiplication signBridging (networking)Electric generatorUniverse (mathematics)Lecture/ConferenceXMLUML
Bridging (networking)Electric generatorLecture/Conference
BuildingXMLComputer animationLecture/Conference
Transcript: English(auto-generated)
So let me introduce the first keynote speaker this morning. Julia Wageman comes from the European Center for Medium Range Weather Forecasts. She's been working there a few years, currently doing a PhD on these same topics.
Julia, welcome. You have the floor. OK, good morning, everyone. I'm quite surprised to see so many after seven hours of open bar. So really good.
So yeah, my name is Julia. I am a visiting scientist at a moment at ECMWF. I'm also doing a PhD at Marburg University. And for the next 20 minutes, I would like to put your attention to meteorological and climate data, because they're openly available, and they're ready for you to use. And I also would like to use the opportunity
to talk about some aspects about open data. But before I start, who actually knows what ECMWF is, the European Center for Medium Range Weather Forecasts? Please raise your hands. It's really hard to see, but OK.
OK, so a few. OK, that's already good. So ECMWF is primarily a numerical weather prediction center. We provide weather forecasts to national metrological services. But we also operate two Copernicus services, the Copernicus Climate Change Service
and the Copernicus Atmosphere Monitoring Service. And because of these two services, we have also now a full range of open environmental data available. We have, for example, data on climate available. The most popular data set is the ERA 5 reanalysis, which just has been published at the beginning of this year.
It is hourly data on a 30-kilometer spatial grid. And it's going back so far until 1979, but it will soon go back even until 1950. And we also have seasonal forecast data, monthly forecast data, up to six months ahead. But we not only have climate data,
we also have air quality data. And this is quite recent because of the wildfires in the Emerson area. So this is a biomass fuel index, which we see here. And so it's really helpful to monitor the impact, what's actually happening there.
And we have different parameters on air quality, on ozone, on carbon dioxide, on nitrogen dioxide. And we have reanalysis data. They go back until 2003. But we also have forecast data, three hourly forecasts, up to five days in advance. But it's not everything.
We also have data on fire danger. We provide a series of fire danger indices, for example, fire weather index or fine fuel moisture code. And they help you better to assess the possibility of fire occurrence somewhere. And last but not least, we also
have very good data on flooding, so in specifically river discharge information. We have forecast data for there daily on a 10-kilometer spatial grid. And we also have a reanalysis going back until 1981. So we have these.
And the good news is that these data are all full, free, and open available under Copernicus. That's exciting, isn't it? But in preparation of the talk, I was thinking about some people already asked me, OK, what does it actually mean, full, free, and open?
And I looked up a definition of open data. And I found a definition from the European Data Portal. And what I found very interesting is that they said that aspects like format, structure, and machine readability make the data more usable.
But they also said that these do not make the data more open. And I want to actually turn around this question today and ask, do aspects of non-interability, so that data are not easily inter-exchangeable between system, or data complexity,
so we have the structure of the data is complex. We have complex metadata. It's hard to understand. And community-specific formats, so do these aspects make the data less open? So when I joined ECMWF, I was involved
in a project where we knew that the open data, it's growing in volume. And it's getting harder and harder to actually download the data. And so I was involved in a project where we investigated how we can actually provide a better on-demand access to the data
based on data standards. So we implemented a web coverage service for climate and metrological data. And this is how my day-to-day work looked like, because I tried to fit in metrological and climate data into standards and also into a technology which
was primarily developed for Earth observation data and satellite images. But it worked. So we also see the potential for web coverage services. Standards. But it also showed me that working with different partners in the project
is, despite the fact that we speak the same language and we also use the same terms, it often also doesn't mean that we actually have the same meaning for these terms. And so I would like to share with you today five facts about a metrological and climate community to better understand us.
So the first fact is big data is not a new term. So if we define big data just by the sheer amount of data, ECMWF is quite experienced in handling, storing, and archiving large volumes of data. We have the metrological and archival retrieval system.
It's called Mars Archive. And in the Mars Archive, we have more than 250 petabytes of data stored. And it's the largest archive of metrological data worldwide. The second fact is operational means really operational.
There are a lot of services and projects out there they claim to have an operational service by offering data support within 24 hours only on workdays. For ECMWF and the metrological community, operational means it has to be up and running 24 hours
and seven days a week. Because forecasts, they can save lives. So it's very vital to disseminate this data and information and time. The third fact is we talk about fields and grid points, not bands and pixels. So the forecast data are produced on an octahedral grid.
And so forecast data are basically just valid for this one specific grid point. And if you retrieve data on a regular latitude longitude grid, it's important to know that the data between two grid points, they have been interpolated. And the real forecast value is just
valid for this specific grid point. The third fact is standards and interoperability are no new terms. So we have a common data format the metrological community likes. It's the GRIP format. And this is like a very efficient data format to inter-exchange data, metrological and forecast data
between metrological organizations. And so it's a very efficient data format. But it can mean that people who are not familiar with GRIP format, it might mean that you have to invest some time to actually better understand and how to handle it.
But for expert users and for national metrological services, it's not a big deal because this brings us to fact five. We like custom-built software developed in-house because this helps us to tailor the software specifically to our needs and also to make the handling of GRIP data very efficient.
And for ECMWF core users and expert users like trained meteorologists, employees from national metrological organizations, it's not a big deal. And it's very good because many work on Linux work
stations. But we also see a trend now with Copernicus. They are like a large user community. They're actually interested in the data. And this can lead to a problem. And so this is also what the European Commission found out.
They published a report, the Copernicus Market Report at the beginning of this year. And they state that number one trend is that they see a diversification of users and their demands. But the definition of users is quite arbitrary, I would say. Probably if I ask 10 people here who is a data user,
I probably get 10 different responses. There are different terms out there, what users can be. So there can be an end user. There can be decision makers, intermediate users. But the problem with users, if we
try to actually match them on what level of the geospatial processing chain they probably belong to, so the pyramid is going from raw data on the bottom up to pre-pros data and then generating more information. It's very difficult to actually match.
And also probably it depends on the level of experience, how you would put specific users. If someone has a lot of experience using geospatial data and developing software, if someone who just developed workflows in Python, maybe you would put him or her as an intermediate user.
But someone who just heard about geospatial data and uses GIS systems, and then compared to someone who already develops workflows in Python, he or she is already much more advanced.
So the point I want to make is it's very vital to actually understand who are the users. Because then systems and also the data can be better opened up. But at the same time, it's also very challenging. So there are some few challenges. If we go back to metrological and climate data,
new data users, they might face some challenges. And one of the most important challenge, I would say, is data complexity. So we all knew about three-dimensional data, but metrological data, we can even go up to four dimensions
and then even five dimensions. If we talk about ensemble data, we don't trust one forecast at one specific time. We actually let run the model 51 times to then generate the mean out of it. So this is a much more reliable forecast.
So data complexity is a challenge for new data users. I conducted a user requirement survey at the beginning of this year. And because I'm interested in who are the users, what tools they use at the moment, but also what challenges they face. And the fourth biggest challenges they identified
is limited processing capacity, your growing data volume, data are disseminated in a non-standardized way, and that there are too many platforms and portals that users are just confused, they don't know where to find data, how to retrieve the data.
So it's probably growing data volume, it's not big surprises. But these challenges are very important to overcome and also to realize, because if we don't specifically address them, then we're just continuously aggregating more and more data. And we have open data available in large data silos,
but it's actually open, locked data, because no one can find them, no one can access them, and no one can use them. So there has been like kind of a competition, I feel, on conferences that people, they say, oh yeah, we generate these and these terabytes
and data every day, and oh no, we generate even more terabytes of data every day. And I would like to turn it around or encourage everyone rather than to think of, okay, we generate so much data to also ask, okay, how much of this open data that is actually produced is used?
And this, to make the data usable, one specific prerequisite is to provide, to give data access to it. And there is a very important shift we have to go to from pure download services to more on-demand data services.
And so we have a range of different types of data access ways for also the data I introduced at the beginning, but the most are still download services. But then now with Copernicus and also with the setup of the climate data store,
there is a path towards more on-demand data services. For example, the climate data store toolbox will, similar to Google Earth Engine, have an online code editor. You can directly access the data. You can generate your workflow in Python, and then you can build a web application
or a visualization, and you can just save your plot. So one, but it's also important to think of that, yeah, we have different user communities, and different user communities use often different systems.
So for example, Google Earth Engine is used a lot by the Earth observation community. There's an example from the World Food Program. They develop at the moment a flat prediction model on Google Earth Engine. But they also interested actually in using era five data to make the model better.
But the problem is, okay, how do you actually bring it together? You can try to ingest the data, if the systems are not interoperable, it can be challenging. And that's why we decided, or I also believe in,
that as long as the data systems are not interoperable, that mirror archives of data can be also bridges between different user communities. And so I've been working on making a small subset of seven era five parameters available on Google Earth Engine. But the process to make them available is a very good example how systems are not interoperable.
Because in order to make data from one system climate data store to Google Earth Engine to make them available there, we have to download the data in GRIP or NetCDF. We have converted the data to GeoTIFF. We have to upload them to Google Cloud Platform, and then we have to ingest them to Google Earth Engine.
And so the entire process to make seven era five parameters available, around four or five terabytes of data, it took around nine months. And this brings me to the needs. And one important need is, yes, systems have to interoperate with each other
and data have to be easily exchangeable with each other. And this can be achieved with standards. But it has to be beyond, or it has to go beyond web mapping services because we are data users and large volumes of data, we want to have the real access. So we also have to, we need more web coverage services
and web processing services. The second need is, yeah, we have to make it easier to handle and process the data with tools users use. And so Python and R are at the moment the languages data scientists use. And so, yeah, we need good and handy wrappers
and drivers to work with the data. And ECMWF has put a strong focus on making it easier and also bringing the custom built software to the Python world, specifically the last year. And the third need is,
if we have these tools and the packages and the data, we also have to show how we can actually efficiently put these everything together. So we have to generate reproducible workflows and we have to train data. If you like to discuss and talk about reproducibility,
please have a look at the reproducibility workshop at ECMWF. We will dedicate three days on this topic in mid of October. And so now it's, yeah, the question is, okay, where are we going? And it's not a surprise. Yeah, cloud is the future.
So ECMWF is involved in quite a few cloud projects at the moment. So, but this shows also that there's still a lot of question marks as well, how data services can be set up also on the perspective from data providers. And, but the good thing is the user survey, I also ask users if they would be motivated
to migrate to cloud services and 68% were interested or even very interested to migrate to cloud services to do their processing there. But we have to keep in mind, not just, just because of the fact we have a new system,
we have cloud services and yeah, and it might be beneficial for users. It doesn't necessarily mean that users also use the systems. And so cloud is a paradigm change and we have to keep in mind that change also always takes time. So to conclude this talk, I just would like to give you three take home messages.
And the first one is a quote from Albert Einstein saying, problems cannot be solved by the same level of thinking that created them. So I just wanted to say that, yeah, we have to keep in mind in our day to day work rather than to work on what is possible now to think what do we want to have in 10 and 20 years
and work towards this path. The second take home message is, I think it's less of a problem here in this community, make reproducibility and sharing to a part of your personal code of conduct. So see it as a chance rather than a hassle.
And the third one is, train others, share your skills and also knowledge. So what we discuss during these days here and also in other conferences, we have to bring to universities, we have to bring to companies and public authorities to really train the next generation of geospatial practitioners.
And I liked the resemblance from Vasily yesterday. So yeah, and I want to conclude it with, just continue building bridges and I wish everyone an inspiring FUS4G. Thank you.