We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The Copernicus Data Store (CopDS) - a reimagining of the Copernicus Climate Change Service (C3S) Climate Data Store (CDS)

00:00

Formal Metadata

Title
The Copernicus Data Store (CopDS) - a reimagining of the Copernicus Climate Change Service (C3S) Climate Data Store (CDS)
Title of Series
Number of Parts
351
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
The Copernicus Climate Change Service (C3S) Climate Data Store (CDS) is a single point of access to a wide range of free, quality-assured climate data, along with a suite of tools for performing cloud-based analysis and visualisation of very large datasets. Launched in 2018, the CDS provides over 100 datasets and 30 interactive applications for a global, interdisciplinary and intersectoral audience of over 100,000 users. The Copernicus Data Store (CopDS) project aims to reimagine the CDS, making use of modern technologies and knowledge gained during the development of the existing system to expand and streamline its functionalities and improve its performance and scalability. We present a high-level blueprint of the in-development CopDS, with an emphasis on how we plan to overcome the limitations of the original CDS. We explore our plans for the development of a new suite of open-source Python tools for performing retrieval, analysis and visualisation of climate and atmospheric data under the CopDS project, along with our plans for offering free cloud-based infrastructure for processing and visualising very large datasets through an easy-to-use Python web interface. We also discuss the development of tools for transforming simple Python code into high-quality web applications for exploring CopDS climate and atmospheric datasets, providing tools for interactive mapping, graphical user interfaces and a results cache for responsiveness.
Keywords
202
Thumbnail
1:16:05
226
242
Service (economics)MathematicsPredictionNumerical analysisSelf-organizationRange (statistics)JoystickState of matterStaff (military)Digital rights managementProgrammer (hardware)SpacetimeInformationLibrary catalogSystem programmingProduct (business)Axonometric projectionEndliche ModelltheorieVariety (linguistics)Physical systemMeasurementInternet forumConsistencySeries (mathematics)ZeitdilatationDemosceneVerteiltes SystemUser interfaceRevision controlLevel (video gaming)Shape (magazine)Different (Kate Ryan album)QuicksortInformationWebsiteInternet service providerResultantInteractive televisionOperator (mathematics)Form (programming)Web 2.0Projective planeCore dumpEndliche ModelltheorieService-oriented architectureComputer programmingState observerPhysical systemUser interfacePredictabilitySatelliteFile formatConnectivity (graph theory)Conditional-access moduleRange (statistics)Service (economics)Process (computing)State of matterSelf-organizationSet (mathematics)Staff (military)Multiplication signData centerLibrary catalogSubsetConcentricDescriptive statisticsTemporal logicStatistical dispersionWater vaporPrice indexKey (cryptography)Adaptive behaviorWeb applicationSoftwareTime seriesDemo (music)Open setText editorDemosceneDigital photographyMetrePerformance appraisalIndependence (probability theory)Game controllerLatent heatCASE <Informatik>Software developerData modelImage resolutionDomain nameDiallyl disulfideMathematical analysisGreatest elementCartesian coordinate systemElectric generatorDigital rights managementVisualization (computer graphics)Router (computing)Variety (linguistics)Traffic reportingBulletin board systemWeightDemocratic Action PartyData storage deviceSource codeConsistencyLecture/ConferenceMeeting/InterviewComputer animationXML
Text editorWeb browserSoftware bugContent (media)Green's functionMathematicsMathematical analysisLibrary catalogVisualization (computer graphics)Computing platformDynamische GeometriePlot (narrative)Event horizonAiry functionComputer configurationElement (mathematics)Graphical user interfaceDiallyl disulfideDistribution (mathematics)PressureComa BerenicesEstimationWeb pageAverageMach's principleAxonometric projectionTape driveComputer engineeringService-oriented architectureProcess (computing)Service (economics)Point cloudMobile appCartesian coordinate systemLimit (category theory)Range (statistics)Process (computing)Library catalogFluid staticsVisualization (computer graphics)Web 2.0Line (geometry)Magnetic stripe cardAdditionQuicksortComplex analysisResultantMessage passingLevel (video gaming)CurveElement (mathematics)Graphical user interfaceQR codeNumberStatisticsInteractive televisionConfiguration spaceOpen sourceHeat waveSource codeMappingSet (mathematics)Time seriesComputing platformWikiSound effectGUI widgetPoint (geometry)Mortality rate1 (number)User interfaceAverageWeb browserDifferent (Kate Ryan album)Degree (graph theory)Bit rateCodeLocal ringConditional-access moduleComputer-aided designPower (physics)Point cloudModul <Datentyp>Normal (geometry)Service (economics)SpacetimeText editorStandard deviationSoftware developerLoginNeuroinformatikInformation retrievalPlanningMultiplication signOnline helpSimilarity (geometry)Cache (computing)LaptopBitData storage deviceMoment (mathematics)Event horizonSingle-precision floating-point formatMathematical analysisPlotterLatent heatIRIS-TCASE <Informatik>TheoryProjective planeIndependence (probability theory)Revision controlService-oriented architectureDemosceneCuboidXMLComputer animationSource code
XMLComputer animationLecture/Conference
Transcript: English(auto-generated)
Thank you very much. Hi, everyone. My name's James Vondell. I work at ECMWF. And I'm going to talk about the Climate and Atmosphere Datastore. Apologies for the very wordy title of three very similar acronyms. So hopefully, if the talk is a success, you will understand what all these things are by the end. But before we get onto that, I'm
here from ECMWF, the European Center for Medium Range Weather Forecasts. For those of you who don't know us, we're an intergovernmental organization. We have 23 member states and 12 cooperating states. And I think we now have over 400 staff across sites in Reading, Bonn, and Bologna. We are a 24-7 operational numerical weather prediction
center supporting national weather services and businesses. And we are a research institution researching improvements to our weather and climate models along with climate reanalysis and reforecasts. And we currently operate two EU Copernicus climate services. So the Climate Change Service, C3S, the Atmosphere Monitoring
Service, CAMS, and we also support the Emergency Management Service, CAMS. So Copernicus, if you don't know, is the European Union's Earth Observation Program. So it's all about taking data from a variety of sources, including satellite data, in-situ observations, but also model data from climate and seasonal forecast
models. Ingesting these into the six core Copernicus services, including the Climate Change Service and CAMS. And then producing downstream services, such as applications, bulletins, reports, for our very wide range of interdisciplinary users, which include scientists, software developers,
but also policymakers, journalists, and the general public. So I'm going to focus on the Climate Change Service today. So C3S provides information about the past, present, and future climate, along with tools to support adaptation and mitigation policies. So we support a wide range of sectors,
from water management and agriculture, down to tourism, biodiversity, and many more. And we also issue monthly climate bulletins, which present the current state of the global climate every month, based on key climate change indicators. And we also operate the climate data store, the CDS, which is what I'm going to focus on today.
So the CDS is an online open data catalog offering a wide range of climate change data sets. And the core principle is that it is a simple and relevant catalog makes it as easy as possible to discover and access climate change data. We also offer, on top of the data, a wide range of online tools for analysis, visualization,
and also some interactive web applications as well for exploring graphically the data that we have available in the CDS. And the hope is that this will enable reproducible research and empower our users to spend less time handling the data and help them get to their results much, much quicker.
So there are, I think, nearly 140 data sets in the CDS now. But there are four main kinds of data sets. These include observations, including satellite observations and in-situ observations, climate re-analyses, which combine observations with models to generate consistent time series.
So if any of you are familiar with ERA5, that's the flagship re-analysis data set generated at ECMWF, which has hourly re-analysis back to 1950. And that's the most popular data set in the CDS. We also broker seasonal forecasts from seasonal prediction systems from around the world.
And as these forecasts are updated, the latest data is updated in the CDS. So you can always get the latest information. And also, climate projections data. So these are models which project future climate change for different greenhouse gas concentration scenarios. And if any of you are familiar with the CMIP projects, CMIP5, CMIP6, and Cordex are all data sets
that you can access through the CDS. So the core design principles of the CDS, it's a distributed system made up of sort of six core components. The most important thing is, obviously, the data. So we have lots and lots of different data suppliers. And they all provide us data in many different formats through many different interfaces
and different endpoints. So some of them are accessible over HTTP, some of them using OpenDAP, some of them using web processing services, all sorts of things. And also, they're all in different file formats. We broker data in CSV, in GRIB, NetCDF, TIFF, shape files, all sorts of things. But crucially, we want to provide our users
a unified web interface that takes away any of the hassle of understanding all the different data suppliers and having just a single place they can go to access their data. And we also want the data, wherever possible, to be interoperable. So we have what we call a Common Data Model, or CDN, which we use to standardize or harmonize
the data from our various suppliers so that they can be understood by our compute tools, which we make available to users to analyze and visualize the data sets. So the CDS has been operational since 2018, and we have over 140,000 registered users. We currently have 139 data sets in our catalog,
and we serve over 100 terabytes of data daily, and sometimes up to a petabyte a week. And we've recently migrated the CDS to our new data center in Bologna, which we were lucky enough to visit on Monday this week. That's a photo of my colleague, Eddie, inspecting the system on Monday. And we have also 34 interactive web applications,
which are nice GUI interfaces to explore data that's available in the CDS. This is usually where I do a quick demo. So apologies for the slightly wonky PowerPoint animations. So when you go to the CDS, you can search for data. You're using a faceted search, so you can ask specifically for different spatial domains,
different variable domains, temporal coverage. Or you can search for something specific, if you know what you want. So in this case, I've searched for the ERA 5 reanalysis data set that I mentioned earlier. And when you click on a data set, you're presented with this overview. So this explains what the data set is, gives you a description of its temporal resolution,
coverage, spatial coverage, all of the important information you need to know. And then you can click on this Download Data tab at the top, which produces a web form. And this is where you can specifically decide exactly which data you're interested in downloading. So you can subset each data set to pick the variables, the time steps, the spatial coverage that you're
interested in looking at. So in this case, for the ERA 5 reanalysis data set, we have a 2 meter temperature from the reanalysis. And if I could scroll down, you would see there's also, you can select the temporal coverage you're interested in, the years, months, and days. And then click on a Submit button at the bottom, which will fire up your request.
And when it's done, give you a nice URL to the data that you've requested. We also have a quality assessment of our data sets. So these are assessed by an independent evaluation and quality control team. And essentially, they assess the technical and scientific
quality of each data set. So you can read about all of that under this tab. And then we also have documentation if you are interested in the finer detail, more information about any of the data sets and how they were generated. So that was just for the ERA 5 data set. But crucially, for all of our data sets, we have the same simple and consistent interface.
So here we have a seasonal forecast data set, satellite observations, and climate projections data sets. And in each case, we have a nice, clean, easy to use web form hiding away all of the behind the scenes stuff about all of our different data providers and different endpoints. And I've also put here, we also have a Python API, which you can use to access our data.
And if you fill in one of our web forms to select the data you're interested in, we always have a button at the bottom, which lets you translate that into a Python API request, which you can then copy into your Python session and make that request from there. So then on top of all this data, we also have the CDS toolbox. So the toolbox is essentially a simple Python editor
that lives in your web browser, which you can use to retrieve and analyze and visualize all of the data in the CDS. So we have a wide range of tools available, which you can see down the left-hand panel here for doing anything from statistical analysis to producing interactive web maps.
And the full catalog of data sets is available directly in the Python API. So you can do all of this in the browser and no downloads are needed. So I wanted to give a quick example here. So I don't know if you can actually read that very well. But in this particular case, I'm retrieving 60 years of data for Florence from the year of five reanalysis
data set and producing climate stripes, which show how much hotter or colder each year was than the long-term average. And we can also produce configurable GUI elements. So you can choose which month of the year you're interested in looking at, in this case, August. And you can see that the past few Augusts
have been a little bit hotter than usual in Florence. And we have a wide range of interactive visualizations you can produce, including climate stripes, but also interactive maps and all sorts of other things which I'll show in a bit. And this is also supported by back-end caching. So I mentioned this is using 60 years worth of era five data. That kind of request is quite big
and takes a long time to run. But if you run it once and then run the same request again, the result should be cached so that you get it almost instantly. And once you develop these sorts of applications in the toolbox, you can share them with other CDS users. And we also have a help and support team who are always available to help with any issues you have
along with comprehensive documentation and examples and an application gallery. Just one other very brief example, similar sort of thing, but this time producing an interactive web map. This one is looking at 30 years of era five reanalysis over the entire globe for the 25th of August, i.e.
today. So for the last 30 years, look at the average temperature on this date every year over the past 30 years. And I just wanted to demonstrate how easy it is to go from our data straight into an interactive web map result using a really simple Python API. And this is all supported by skinnyWMS, which you would have heard about if you attended my colleague
Eddie Rozert's talk yesterday. And if you didn't, you can scan that QR code. Hopefully it will take you to his talk page and you can view it once it becomes available. But the key message here is that we have data discovery, analysis, and visualization all through one single online platform. So here's just a few more quick examples of the sorts of features we have available.
So we can produce interactive maps with additional features like sliders and click events. So clicking on, say, a region in Europe, you can then produce a time series or do a data extraction at a specific point. We also provide feature-rich and configurable graphical elements.
So here's some more complex layouts of applications. And all of these layouts are fully configurable in the Python API in the toolbox. And we also have many, many flexible and customizable visualization tools, including things like climate stripes that I showed earlier, but also bar charts, line plots, all sorts of static plots and maps.
So we have a catalog of applications on the CDS website, which are all produced in the CDS toolbox. So if you go to the CDS website, you can see this Applications tab and have a browse. There are currently, I think, 35 published applications. And they all have a source code tab available in each one.
So you can have a look at how they were developed. And you can even copy that source code into your own toolbox space and run them yourself, play with them, have a go. So I just wanted to show a couple of examples of applications. If you scan these QR codes, in theory, it should take you straight to the application. But they're not all fully designed for mobile.
So you might need to play around with the landscape and portrait to get it working. But this is an example of an application looking at 40 years of era 5 reanalysis over the globe, where you can click on a point. And it will show you the typical climate at that point around the world.
We have another application looking at how close we are to reaching the global warming limit of 1.5 degrees C agreed under the Paris Agreement and how the rate at which we're approaching that limit has changed over the past months and years. So this one's a very different kind of application with a very focused message.
And then we also have applications which are aimed at specific sectors, the sectors that I mentioned earlier. So this one's for the health sector, looking at heatwave days and heat-related mortality for nine European cities. So for this particular application, we can look at Rome and how the number of heatwave days is expected to change under climate change
and the effect this could have on the number of heatwave related deaths per year. And I should stress again that all of these applications are fully developed in the CDS toolbox. And all of the data sets behind them are fully available in the CDS catalog. So finally, I just wanted to mention a few successes and challenges that we've had and what
our plans are for the future. So all of these applications I've showed you are provided using bespoke services, which provide powerful cloud processing. This all happens online. But it requires quite a steep learning curve to fully utilize. So although we are using standard tools like X-Ray, Pandas, and Friends behind the scenes,
you can't actually make use of those tools directly in the workflows we develop in the toolbox because you have to use our service-based infrastructure. We have a very powerful broker, which allows us to process requests for data sets, toolbox workflows, and applications. But it does mean that they all compete for the same resources, which means if, for example, we publish a new application that becomes very popular,
that could have an impact on the performance of standard data retrieval requests. And also, we have our always online platform. So you have to develop toolbox workflows in our online editor, which means you don't need to download any big data. And you can work anywhere in the world on anyone's computer as long as you have your login details.
But it's not then possible to work offline. So you can't then copy your toolbox workflow into your local Python session and then run it alongside your other code. So the Climate and Atmosphere Datastore, CADS project, will modernize the CDS. So we've added an A in there, crucially. So the Climate and Atmosphere Datastore
will incorporate the C3S Climate Datastore and the CAMS Atmosphere Datastore. So we're incorporating atmospheric data into this new version as well. And there are a few things that we want to do. So we want to move towards a more modular and open source code base that's fully compatible with scientific Python. So moving away from our fully service-based infrastructure
and offering users the ability to run things offline using the tools they're familiar with, like X-Ray, Pandas, and the like. We want to introduce improved interfaces to our ever-growing catalog, including the ability to interrogate the availability of data and the status of the service from the Python API. We want to move towards more infrastructure independence,
by which I mean we want to have applications hosted using different infrastructure to everything else. So if we have, for example, a popular application, it doesn't have any impact on normal users and downloading data through the normal CDS catalog. We want to have more platform flexibility, which sort of ties in with the modular open source code base
I was just talking about. So by this, I mean the ability to run online in the cloud as we do now, but also on your local laptop. And we want to move our online text editor interface more towards Jupyter so that our cloud resources and web interface can be underpinned by a more familiar interface.
But crucially, we want to keep our wide range of analysis, visualization, and GUI tools in line with the original CDS so that everything that we can do at the moment, we can still do, just hopefully in a more flexible and open way. So thank you very much, and should have time for questions. Thanks.