We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

xcube as a platform for spatiotemporal data analysis and visualization (Python tutorials)

00:00

Formal Metadata

Title
xcube as a platform for spatiotemporal data analysis and visualization (Python tutorials)
Title of Series
Number of Parts
17
Author
License
CC Attribution - NonCommercial - NoDerivatives 3.0 Germany:
You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer
Production PlaceWageningen

Content Metadata

Subject Area
Genre
Abstract
Xcube is an open-source xarray-based Python package and toolkit that has been developed to provide Earth observation (EO) data in an analysis-ready form to users. xcube achieves this by carefully converting EO data sources into self-contained data cubes that can be published in the cloud. In this session you will learn about the ecosystem around xcube, which allows to access different data sources and turning the inputs into data cubes. These data cubes can then be easily used for spatiotemporal data analysis and visualization. After a brief introduction about the software components, we will go step by step though some example Jupyter notebooks and finally we will dive into a hands-on session with a little challenge. For the session you will need a laptop with an internet connection, some basic knowledge about Python and already installed miniconda (https://docs.conda.io/en/latest/miniconda.html) which is used to download the necessary Python packages for the session. Prior experience with Jupyter notebooks will be helpful, but not mandatory.
Open sourcePerfect groupXMLComputer animation
Enterprise architectureSoftwareService (economics)SpacetimeInformationIntegrated development environmentOpen sourceSoftware developerSoftware industryInformationIntegrated development environmentExpert systemInformation technology consultingData managementState observerService (economics)GeometryExploit (computer security)Open sourceSoftwareMereologySpacetimeComputer animation
Software developerSoftwareProcess (computing)Standard deviationRevision controlGraph (mathematics)AudiovisualisierungMathematical analysisOrthogonalitySoftware frameworkComputing platformRectifierInternet forumSharewareContent (media)CubeServer (computing)File viewerVisualization (computer graphics)InformationOpen sourceFunction (mathematics)File formatData storage deviceArray data structureFile formatVisualization (computer graphics)CuboidVariable (mathematics)Web-DesignerNumberCubeMereologyBitComponent-based software engineeringPoint cloudArray data structureSharewareComputer fileMedical imagingMultiplication signSoftwareState observerFile systemInformationOpen sourceProcess (computing)CASE <Informatik>Queue (abstract data type)Functional (mathematics)Data transmissionStandard deviationCartesian coordinate systemElectric generatorPerfect groupMathematical analysisRevision controlStack (abstract data type)OnlinecommunityLatent heatShared memoryGraph (mathematics)Dimensional analysisComputer animation
Software frameworkOpen sourceLatent heatServer (computing)Configuration spaceCubeFile viewerData storage devicePersonal digital assistantAxiom of choiceMathematical optimizationFile formatComputer-generated imageryData storage deviceUniform resource locatorMedical imagingSoftware frameworkSet (mathematics)File viewerCombinational logicTime seriesVisualization (computer graphics)Open sourceServer (computing)Computer animation
Stress (mechanics)Sanitary sewerFile viewerLibrary catalogStack (abstract data type)Open sourceUniform resource locatorCartesian coordinate systemFile viewerProcess (computing)LaptopData storage deviceFormal languageRight angleComputer animationProgram flowchart
Menu (computing)CubeVariable (mathematics)Hausdorff dimensionCoordinate systemExtension (kinesiology)Installable File SystemOpen setElectronic meeting systemRead-only memoryMathematicsSpacetimeConsistencySatelliteProgrammer (hardware)Pairwise comparisonControl theoryMathematical analysisFile viewerServer (computing)LaptopComputer-generated imageryCodeFunction (mathematics)Series (mathematics)DatabaseVector spaceVisualization (computer graphics)Operations researchService (economics)Digital rights managementReading (process)Data storage deviceWeb pageComputer fileGoodness of fitSatelliteVariable (mathematics)LoginFile systemView (database)Key (cryptography)Semiconductor memoryUnified threat managementExtension (kinesiology)ConsistencyDatabaseDisk read-and-write headMedical imagingSet (mathematics)Computer programProcess (computing)Visualization (computer graphics)LaptopPairwise comparisonComputer reservations systemGame controllerFile viewerVector spaceMetadataServer (computing)PermanentSoftware developerDirection (geometry)SubsetMultiplication signData loggerWeb 2.0Operator (mathematics)Flow separationMathematical analysisOpen setSharewareGreatest elementData structureCubeData managementInternet service providerTime seriesCoprocessorPhysical systemIntrusion detection systemImage resolutionQuicksortMereologyDegree (graph theory)State observerComputer animation
Computing platformInformationFile viewerFlow separationOrder (biology)Moment (mathematics)Numeral (linguistics)Projective planeOpen setSharewareCollaborationismService (economics)Open sourceState observerVirtualizationComputer animation
SharewareCollaborationismModemCubeDigital filterView (database)Modul <Datentyp>Open setInformationLaptopDefault (computer science)Graph (mathematics)Boom (sailing)Core dumpElectronic visual displayComputing platformWebsiteIntegrated development environmentExtension (kinesiology)Computer fileAsynchronous Transfer ModeInterface (computing)Kernel (computing)AerodynamicsSet (mathematics)MathematicsData storage deviceInheritance (object-oriented programming)Raster graphicsLink (knot theory)Perfect groupError message1 (number)LaptopShared memoryDisk read-and-write headMultiplication signCodeCategory of beingExtension (kinesiology)Uniform resource locatorBitQuery languageElectronic mailing listInformationProjective planeGoodness of fitCellular automatonSurfaceRevision controlContent (media)Parameter (computer programming)Virtual machineSharewareMoment (mathematics)Computer animation
Random numberMereologyVariable (mathematics)WebsiteFrequencySurfaceComputing platformComputer fileKernel (computing)CodeInterface (computing)Asynchronous Transfer ModeParameter (computer programming)Computer-generated imageryPlot (narrative)File viewerLaptopDisintegrationConfiguration spaceData storage deviceNetwork topologyPrinciple of maximum entropyFLOPSData storage deviceMoment (mathematics)MetadataSet (mathematics)Multiplication signLink (knot theory)LaptopShared memoryProcess (computing)SpacetimeRange (statistics)Covering spaceIP addressAbsolute valueSelectivity (electronic)MereologySubsetInformationSuite (music)Variable (mathematics)File viewerFrequencyPlotterSemiconductor memoryServer (computing)TimestampCubeCuboidParameter (computer programming)BitRepresentation (politics)Shape (magazine)Extension (kinesiology)Point (geometry)Attribute grammarLocal ringAddress spaceSystem callMathematical analysisLevel (video gaming)SoftwareComputer animation
Asynchronous Transfer ModeNetwork topologyComputer fileKernel (computing)View (database)Interface (computing)File viewerInformationProxy serverServer (computing)Link (knot theory)Variable (mathematics)MultiplicationCodeWeb browserSurfaceComputing platformDisk read-and-write headFunctional (mathematics)Link (knot theory)Electronic visual displayRight angleSet (mathematics)Scaling (geometry)CASE <Informatik>MetadataLevel (video gaming)Maxima and minimaGraph coloringMultiplicationComputer animation
CodeInterface (computing)Kernel (computing)Thread (computing)Dependent and independent variablesFile viewerServer (computing)WebsiteMathematical analysisError messageException handlingComputer filePrinciple of maximum entropyAsynchronous Transfer ModeMaxima and minimaNetwork topologyOnline helpMilitary operationCellular automatonVideo game consoleLaptopFunction (mathematics)View (database)Level (video gaming)Open setRead-only memoryCubeData storage deviceVisualization (computer graphics)MetadataData storage deviceFile viewerBitMetreError messageLink (knot theory)Computer animation
InformationComputer fileNetwork topologyInterface (computing)CodeKernel (computing)Variable (mathematics)File viewerProxy serverWeb browserLink (knot theory)Server (computing)Inclined planePrinciple of maximum entropyAsynchronous Transfer ModeBootingMetrePoint (geometry)Euler anglesTimestampMetreInformationUniform resource locatorBitTime seriesComputer fileFile viewerShape (magazine)Server (computing)Computer animation
FehlererkennungOpen setBookmark (World Wide Web)GoogolUniform resource locatorServer (computing)Series (mathematics)Point (geometry)CubeSharewareArithmetic meanPoint (geometry)Graph (mathematics)AreaVariable (mathematics)Execution unitPlotterElectronic mailing listGeneric programmingServer (computing)Computer fileInformationShape (magazine)Computer animation
Server (computing)Proxy serverFile viewerVariable (mathematics)Kernel (computing)Network topologyMenu (computing)CodeInformationBit rateWeb browserLink (knot theory)Asynchronous Transfer ModeSemiconductor memoryMereologyForm (programming)LaptopData transmissionArray data structureFile viewerSet (mathematics)outputComputer animation
SharewareComputing platformStructural loadSource codeInstallation artEmailNetwork topologyCovering spaceLevel (video gaming)Message passingTouchscreenLink (knot theory)Line (geometry)PasswordVideoconferencingPoint cloudEnterprise architectureZoom lensFile viewerPhysical systemIntegrated development environmentOpen sourceCubeYouTubeComputer fontSoftwareCovering spaceLink (knot theory)Point cloudSpacetimeNeuroinformatikServer (computing)Source codePoint (geometry)AreaCodeMereology1 (number)EmailoutputLaptopRepository (publishing)AlgorithmSet (mathematics)Plug-in (computing)Moment (mathematics)Component-based software engineeringInstallation artNetwork topologyElectronic mailing listData storage deviceBitFile viewerFunctional (mathematics)Task (computing)Slide ruleComputer animationMeeting/Interview
CodeKernel (computing)Network topologyView (database)Menu (computing)Asynchronous Transfer ModeLaptopCubeSheaf (mathematics)Data storage deviceCodeLink (knot theory)Set (mathematics)Covering spaceBitMultiplication signComputer animation
Server (computing)Software developerElectronic program guideComputing platformModemCubeLink (knot theory)Integrated development environmentSystem callFile viewerComputer fileMessage passingPasswordEnterprise architectureZoom lensVideoconferencingPoint cloudLattice (order)Reading (process)Repository (publishing)Exploit (computer security)CodeInformation securityGroup actionData typeRevision controlChecklistGame theoryNetwork topologySoftware repositoryCovering spaceLevel (video gaming)Form (programming)E-learningTouchscreenPhysical systemAlpha (investment)CubeArithmetic meanPatch (Unix)Functional (mathematics)Software repositoryMachine learningLink (knot theory)Perfect groupSet (mathematics)Virtual machineBitSlide ruleRight angleCASE <Informatik>Computer animation
Network topologyCovering spaceLevel (video gaming)MereologyOpen sourceNetwork topologySocial classCovering spaceSpacetime1 (number)Selectivity (electronic)SubsetDefault (computer science)Variable (mathematics)Menu (computing)EmailSemiconductor memoryResampling (statistics)AreaTimestampAuditory maskingMultiplication signFlagPoint (geometry)CubeGraph coloringSet (mathematics)Sparse matrixBitFile viewerNumberMappingImage resolutionLevel (video gaming)Computer reservations systemOcean currentInteractive televisionFlow separationRevision controlAudiovisualisierungMereologyMoment (mathematics)Configuration spaceMusical ensembleTime seriesLaptopMetadataProcess (computing)PlotterSuite (music)CASE <Informatik>Range (statistics)VideoconferencingConnected spaceData managementDifferent (Kate Ryan album)Validity (statistics)Integrated development environmentMixed realityInterior (topology)MetreCategory of beingCentralizer and normalizerLoop (music)Square numberElectronic mailing listCuboidTotal S.A.Server (computing)Field (computer science)PixelControl flowVirtual machine2 (number)Sampling (statistics)Computer animationJSON
Transcript: English(auto-generated)
Hi everyone, thanks for joining today. I'm Alicia, I'm from Bruckmann Consult, it's a company in Germany. I'll give you a short overview about the company in Möngen. My
background is geoinformatics and today we're going to have a look into xQube, which is an open source ecosystem for f-observation data, data cubes actually, and it's based on Python. And I would like to ask you, how many are familiar with Python, like briefly?
Okay, perfect. Looks very nice. And are you familiar with Jupyter Notebooks? Have you used them before? Cool. Very nice. Thanks a lot. So Bruckmann Consult, it's a small company. It's located in Hamburg and we develop open source software for exploitation of
environmental data. So not just earth observation data, but also environmental data in general. And we provide consultancy and geo information services to the public and private customers. And we have a diverse customer base. So among them, we have large space agencies
and European institutions. And as you can see in the image, we have different sectors a bit. So we have the software development part. So we are not a pure software development
company, but we also provide geo information services, as well as we have in-house expert knowledge, who then give advice to environmental managers or to the public. So yeah, just to give you an idea about who we are. And actually the company evolved from remote sensing research. So
the start of it was remote sensing researchers who then evolved into this more software company in the end. Then maybe some of you have heard about the software SNAP. So it's Sentinel
Application Platform, which we are lead developers since 2014. And it's one of the standard software for processing and analysis of earth observation data. And the current version is 9.0. And it's it's made for visualization analysis. Then you can do graph processing with it,
reprojecting, autorectification, just to name a few. And we have a large user community. So in case you're using it and you want to or you want to use it, there's a good forum, you might find the answer to your question. But today is not about SNAP. I just wanted to say that we are also part
of this. Today is going to be about xQube. And that's what you can expect from today's session. We will have two parts. So the first part will be input, a little bit about the software. I'll show
you the components. And later on the second part. So depending when we're through with with the input, I'll have a little task for you. And we'll see how far you will get with using xQube yourself. So from the first part, you can expect to hear a bit about the beginning of xQube. We will look at
why we developed xQube and still develop it. I'll be talking about data cubes a lot. And I heard in a session already also about data cubes. So I'll give you a data cube definition how we understand it. And later on, we'll see a typical workflow. Once you've heard a bit about the
components, I'll show you an overview to show you where they're located. And then we will look a bit more into detail into three different components of the ecosystem. And in the end,
a short demo with Jupyter Notebooks. So in the beginning, there were scientists who had the desire to tackle environmental questions with conveniently accessible data and to make use of the vast number of data sources and the data sources are growing every day.
And then they also wanted to visualize and share the process data to communicate the information that's derived from these observations. And if you are data scientists or working with data, most of the time you spend cleaning up the data. So that's also what we do a lot and xQube is trying
to help to do. So the purpose and ambition is to provide convenient access to as many gridded data sources as possible, which then allows you to generate analysis-ready data cubes
from these different sources. And that xQube provides high-level tools and functions to exploit these and manage these data cubes and also serve this data. You see, we serve the data directly from XAR. XAR is the data format which I'll briefly introduce to you in case you don't
know it yet. And then the next step is also the visualization part. And xQube is a Python package and it leverages popular Python data science stack. So X-Arrays are Dask. So they are the base
of xQube. So what's our data cube? So here you see a very small data set, let's say. So here we have, this is our data cube. So we have a temporal data set, a temperature data cube,
and it has three dimensions. You see time, y, and x. And each of the dimensions have a shape. And then this data cube contains two variables, which is sea surface temperature and land surface temperature. Both are data arrays. And both of the variables have three
dimensions, time, y, and x. And they share the 1D coordinate variables. So both of them share the same time, the same x, and y. So there is no data duplication or anything. And if you have a
data cube, you can be sure that the data is for the same location, no matter which variable. And then we started about two or three years ago to use the XAR format, which is a cloud-optimized
format. And it's a file storage format for chunked and compressed n-dimensional arrays based on an open source specification. I don't know. Have you seen this image before, maybe if you're used to X-ray? So they have it in their logo. And what Zari does, so you see here these little boxes.
Yeah, you can see them. So each big box is like the latitude, all the latitude values. And then you have the little box, which is a chunk. And each of these chunks is saved in one file. So here we have longitude. It's like roughly five boxes down and five boxes wide.
So you will have 25 files storing just the latitude information, same for longitude information, same for the latitude in this case. And then precipitation is also in time. So there you have each of these box will be stored in one file. So it might sound a bit crazy to store
the data in so many files because it sums up quite quickly. But it's very convenient, for example, for cloud processing. So when you have a data queue saved in XAR in the cloud, you just request
these files that you need and not the whole data set, just the chunks that you need. And this reduces, well, transmission time and also allows you to put less requests on the data.
And it's quite convenient if you save it in the cloud. Oh, have you heard of XAR before? Nice. Perfect. So the typical workflow with X-Cube would be to generate an X-Cube data set from some
EO data source using the data store framework. We'll look at the data store framework in a moment. And then while you're doing this or before you generate the data, the data cube, you should already think about what's the optimal way to choose your chunking,
for example, because it might differ depending if you want to do a lot of time series analysis or you want to visualize it. So it's also possible for past visualization to do pre-calculated image pyramids, which are then displayed quicker.
Then the next step, once you have the data cube, you persist it into some storage location. It might be in the cloud, so AWS S3, but it can be also locally wherever you need it. And then after you have the persisted data set, we usually internally, we configure
an X-Cube server which serves the specific data and then configure an X-Cube viewer to visualize the data. And in the end, it's also possible to combine it, of course, with future data,
for example, from X-Cube GeoDB. So I've thrown all these X-Cube something at you. Let's have a look at how it fits into the ecosystem. So we have on the bottom, that's the base, basically. We see many data sources. So we have EaseSCCI, Oakland Data Portal, we have Copernicus
API, there's the Sentinel Hub API. You can access EOData, which is on S3, and Siemens API. And it's open to be like new data sources can be added. So I'll talk about it later as well,
but it's open source and you can contribute. You can create your own plugin and add a new data source if it's not existing there yet. And then we have the geodatabase store, which is also within the X-Cube data source. So the data sources are a bit the base.
And then we either persist them, so we put them into SAR cubes on S3 or whatever location you wish to, or you can also do some processing directly on the data stores. So you don't have to persist them,
like duplicate the data from somewhere else before you can use it. You can use it on the fly and do your analysis on it. So you can already make use of these APIs directly. And also, so this X-Cube processing or processing whatever you can do in using Jupyter notebooks
and using the Python data stack. Or actually, if you have them persisted in SAR, you are not bound to Python, right? So there's other languages that also work with SAR, so that's not a problem.
And so, yeah, you can make use of third-party applications. And then we have the X-Cube viewer here in the top right corner, which visualizes the data from the X-Cubes.
And the data stores that we just looked at at the very bottom. So they're making use of data APIs. I mentioned already the CCI Open Data Portal. We have two store IDs there. So CCI ODP
collects directly from the API the data. And for some data sets, we've realized we're using them so much and they're huge. And it takes really long time to access them on the fly. So we've created a CCI SAR store, like mirroring the data in S3 buckets. So that's why you have this two
store IDs here. Now we have the Copernicus Climate Datastore, which has the store ID CDS. Sentinel Hub, which you can make use of if you have a Sentinel Hub account. Also CMAMs. So Sentinel Hub is the only one where you have to pay or you have to get some
processing units. For the others, you just have to create a login and it's free to use, but still you need to pass the credentials also with the X-Cube data stores to it.
And then just recently, that's why it's still under development. It's not released yet. We have also a SMOS data store. But these are data APIs, but you can also make use of file system like data stores, like S3. Oh, Azure Blob File System, I think is the long name for ABFS,
file and memory. And of course, like I said, you can add your own extension. Here on the right side, you see a screenshot from a Jupyter notebook,
just to showcase briefly how you would make access of these APIs. So on the top, you just... Is it large enough for the back? I hope. Okay. Is it too small? I'll try. I'm not sure if I can enlarge the PowerPoint, but we'll have a look at
the Jupyter notebook in a moment as well, and I can make it bigger. So I can talk you through. It's... I hope you just do the import of a new data store from X-Cube. Then you specify which data store you actually want to make use of. So in this case,
we have Sentinel Hub, and then you can list which Sentinel Hub data sets you can access. Here, we just have the Sentinel data sets and the DM, but it's also possible to access,
I believe, Landsat 8, and just you have to then add a different endpoint to the data store. Then you specify your cube. So with store open data, you say which Sentinel data you wish to use. So here we have Sentinel 2, and we specify bounding box, a resolution,
say which coordinate system we want to use, and then also we do a time subset. And then you can... It will fetch the metadata. So you will get a cube that you can see here on
the bottom. And the cool thing is that you don't fetch any data yet. It's just metadata. So you'll get the structure of the data set without actually requesting data. Yes. Just a small question, but do I really have to use degrees for the resolution as it
seems to be here, or is it also possible to work with metric units? You can also work with metric units. That's possible. And I don't know from the top of my head right now if you need to add something... Well, I think based on the CRS that you're giving, it will convert... It will understand. So if you give a UTM
CRS, then it will convert it into meters. Okay. But I know already I have to know which UTM sort of makes sense. So it cannot interfere. Good question. Yeah, I'll check and get back to you.
Okay. I think I have an example somewhere where you use meters. I just need to double check. All right. So in the hands-on, we will use the xCube CCI data store because it's one of the data stores that you can use xCube without registering for anything. So it's the easiest to just get
started. And I just wanted to briefly tell you about the CCI. So CCI stands for Climate Change Initiative. And it's a European Space Agency program. And it's aiming to provide accurate
long-term climate data records using satellite observation. And it's been launched in 2010. And it brings together scientific expertise as well as satellite technology to monitor and analyze key climate variables. So it's there to understand the Earth's changing climate.
And CCI aims to ensure consistency across satellite data records. So it calibrates, validates, and it has a quality control process. So this
allows for then, this consistency allows for accurate comparison and analysis over time. And I've said in the beginning that we usually aim for visualization of the data sets.
So what we use for this is the xCube Viewer, which is a web-based viewer. And it's made for all data cubes. And it works, for example, directly on ZAR in S3. So you don't have to have the data in the same place as your server. So it can be elsewhere. And the xCube Server part
is required because it's providing images, like image tiles, processed on the file, on the fly from the data, as you can see here in the screenshot. So it's not like it's then the image tile, which is represented here. And the viewer has, well, image functionality,
but also time series functionality, as you can see here on the side. So this time series then actually goes into the data directly. So it fetches the chunks it needs to create the time series. And you can compare different variables with each other and just have
a quick overview about the data cubes. It's currently, so we have a quite new development in there. xCube Server and Viewer can be used directly from Jupyter Notebooks.
And we will have a look at this in the demo. So you don't have to set up, like go through the pain of setting up the xCube Server somewhere externally, making a deployment of the viewer, and then hosting it somewhere on a web page. But you can use it interactively in your Jupyter Notebooks. But if you wish to have a look at a static one or a permanent web page,
you can head over to viewer.earthsystemdatalab.net. There we have several data sets displayed, and you can play around with the data that is there. And then we also have xCubeGODB.
Today I'll just give you an introduction to it like just this, because we will not have enough time to cover it as well. But just so you have heard of it, so xCubeGODB is a geospatial database
and it's been developed within this xCube ecosystem. And its purpose is to offer an API to store, fetch, manipulate, share, and view vector data. So the data is stored also in a georeference database and simple usage of basic crit operations is possible. So you can
create, read, update, and delete. And it has some user management and is accessible via Python API. So just that you know that it exists for whenever you might need it.
Let's have a look where xCube is in use. So this is just a selection, but we have a, which was formerly a new project called CianoAlert. It's a CianoBacteria information
and notification service for several public customers. And you see here in the little screenshot that makes use of the xCube viewer. It displays water quality data within the viewer and the customers can have a look into it and explore the data, how the situation is.
It's near real time, so we publish the data on the next day. So we have data of the day, you will be able to access it tomorrow. Then we have EurodataCube, which is a commercial service for accessing, processing, and selling Earth observation data.
xCube is included there in the Jupyter nodes, Jupyter workspaces. Also, there's another, so Jupyter is quite a common way to use xCube. And we also have this agriculture virtual lab,
which is also a Jupyter hub. And it provides numerous data sources for agriculture research community. And then DPSDL, we'll look at that platform in a moment in the demo. It's an open platform for research and collaboration in Earth sciences.
And maybe that's an interesting one for you because it's made for researchers. And yes, I would like to onboard researchers to solve their questions within the platform. There will be a so-called NOR, so Network of Resource, offered soon where you can get
funding in order to use the platform. The platform is quite neat because it allows you to configure computational resources according to your needs. So yes, let's have a look at the demo. I'll have, so it's a Jupyter lab. I have my workspace and I hope I have not been kicked out.
Let's see. Yeah, looks good. And let's see. Tell me, is that fine in the back from the size?
Okay. How's that? The download, so I'll post a link in the matter most in a moment. So the example notebooks are like specific to the GPS code project, but they are still freely accessible
and you can make use of them. Although just be aware that there might be some project specific content to it, but it will give you some overview.
Yeah, I can do that. Let's see. How's it in the very back? Are you good? Perfect. Okay. So let's have a brief look at the CCI data store. So
I'll show you how it can be used. So in the screenshot before we already talked a bit about how you make use of it in the Python code. So here we have some mandatory imports. We have find data store extension because you might not know which extension
you wish to use. Then you don't know from the top of your head which data store parameters you can use. So we will have a look where to find these. So you have to connect to the data store itself.
So you have to have the new data store in there. And here we just have some utility imports. So they're not excuse specific. I'll share the link just after the demo, but for now it's not important that you have to have it on your laptops. I'll take you step by
step through them. So just a matplotlib setting here that I've already said that we want to know which extension we have on this chip. Still do. Okay. So some of you might have already installed
xCube and maybe tried already or you have it on your machine. When you do the find data store extension you might not see this whole list because in the matter most instructions I just
asked you to download the xCube CCI. So you might not see the whole list then. You will just see CCI-ODP and CCI-SR but not for example SMOS or CDS. So this is only once you have installed all
the plugins. Then you can get some basic information about them. So like the the title of the data store just to so you know what it is about.
Then we want to use the CCI-ODP today. So we go on so we get the data store parents. It's you can have a look there about some specific information like the endpoint URL of that data store. How many retries they are done before it gives up because I don't know if network error
or whatever. Yeah so this is just where to find some basic information. And then to actually use the the data API we have to connect with new data store with the CCI-ODP.
And you can name it whatever. I've just named it store here. So we're connected and maybe you already know which CCI identify which to use. And you can check if it's there.
For example this one I have to check why but it's not available anymore. Could be that they have removed that version and the new version is available. But actually more likely is that you want to query for some specific data. So it's unusual that you know your ID already.
So with story.getSearch data params schema you can get the schema how to search for data. And then we have properties and for example CCI attributes. So what I do quite often is to search
by the ECVs and then I check how are they abbreviated. So we have a list of ECVs here. We have land surface temperature, fire. Yeah so all the ones that are available.
Not everything that's on CCI-ODP is available via execute-cci because they also have shapefiles for example and this focuses on raster data. You can also search by sensor. You can combine multiple queries so you can search for the ECV
at the same time as the sensor and we do this in the next cell. So here you see search data and we search for the CCI attributes for the ECV, sea surface temperature
in a frequency day and then we also want the processing level L4. So it will now check the data store for the data sets that match this request.
Sometimes we get some warnings from ODP because something has not matched or there's a network problem but now you see four items have matched our request. And then you can have a look
what they're about. For example you want to check the time range, the time period that they are available, also the data variables and you can have a look at all of them and then decide which data suits you most. So once you've decided you take the data ID so this part here and you can
then request it. I know first if you don't know for example you want a subset you can ask which parameters can you give to the data store for this particular data set and it will tell you that you
may use the bounding box or the time range so you can include these in your request actually like the request of the data now. So here we take the ID we subset by variable name so we just have
analyze SST and a time range which I think covers at least almost the whole time. Just to show you
that it is so if we would load all this time range for global data it would never work right so it would just break a laptop or whatever. But here we're just fetching the metadata so we'll see we get an idea what the shape of the data cube is. We have here 5,000 timestamps in it.
We have the spatial extents and we have some information about the variables. Here you see a representation of the data cube so you can see how it's initially chunked and how many timestamps there are so it's a quite nice way. So one array is over a terabyte
also quite good to know before you actually then fetch the data. So this is a cool way to get some information about your data set but then once you actually
need make like need to make use of the data for example here for plotting it will fetch the data for timestamp zero so for the very first timestamp and now we're actually making calls to the CCI ODP to get the data. This takes a moment it's global data
right now yeah makes the request and we'll flop it. So it's cool to to get an idea what kind of data you need. I want to once you're sure and you know you want to do your analysis on a light like a bigger part you can then persist them to a
space where you you can access them. Cool any questions for this part? Like I said I'll share the link to the notebook in a moment. There are no questions for now I'll just show you how to make use of XCubeViewer in the Jupyter
notebook. So we just saw a matplotlib plot but we can also visualize it with XCubeViewer. So the first part is just as the one that we did so I'll just go through it so we
connect to the data store we make a data set selection here I've selected only a few days because when you visualize it in the viewer it will then fetch the data absolutely. So we don't want the one point something terabytes so we have seven timestamps
with the metadata we saw already. Okay let's skip this bit it's specific to CTP-SDR but now comes the part for the XCubeViewer. So to make use of it we need to import it. So with import viewer and we then
connected the data set let me just check how it was called. So it's data set here and let me just adjust okay we have data set and then we tell the notebook that we want to use the viewer
and we add the data set now. Now it's starting up the server in your kernel or in your workspace
and we add the data set which we have on the fly it's still in memory at least the metadata. You can call viewer info it will give you the server address if you do it locally you will get a local host address there and for the viewer as well but you don't have to go to the UL you can
also call viewer.show and this I'll make it a bit smaller now just so we see a bit more of it. Now let me I'll head over to the so the functionality is the same
I'll just head over to the link so you get a full display. Let's hope it's still working. Okay it seems like it needs some loading so let's try if we can work with this this side here.
All right so on the top you get the data set now we've just added one data set right so we have the CCI SST data set if you add multiple data sets to it you can compare them you would have a longer
drop down list here. We have analyzed sea temperature sea surface temperature here and I think it's not done fetching the data yet so it's hanging a bit.
And also here on the side you already it's not large enough. There's like the color map and the labels and these labels are
fetched directly from the metadata so in this case we know it's kelvin and this value scale is nonsense so if you don't specify it yourself it will look into the metadata we'll see if there's
like a value min max thing and then I will just use it so always make sure to check if your metadata is correct because this is yeah it's nonsense but let me just I'll break up with this because I have the data set also stored in S3 yeah it's complaining about something.
So sometimes it makes sense to then persist the data first and then you can visualize it with the viewer here we just with the on the fly it seemed not
to be happy but we can I'm sure with the CDS store it will be a bit more kind to us so here it's the same way as for CCI we request a data ID we want
error five two meter temperature and then it's fetching the data and we can set a title and here we set the metadata already to some sensible values so we have the two meter temperatures also
kelvin so let's yeah let's just use this instead okay so I'll check if the link works
all right so we now fetch the data again from from the API it's not persisted on my workspace yet you can see it's the title that I've set we have the two meter temperature we can see the
see the timestamp up here and then we can get some information that comes with the data in in this info panel and it's if you are interested in a certain location you can just point into it
and then you get the time series for that location and you can when you yeah click into the time series you will jump to the right timestamp in the in the imagery here so it's quite nice to play around a bit more interactively with data and to get a first idea
of it yes yeah I'll repeat the question for for the audience so the question was if you can import you can import a shape file or a yeah yeah so you can configure the server it's a bit more work but
it's possible and you can then include your your place that you wish to to add here so you will see it then in the drop down of my places we can add quickly to the viewer of earth system data lab
because we have one there and so yeah either putting points into it or you can also
draw areas so that's possible too oh no I'm stuck here and then it will plot it in the same graph you can compare it if you have many variables you can compare them also among each other and then for the earth system data lab we have somewhere
yeah we have included the shape file which has the country information so it's configured with the server and it then fetches it yeah from the server and you can if you have a list then you can select let's say belgium
and it will then zoom to it and yes you can explore they are not on the same plot because
the plot depends on the unit so if it has a different unit it will make a separate plot but you can compare different places for the same variable and then if it's a different variable with a different unit it will switch the plot
yes it's yeah here we have a czar but the one in the nadirka generic it's just an x-ray data set right so ours are x-ray data sets and here in the
it's just x-ray it's not persisted or it's just put into the form of x-ray data set and now we can can make use of it even in the viewer yeah yes do you use is the equipment that you're excessive they are stored as arrays in the memory or they were
like downloaded they are in memory currently yes so well there is data transfer of course so you have them in memory but they are not downloaded to your workspace so when i close now the just the notebook then i will not be able to go back and just say data set
i will not be there anymore i'll have to go through the steps yeah okay so let's just have a look back into the slides and we're almost at the end of the input part but
i just wanted to let you know well it's data science and not data magic so you can use and contribute to x cube it's all open source you can head over to github
there you find the source code you can check what we're doing there you can contribute or uh we've got issues we're also happy about this um if you don't want to install it from sources you can use the conda install from conda forge and um here in the list you see so we've
talked about x cube and x cube cci but there are many more plugins that go along with x cube um and most recently also the x cube 4d viewer and the buttons i'll come back to you in a moment um which is developed by partner company so it's not our own but you can contribute from outside
so if you have a czar you have to store it somewhere so if you persist it in a czar but it can be so if you have a server locally but you have your data in the cloud then you don't
have to put the data first to your server it can talk to the s3 bucket with the well if you have access to it right so if it's a public bucket you can just read it if it's not you have to have the
right credentials yeah but if it's locally then it needs to be in the same space as the server yes can i use this team to yes actually yeah so do you write your own algorithm now we use uh what's there with x-ray
and um yeah so actually the uh you might need it in a moment for the hands-on session the regretting part but um it provides um execute provides a resulting function which makes it maybe a
bit easier to handle regretting so yeah this is just in the slide so if you want to catch me later you can just uh write me an email also my colleague gonna he's happy to take any questions
um and then yeah can you add also data from a local computer with the data fonts in the company yes so so the question was if you can uh include your own local data um so x cube is software so you can use it either locally or when it's in the cloud it's a software component so
your data if you have a locally and you use x cube locally you can make use of it of course yeah and you can also upload your data to the cloud and then make it used to make use of it somewhere else and execute doesn't just work with um with xar it works with netcdx it works with
geotiff um cloud optimized youtube so it's uh it's not limited so yeah basically anything that xarray can read you can use with xcube
all right um any more questions for the input uh i hope it was not too quick but if you um we can have a look in the hands-on session if you have questions so i would like you to uh try to visualize fires which occurred in 2020 in europe but only the ones that happened
in tree covered areas so um here are some things you can use the xqcci store which we just used i'll give you the link to the notebook just in a moment so you can have a look what the code was
go through it and then we have a persisted land cover data set which you might find handy which is in a public aws pocket it's here in dps they're public also oops sorry i'll point you to
the um to the code in a moment as well and then um there's many example notebooks in the xcube get a repository as well as in this deep estl doc repository which is a bit more specific but you will find anything you wish i think for this task in these notebooks so let me just um do you have
all access to the channel to the execute channel uh in mattermost then i'll just post the links there and
so that's the regretting one that you might find useful uh maybe i'll just show you quickly that in the regretting notebook you will see how to access also the public um data cube for the land
cover if you want to make use of it so um in the very first section you will find that we create an s3 store it's a bit tiny um so here we specify the bucket we say that this time we don't want to
use a data store like cciodp like a data store api but s3 and then you list all the data sets that are available and it takes you step by step how to make use of that data set so that code snippet
is included in the notebook that i just cropped the link to and let me just give you the other links as well i'll give you the read the docs for x cube and the x cube repo
as well as the notebooks and and
how many of you have installed uh already uh x cube with mini conda are you okay cool and you're set perfect um yeah one question because uh i had to reinstall it um and i just see that it's also installing torch is there a specific reason for that because i mean
it's machine learning and now we're talking actually talked about accessing cost of data yeah true when you install x cube it patches torch yeah but i mean is there is a specific reason because i mean which is such a such a huge package yeah i mean i think it's not really required right to access restorator so yeah well we use x cube also for machine learning it might
be that there might be a function that we have not touched upon currently that uses torch because it's much bigger than just what we looked at um but i can check yeah why we if we need it actually
yeah because i mean it's stuff like downloading um i think 600 something the xq package is quite large i have to admit all right um yeah in case you have questions are you happy with um trying on your own and just explore a bit cool sounds good the question yes i'll post it also here in the chat so you can have
it all just do a screenshot maybe and i'll put it on on the slides as well okay so
and feel free just to ask if you're stuck or something i'm here to help maybe just to uh add because um like now actually the second part of the session started which
i was not sure how many questions we will have so it's uh we will take this also into the second like after the break uh but in the end we will wrap up and have a look at solutions or
we can have a look at the notebook that we have prepared uh for answering the question so um yeah just feel free to work now and take a coffee break and uh this in the second like in the beginning of the second part we'll still uh work on this question but maybe you're ready by then
i i'm not sure we'll see how it goes now in adapted notebook to the one that you have seen in the examples so here we do all the imports uh get the ccI ODP connected and fetch the land cover data from
um from the aws bucket and as i'm in an environment where we have credentials in it i don't need the anonymous one so the steps i think all of you have managed to to fetch the
data and so we have the land cover um one thing that i wanted to tell you because there was the question about this before so the land cover data set is actually a pyramid data
set so we save different levels and the level zero is like the highest resolution and then if you would take a more like a higher number of the level it will then uh it's a resampled already pre-resampled uh coarser resolution it's because we we visit visualize this this very data set in
the execute viewer for dpsl and it's faster for visualizing the data so that's why you have to select the base level here and so we get this and then i search for for fire i think i've seen this
for everybody as well so we have seven data sets uh i selected the fire monthly uh data set from modus which is on a grid uh version 5.1 or 5-1
and we fetch it from from the notebook from the odp and yeah as you pointed out it has vegetation class in it so actually it would be possible to use this
and but for the sake of resampling we still go through the land cover yeah comparing it with the land cover so because we want to do a subset for 2020 i just select
start and stop date and then i have a bounding box over central europe southern central europe i subset the land cover bit then i've plotted this before so you see
where it's located it's quite nicely to see which like that there are different classes here in the in the legend you see that it's the numbers so each number is coded to a class
which is in the metadata so here when we go to lc the class so that's the variable we plot it we see the flag colors and we see which meaning each flag has so no data cropland rain field and down here is in the audit list the num like the class classification to it
the classification number and so that's the variable we want to use then we also subset the fire to to that area here i've just plotted to be sure that there is any fire data so i've selected
january and i see that there fire in this area i've done actually a burned area cannot be minus
one but for plotting i just wanted to make sure that zero is is not the darkest color range so that's just to explain my range selection okay and then we resample so the land cover is the higher resolution data set so i decided to use
the fire as a source and then the land cover as a target and so i define
a source grid mapping, which is FHIR. I check it out so you can get some information about it. So you see which CRS it is, lat, long, bounding box. So things like this. Then I set the target grid mapping,
which is the one from land cover. And yeah, we can check it out. And then the next part is to resample in space. So I say, which data set would I like to resample? So FHIR. I set which source grid mapping.
It's the one from FHIR. And then the target grid mapping from land cover. And this takes a moment. I think it should take like one minute, maybe. We'll see. I hope I don't get the same problem as you did. Just now that it takes forever.
But I must also say that usually I went away and did something else during the time for sampling. So I'm not sure. Well, yeah.
But nobody ran into memory issues so far, right? So you were all fine with your computational resources. That's quite nice. Down here, we can also see that my memory is still fine. So I have 15 gigabyte on this machine.
So where the Jupyter notebook runs and it's at three gigabyte currently because of the plotting. In fact, it takes a bit. Okay, while we wait, we can have a look at what I've executed this yesterday.
Oh no, now it's ready. So yours too. Nice. Okay, so now we just check. So that's the source CRS. So it's still from the original FHIR one.
And now the targets here. So you see it's, where do we see the difference? Now here, we don't see the difference. Oh yeah, because this time we resampled from actually from LatLan to LatLan. So we don't see much difference here.
It's all the WGS 48, 84. So we can skip this. But if you would have had your source data in a different CRS, you would see then that they swap, like that it changed.
Okay, and now that I actually have the FHIR data set on the same grid as the land cover, or at least almost, I can have a look at the land cover classes.
So I take the flag values, so the names, and then I want just the ones that have tree. So I have this four loop, how's it called? When it's in one line, it's somehow called inline.
Yeah, yeah, list comprehension. But is it always four loop or is it? Well, you know what I mean. So we have it all in here. So I want the index, like, no, I don't want the index, but I want also the category.
So then the int value together with the name, and then I just filter by tree. So we get the tree categories. So I have everything which contains tree, for example, tree mixed sparse tree, and then I have the classification number in it as well.
And next part is to mask the land cover classes just to filter out the tree ones. So we do LC, so I create a mask from land cover,
land cover classes, and then I just request the ones that have these categories. And I do compute because most of the things are lazy with x-ray and x-cube. And then sometimes when you want to plot them,
it takes forever because all the steps beforehand now has to be done. So I just did a poor time saving to compute in it. Now I want to assign, make sure that the land cover tree
class has exactly the same values of the grid, like land latlan as a resampled fire one. So they are, although they are on the same grid, there's still some floating points sometimes that make a difference, and then they won't snap into each other. We know they should be on the same grid,
so we just assign the resampled fire lat and long to get rid of these tiny differences. Here, I've plot the tree mask. So you see three, all the yellow points or yellow areas
are the ones that contain something with tree, and the zero, like the purple one is non-tree. Looks quite all right, like from just looking at it. And then the next step would be just to,
I've decided to create a new variable and resampled fire. So I say I want a new variable, which is called burnt tree area. And I mask out the burned area with where,
so I say, okay, where we have the land cover tree mask, please put this into this new variable. I select time zero because the land cover one has more timestamp than just one.
And I just took, I just needed one timestamp. So now we have the resampled fire dataset, and now we see a new variable appeared, it's called burnt tree area in there.
Okay, so here it's the burnt area that I've plotted before in the top. And now if we have a look at the burned tree area, we see, can you see it a bit? So it's more coarse. You see the little pixels where there is no land cover,
which is trees. So the highly red is where land cover was tree, but then these pixels are not, like the boxes are not as clear as here anymore. So we see that it has actually considered the classes.
And for visualizing it with X cube, I just, I put this dataset locally, so I've persisted it, so we can have a look at it now together.
So, all right, now we go. Yes, so I've done nothing to the metadata. So it's maybe not great. So we have here in the dropdown menu of variables,
you see total burned area and total burned area. That's because I've masked the total burned area without assigning it a valid variable name newly, but here in the legend, you see this is burned area currently selected.
So the one that's not masked, but when I selected the other total burned area, it's actually the burned tree area. So can you see it here? It has the variable name. It's just in the metadata. It has a title and that has been taken
from the other dataset. And then also we have the default color map and the default value ranges, which is one to zero, but I've had a look yesterday.
It's in square meters. So let's go high and let's select Magma. Oh, and we can hide the small values. So we can get all the areas, which are trees, but have not been burned. They're zero currently, but with hide small values,
we will get rid of those and we can, maybe I went too high. Yeah, now it's showing up, not the zero. Okay, so now you see on the map, we have little hotspots here
and it's in Portugal. So two years, three years ago. And you see that the center, the area was larger that was burned and that's January currently. Now it's December. And the further it goes out,
it's a bit less burned area in the tree area. And we can check if there were other fires in that very region with the time series. So you can see there's the one that we're looking at currently, but there was also one.
It might be that it was still the same fire, which was not distinguished, but in October, there was no fire in that region. And yeah, we have something in October going on in Spain, for example.
So that was my solution for subsetting and of course checking where there was burned area and the land cover glass tree or something that's connected to trees. Yeah, so do you have any questions for that?
We can only choose to join the email. Yeah, you can visualize it, of course. If you're just interested in the burned areas. Yeah, you can just say again. Sorry.
Yeah, you don't have to convert it somehow or make use of it, but sure you can use your own data. Of course. And also I can create a cube with several bit ratio,
whatever, and you have the same interactivity. Yeah, also if you have bands that create a true color image, so if you have a red, so RGB bands, you could configure it to,
so you assign them in the server configuration. There is a notebook, I'll search for it in a moment where it demonstrates how you make use of the server configurations and there you can configure the RGB one. So here you see, I cannot switch it on because no layers have been assigned.
Yeah, so that's possible. Cool, well, I'll post this notebook in the MetaMost channel as well. So you can just have a look at it. I hope I did not do any huge mistakes
that you will notice when looking at it later. But yeah, so from my side, thanks a lot for participating and for trying out this hands-on sessions. Really happy how you were keen to do this. And in case you're soon finished with your studies
or with your PhDs or whatever you do, we are somehow in hiring mode. So in case you like to, yeah, check us out the company, even if there might not be a job announcement
that suits you, you can still drop the CV and say who you are and why you're interested in working for us or do an internship that's possible too. Yeah, sometimes management is a bit slow in saying, but just ask again, maybe.
Yeah, can we cut this out from the video actually? Oops, no, it's just we're not a huge company and just there's lots of work. So yeah, thanks a lot. Thanks.