We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

openEO: Open Science for Earth Observation Research

00:00

Formal Metadata

Title
openEO: Open Science for Earth Observation Research
Title of Series
Number of Parts
351
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
The open standards, open source geospatial and open science communities still have a very limited answer to the question how researchers active in applied domains such as agriculture, ecology, hydrology, oceanography or land use planning can benefit from the large amounts of open Earth Observation (EO) data currently available. Solutions are very much tied to platforms operated and controlled by big tech (Earth Engine, Planetary Computer), particular programming languages, software stacks and/or file formats (xarray, Pangeo, ODC, GeoPySpark/GeoTrellis). The openEO initiative provides an API and a set of processes that separate the “what” from the “how”: users specify what they want to compute, and back-end processing engines decide how to do this. The openEO API is OpenAPI compliant, and has client interfaces for Python, R, and JavaScript, and in addition graphical user interfaces running in the browser or in QGIS. The underlying data model is that of a data cube view: image collections or vector data may be stored as they are, but are analysed as if they were laid out as a raster or vector data cube, e.g. for raster with dimensions x, y, band and time, or for vector with dimensions geometry, band and time. Because openEO assumes that imagery is described as STAC collections and the implementation is composed of open source components, it is relatively easy to set it up and compute on infrastructure where imagery is available through a STAC interface. Having a single interface to carry out computations on back-ends with different architecture makes it possible to compare results across implementations, to verify that EO processing is reproducible. So far, over 100 processes have been defined, and user-defined functions written in Python or R extend this ad infinitum. openEO was initially developed during a H2020 project (2017-2020). It is currently continued with ESA funding that has resulted in the “openEO Platform”, an implementation run by VITO and EODC where the general public can use the openEO interface for large scale computations. Several upcoming Horizon Europe projects will further support continued development of the API and openEO software ecosystem of clients and back-ends. Since the initiative is designed to be an open science, all users and developers are invited to engage. We will present the current state of the openEO ecosystem and give an outlook to forthcoming developments.
Keywords
202
Thumbnail
1:16:05
226
242
Open setPublic domainUsabilityGoogle EarthProduct (business)Mathematical analysisOpen sourceCodeFunction (mathematics)Set (mathematics)SoftwareLibrary catalogComputer fileFocus (optics)Data analysisDigital rights managementKernel (computing)Reduction of orderVector spaceCubeFile formatMultiplication signNumberBusiness modelUser interfaceMusical ensembleSoftwareDirection (geometry)Data modelResultantClosed setCubeReplication (computing)Medical imagingComputing platformRight angle2 (number)Open sourceReduction of orderRegular graphCodeSubject indexingInstance (computer science)PixelExterior algebraMoment (mathematics)Open setMathematical analysisAreaVirtual machineSet (mathematics)Limit (category theory)Game controllerSelectivity (electronic)QuicksortGeometryReading (process)Time seriesSatelliteDivisorDiallyl disulfideComputer programmingBitPoint (geometry)AdditionArithmetic meanData analysisScripting languageCloud computingShared memoryComputerFocus (optics)Stack (abstract data type)State observerTrailUniform resource locatorObservational studyHard disk driveControl flowFigurate numberTable (information)Data managementProduct (business)Filter <Stochastik>Parallel portPublic domainOperator (mathematics)Dimensional analysisAverageDialectInstallation artComputer fileNoise (electronics)UsabilityIntegrated development environmentDifferent (Kate Ryan album)Block (periodic table)Analytic setDeterminantBlack boxHypothesisRaster graphicsBlogSelf-organizationProcess (computing)Query languageVector spaceComputer-generated imageryLecture/ConferenceMeeting/InterviewComputer animationEngineering drawing
Computer-generated imageryProcess (computing)CubePoint cloudSoftwareRepository (publishing)Software developerOpen sourceFormal grammarData structureGrass (card game)Client (computing)Visual systemThermodynamischer ProzessText editorScalable Coherent InterfaceWebsiteFile formatCone penetration testExecutive information systemService (economics)StapeldateiGraph (mathematics)Formal languagePerformance appraisalActive contour modelLevel (video gaming)Server (computing)Uniform resource locatorPerturbation theoryDialectEmailCategory of beingParallel computingFunction (mathematics)Operations researchMathematicsLogicComputer programmingTemporal logicSeries (mathematics)PixelScalar fieldField (computer science)Variable (mathematics)Business modelMultiplicationMusical ensembleDimensional analysisCanonical ensemblePixelQuantumImplementationMultiplication signCubePublic domainBusiness modelComputer programmingText editorWeb serviceProcess (computing)Formal languageCASE <Informatik>Web 2.0Point (geometry)Set (mathematics)Standard deviationGraph (mathematics)SurfaceOpen sourceSoftware developerInstance (computer science)Observational studyClient (computing)Operator (mathematics)Cellular automatonField (computer science)ExpressionFrequencyQuicksortScalar fieldMusical ensembleDimensional analysisRight angleVariable (mathematics)Latent heatMedical imagingFront and back endsSingle-precision floating-point formatPhysical systemProjective planeTime seriesCartesian coordinate systemBendingWeb browserGraph (mathematics)Limit (category theory)Row (database)Software repositoryComplex (psychology)File formatFunctional (mathematics)Point cloudState of matterThermodynamischer ProzessDigital rights managementView (database)Web-DesignerCodeHorizonData structureSelectivity (electronic)Computer-generated imageryNumberState observerGrass (card game)User interfaceComputing platformTouchscreenLogicLevel (video gaming)Visualization (computer graphics)MathematicsZoom lensOpen setDifferent (Kate Ryan album)Formal grammarUser-defined functionStack (abstract data type)Extension (kinesiology)StapeldateiComputerForm (programming)Service (economics)Personal area networkComputer animationSource codeXML
Vector spaceCubeReduction of orderVector processorFormal grammarStandard deviationSoftwareTime evolutionProcess (computing)Thermodynamischer ProzessObject (grammar)Machine learningAuthenticationCubeDivisorDimensional analysisVector spaceArtificial neural networkReduction of orderObservational studyAssociative propertyInstance (computer science)Virtual machineMappingGeometryWave packetDigital rights managementPublic domainComputing platformRaster graphicsPoint cloudDifferent (Kate Ryan album)Set (mathematics)SoftwareStack (abstract data type)Sampling (statistics)QuicksortVector processorProjective planeNumberStandard deviationProcess (computing)Reading (process)WritingIntegrated development environmentPolygonCapability Maturity ModelCartesian coordinate systemGoodness of fitThermodynamischer ProzessState of matterFile formatFile archiverExterior algebraLevel (video gaming)Device driverMedical imagingFront and back endsOnline helpCASE <Informatik>Open setDynamical systemEmailDiallyl disulfideOcean currentIdentity managementRight angleProduct (business)FrictionWeightPoint (geometry)ComputerTable (information)Web browserLine (geometry)Multiplication signState observerAttribute grammarComputer animation
Transcript: English(auto-generated)
Thank you. Thank you so much for the introduction. As you hear, I have this air-conditioning disease that caught my voice. Matthias Moore, by the way, of Stack fame, is also here in the room. So if I fall apart, then he will definitely take over.
So this is a talk about OpenEO, Open Science for Earth Observation Research. And it's, let me see how this works. Yeah, so this is where I started talking about scientists. And Matthias asked, what are you doing here at this European track and not at the academic track when you talk about what scientists do.
But anyway, I think it's of interest. So scientists broadly, anyone who did a PhD here or a master knows how this works. They spend a lot of time, but as little as possible time on data wrangling and data analysis to get answers to questions and figures and tables. And then they write a paper and get the co-authors to resubmit.
And then, next question, right? So you do that for like three years and they have a handful of papers and submit that and your PhD thesis is done. And then you go to the postdoc and then hell breaks loose. But it's essentially the same thing that's happening there. So how do open scientists work, right? Everyone is talking about open science, which is the new thing.
They do essentially the same thing, but in addition, they take time to share. They share new or generated data. And then they take also the time to share all the details on how results were obtained, right? And of course they spend as little as possible time on point two and three
because they didn't get like less papers to write, right? They have to basically do the same thing to get their PhD. And so the catch here is that the sharing of all the details on how results were obtained, right? So you're not going to dump your hard drive somewhere on like, you know, this is sort of a bit of a nightmare thing.
So one thing, of course, is that you share your data that you generated or that you obtained or collected in a fair way and findable, accessible, interoperable and reusable way. But also that you use open source software for the analysis that you did so that everyone understands what you did and can sort of scrutinize the details of that.
Yeah, this is an important point because otherwise, you know, you use black boxes and what comes out and who will say what it is. So why is it so hard? Then I got, you know, frustrated, so a long time ago already, very frustrated about why it's so hard for domain scientists. I mean domain scientists. I don't mean Earth observation specialists, right?
Satellite images specialists, but I mean hydrologists, biologists, ecologists here, biodiversity people, agronomists, meteorologists and so on and so on to use Earth observation data and practice open science. Why is this so incredibly hard? Yeah, and then after, you know, running around frustrated
for a number of years, I ran into an AGU conference where Google Earth had this Google, where Google had this Earth engine workshop and so on and demonstrated that and that grew for like 10 years and it's a fantastic product really. It sort of solves nearly all your problems, right? It's unraveled in its ease to share an analysis with others.
What you do is you solve your problem, you write a script, you click share, you copy a URL, send it to somebody else, somebody else opens it and sees it and says run and runs the same thing, right? So this is, again, it's literally 10 seconds to share an analysis, right? So why can't you do it and the rest of the world can't do it? The problem, however, the problems it does not solve
is the problem is that the platform is a closed source problem. It is hard to extend, it is sort of a monolith, it's a silo. You can't extend it with Python or R code that you custom wrote or do your custom time series analysis or something like that and it has hard limitations in its use, right?
There you can discuss long about it. So you're not in control essentially, right? So open alternatives usually require that you rent machines, install software, learn Docker and so on, manage resources, organize your parallelization, organize your, learn Kubernetes if you're ambitious and so on. So all these kind of things domain scientists really don't want to do, right?
So find your files, go through them and so on. So there's a lot of focus to do things right now, a lot of focus on resource and data management rather than data analysis, right? So how you do things instead of what you do. And there is Google Earth Engine that says, well, and then they now have 200,000 users or so.
Whatever is the number. So there's a gap to be filled here, yeah, I felt, and we set out to sort of start talking about it and making a lot of noise and started in 2016 with a blog post on OpenEO and it said we need a GDAL for observation analytics, right? Like GDAL solves all problems on file formats
and so when nobody talks about file formats, as long as it's GDAL supported, anyone can read it, right? There's no problem in that sense. So we had this silo idea from the, you know, from the 80s. GIS is a time when I grew up that we still had this, and essentially we have the same silo idea with Earth observation analysis platforms right now.
You either do things with Earth Engine or with Microsoft Planetary Computer, but not with both. You're not going to compare because that takes three months and who has three months to do that, right? And you have to understand all the differences and how things are called and so on. So that's not going to happen. So that's why we set out with OpenEO. And essentially everyone does the same thing,
or essentially does a lot of the same thing on all these imagery. So what we started out is basically describing what raster data cubes are. This is sort of the way, the model, the data model we have for these Earth observation data where we have essentially raster data in X and Y direction. We have a number of bands for a particular sensor
and then we have time replication. So minimally four-dimensional data cubes. And then we do things like, we filter things, like we do selections. We select a particular band here, or we select a particular moment in time, or we select a particular area that might be irregular or something like that. Other things we can do is that we reduce dimensions, right?
We run the time series model on a set of pixels distributed over time, or that we combine bands into an index, for instance, or that we sort of reduce pace and compute averages over areas, things like that. We can do like filter operations. We can do the spatial filter. We can do temporal filters.
These are kind of operations that you typically do that everyone does and you want to have an environment that does that. Or you can do things like querying regions, right? Querying for particular geometries. Ask like, what is going on here? What are my bands and my time steps for these particular geometries? Like reading to vector data cubes.
So what is OpenEO? OpenEO is a solution to this problem in the sense that it is an API that's an application program interface for cloud-based processing of image collections or slash data cubes, right? The API, of course, the API is a thing that you need but that end users don't see.
So it's not like an agronomist needs to learn the API. No, the API is being used by applications that use the API that people are familiar with. And this API not only specifies how things are being called and done but also what is being done in the set of processes, in the set of all these things that I just showed
in this previous slide, like the typical things you do with the Earth observation data. It is also a set of software repositories for back-end connectors. We're not reinventing the wheel. We're basically trying to sort of abstract away from all kind of existing systems
that occur like Open Data Cube, like GeoTrails, like WCPS, like Grass.js, like Sentinel Hub. There are existing and operational, or might be operational, systems that do these kind of activities with Earth observation imagery, and we write clients for them
so that everyone can use their familiar sort of data science interface, whatever it is, Python or R or JavaScript or visual editors in web browsers or quantum JS or something like that. All these things are in different stages of development, obviously, but they are happening, and they're being used.
And then, interestingly, there is also a set of implementations now running. That means running deployments that include publicly accessible OpenEO Cloud, which is an ESA-funded activity, which is a service that users can now actually go to and use.
Also, OpenEO is a community of users and developers, and it's an open-source project with a formal governance structure modeled along the way of many OSEO projects. So here's a little view on how things could look like. So this is the web editor that Mattias programmed.
Here we have a graphical model builder, essentially. You connect to an OpenEO end point, so to a processor, which is, in this case, is OpenEO platform, and then you can start looking at which collections, data collections, are there Sentinel-2, Landsat 8, and so on, MODIS, whatever.
And you pick one of them, you drag it to the screen, and you say, okay, this is my starting point, and then I'm going to do all kinds of steps on them, like I'm going to select a region, I'm going to select a time period, I'm going to, whatever, compute an NDVI of that and do something like that. And then, in the end, I want to sort of have this processed and put it and show it in some kind of web service
like a WMS, and it is then being shown here. And you can do this in a synchronous way or in an asynchronous way. So synchronously then, this would sort of react on zooming and panning and so on and recompute pixels. And asynchronously, you do that for larger jobs, and you get this job manager, and you can see which jobs are in which state
and whether things have finished and sort of look at them. And here you can also see at some kind of time series of things, how things develop over time or are distributed over different bands or so. There is an API. This is, of course, that means that backends
and clients, developers know where to start, yeah? So the API is really an open API, so a modern way of writing these things is really developer-oriented, right? So this is not something that domain scientists or end users will really see. So it defines how clients communicate with backends and defines discovery, processes, batch job handling,
and publishing via web services. It uses Stack. Stack is unsurprisingly, actually, we were at the Open Horizon 2020 project, 2017 to 2020, contributed substantially to the Stack specification. Substantially, that means that is what Matthias did. He said, you guys, you do things for images,
but we want to do things for image collections and need to do that. So he sat together with the people from Google Earth Engine and wrote the Stack image, the Stack collection specifications there, essentially. Yeah, this was one of these things where everyone sort of, you know, people were saying, oh, you have to look at OGC standards for solving this problem,
and then we looked at OGC standards and looked and said, well, yes. Deep sigh. It defines a workflow language, so the process graph. So the process graph is nothing like a computer program, right? Sort of a set of expressions that are being evaluated of arbitrary complexity. Expressions in computer programs are also DAX acyclic graphs.
And that means that there is not like, you know, limits to the things you can do in that sense. The process is run synchronously or asynchronously, as I said, and evaluate lazy if possible, if it makes sense. So here is a little excerpt of the API documentation,
which is sort of the way that, you know, web developers are so expected, but not something that you would confront end users by. Here's how the slash jobs sort of end point is being described and how it could look like. So clearly this is not something for people who want to use Earth observation data, but people want to sort of program against this kind of thing.
And then the user is really in the processes, all the processes that are being defined. I think there are like 120 or so of them that defines what can be done with Earth observation data through these data cubes. For instance, select image collections, define extents of data cube views, define mathematical operations,
arithmetic, mathematical, logical operations, and so on, familiar from programming languages. And you can export the various forms. And a number of particular things which I mentioned, like reduce dimension, that you work on dimensions of a data cube. And a nice thing that what you cannot do with Earth Engine is user-defined functions. And of course you can do it here. So you can have your little Python code chunk
or your R code chunk that does some kind of specialized time series method that nobody else, but that you would like to run on a large amount of imagery that you want to run and that you actually can because you're not dependent on some kind of big tech implementing this for you. So aggregate temporal period.
Here's documentation of that that says I have sort of all these data for every five days. I want to aggregate it to, for instance, monthly data and then work on the monthly data. The data cube, the model we have is basically a multidimensional array with typically spatially temporal thematic dimensions, but it can be arbitrarily complicated.
And we define a pixel or a cell essentially as a scalar, so not a record with variables or fields. So the canonical case is pictured here is from the dimensions x, y, time, and bend. Along with the latitude, time, and bend, we map to a single scalar value, which is a very simple thing. Vector data cubes then arise naturally
where you basically resolve your x and y dimensions to a set of geometries and end up with one dimension less and an array dimension that has geometries. Vector data cubes are sort of a relatively new thing I think for a lot of people here associate raster data cubes with raster data cubes
as being the identical thing. Vector data cubes are basically multidimensional arrays with at least one spatial dimension that maps to a set of 2D vector geometries. A problem there, and also an OSG problem that we have is that we are not so very good at representing vector data cubes as, you know, we don't have sort of very useful
exchange formats for them. The best I could find is NetCDF or COFJson, and both are kind of not very well, not easy to handle, so to speak. Yeah, we have, of course, OGR sample feature tables, which is essentially a vector data cube
that is one dimension. It has just a set of observations, and then it has attributes and columns. We have NetCDF and ZAR that has some DCF conventions, the climate forecast conventions that kind of can tell you how to handle point, line, and polygon geometries,
but there's very little tooling for that. And then there is COFJson, which is now an OTC community standard in a very mature state of discussion, I think, which is very good. But what we need is basically support and tooling for data cubes,
in particular for vector data cubes. And what we also would need, I think what we really would need, is COFJson read and write driver for GDAL's multi-dimensional array API, which is an excellent API for handling data cubes in GDAL and exchanging them with, for instance,
NetCDF and ZAR and so on, and that we could do that. So I think those are things that are now missing. So it is basically vector data cubes are very natural things. If you query a data cube at a set of points, you get a vector data cube, but how do you store it? And when you have that nightmare, then nothing's going to move.
We have an OpenEO platform. As I said, that is a running platform and running instance of OpenEO with public access funded by ESA, one of the big funders of this conference. It's also created by a project that's funded by ESA, and that is very good in the sense
that we were being forced to really make something that is operational and that you get the nightmare of user authentication and management and everything. And that was an enormous job to get it realized. There are different back ends that actually run the OpenEO platform,
the running instance, so to speak, the publicly running instance now of OpenEO, OpenEO with the cloud. There is the Vito back end, which runs on the GeoTrallis. So that GeoTrallis is kind of a layer on top of Spark. There's the Crayodeos back end. There's the EODC back end, which mostly runs on open data cube, X-ray, Dask stack. And then there's the Sentinel hub,
which runs on Synergize dedicated software. As far as I know, in the Amazon cloud, they can also run it on different cloud environments. So there is already four back ends that are basically essentially in production, so to speak. There is the interesting question like, you know, what is this? Is this a standard or not?
Or what are you working on? And then, of course, the question is like, you know, what is the standard? Is it the de facto standard somebody, people use? Or is it the Jura standard, the formal standard? And I always say that a good Jura standard evolves from a de facto standard and not the other way around. People are now really going into the discussion like, should we use OpenEO?
No, we shouldn't because it's not an OTC standard. Well, you know, we are creating it, right? Okay, go ahead and standardize it. But anyway, it is like, you know, it is a process and it's a very complicated process and it takes time. And so nobody here is against standardizing this, but it is a useful thing. There's no really sort of alternatives in the room.
And so go ahead, yeah? So OpenEO in any case has been set up entirely as an open community-based process. The idea that software, the ideas in software have been adopted by industry without any involvement of OpenEO team members. So that is a demonstration that we use
sort of modern technology that can be adopted. We align well with OTC API. We complement in some sense API processes where processes defines processes at an abstract level where we define concrete processes like the things you want to do with Earth observation data.
Nobody in the OpenEO project or consortium actually is against adopting things as a formal standard. So it's not, the slowness is not with us. The slowness currently is actually with OTC not answering our emails, yeah? That is the current state of affairs. Anyway, there's a lot of de facto standards like things around that everyone uses and at work that are not formally standardized, yeah.
So what's coming up? There's a lot of projects coming up. We're looking at classification problems, artificial intelligence, machine learning. So that calls also for handling of training data which is an additional problem. Handling of factor data cubes and the tooling around that, sort of looking, visualizing them and so on
is something that is in its infancy. And there's a number of projects coming up where we also look at the application into other domains like meteor climate, hydrology, agronomy and so on. And we're also trying to put OpenEO everywhere in a sense of deployment in other clouds
with federated computing, a local computer, running an OpenEO backend in your browser, these kind of experiments with small data that makes total sense. So we're trying it. Wrapping this up, OpenEO is an open community-based initiative to make it easier for non-developers to use open science principles and use Earth observation data archives.
So everyone is basically invited to participate and to contribute and to help and help with, you know, start using it and ask questions. That would be a great help already. From doing reductions in large image data cubes, it will move towards machine learning and wider domains that is currently happening.
And as I said, factor data cubes naturally arise when you have dynamic data and some work is needed to get some tooling going on probably also in other OSTO projects and in COFTA and so on. I think that's all. Thank you for your attention.