openEO: Interoperable geoprocessing in the cloud
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 490 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/46969 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Client (computing)Projective planeMoment (mathematics)Cloud computingInternet service providerComputer animation
00:27
Formal languageDifferent (Kate Ryan album)Data storage deviceLatent heatCASE <Informatik>Cloud computingMultiplication signClient (computing)Internet service providerThomas BayesProcess (computing)Scripting languageComputer animation
01:27
Point cloudPortable communications deviceAlgorithmCartesian coordinate systemPoint cloudMereologyComputer programmingInternet service providerStreamlines, streaklines, and pathlinesSingle-precision floating-point formatMoment (mathematics)CodeCloud computingWritingThermodynamischer ProzessProper mapWordNeuroinformatikForestExtension (kinesiology)Computer virusGoogolCASE <Informatik>Heat transferSampling (statistics)Computer animation
03:12
Open sourceThermodynamischer ProzessFormal languageFocus (optics)DisintegrationCubeInternet service providerPoint cloudSoftwareLibrary catalogAuthenticationData managementProcess (computing)StapeldateiOpen setText editorWeb browserMobile WebClient (computing)MereologyServer (computing)Process (computing)DataflowOpen sourceTouchscreenTesselationClient (computing)Library (computing)Data managementComputerText editorDependent and independent variablesWeb browserElectronic mailing list2 (number)ResultantDrop (liquid)Library catalogTranslation (relic)Open setInternet service providerComputer fileMathematicsCodeRepetitionContrast (vision)Multiplication signLevel (video gaming)Monster groupStandard deviationThermodynamischer ProzessNeuroinformatikLatent heatMoment (mathematics)Plug-in (computing)Intelligent NetworkScripting languageMassRoundness (object)Cloud computingAlgorithmGraph (mathematics)Functional (mathematics)Dimensional analysisInformationPoint cloudEndliche ModelltheorieCartesian coordinate systemSet (mathematics)ImplementationSoftwareMusical ensembleCentralizer and normalizerFormal languageFront and back endsStructural loadUser-defined functionDifferent (Kate Ryan album)WritingMobile appStapeldateiWeb serviceStack (abstract data type)AuthenticationComputer animation
10:25
Execution unitLocal GroupServer (computing)Text editorModal logicGreatest elementData managementProcess (computing)Web 2.0Text editorMoment (mathematics)Visualization (computer graphics)Right angleLevel (video gaming)Electronic mailing listEndliche ModelltheorieGroup actionComputer animation
10:56
Maxima and minimaCubeHausdorff dimensionFile formatLibrary (computing)Musical ensembleServer (computing)Functional (mathematics)Web serviceUniform resource locatorCASE <Informatik>Dimensional analysisNeuroinformatikResultantMusical ensembleCubeMathematicsExtension (kinesiology)Representation (politics)Operator (mathematics)PasswordReduction of orderWorkstation <Musikinstrument>
12:06
Computing platformProcess (computing)Google EarthGrass (card game)Stack (abstract data type)Open setServer (computing)Flow separationElectric generatorProcess (computing)BitElectronic mailing listPhysical systemThermodynamischer ProzessGoogolImplementationMoment (mathematics)Open setState of matterData structureClient (computing)Limit (category theory)InformationScripting languageObservational studyDifferent (Kate Ryan album)Dependent and independent variablesResultantOnline helpOrder (biology)Grass (card game)Server (computing)AlgorithmComputing platformUser-defined functionStack (abstract data type)Front and back endsStandard deviationComputer animation
14:18
State of matterRevision controlAsynchronous Transfer ModeMoment (mathematics)Projective planeState of matterNumberComputer animation
14:51
TwitterOpen setHome pageCASE <Informatik>Multiplication signScripting languageMoment (mathematics)Client (computing)ImplementationProjective planeInsertion lossOnline helpWater vaporRange (statistics)LogicArmProof theoryExtension (kinesiology)Computer animation
18:13
Open sourcePoint cloudFacebook
Transcript: English(auto-generated)
00:05
OpenEO is the main thing I'm working on. Stack was just the side project, basically, that I started working on because I needed it for OpenEO. So OpenEO is a funded project by the European Commission at the moment, and the idea behind it is to get an interoperable geoprocessing API
00:23
for the cloud, for cloud services. So why do we do that? Basically, at the moment, if you want to geoprocess data, it's often the case that you have either R, Python, JavaScript, or whatever language you're using. And then you need to connect to any of these cloud services.
00:41
They have all their own APIs and specifications for how to process data, whether it's, for example, DataCube or tile-based, whether it could be downloaded as geotiff or as whatever. They have all the different things for billing and how to store data and stuff like that.
01:00
So basically, for each of these things, you need a different client. You need to learn how to process the data. Basically, if you start working with one of these services, you're locked in because you have learned that. And of course, you want to proceed to work with that because you learned it. And it's always a tough time to get
01:21
used to something different. So what we think is better is to get something like this. You have one client, R, Python, JavaScript in our case, what we support at the moment. And then you have a streamlined API in between, which then translates the things to a single API,
01:47
basically, where you have a single part how you process, like whether for us it's a DataCube, how you build things, how you download things, and so on and so on. So it's basically like most of you probably know GDAL.
02:03
That's the thing that translates things between the GIS programs and the data formats. And that's basically some kind of GDAL for the cloud, I guess. So that helps you to make reproducible research, as you can basically take your application that you wrote
02:22
or your algorithm from one provider to the other. So if you run things on Google Earth Engine first and want to know whether that's really true what they computed for you, then you can take the code that you wrote in R and just change the URL and some other minor things for pre-processing
02:41
and transform it to code that is running on, for example, Vito on the proba VMEP or any of the other cloud processing providers that you are aware of. In that case, it's portable to some extent. And that's how we think it should be in the future
03:01
that you just have very simple access to data in the end and don't need to write proprietary workflows for other cloud providers. So as I said, it's a language for geospatial processing. We have on the one side the API, which is basically the translation layer
03:22
between the clients and the server, and a set of predefined processes, which is basically trying to make interoperable processes. So that, for example, if you compute things on Python in X-Array and in R with stars on other package, then processes may slightly differ
03:42
in regard how they compute things. And we try to do this, like define it on a higher level so that you can use all the same processes for processing for all the different kinds of computation software that is in the background. This is in contrast to Stack, is focused on processing,
04:01
and Stack was focused on search and discovery. It's open source, so all software we develop here is open source and the specification as well. We are focusing on DataCube, so that's a bit changed maybe from the traditional GIS workflow where you downloaded individual tiles
04:21
and processed based on them. And here it's all basically wrapped into a DataCube which you can process on. And we support UDS, which is a very interesting thing because then it allows you to send your R or Python code that is not like, the processes we have at the moment are very narrow in the sense that, for example,
04:45
that you don't can use custom libraries, for example, that compute some very advanced algorithm that we don't support at the moment. And in that sense, if you need any specific libraries, for example, be fast for some computations,
05:01
then you can actually run it as UDF where you can basically just write script code in Python or R and send it to the server and then it's executed in the cloud for you. So what is it not? Well, it's not another cloud provider. We just specify the API and translation layer. It's not another geoprocessing software
05:20
so we're not writing the new ArcGIS or something like that. It's really just the translation and it's not very much as the previous traditional GIS workflow so that you download the data, then you have tiles and need to process them and so on. It's all cloud-based so your algorithm is going to the data which is stored in large amounts in the cloud
05:42
and then you get the result back and put the other way around. Of course, in this part again, I can show this again here, which is basically of course defining a new standard and in that sense, we could run into the issue that there is afterwards 15 competing standards
06:00
but I hope it's not. So the API, the translation layer in between, offers the following functionalities. Of course, first it needs to give you the basic information so it's giving access to discovery things like for example how the API works, what it supports, the EO data that you can use
06:24
in this workflows that is exposed via stack, stack collections and stack API and then the processes which is basically just a list of processes that is supported by the backend, then it supports of course authentication
06:43
with OpenID Connect, then you have workflow management for where you can basically store your own user-defined processes. So if you, for example, want to make a new algorithm based on the predefined processes we have, you can store them as user-defined processes again
07:01
and use them as they were predefined before from the backend. So it's really integrated into things and you can pass around your algorithms and run them on other backends or you can pass them to other users to be reused. Then there's file management
07:20
where you can basically upload assets if there's a GeoJSON file that you need to pass or something like that or whether there's things that you want to download or it's handled by a central file management API. Of course, then there is a processing service. You can either process synchronous or then you basically send the things to the server and immediately, or in a matter of seconds hopefully,
07:41
get a response back with a result that of course only works for limited extends and data. And for bigger things, you can use batch jobs where you can basically also send the data to the server and then wait for whatever time it needs to process the things and then get back the results
08:00
to download it again as stack catalog with the appropriate files in it. And the third thing is the web services. So you can basically also, there is an API to basically host WMS through OpenEO or WCS or other services that you want to expose.
08:20
So we don't redefine things for viewing and stuff like that, but rely on the standards that are already there and defined mostly from OGC. But you can also expose non-standardized thing like XYZ tiles that are used by OpenStreetMap, for example. Processes are already mentioned.
08:41
There is a set of predefined processes like at the moment I think 130 or something like that for band mass, for loading data into data cubes, working on data cubes, renaming dimensions, adding new values and stuff like that. You can visit processes.openeo.org to see the list and then of course based on the predefined processes,
09:03
you can define your own user-defined processes which is internally just a graph that is basically a dependency graph with instructions how to work on the data. And then there is UDFs again which is basically the thing I talked about before where you can write your R and Python scripts
09:21
and send it to the server as part of the other processes. So basically you can say I use a predefined process and then load data with it and then this data gets passed to the UDF process and then you can further compute it with other processes we have predefined to say for example the data and then you are ready to go and download the data.
09:44
We have several clients implemented at the moment. We are tackling JavaScript, Python, and R at the moment which should tackle most of the geospatial community I guess maybe there's Julia in the future as well but we'll see. We have a browser-based application as well
10:02
for users that are not so much into programming so that pretty much works like a model builder in ArcGIS or QGIS. Then we have a QGIS implementation where you can use it as a plugin and basically start jobs from QGIS and download it and show it in QGIS directly
10:22
and there is a mobile app that you can use as well. This is a screenshot for example from the web editor. You see these workflows over there at the top in the middle then the management stuff is here in the bottom. You see a list of processes and collections you can use.
10:40
You can drag and drop them into the model builder and then on the right there is a map that you can basically use to view the data. I think there is some NO2 visualization at the moment on the map. This is how, for example, an EVI computation would look like on, that is R, yes, that is R.
11:06
It's pretty easy, you just connect to the web service with a URL and of course username, password then will be prompted. Then you basically create a data cube. You can load data. In this example it's Sentinel-2 again.
11:20
You can specify the special extent, the extensor bands to be loaded and that will be loaded into a data cube. Then, for example, in this case it will reduce the dimension bands and do some band math on the bands, in this case the EVI computation. And then reduce the temporal dimension to just give you the minimum composite
11:41
and save the result as geotiff. And the same you can do with Python in this case. It's looking very similar. You can use the functions as, in Python, like the operators here are overloaded just to be used and then they are translated
12:02
into our internal representation and sent to the server. Yeah, we have several server implementations already that you can reuse or extend if you want. There's GeoPySpark and GeoTrails implementation. There is a Google Earth implementation, so you can basically run our scripts already
12:22
on Google Earth engine as well for free. There's a GRASS GIS Actinia implementation. You can go to Marco's talk at 2 p.m. to get more about that. There's the JRC, Earth Observation Data and Processing Platform from the European Commission. There is an OpenStack implementation.
12:41
There is access to Sentinel Hub as well. And there is a server implementation for WCPS, which is in the end GRASS-Daman. There is a bit of ecosystem. We also developed, for example, OpenEO Hub, which you can go to and then you get basically a list of overview which servers are there
13:00
where you can process on. You can basically, for example, also just pass your algorithm that you implemented and then it tells you in which server you can run it, gives you information about which data is available, what it costs and stuff like that. You can also share there your undefined processes, your UDFs and stuff like that. And then there is validator, of course,
13:21
to check whether the API implementations are valid. It checks both the actual, just the structure of the API, whether the responses are valid. And then also it checks whether the results that are processed are valid. So there is also a way to check whether in between the back ends there are differences
13:40
that are coming from processing. Then we have, of course, when you visit processes.openio.org, you see a rendered list of processes, which is basically rendered through our doc generator for processes. And of course, you can also reuse for, at least for the data discovery part, you can use the Stack and OGC API features ecosystem
14:03
because the API is completely compliant to that standard and as such, you can use that ecosystem. And if you expose the WMS, of course, you can just use the WMS clients that you are aware of. The state of OpenEO at the moment is that
14:22
all these partners are working on that and maybe also you in the future. We are currently have released version 1.0, release candidate one, so we are pretty much going into stable mode now after experimenting a long while for two years with what works best and what doesn't.
14:42
And the project ends in the third quarter of the year, so then we can expect a stable version where you can really rely on. Yeah, and that's it for my two talks. Now I need some water and then thank you for listening and I take good questions.
15:04
We have some time for questions. Anyone? Can be Stack and OpenEO. Yes.
15:23
Yeah, so regarding maintaining, there are a couple of companies that are basing their work and future work on OpenEO, so EODC, for example, and Vito are all already pushing things internally so that their internal users and external users are using that, so in that sense, they need to continue with that,
15:44
of course, because they have clients that rely on that. And there is also further projects that we want to establish based on OpenEO, so I hope that will make it future proof. Regarding user base, we have some use cases
16:01
that are running at the moment to really check whether all that what we did is working. That is a broad range of things, snow cover analysis, agriculture, and stuff like that, but there could be more, of course. The thing is, if you start something new, it's hard to find people that really want to hop on a thing that is not stable yet,
16:22
but we are working on that, and it evolves over time. We also have some metrological things in the future with ECWMF plan, so yeah, that's the future hope.
16:49
So for OpenEO, everything is licensed under Apache 2 license, so that's all open source, and you can reuse it to whatever extent you want,
17:01
so feel free to implement something or do pull requests. It's all on GitHub, so that's good. What was the other thing? Yeah.
17:26
So. As far as I know, for most of these implementations, there are Docker containers which you can run.
17:43
That's a start. We are working on making that more easy to adopt at the moment. Most of the implementers are still setting up their own infrastructures to get things running, of course, so in the future, there should be more things like vagrant scripts and stuff like that, yeah.
18:10
Thank you very much. Thank you. And Jody.