We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

EO Platforms and open science in support of Green Deal ambitions

00:00

Formal Metadata

Title
EO Platforms and open science in support of Green Deal ambitions
Title of Series
Number of Parts
9
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer
Production PlaceWageningen

Content Metadata

Subject Area
Genre
Abstract
Patrick Griffiths is a Earth Observation Data Engineer at European Space Agency – ESA, based in Italy. At the Open-Earth-Monitor kick-off, Patrick introduced the data management burden affecting scientists, particularly in the EO space, exploring possible solutions and semplifications paths – as shown by the EuroData Cube and OpenEO Platform initiatives.
Keywords
Lecture/ConferenceComputer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animationProgram flowchart
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animationProgram flowchart
Computer animation
Computer animation
Satellite imagery
Computer animation
Transcript: English(auto-generated)
For those who don't know me, I'm Patrick Griffiths. I'm with the European Space Agency in the EOPS department. So just to set that straight from the beginning, we're in the science climate and application department. So we don't have any too much say
in the Copernicus missions, which are of course owned by the European commission, but operated and designed by ESA. We also have relatively little say in the Copernicus DSS, but we're doing all the science work in EOPS, such as to set that straight, okay? So I'd like to thank Tom and OpenGeoHub
and the responsible people here in the Open Earth Monitor project for inviting me, because I think the work that ESA is doing on projects on R&D should be coordinated to a certain extent with the work that is funded and supported by the European commission. And too often, the fragmentation
of the European platform landscape that also Evan has pointed out earlier, that is a lack of coordination and a lack of interoperability or reusing existing capabilities. And I think that's what I wanna point out in my talk here in the next minutes. So I'm gonna talk about EO platforms and open science in support of the Green Deal ambitions.
But let me start setting the scene here a little bit, just with a couple of general slides. I mean, the background against which we are all here today and the background against which we have all worked together in the last years is just that we have this wealth of Earth observation data coming in
and that's available to us now. And just looking here at the ESA designed and operated missions here, you have the Earth Explorer, the science missions, which are the key flagship science missions like FLEX or ALOS. Then you have the operational Copernicus missions and the meteorology missions, right?
And the numbers up there are a little bit outdated, I think, but there's something like, well, no, there should be, no, okay. Exactly up there. So you have something like 18 missions in operation, 38 or something in development, you know all of that.
And the most important ones are, of course, to all of us, the Copernicus Sentinel missions. And here, this is also slightly outdated because the failure of Sentinel-1B is not listed here. Also the launch of Sentinel-6 has not been listed here, but we are in this situation where we're generating 25 terabytes of new data every day,
disseminating more than 250 terabytes of data. And this brings us to a lot of challenges that all of you have been working on in the last years. And we have, how do we actually handle this data, the data volume? How do we ensure continuity across missions? How do we share this data?
How do we track a consistent archive? How do we exploit synergies between missions? And, you know, how can we foster innovations with these rich and challenging data archives? And, you know, 10 or 15 years ago, the simple term that was coined to meet these challenges
was basically, you know, the paradigm of moving the algorithm to the data, which is, you know, not necessarily the solution to all the challenges that scientists are facing with these data volumes. So in response to this, of course, the DSS were initiated, and they are also, we have five DSS rather than one huge one
with the spinning disk archive for all of the data archives. It's a little unfortunate. So I think we all came to the conclusion that it's maybe not that simple. It's not just bringing the algorithm or the user to the data. So it's a little bit more that we need. So, and I think many of you have experienced that in your own work, that, you know,
if you work with these high data volumes, that you face this issue of the data management burden. And this is just one graph. There are many similar graphs around that simply say, you know, scientists work 80% of their time on managing cleaning data. That's the data management burden. And in Earth observation, it's especially pronounced,
I would say, you know, researchers have to search for file lists, file paths, download them, unzip them, bring them into a storage system, then run a processor over a file list and then clip data and free processes and so forth. And then, you know, when you,
then you also have to, of course, explicitly spin up your processing VMs or whatever. And once you've done that, then you suddenly notice that your, you know, your storage is full and you have to move data over here. And it's, you know, it's basically preventing scientists
from working on what they should actually be working, which is the scientific insights that we want to gain from this. So, and the promise that the Earth observation cloud-based platforms hold for science and applications is large. You know, there are many great things that we can get out of this. So one aspect is a simplification and democratization
of working with Earth observation data. And I think things like Google Earth Engine have shown that very well. You know, that this is a democratization, right? A lot of the users in Google Earth Engine, they have no idea how to handle a safe file format for Sentinel One, but in Google Earth Engine, it's been made very easy.
Also the aspects like dynamically allocating compute resources, depending on your processing task and what you're currently doing, intuitive front-end syntax and stuff like that. Also the points that go towards collaborating and sharing of code, of processes, you know,
transparency and innovation concepts like deferred or lazy evaluation and Jupyter Notebooks. All of these are the promises that the cloud-based Earth observation platforms hold. And these are all great aspects. But if we look at the current situation in Europe here,
and this is not a complete map of the platform ecosystem in Europe, it's just, you know, it's an incomplete map. But I think someone from OGC has recently said that there's something like 78 platforms in Europe, you know, and if we saw the presentation from EuroGeo earlier, I mean, there's so many great projects,
but they're all, you know, developing new portals, new front-ends and new visualization interfaces. And it's really an issue. So also here, I just want to quote the study from Julia Wagemann that was already brought up earlier, you know, where she looked at the current status of the adoption of cloud-based technologies
in the Earth observation domain. And well, it is quite remarkable, you know, that more than 50% of the users have not used any cloud-based API services or a code editor in the cloud, you know. This is really remarkable.
And this is from 2021, I believe. So something is kind of limiting this paradigm shift. And if we look at the situation compared between Europe and the United States, you also see that there is a difference there. So you have a higher adaptation of cloud-based technologies in Earth observation in the US
compared to in Europe. So this is significant and interesting. And the question is, you know, why is there such a hesitation of adopting cloud-based geo platform services in Europe more widely? And I think what this basically is,
is this capability gap that we have in Europe, right? So we have this situation that we have the most advanced Earth observation system with the Copernicus Sentinels and the Earth Explorers. But the, you know, this European leadership is not necessarily matched by the analytical capabilities that we're offering.
And we're suffering from this, you know, fragmentation, redundancy, and a lack of coordination among platform providers, but also funding bodies like ESA and the European Commission. There's still, in many parts, a prevalence of the old VM model, you know, where you have to spin up virtual machines instead of having dynamic resource allocation and scaling,
which is very well solved in AWS also. And there's still, you know, also a prevalence of this file-centric thinking and file-based storage and data access rather than pixel-level perspectives when working with the data. Also there's unappealing or unrealistic business models,
and business models in general are challenging as we had just discussed in the coffee break. And also a long-term perspective, you know, I mean, we're thinking too much in this project-centric thinking. That's the funding bodies, but it's also the companies that are doing the work.
Okay, and I wanna just focus here in the next couple of slides on two initiatives that we have been pushing in the last couple of years, and we're intending to continue to support over the next years, and that's Euro DataCube and OpenUo platform. And I'm just gonna quickly give you an overview of what Euro DataCube contains.
So it's basically a collection of integrated services here, which includes Sentinel Hub, XCube, GeoDB, EoX Hub. And well, it provides you basically an API, a DataCube API with a global DataCube service. So you can access the data very conveniently with different interfaces,
such as XCube in very familiar Pythonic kind of syntax. And they provide some really nice features now on algorithm plug-in or the batch processing that's exposed through the Sentinel Hub. And I'm gonna show a few examples of what we did with Euro DataCube a little bit later,
but I wanna focus a little bit more here on OpenUo platform, not only because it's a project that I believe in and that's very much at my heart, but I think also it's a very nice example. So here you have it in a nutshell, you see the consortium down there. I'm gonna go a little bit more in detail. So of course, on the client side, you have these three client libraries
in JavaScript, Python, and R, and then you have the backend and the OpenUo API. We'll take a little look at this in the next slide, but what's really the nice thing to highlight here is that we of course took what was achieved under the Horizon 2020 project OpenUo, and many of you were involved there.
So we really took this when the project ended, the Horizon 2020 project, and then evaluated it under an Ease Attender and took it to the next level, developing an operational service based on the outcome of the Horizon 2020 project. And for those of you who are not so familiar with OpenUo and OpenUo API,
the situation that they addressed basically four years ago, and this is the nice graph that Edz and colleagues created here, the situation was that users using Google Earth Engine or the G-O-D-P-P, also they would all have to learn
a new programming dialect and a new syntax if they wanted to work on those specific infrastructures. And OpenUo API basically created this language agnostic API interface that could connect all of these different cloud backends with the clients and the different users.
So conceptually very strong, and I think the outcome of the Horizon 2020 project was great, and it's a really nice situation that we're now under ease of funding, basically taking this to the next level, trying to establish this and evolve it into an operational service. So we have a couple of driving concepts
that we're trying to follow in the main project activity, and that is abstracting complexity and providing these intuitive analytics and syntax in a federated cloud environment. So we have these four deployments in UDC, Terascope, CreoDS, and now also Sentinel Hub.
Providing transparency, of course, all of this is fully open source, and there are no concerns regarding IPR, something that's sometimes of concern. And then this big, complicated ambition to provide pixel level, but also continental scalability. So the users should be able to manipulate data
at the per-pixel level, but at the same time have a straightforward pipeline that they can use for continental scale processing tasks, which is really not easy, but I think we're getting there. A cornerstone of OpenEO and OpenEO API and OpenEO platform are OpenEO processing graphs.
And you can think of these basically as instructions of processing steps and operators. And you can think of this as a graphical view here on the left-hand side. In the background, it is basically defined as JSON syntax,
providing all of these processing steps in a kind of a processing recipe fashion. And this is of course being JSON, it's completely programming language agnostic. And this has many nice features and also some nice education features. So for example, this is the OpenEO platform editor view,
where you have a processing graph here in a graphical representation. And then you can simply automatically translate this into the different client languages, so JavaScript, Python, or R, which is nice for understanding how the API and processing graphs work together,
but also for education is fantastic, because you can basically allow people to learn how to work in these client languages more efficiently. So, and this is kind of the syntax simplicity that we are targeting for. And I think for all of you that have worked
in Python data ecosystem, this is very familiar. So in this example, we basically connect to the OpenEO.cloud central backend or the aggregator proxy. Then we define a virtual data cube on which you are all very familiar with. I mean, this is similar to solutions in G-O-D-P-P or the Google Earth Engine or other solutions.
And then we run the ARD surface reflectance process here on the virtual data cube using force and F mask, basically as the radiometric and cloud processor. And we can then also add some user-defined parametrization down here, for example,
to turn on or off the optional BRDF or topographic normalization. Yes, and this is, I think for many of us, this is the kind of abstraction level that we want to have when working on these data archives. You know, okay, OpenEO platform is running for another one and a half years now
with the main contract. And we're working along a set of use cases that are all resulting in new processes that are available to users in the platform. We also are working or beginning to work on these advanced federation concepts. So there is the OpenEO aggregator instance,
which is basically an endpoint here that can now redistribute processing task to the different backends that are making up the OpenEO platform. So you have the UDC, Terascope, ClioDS and Sentinel Hub backend. And with this aggregator instance, it's quite interesting
because we can basically redirect client requests to backend that host a certain data set or that host a certain processor that is maybe not available in other backends. But we can also distribute large processing graphs on the different backends. So you're actually not paralyzing your process only within one infrastructure, but you're actually executing the processing distributed
in different cloud backends, which is really nice. And this is a little bit more of a detailed view here where a processing graph coming from a client library, from a user basically executes the feature engineering, the transformation of the earth observation data into predictive features for machine learning in one data center,
using the data specific implementation of the OPO API. And the other part of the processing graph is executed in a different data set where you basically run the inference. So these are, I think, some really interesting advanced federation concepts that we are now exploring and making part of OpenEO platform.
So here's all the resources if you would like to follow up on more information on OpenEO platform, OpenEO, just leave that here. And just wanted to point out that of course, for all of the platforms in Europe, we don't have any revenue based business models. So all of these platforms need to assume
some kind of business model where they request some revenue, but OpenEO platform is available free of any cost for any scientific or pre-commercial activities. And we do that basically by sponsoring any licenses you require or your research group requires through the ESA network of resources,
where you basically find a sponsoring wizard where you can request different license packages and they will be sponsored. So we have, for example, also here, quite attractive, new license package for research groups. So where you get a license package for the whole research group. And this will be sponsored by the ESA network of resources.
And also we have these new features here for the support packages. So you can also request some developer support if you have specific tasks where you know you're gonna need some in-depth developer support to implement new processes or make new data sets available. So I think these are some very essential elements now
to build on top of what OpenEO platform offers at this point. For those of you who are not aware of what the ESA network of resources is, and I'm kind of afraid that it's probably the majority of you. So the ESA network of resources is an initiative to provide a portfolio of European platform services
and sponsor these services for any scientific or pre-commercial use. And it's basically the ambition is to increase the uptake of usage of European Earth Observation Cloud services. So you have the links here. You can take a look at this.
And there is the network of resources discovery portal. And there you find basically the sponsoring wizard and it is now a simplified process where you can find, for example, the OpenEO platform IDE. And you can request the licenses there and it's really not a lot of work to do that.
I think the network of resources had a little bit of a difficult start, but by now it's really, I think it has worked quite well. So we've supported more than 530 projects in 79 countries for sponsoring of European cloud resources. And you see here, the stats there on the left-hand side,
there are now 17 data processing as a service services and 11 IDEs. And yeah, I think this is an element that we're gonna continue providing over the next years. And you see here a slightly incomplete list of the different providers that are in the network of resources.
So feel free to follow up on those links. So I wanted to take the remainder of the time here to talk a little bit about the Green Deal ambitions and how Earth Observation can support this here. And I think we're all well aware that this is the next cornerstone policy piece in Europe that will be really central
to the Earth Observation and geospatial sectors. And we had the example of the common agricultural policy in the last years and starting in 2023, Sentinel-based monitoring of the cap is mandatory for all European countries. I mean, it's amazing that we have gotten there and all such a central role
for Earth Observation-based monitoring embedded in policy. So it's fantastic. But the Green Deal, of course, is a much broader and far-reaching policy framework, if you like, because it touches so many different areas, not only clean and affordable energy, but also the trade-offs of biodiversity and food production
and sustainable mobility and so forth. So it's a really complex policy piece. So the question is, how can Earth Observation support the Green Deal and what should be the role of cloud-based EU platforms and open science in supporting this? And I think the obvious point at hand,
how Earth Observation can support the Green Deal ambitions is, of course, that we have new and upcoming observational capabilities. This is just an example you've seen before here with the Sentinel-5P and nitrogen dioxide emissions averaged over one month in 2018,
where nicely at global monitoring scale, you see the cities that emit these large amounts. So this is fantastic. But we're also seeing new and really promising and nice approaches that use multi-scale data and monitoring. So in this case, you have the Topomi Sentinel-5 measurements
here on the left-hand side. And then you have the GHGZ, which is this Canadian private machine with high-resolution methane monitoring capability, where you can use, well, Sentinel-5 Topomi with the core 7.5 by 3.5-kilometer resolution. But then you basically zoom in with the GHGZ
to do the emission attribution to see where the emissions come from. And it gets even better because now we have shown that you can use the Sentinel-2 swear bands to zoom in even further, and you get the temporal, also the temporal variability by mapping the methane emissions from Sentinel-2 with potentially a five-day temporal repeat.
So I think these are really nice examples. I really like these multi-scale monitoring approaches. So it's fantastic. And of course, I mean, in the media, everyone's complaining about the heat wave and urban heat island effects during these increasingly hot summer temperatures
is one thing, which was shown quite nicely here with the EcoStress sensor on the ISS. But there are the upcoming LSTM, Copernicus expansion mission, which will provide 30-meter thermal measurements and also these private company constellations coming up.
So another important role towards addressing the Green Deal ambitions is, of course, advanced analytics. And there are a lot of things that we can already do on energy potential, modern mapping, bringing that together with social mobility data or the downscaling of Sentinel-5 Topomi to basically map emission sources and emission hotspots.
ESA has, of course, defined this space for Green Future Accelerator, where we will be working on these elements, such as the green transition information factories and the connecting this effectively with the digital twins. We also had a stakeholder workshop at the beginning of this year. And well, just a little bit of an insight
of what the recommendations of this stakeholder workshop were and one of them was, for example, to address the energy transition, but also the kind of the related trade-offs in the intertwined social economic dynamics. I mean, if you optimize renewable energy production in one country, what are the impacts on biodiversity of food production?
So we have to look at these trade-offs. One other recommendation was to leverage and build upon existing and planned activities, proactively engage all relevant stakeholders, and you also think about sustainable business growth that can emerge from this. So at ESA-EOP, we now have a defined
long-term open science strategy, and I'm not gonna go into detail in this, but of course, this involves, you know, this comprises basically open source developments that we're making more and more, incentivize more and more in our activities, but also, of course, all of the data environments
that we're developing with industry. Just one example here of what we did with Eurodata Cube was the rapid action on COVID-19 and EO, but one really nice element of this was that we had this hackathon where we basically asked people to come up with new ideas
of how, you know, the economic impacts of COVID could be measured, and you know, this one was not bad. I mean, there was a team that came up with this idea of using this parallax effect in the Sentinel-2 image when moving objects are imaged, you know, that you can exploit this basically to quantify the number of trucks,
and this is something that basically, by providing the tools to the community, they came up with this idea, and in Eurodata Cube, this has now been transitioned to an on-demand service that anyone can execute at continental scale. So it's a nice example of open science, but also EO platforms supporting the upscaling
and provisioning of service. Science communication is another important aspect for supporting the green legal ambitions. Okay, let me conclude here. So cloud-based EO platforms can provide various abstraction levels that help scientists to cope and work efficiently with EO data archives, but the current European Earth Observation Platform landscapes
really suffers from redundancy and fragmentation, as we have heard before, and I think what's really needed here is a little bit of a change of mindset away from project-centric thinking towards thinking of interoperable building blocks to stop constantly reinventing the wheel and rather reuse existing achievements
and build upon those. Of course, we're gonna continue working and funding projects, but we will incentivize very strongly to reuse existing capabilities such as OpenEO API, just one example, but also STACK or other things. And not every project needs to reinvent data cataloging or user authentication, and it's just no need.
It's inefficient. So key technology elements are emerging slowly from the bottom-up approach, which we heard earlier also. STACK is one example. OpenEO API is another example. And advanced federation concepts can help
the consolidation towards an interoperable ecosystem, which we will incentivize over the next years. Then finally, a couple of points on the role of Earth Observation in the Green Deal. So the upcoming and future observation capabilities together with state-of-the-art analytics will be very important, powered by hopefully a federated platform ecosystem and open science,
pushing for interdisciplinary data and advanced analytics, exploring these what-if scenarios, investigating trade-offs together with the emerging digital twins, and facilitating innovation, co-creation, and effective science communication. Great, thank you.
Thanks a lot, Patrick. Patrick had some polls that we were not showing in the background, but we'll put them up so that you guys can interact with the presentation a little bit afterwards. Should we put them up now? I mean, we have five minutes left. We do, but I think you've got several questions.
So I don't know if you would prefer to answer questions or we can put the polls. Can we put the poll up so that we have it there? Okay, so we can answer the question while the poll is up, no? Yes, okay. So the first question was on the, what's your desired level of abstraction when working on the data archives? Because there are those scientists that simply want a VM,
but there are other scientists that want a higher level of abstraction. So that's the first question that we have for you. And it looks like an even split. Oh boy, that's not good. Okay, but yeah, in the meantime, we can maybe address one of the questions that came up.
In the meantime, Tom, maybe you could just ask a question off of the list while the polls are active in the background over here. Just keep on rating questions, please. What is the relation of OpenEO Cloud and OpenEO.org?
Oh, that's exactly that. I mean, OpenEO.org is the open source community that came out of the Horizon 2020 project, developing mainly the OpenEO API and OpenEO.cloud is the operational service that we're building with Ease of Funding. So like the backend basically, OpenEO Cloud is the backend for OpenEO? Well, OpenEO.org is sort of independent of backends,
whereas OpenEO.cloud federates across Chiodia, Terascope, Sentinel Hub, and UDC. Okay, but it's one OpenEO, right? It's not, like because OpenEO is open source, you could have taken it, modified, make your own version? It's all open source, but OpenEO.cloud
is the operational service that ESA is supporting. Maybe I can take the first one there on the, OpenEO probably does not serve any vector data. Will this change soon? Yeah, I think indeed the current, currently working with vector data in OpenEO, based on what the API provides is not yet ideal,
but there is a lot of conceptual discussion now on vector data cubes with EDSA, and this is recognized that we need this, and it is in progress. And I think there are now already ways of working with vector data. Also, we have a geodb endpoint that's gonna be becoming available
as part of EurodataCube, which is a database system for vector data, and that will be exposed through OpenEO. Valentina, do you wanna put up the next poll, and then we can answer one more question after? Okay, Patrick's next poll. Yeah, exactly. So this is just a question on, how should the European platform landscape
evolve in the near future? Because some people say like, hey, we just need a single monolithic infrastructure like AWS with the whole entire Copernicus archives on spinning disk, but it would be interesting to see what people think if we should really go this way for a effectively federated system.
No one's voting for the jungle, I see. Okay, we can leave this up for another couple minutes, and then I think we have time for one more user-submitted question after. The line goes, I think, though. Yeah, I can read it.
Can you show me? What is your opinion on moving the OpenEO as a community standard in the OGC? Yeah, I think OGC should consider OpenEO API as part of a standard, and those discussions are ongoing. So we have a Tiger team initiated that is investigating together with OGC people,
the complementarity between the OGC processor API and the OpenEO API, and the recommendations that they're making is basically for OGC to go to adopt OpenEO as a standard eventually or part of a standard. So that's ongoing.
Okay, do you wanna do the last poll and then close out? I think we have two minutes left. Yeah, sure. Great, so. I believe all of it, all of the data that's exposed in OpenEO platform
is indexed with stack. And of course, the data collections I didn't show, but currently it's 78 data collections that are exposed, and of course, through the federation with EurodataCube and Sentinel Hub, we are reaching into a lot of cloud environments to making these data sets available. How much vector data do you have? Currently, there are no published vector data collections.
So that's something I think that will happen now over the next year or so. I mean, formats. I mean, no one has to worry about formats anymore to a certain extent. I mean, for raster data, it's important that it's available with pyramids. So cloud optimized formats are good,
but I think the user should not worry about the formats that used in the background. If you have a data cube API, then the data access is given and you don't have to worry about formats anymore. So maybe solution, I mean. For the vector data? Oh, I'm not sure. I think that's a discussion we need to have, but certainly no longer shape files, no? No, no, no.
But there are a few solutions for cloud vector data. Yeah, no. Everybody's looking for geo-packaged ones. Geo-packaged is nice, but let's see. Oh yeah, this was the last poll question. So which of these aspects do you think will be the most essential to ensure a meaningful contribution of EO to the Green Deal ambitions?
And I didn't talk about that, but you know, this, ah, okay, I don't wanna bias the poll right now, so I'm gonna stop talking. Open source and open science. Well, that's the community here, okay. And digital twins, yeah. I mean, honestly, I think this feedback loop with citizens is gonna be important because there are many things
that we cannot observe from Earth observation where a handheld device, where a citizen provides some feedback on, you know, here there was an improvement of the insulation efficiency of buildings or so. Those are things we cannot see, so having this feedback loop from citizens through handheld devices will be really important,
but I'm biasing the poll. Maybe they would have gotten it anyway. Okay, thank you so much, Patrick. I'll take the mic.