We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

EO Open Science Catalogue initiative by ESA

00:00

Formal Metadata

Title
EO Open Science Catalogue initiative by ESA
Title of Series
Number of Parts
351
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
To enable sustainable and impactful Open Science in the long-term, ESA Earth Observation looks to design and implement a comprehensive Open Science framework, which includes a dedicated set of integrated tools and common practices for effective scientific data management, seeking to support Open Innovation, advance Science and increase community participation. The framework will build on and advance existing Open Science elements and will develop new capabilities to achieve the ambitions and vision set forth in the 2025 Agenda, supporting the European Green Deal. The four main pillars of the initiative are: i) EO Digital Platforms, interoperability and standardisation, ii) Accessible and Reproducible EO Science, iii) Inclusive and collaborative research and iv) Strategic Partnerships. Contributing to the second pillar, ESA is developing an EO Open Science Catalogue tool to enhance the discoverability and use of products, data and knowledge resulting from Scientific Earth Observation exploitation studies. Adhering by design to the "FAIR" (findable, accessible, interoperable, reproducible/reusable) principles, the Open Science Catalogue aims to support better knowledge discovery and innovation, and facilitate data and knowledge integration and reuse by the scientific community. The Open Science Catalogue is based upon the EO Exploitation Platform Common Architecture (https://eo4society.esa.int/2022/01/26/interoperability-sharing-your-application-where-the-data-sit/) (EOEPCA) and shares its basic Open Source components, but extends it with additional functionalities: - The Static Catalogue is a hosted STAC Catalogue, comprised of static Catalogue, Collection, and Items that represent the Themes, Variables, Projects, and Products - The Open Science Catalogue Frontend is a Vue.js based client application, that allows the efficient browsing of the Open Science Catalogue - The Backend API allows users to make submissions to create, update, and delete Themes, Variables, Projects, and Products. These submissions are then handled as GitHub Pull Requests, where they can be further reviewed, discussed, and finally accepted or denied. The Open Science Catalog makes use of various geospatial Open Source technologies such as pycsw, PySTAC, and OpenLayers. In this presentation we will review the EO Open Science Catalogue architecture, technology stack, and how this tool can be used to discover and publish Earth System Science products from ESA activities. We'll also look at future evolutions of the product and how it contributes to the overall ESA EO Open Science Framework.
Keywords
Elasticity (physics)Library catalogOpen setComputer fontArchitectureElement (mathematics)Front and back endsResource allocationTerm (mathematics)Data managementSoftware frameworkMachine visionGreen's functionOpen innovationOperations support systemBDF-VerfahrenPeer-to-peerDigital signalComputing platformVirtual realityCubeIntegrated development environmentLevel (video gaming)CollaborationismMathematical analysisProduct (business)MetadataOpen sourceData dictionaryCodeOscillationSurfaceVariable (mathematics)Extension (kinesiology)Latent heatWeb browserInstance (computer science)Electronic mailing listMetric systemDebuggerMetadataProduct (business)Metric systemFocus (optics)Landing pageVariable (mathematics)Library catalogInstance (computer science)User interfaceSoftware frameworkOpen setOrder (biology)Medical imagingFluxContent (media)Connectivity (graph theory)InformationElement (mathematics)Computer architectureStatisticsBuildingRow (database)Set (mathematics)Client (computing)Interactive televisionRepository (publishing)Data storage deviceOpen sourceData managementDigitizingConfiguration managementCollaborative softwareExtension (kinesiology)Group actionInterface (computing)Stack (abstract data type)Sheaf (mathematics)Computing platformBitObservational studyContext awarenessTerm (mathematics)Projective planeSound effectTwin primeSoftware developer1 (number)SatelliteProcess (computing)Exploit (computer security)CollaborationismMereologyService (economics)Goodness of fitNetwork topologyBlock (periodic table)Session Initiation ProtocolComputer fontComputer fileQuery languageArithmetic meanState observerMathematicsView (database)AdditionPoint (geometry)Default (computer science)Web pageResultantDifferent (Kate Ryan album)SoftwareNumberMultiplication signExpert systemAuditory maskingCore dumpBranch (computer science)Front and back endsField (computer science)Latent heatTouchscreenDirection (geometry)Standard deviationIndependence (probability theory)Distribution (mathematics)Virtual machineFunction (mathematics)AreaTrailFile formatPresentation of a groupSpeech synthesisLibrary (computing)QuicksortData dictionaryIntegrated development environmentClosed setDiallyl disulfideLoop (music)Slide rulePressureDynamical systemScatteringWebsiteCASE <Informatik>Digital object identifierOcean currentFluid staticsDomain nameGravitationCubeOpen innovationPhysical systemPoint cloudMeeting/InterviewComputer animation
Transcript: English(auto-generated)
Hi, good afternoon. Actually, Fabian will start this presentation, so I'll hand over to him. So hi, everyone. Thanks for showing up. Today, we're going to introduce you to the ESA initiative, the EO Open Science Catalog. First, I'll give you a quick outline.
So Anka will present you the ESA EO Open Science as a whole. Then she will talk about elements of the ESA EO Open Science. And then she will get into the minutia of the Open Science Catalog. Then we'll talk about the details of the Open Science Catalog, how it is implemented, key technologies, the architecture.
And then I'll go give a brief overview of the specific components that make up the Open Science Catalog. So please, Anka, introduce us. Thanks, Fabian. So my name is Anka Angella. I'm an Open Science Platform Engineer in ESA.
I was just introduced. But actually, what that means is that my role is actually sitting between the digital platform section and the Earth science section. So this means that we work, or my work is focused on building digital platforms or digital tools that help the work of our Earth scientists.
And actually, I'm very lucky that Edzer was speaking before me, because he introduced this topic of Open Science quite well. And just to give a bit of context on where this project is coming from, it's actually quite a new project. And we started working on this since just the beginning of the year, so just a few months ago.
But the problem goes way back. So if you were in the audience and you listened to my colleague from the ground segments, Nicholas Hanovski, he presented a lot of details for the digital twins and all the satellite missions that we have available out there. And imagine that we have all this mission data that
has been delivered by all our satellites, and more than 40 satellites, that are either Earth explorers, that are science missions addressing very particular science domains, so looking at the cryosphere or gravity, things like this, but also the Copernicus Sentinels and all the other third party
missions, contributing missions, and the meteorological ones. So I'm not going to focus on the meteorological ones, but from all these Earth explorers and the Sentinels, there are huge amounts of products that are being output. So I'm not talking about satellite imagery, and the access and the processing is solved by solutions such as OpenEO,
and you've just heard about that, but about all the products that result from the scientific studies that we do in ESA and that we've funded across the years. And how this process works is that we have typically independent science teams that get all this data and use their own tools and their own methods
on their own machines or more recently in the cloud, and they output some very interesting global products. So imagine all the climate variables or products like albedo or the ocean topography or ocean currents, ozone, stuff like this. So all these products, of course,
they're documented in some publications and sometimes they're made available openly. Usually they're made available openly. But there's no specific guidelines coming from ESA on how should you keep these products and make them available in the long term. So to ensure this sustainable access
and reuse of the scientific products that are output by all of these projects, the first question that we asked was, how many are there? Where are they? And we realized that many of those are sitting either on some scattered websites or in the lucky case,
they have a DOI and they're accessible through some catalog service. But in many cases, sadly, they're just lost because the team that was working on that is just not working anymore. So open science, we've started looking into open science many years, basically building solutions that are open
or that promote open science. But with this background that I've just laid out, it's clear that we need to do more. So what ESA is now currently looking to do is to enable sustainable and impactful open science in the long term. So this means take care of all these very precious scientific products that we have
from all the teams that are working on that. So we, as part of this effort, or this is part of a bigger effort that is building this open science framework. And this includes this dedicated set of tools and common practices to make effective scientific data management.
This looks to support open innovation, advanced science, earth science in particular, and increase community participation. This of course means that we would have to provide guidelines to the community, the open, the earth observation community on how to do properly open source development for earth science. We're building of course on a lot of elements
that exist already. And some of these elements are on the screen. One of the main pillars that support our efforts in this direction is the development of the digital platforms. We have, of course, we're keeping a close eye
on making sure things are interoperable, standardization is on the slide as well, and there are several solutions that we're currently working to develop, probably more are coming. But just to name a few, there's the EurodataCube effort, the Open Neo platform is one of them,
and the Earth Observation Exploitation Platform Common Architecture, which is looking to develop open source building blocks that would enable easier development of exploitation platforms that are by default or by design interoperable. And this is what, so the EUA BCA,
or the Common Architecture in short, is what is contributing to the development of this catalog that we're presenting today. Reproducibility is key, and for this, we're trying to make sure, or to put, we're laying now the foundation basically
for reproducible science, for reproducible workflows, with elements such as the Earth System Data Labs, so building collaborative development environments, or making sure that these results are properly documented, accessible, in the long term, discoverable, through programmatic means and so forth.
Community is very important, so we have a number of initiatives, I'm just going to mention the Estuarine Science Hub, that's an initiative that is bringing external experts in Estuarine to do science with our scientists, and using all these collaborative tools that we're making available, and the EUA Science Cluster that is, again, bringing different projects together. And of course, a lot of strategic partnerships
with NASA, with JAXA, with OGC and so forth. So this Open Science Catalog that we're presenting today, it's a new project, it gathers, it tries to make all these science products discoverable and accessible in an easy way
and programmatically for the community, and it contributes to this framework. So what it is, it's a catalog of geoscience products, data sets and resources that are developed in the frame of projects that we fund. It provides a means for discovery, for access to these products, using unified metadata across all sorts of heterogeneous sources,
using common dictionaries, and also it's a tool for us to understand what are the gaps, for example, if we're building, if we're developing ozone products, for how many years have we done this? Do we have a lack in observation, I don't know, for three years in a row?
Do we have a lack of observation spatially and so forth? So it also offers us a synoptic view and helps us better identify where we need to invest more. So I'm handing over to Fabian to go through the technology stack and all the technical details. Thank you, Anka. Okay, so first, I want to say that we want to build a service
with existing technology, with existing building blocks. For example, EoAPCA, you mentioned already, the Eo Exploitation Platform for Common Architecture, which already provides us with a good set of tools and a good set of components that we can easily reuse
in order to build the Open Science Catalog. Because of that, and not also because of that, also because it is convenient and a very stable tool, we are building on a Kubernetes cluster and we're using Flux as a configuration management to simply grow a large and configure our cluster components.
We're building a user interface using the Vue component library, which is really a great tool in order to build rich client platforms, in order to keep the history and in order to build the static catalog, we're reusing Git and GitHub facilities in order to keep the history,
keeping a Git repository of all the records in order to see the change tracking of the projects, products, and variables over time, which is then exported using GitHub Actions. So we always have a clear picture of what is currently at the main branch of the catalog store. We export everything,
so the whole contents of the catalog is exported in stack format, a static stack catalog. I'll be talking about the details of that soon. And then we're using Pisces-W as a dynamic front end for this whole catalog, which gives a rich set of services and interfaces that we can then use in the client or can be used by other clients if they so inclined.
And then there's a couple of supporting technologies that we rarely use. Good. First off, I'd like to talk about the whole architecture. So there is, of course, the front end, which is the main focus point of the user interaction.
We are then, so this communicates to various other components. So one we call from inheriting from the EOAPCA, the resource management, which incorporates the resource catalogs. This is the catalog component that we use for dynamic queries. But it also pulls data from the static catalog, which is the main source of truth
of the actual catalog contents. In order to make contributions or to allow users to provide changes to the catalog, we provide a backend API, which handles the interactions to GitHub and to Git repository. We also have user management, which allows us to manage the users and also allow our users to self-register.
And we can also promote users to data owners which allows them to make contributions. The catalog store is a simple Git repository. As I already said, it allows us to keep the history of the catalog. And we're using GitHub actions in order to produce the static catalog.
I'd like to show you some images of the front end. The front end is allowing users to disseminate the contents of the Open Science Catalog. So it gives some overview information. We call them metrics and statistics. And it also allows users to search for what we have themes, variables, projects and products.
So those are the main components, the main records of the Open Science Catalog. And also through this user interface, it allows users to propose changes. It's not that they immediately affected but they need to be discussed and approved or can also be disapproved of.
And so they will not be merged into the contents of the Open Science Catalog. What you see here is the landing page of the Open Science Catalog front end. And you can already see that we have some very distinct theme handling. And this is a very basically a hierarchical catalog that you can simply click through. So you can go through the themes
and then to the variables, then to the projects and products that you're interested in. It works this way. It also is possible to gain some statistics and common metadata combined and are exposed using metrics. So you can see the temporal
and also the spatial distribution of your datasets. And also which missions are involved and also which projects are involved and also the areas of the earth where you can find these products and projects. We also have an extensive search capability for searching variables, themes, projects, and products.
And we have a user interface mask to allow you to interact with the catalog. Also you can, as I said, you can also submit changes and then you can see your contributions that you have made to the Open Science Catalog.
The front end is talking to the backend API in order to submit those change requests and when you want to change a record for whatever kind. It's basically a facade to make it easier to interact with the GitHub API. And this is also ties in with the user management
where we can store extra credentials and then can enrich the pull request made on GitHub by the user credentials. So we can then relate each pull request to the user that actually made the pull request. The catalog store, as mentioned already, is a Git repository which stores the themes, variables, projects, and products as JSON files.
So there's text files. So it's easy to get file differences and it's easy to discuss the changes that are proposed but it's also just easy to see in one glance what has changed over time for this particular product. Submissions are managed as pull requests which is an easy way to discuss changes and also to finally approve or disapprove them.
And then also on GitHub there is a CI pipeline that is actually exporting the contents of the Open Science catalog as a static stack catalog on GitHub pages. Right, the static catalog. So this is a full-blown stack catalog
that is statically deployed on GitHub pages. We are reusing various extensions and also the core specification. We try to map everything to fields that are already known in the ecosystem but it's not possible for all the fields, all the metadata that we have so we have created an Open Science
catalog-specific extension. The main entry point is the catalog JSON which allows us to walk through the whole catalog in one go. We also export additional metadata for easy access especially in the statistics and metrics page on the front end. So this is the metrics JSON and the codelist XML
so other software can reuse these files in order to enrich the information. Variables themes are exported as stack collections. Projects and products are exported as stack items. And we also export ISO XML metadata
which we need in a later step but they can also be used by anyone else who is inclined to use them. Just for the, because we can basically, we also export a stack browser instance actually configured to this stack catalog so it's easy to use and it's also a catalog interface.
It's not the main product but we can, we simply use it because we can. Right, now to the dynamic parts. We, so there's, the most important dynamic part is the resource catalog which is based in PyCSW but in order to feed it we need several components to do so.
So one is the harvester which is actually just pointing to the catalog JSON and it walks the whole tree and then exports all the stack items and pushes them via the registrar into the PyCSW where we again have a nice interface or nice interfaces OGC compliant and also stack API compliant
to then again search for the records in a dynamic way. So this concludes the whole loop, the whole journey of the metadata through the Open Science Catalog. I hope you find it interesting and I would like to thank you for your attention and maybe Anke wants to join me again
and we are of course open to questions.