We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Semantic querying in earth observation data cubes

00:00

Formal Metadata

Title
Semantic querying in earth observation data cubes
Title of Series
Number of Parts
351
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
Earth observation (EO) imagery has become an essential source of information to better monitor and understand the impact of major social and environmental issues. In recent years we have seen significant improvements in availability and accessibility of these data. Programs like Landsat and Copernicus release new images every day, freely and openly available to everyone. Technological improvements such as data cubes (e.g. OpenDataCube), scalable cloud-based analysis platforms (e.g. Google Earth Engine) and standardized data access APIs (e.g. OpenEO) are easing the retrieval of the data and enabling higher processing speeds. All these developments have lowered the barriers for utilizing the value of EO imagery, yet translating EO imagery directly into information using automated and repeatable methods remains a main challenge. Imagery lacks inherent semantic meaning, thus requires interpretation. For example, consider someone who uses EO imagery to monitor vegetation loss. A multi-spectral satellite image of a location may consist of an array of digital numbers representing the intensity of reflected radiation at different wavelengths. The user, however, is not interested in digital numbers, they are interested in a semantic categorical value stating if vegetation was observed. Inferring this semantic variable from the reflectance values is an inherently ill-posed problem, since it requires bridging a gap between the two-dimensional image domain and the four-dimensional spatio-temporal real-world domain. Advanced technical expertise in the field of EO analytics is needed for this task, making it a remaining barrier on the way to a broad utilization of EO imagery across a wide range of application domains. We propose a semantic querying framework for extracting information from EO imagery as a tool to help bridge the gap between imagery and semantic concepts. The novelty of this framework is that it makes a clear separation between the image domain and the real-world domain. There are three main components in the framework. The first component forms the real-world domain. This is where EO data users interact with the system. They can express their queries in the real-world domain, meaning that they directly reference semantic concepts that exist in the real world (e.g. forest, fire). For simplicity reasons, we currently work on a higher level of abstraction, and focus on concepts that correspond to land-cover classes (e.g. vegetation). For example, a user can query how often vegetation was observed at a certain location during a certain timespan. These queries do not contain any information on how the semantic concepts are represented by the underlying data. The second component forms the image domain. This is where the EO imagery is stored in a data cube, a multi-dimensional array organizing the data in a way that simplifies storage, access and analysis. Besides the imagery itself, the data cube may be enriched with automatically generated layers that already offer a first degree of interpretation for each pixel (i.e. a semantically-enabled data cube [1]), as well as with additional data sources that can be utilized to better represent certain properties of real-world semantic concepts (e.g. digital elevation models). The third component serves as the mapping between the real-word domain and the image domain. This is where EO data experts bring their expertise into the system, by formalizing relationships between the observed data values and the presence of a real-world semantic concept. In our current work these relationships are always binary, meaning that the concept is marked either as present or not present. However, the structure allows also for non-binary relationships, e.g. probabilities that a concept is present given the observed data values. We implemented a proof-of-concept of our proposed framework as an open-source Python library (see GitHub: ZGIS/semantique). The library contains functions and classes that allow users to formulate their queries and call a query processor to execute them with respect to a specific mapping. Queries are formulated by chaining together semantic concept references and analytical processes. The query processor will translate each referenced semantic concept into a multi-dimensional array covering the spatio-temporal extent of the query. It does so by retrieving the relevant data values from the data storage, and subsequently applying the rules that are specified in the mapping. If the relationships are binary, the resulting array will be boolean, with “true” values for those pixels that are identified as being an observation of the referenced concept, and “false” values for all other pixels. Analytical processes can then be applied to this array. Each process is a well-defined array operation performing a single task. For example, applying a function to each pixel or reducing a particular dimension. The workflow of chaining together different building blocks can easily be supported by a visual programming interface, and thus lowering the technical barrier for information extraction even more. This is demonstrated already in an operational setting by Sen2Cube.at, a nation-wide semantic data cube infrastructure for Austria, which uses our proposed semantic querying framework [2]. We believe our proposed framework is an important contribution to more widely accessible EO imagery. It lowers the barrier to extract valuable information from EO imagery for users that lack the advanced technical knowledge of EO data, but can benefit from the applications of it in their specific domain. They can now formulate queries by directly referencing real-world semantic concepts, without having to formalize how they are represented by the EO data. To execute the queries, they can use pre-defined mappings, which are application-independent and shareable. The framework eases interoperability of EO data analysis workflows also for expert users. Mappings can easily be shared and updated, and the queries themselves are robust against changes in the image domain.
Keywords
202
Thumbnail
1:16:05
226
242
CubeDomain-specific languageThermodynamischer ProzessEvent horizonData managementData storage deviceReal numberSoftware frameworkComputer-generated imagerySoftwareExpert systemTexture mappingOntologyQuery languageConstructor (object-oriented programming)Data managementDifferent (Kate Ryan album)Green's functionExpert systemMereologyState observerSpacetimeAreaData storage deviceDimensional analysisConfluence (abstract rewriting)Cloud computingDomain-specific languageComputer scienceCubeSoftware frameworkTask (computing)SatelliteNumberResampling (statistics)Right angleEvent horizonInterpreter (computing)ForestCartesian coordinate systemProcess (computing)Mathematical analysisPower (physics)Thermodynamischer ProzessReflection (mathematics)System callLatent heatVirtual machineType theoryArithmetic meanFront and back endsDenial-of-service attackProjective planeSemantics (computer science)MiniDiscLocal ringUniform resource locatorSingle-precision floating-point formatHypercubeThermal radiationTowerComputer programmingGoodness of fitPoint (geometry)Mixed realitySet (mathematics)SoftwareBitTerm (mathematics)Raw image formatObservational studyStandard deviationWater vaporOpen setReal numberCuboidData analysisTimestampQuery languagePointer (computer programming)Direction (geometry)Product (business)Theory of relativityComputer animation
CubeConstructor (object-oriented programming)Software frameworkComputer-generated imageryDomain-specific languageSoftwareExpert systemOntologyScale (map)Local ringLatent heatThermodynamischer ProzessEvent horizonGenetic programmingTexture mappingRule of inferenceHybrid computerFinitary relationBinary fileQuery languageData structureArray data structureGroup actionProcess (computing)Multiplication signNumberExpert systemSoftwareSource codeState observerSpacetimeCartesian coordinate systemRepresentation (politics)AreaUniform resource locatorObservational studyComa BerenicesGreen's functionFormal languageNetwork topologyQuery languageCubeArray data structureThermodynamischer ProzessReal numberSocial classProjective planeEvent horizonDifferent (Kate Ryan album)Latent heatSymbol tablePhysical systemForestWater vaporCloud computingTime seriesMorley's categoricity theoremMappingRule of inferenceMedical imagingGroup actionPixelConnectivity (graph theory)Term (mathematics)Task (computing)SatelliteDocument management systemDomain-specific languageOntologyRight anglePoint (geometry)MereologyCore dumpSet (mathematics)AlgorithmSoftware frameworkBitData storage deviceNeighbourhood (graph theory)Shape (magazine)Subject indexingSlide ruleSubsetOpen setWordSingle-precision floating-point formatDynamical systemComputer fileComputer animation
Computer fontData structureMathematical analysisLogical constantQuery languageContext awarenessShared memoryStructural loadTexture mappingOpen setCubeRepresentation (politics)Temporal logicParameter (computer programming)Configuration spaceSpacetimeComputer fileAxonometric projectionReduction of orderDependent and independent variablesPlot (narrative)MetreQuery languageMappingInformation retrievalProof theoryWater vaporTime zoneDimensional analysisMereologyOpen sourceDifferent (Kate Ryan album)SubsetCodeView (database)Multiplication signGreen's functionSpacetimeMathematicsData structureMathematical analysisSoftwareExpressionExtension (kinesiology)Expert systemOperator (mathematics)Priority queueCountingReduction of orderAreaLink (knot theory)Stability theoryForestInterpreter (computing)Projective planeComputer reservations systemGroup actionTrailLibrary (computing)Cartesian coordinate systemScripting languageImplementationVapor barrierDemo (music)Array data structureCubePerformance appraisalState observerTemporal logicDemosceneAlgebraComputer animation
Transcript: English(auto-generated)
Good, I think I can start right away. Thank you all for coming. My name is Lucas. I'm from the University of Salzburg and I'm going to present you about Semantic querying in Earth observation data cubes and first I want to ask a small question who in this room works a lot with Earth observation data. Can you raise hands?
Who would think they can get value out of Earth observation data? Good, and that's that's a good mix So, let me start and First I will show a bit. What's the current some? background because we're talking about Earth observation data and for some of you that attended the keynote yesterday of
The European Space Agency you already heard that there is a lot of Earth observation data There comes terabytes of new Earth observation data every day Data metaphor of if you add all this HD discs that you get the height of the Eiffel Tower of new data every day
And many of these data are actually freely and openly Available for everyone. Yeah, for example through the Copernicus program of the European Union. Yeah, this is not This is just open data. Everyone can use it if they want and Because there's so much data and it's also often free and open
This is used more and more in Different application though domains that maybe did not use this type of data earlier so for example hydrology or ecology or Urban planning all this kind of different application domain start to see that they maybe can get value out of Earth observation data
Yeah, and they use it to analyze entities events processes that happen in the real world think about floods or forest fires or Soil ceiling or All this kind of stuff green space in cities So they are interested in such real-world entities real-world processes and to analyze them with earth observation data
And the data storage and management and the data access has been greatly simplified in recent years with a new Let's call it technique or
Yeah, which is called Earth's observation data cubes and very simply put this basically means that all the different Satellite images for an area you store them in a single cube which has two spatial dimensions and a temporal dimension and you don't as a user as an end-user, you don't have to worry anymore about
Resampling of the data because it was in a different resolution or about missing timestamps or everything. It's all stored in a single Hyper cube and makes data access data storage and data management much easier But what we have to remember is that the EO data themselves, you know, they are numbers
for example, they are reflectance values of Specific radiations here that the satellite captures these numbers these data in themselves are not yet knowledge about What we want to analyze in the real world So there is a step needed we need to get from the data to the knowledge that we want to
Obtain and this is not always always easy This is a hard task that requires technical expert knowledge Okay, so how does this used to be we have at one side the EO data and at the other side knowledge? So we want to go from there to here
Constantly have to bend to get to the microphone and Dutch people problems so In the previous situation, there's one person there and he has a or they have a toolbox in their hand And this toolbox need to contain all the tools all the skills to get from EO data to EO knowledge
Yeah, so this is data access data storage merging of data resampling of data knowing about processing power Interpreting the data analyzing the data the whole road from EO data to knowledge needs to be toolbox of that single person Now with earth observation data cubes this becomes easier
Because we get a data cube in the middle and I said the data storage management and how to access it is greatly Simplified as the analyst you don't have to worry him anymore about that For example open EO which is a project that is great work in this week in this regard They create for example standardized API that you can use to access a lot of different backends of satellite data
usually this also works in the cloud so that you on your local machine don't have to download the data and Make sure that there is enough processing power. This is all much easier to get the data The access to the data is greatly simplified
So your skill set that you need your toolbox does not to need to contain all these tools anymore But you're still a gap Gap that needs to be filled because in the end you query this data cube and what you get are still these numbers this reflectance values for for example that are not yet equal that
Doesn't tell you anything yet about this real-world entity this real-world process that you as an analyst actually want to analyze You have to give this meaning to this data and you have to interpret this data before you can move on and actually do the and do the Analysis that you want to do
And this interpretation I said is hard. This is a hard task this requires expert knowledge in EO data Analytics and that is hard to obtain So Can we not move to a future situation? Where there are three persons here in this road
Where the cube on the left? Is Containing numbers yet the reflectance values, but the cube on the right is containing what we call Symbolic categorical data it tells you directly something about debt entity event
Process and that concept that you want to analyze for example, it's if I am an urban planner Which I actually am And I want to analyze green space in cities every Location in space time will will for example tell me here green space was observed and here green space was not observed
This is a direct Relation to this concept and the part that interpretation going from what are how do we? Represent this concept green space in terms of the EO data in terms of the numbers That is done by someone who in their toolbox has the advanced technical earth observation expertise
This is the earth observation expert They define how do we actually represent this real-world concept in terms of the data? Which means that you finally as a domain expert don't have don't need to have these skills at anymore in your toolbox And you can focus on actually analyzing the concepts that you are interested in by directly querying this concept
From the cube because there is a step in between and that is what we then called Semantic querying so instead of querying the raw data values that are in the data cube you actually query Meaningful concepts that you're interested in and it have a meaning in the real world for example green space
forest water lakes And So how does it work? How does our framework look like? That looks like this. I will go through all the steps. I don't get overwhelmed at first sight and
So I said we have three different roles We have the application expert and said this is the person that in the end wants to analyze Something in the real world We have the earth observation expert which knows very well how to interpreter in to interpret earth observation data
And we have the software expert who knows very well how to set up a cloud infrastructure with the data Data cubes resampling all this kind of more computer science oriented work One point to make is that of course this purse this these roles can all be taken on by the same person Completely fine, but it doesn't have to be anymore. You don't need to have all the skill sets of all the three roles
Then we have two abstract Domains, let's call it in our framework with here on the right is the image domain This is the domain the abstract domain that contains the numbers that contains the data
Well on the left we have what we call them the semantic domain this contains the real world concepts So this is a conceptualization of things that we see in the real world when we look outside So here we have the real world that's in the end what we're all interested in that's what we want to analyze
The real world is captured by Earth's observation data. We have the satellites going around the earth and they capture the Real world and then this data or numbers and they are stored in this observation data Can also have extra data in there for example the EMS everything that can help the eel expert to
Accurately interpret this data and to represent concepts with it And the software expert is then the one that constructs this data cube and just the whole technical infrastructure cloud infrastructure At the other side the real world is abstracted by semantic concepts We have to define what actually exists in there in the real world
We have I mean we cannot really look out outside you now, but you see build-up area cities forests mountains lakes these are concepts that abstract what exists in the real world and we formalize this concept in so called on
Tholji, so in the ontology we have formalization of concepts that exist in the real world and that can mean for example that we say a lake contains water and green space is made up of vegetation green space contains trees all this kind of
Formalizations like that we define for ourselves What do these concepts actually mean we define that in an ontology? But because this is in the semantic domain this ontology does not contain any data We don't say at this point green space has
Red-band value higher than sixty point zero nine and you know all this kind of stuff This is real what then technology in which we formalize what do these concepts actually what what are they? how do we formalize and then the ontology you agree upon by the Community and this doesn't have to be everyone this can be only your working group or your project
We are going to formalize only those concepts that we are interested in in our project in my urban planning project I'm going to formalize what is green space? What is build-up area and what is blue space it doesn't have to be a huge ontology?
Which the wits will describe the whole world that is too big keep it small keep it simple and keep it local and the come and the community that Will agree upon this can contain of course application experts can also contain earth observation experts But together they formulate this is what these concepts mean this is what they are
Then the core role here is for the earth observation expert because their task is to actually map These concepts that are formalized in ontology to the data values that are stored in the earth observation data cube They're going to say they're going to formulate rules that say how is this concept green space?
How is that represented by the data that is stored in the EO data cube? So they bring in their expert knowledge into the system. So that's the application experts can say I'm interested in in green space Yeah, it can write a query recipe which will come to soon that says I'm interested in green space and
That they don't have to know. Hey, how do I actually? represent green space in terms of this data values So that is how the three roles Make up the whole system and every one does What their expertise is and I said it's perfectly possible that one person takes on all the three three roles
This actually happens quite a lot. But the key point is you don't have to Okay, now I will go again through the different components maybe I repeat a bit but let's see Yeah, I said we have the ontology formalizes the conceptualizations of real-world entities events and processes Uses real-world terminology no data values agreed upon by the community and keep it small. It doesn't have to contain everything and
That was just a summary of what I just said The EO data cube stores the earth observation data and they also store other data sources like a DEM or anything that the earth observation expert thinks this is useful to define these concepts and
Can be accessed with a standardized API which for example open EO is very suitable for And it's not limited to a single software in our system you can use open data cube You can use the file based system You can use any different kind of software to actually store your open your data cube in But I said that it's not the task of the application experts to set this up
Then we have the mapping which of course a core part in this in the system It's a knowledge based expert system where the earth observation expert Brings in their knowledge about how to represent real-world concepts in terms of So they formulate rules that then quantify a direct
relationship between the data and Concepts and these rules can be binary. I said It's pixel. It's observation in space-time. You just labeled a this is green space. Yes, and this is green space No, it's either true or it's false, but it could also be for example Probabilistic and where I say there is a high probability that this is green space or a low probability
And this is just what the earth's observation expert Things is suitable there in the entity expert in this part and your rules can be very simple You can for example say we have the concept green space in the ontology
We say green space has a high photosynthetic activity, which means it's green vegetation Which would which the earth's observation expert says, okay, then we calculate an NDVI index and if it's higher than 0.6 It's yes green space. Otherwise, no and this can be super simple, but To make it more accurate. They can also go more complex again. This is the expert
Knowledge of the EO expert that is brought in here so you can look through time series of different images How did the numbers change over time? Can we learn from that? You can look at spatial neighborhoods at shapes So they can rule and they can range from very simple to more complex
depending on what the EO expert seems Finds suitable they can also be hybrid which basically combines a knowledge driven with a data driven approach Where for example you say we first run an automated algorithm like Google dynamic world or some other thing on our data
We have a set of classes for each Image and then in the knowledge based part we further customize these classes For example merging them to really represent those concepts that you're interested in So a lot of different approaches are possible here and then we have the query recipe
So then the application expert references the concept that they are interested in in my example green space And they ask the cube, okay Give me green space for my area in space time that I'm Interested in and they get some cube like this for each pixel each observation as a direct
Relationship to the concept they're interested in then they can use array specific processes to further customize this symbolic Categorical cube taken for example say I want to count The green space observations over time that for each location in space I know hey in this year six times. We found this green space here five from here
So you can reduce it over time over space you can filter them you can merge different cubes Yeah, there are a lot of area specific processes that you can apply to this queried categorical subset of the EO data cube And we named each of these processes by single action word verb which makes it very clear
hopefully for the application expert what's happening and So don't get confused by the next slide It's just to show we have a lot of different burps and that all do a specific thing on an area The one I showed was the reduce one which for example says we reduce it over time
We count all observations through time. You can filter you can evaluate Expressions you can group it trim it all different kind of operations are possible for details Please look at the documentation, which I will share soon or at the paper And because the final part I just want to show briefly I think times, okay
It's just to summarize the benefits because I said this in our view lowers the technical barriers for people to make value out of EO data because they don't need to have the skill set to actually interpret the data and They can focus on their application But also I think it improves the structure of the existing EO analysis workflows also of expert users
because this interpretation like what how is this concept represented by the data is defined only once in the mapping and not defined everywhere in each and in each script and Somewhere knitted into other code. Yes defined clearly in one place
You define it once and the whole group research group The whole project can use it and you can easily share it and their recipes. They also remain constant because they reference relatively stable concepts like Forest like green space So for example when the data changes or when the techniques to interpret the data change or when you apply it in a different area
Your mapping will be different. You have to update your mapping with your recipe count green space Remains the same because green space is still green space that didn't change The concept is still the concept. So this query recipes they remain fairly constant
And you don't always have to update them when the data or techniques get up Final part we did a proof of concept implementation of this in a Python library I will show very quickly some demo code, but please there is extensive documentation Which I will give the link to also which explains it in much more detail
But then ideas, for example, you're an application expert you have to load a mapping which is predefined by an EO expert so you don't create a mapping for yourself. You load one that is predefined You represent you basically link to an EO data cube, which is set up by the software expert You don't have to set it up for yourself. You only have to link to it
Then you set your spatial temporal extent and some additional context like in what CRS and time zone you want to work Etc etc with this your recipe just looks like okay I'm interested in the concept in the entity water and I want to apply the reduce process and use the count Reducer over to them over the dimension time, but you see that here you don't reference any data
You reference a concept by its name. This concept is defined in the Mapping and the mapping can translate this concept with the data values In the queue then you execute this and you get a map of hey how often was water observed over time in my
spatial temporal subset and So this is a package. It's called semantic for semantic querying and And you can find it on this get the blink. I said there's quite extensive documentation I think so if you want to know more, please take a look here. It's open source. So the code is out there and Everybody can use it. And of course, we have a paper because the academic track where you can also find more details about
our ideas and what we did so Thanks a lot. And now since I'm also chair, I will check for questions