We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Computing Global Harmonic Parameters for Flood Mapping using TU Wien’s SAR Datacube Software Stack.

00:00

Formal Metadata

Title
Computing Global Harmonic Parameters for Flood Mapping using TU Wien’s SAR Datacube Software Stack.
Title of Series
Number of Parts
351
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
Computing Global Harmonic Parameters for Flood Mapping using TU Wien’s SAR Datacube Software Stack. Synthetic Aperture Radar (SAR) backscatter is adept in differentiating standing water, due to its low signal, compared to most non-water surface cover types. However, the temporal transition from non-water to water is critical to identifying floods. Hence objects with permanent or seasonally low backscatter become ambiguous and difficult to classify. TU Wien's flood mapping algorithm utilizes a pixel-wise harmonic model derived from SAR datacube (DC) (Bauer-Marschallinger et al., in review) to account for these patterns. Designed to be applied globally in near real-time, our method applies Bayes inference on SAR data in VV polarization. In this method, the harmonic model generates the non-flooded reference distribution, which we then compare against flooded distribution to delineate floods within incoming Sentinel-1 IW GRDH scenes. In the harmonic modeling, we estimate each location's expected temporal backscatter variation, explained by a set of Fourier coefficients. Following recommendations in the literature, a seven coefficient formulation was adopted (Schlaffer et al.,2015) and is here on referred to as our harmonic parameters (HPARs). The HPARs include the backscatter mean and three iterations of two sinusoidal coefficients. This model acts as a smoothened proxy for the measurements in the time series, thus allowing for a seasonally varying backscatter reference to be estimated for any given day-of-year However, generating the harmonic model at a global scale and with high resolution presents significant logistical and technical challenges. Therefore, harmonic modeling of remotely sensed time series is often performed on specialized infrastructures (Liu et al., 2020), such as Google Earth Engine (GEE) (Gorelick et al., 2017) or other highly customized setups (Zhou et al., 2021), where the pixel-wise analysis of multi-year data requires well-defined I/O, data chunking, and parallelization strategies to generate the HPARs in reasonable time and cost. While harmonic analysis is not new, to our knowledge, production and application at a global scale using dense SAR time series have yet to be implemented, let alone operationally utilized. To prepare for global near real-time flood mapping effort, HPARs were systematically computed using a global DC organizing Sentinel-1 IW GRDH datasets. In the DC structure, individual images are stacked, allowing for data abstraction in the spatial and temporal dimensions, making it ideal for time-series analysis. However, for this abstraction to be realized, a rich set of software solutions is needed to implement the 3-dimensional data model. In this contribution, we present our SAR DC software stack and its utilization to compute the aforementioned global harmonic parameters. We show a set of portable and loosely coupled Python packages developed by the TU Wien GEO Microwave Remote Sensing (MRS) group capable of forming a global data cube with minimal overhead from individual satellite images. The stack includes, among others, open-source packages for: ``` high-level data cube abstraction - yeoda, spatial reference and hierarchical tiling system - Equi7Grid, lower-level data access and I/O – veranda, spatial file and folder-based naming and handling – geopathfinder, and product tagging and metadata management – medali.``` The detailed description of the preprocessing and storage infrastructure used for this global DC is outlined by Wagner et al., 2021. Here, we focus on the software interfaces. Moreover, given the preprocessed datasets, the logical entry point is through yeoda, which abstracts well-structured Earth observation data collections as a DC, making high-level operations such as filtering and data loading possible. This level of abstraction is supported by the other components of the software stack, which addresses the organization and lower-level access to the individual files. In a nutshell, the DC is simply a collection of raster datasets in GeoTIFF file format co-registered in the same reference grid. To deploy for large-scale operations, a well-defined grid system is required to deal with high-resolution raster data. A tiling system fulfilling this requirement is the Equi7grid, based on seven equidistant continental projections found to minimize raster image oversampling. Interacting with this tiling system on an abstract level is possible via our in-house developed Equi7Grid package. The tiling system follows a hierarchy of directories to manage the datasets on disk. Moreover, for individual files, a predefined naming convention is applied to indicate spatial, temporal, and ancillary information from product metadata that becomes transparent to yeoda. This setup of customizable file naming schemes is easily managed through the geopathfinder package.
Keywords
TrailGEDCOMParameter (computer programming)Harmonic analysisStack (abstract data type)SoftwareTexture mappingInformationAreaComputer fontSample (statistics)Process (computing)Component-based software engineeringSoftwareStudent's t-distributionCubeParameter (computer programming)Open sourceSet (mathematics)Group actionPresentation of a groupSampling (statistics)Texture mappingProcess (computing)Projective planeComputer fontOpen setFunction (mathematics)Product (business)Object (grammar)Software developerEmailUniverse (mathematics)Computer animation
TrailTexture mappingOperations researchResultantMappingNear-ringProduct (business)Computer programElectric generatorTexture mappingLink (knot theory)Parameter (computer programming)Real-time operating systemMultiplication signProcess (computing)Computer animation
Data managementService (economics)Computer-generated imageryPrice indexThresholding (image processing)PermanentNormal distributionDenial-of-service attackParameter (computer programming)InformationMetadataBroadcast programmingDigital signal processingSatelliteRange (statistics)Asynchronous Transfer ModeOpen setProduct (business)Texture mappingFlagReduction of orderVector potentialBoundary value problemPixelMetreOverlay-NetzNumberCovering spaceoutputCondition numberIntegrated development environmentFunction (mathematics)AlgorithmElectronic mailing listVotingMusical ensembleEvent horizonTrailDistribution (mathematics)Harmonic analysisInferenceThomas BayesBayesian networkMathematical modelCoefficientProduct (business)Denial-of-service attackSampling (statistics)NumberCoefficientTime seriesCalculus of variationsProgram slicingPlotterGraph (mathematics)DemoscenePixelMappingStandard deviationProbability distributionWater vaporInferenceAlgorithmPoint (geometry)Electronic mailing listTrigonometric functionsTwitterBoolean algebraMusical ensembleFunction (mathematics)Multiplication signAreaDistribution (mathematics)Texture mappingArithmetic meanFluid staticsSpring (hydrology)DialectScatteringVotingComputer programIndependence (probability theory)Latent heatSign (mathematics)Drop (liquid)Presentation of a groupComputer animation
Parameter (computer programming)Harmonic analysisCoefficientMedical imagingParameter (computer programming)CubeCalculus of variationsSampling (statistics)Denial-of-service attackMathematicsComputer animation
Harmonic analysisTexture mappingMathematical analysisOpticsObservational studyComputer-generated imageryGoogle EarthOpen sourceMultiplicationDenial-of-service attackSeries (mathematics)Mathematical modelTrailReal numberScale (map)Image resolutionMetadataSound effectGeometryoutputAutomationParameter (computer programming)PixelDigital filterSpacetimeOrbitProcess (computing)SoftwareParallel computingWrapper (data mining)Discrete element methodPhotographic mosaicWärmestrahlungNoiseDemosceneComputer fileTessellationResampling (statistics)Linear mapSigma-algebraCodierung <Programmierung>PreprocessorProduct (business)Data storage deviceComputing platformPoint cloudVirtual machineCubeAbstractionHierarchyData managementTesselationPhysical systemStack (abstract data type)Numbering schemeComputerMathematical optimizationData structureOverhead (computing)Electronic visual displayTime zoneUnified threat managementSound effectDifferent (Kate Ryan album)GeometryPhysical systemProcess (computing)Set (mathematics)Projective planeComputing platformPixelType theoryTask (computing)OrbitSoftwareTesselationMetadataCubeTexture mappingTessellationMereologySupercomputerMathematical analysisDemosceneTestbedOperator (mathematics)Product (business)Parameter (computer programming)Real-time operating systemCodeMultiplication signMappingMathematicsAngleIncidence algebraResultantLink (knot theory)ComputerIntegrated development environmentBitSampling (statistics)Denial-of-service attackComputer animation
Stack (abstract data type)Computer-generated imageryMeta elementDatabaseInterface (computing)MetadataLogicGeometryTrailArray data structureVector spaceRaster graphicsComputer fileFile formatOperations researchComplex analysisSocial classSimilarity (geometry)AgreeablenessInformationInternet service providerSeries (mathematics)Revision controlTemplate (C++)Product (business)Latent heatLaptopRepository (publishing)CubeSign (mathematics)Network topologyLattice (order)Letterpress printingPattern languageRegulärer Ausdruck <Textverarbeitung>Task (computing)SubsetOrbitSupercomputerProcess (computing)Vertex (graph theory)Polarization (waves)Range (statistics)Harmonic analysisLinear regressionMathematical optimizationParallel portRead-only memoryStandard deviationParameter (computer programming)CubeSoftwareInterface (computing)Physical systemSubject indexingImplementationLogicMereologyTime seriesFile systemNetwork topologyDifferent (Kate Ryan album)SupercomputerResultantTesselationCASE <Informatik>Sampling (statistics)Array data structureBitTessellationDatabase normalizationHierarchyMetadataInformationParameter (computer programming)Presentation of a groupProduct (business)DatabasePixelComputing platformObject (grammar)Data structureDirectory servicePoint (geometry)RootkitProcess (computing)Polar coordinate systemPoint cloudScaling (geometry)Mathematical analysisMultiplication signCoefficientLinear regressionHarmonic analysisSquare numberAreaTestbedCross-platformComputer animationProgram flowchart
TessellationSet (mathematics)CubeCalculus of variationsCoefficientWater vaporElectric generatorComputer animationDiagram
MassBoundary value problemGeometryTrailParameter (computer programming)Harmonic analysisParallel portOpen sourceSoftwareFunktionalanalysisProcess (computing)AbstractionCondensationFraction (mathematics)Stack (abstract data type)Series (mathematics)Denial-of-service attackMathematical analysisSet (mathematics)OrbitProduct (business)Texture mappingData managementService (economics)SatelliteBayesian networkInferenceGoogolMultiplicationComputer-generated imageryDesign by contractCubeAutomationEvent horizonFaculty (division)Computer programObservational studyAreaParameter (computer programming)MathematicsSoftwareOrbitHarmonic analysisTessellationSet (mathematics)Repository (publishing)Electric generatorMultiplication signSampling (statistics)NumberMathematical analysisScaling (geometry)Projective planeComputer animation
Transcript: English(auto-generated)
I'm originally from the University of the Philippines, and I'm an open source and open data enthusiast, I would say, back home. But for now, I'm wearing my other hat as a Ph.D. student from the university from TUV. So basically we're presenting this work that we did for this global flood mapping
initiative that we have done for the past couple of years. So I'm doing this presentation on behalf of my colleagues from the microwave remote sensing group. So we're branding now to just the remote sensing group. But these are their names. If you have maybe some questions later on, my email is there.
My colleague Claudio is also here, if you have other questions regarding the software that he is the main architect and main developer. So he's here to support for any other questions regarding that. So with that, I'm going to go through with my presentation. So I have basically three objectives in my presentation. It's just to show this harmonic parameters that we generated, to show the software that
we used, and to show how we used our software to compute those parameters. Okay. So this is the outline of my presentation. I'm just going to give an overview where these products were generated from, which project
they come from, and why do we use harmonic parameters. I'm going to briefly discuss the data cube that we use, and then I'll go to the software that we use, and then the processing, and some sample outputs, and then some just conclusions and outlook for the software and the data sets that we have.
So basically this parameters that we generated, this was generated as an intermediate product under the GRC Copernicus EMS flood mapping program. So this program aims to generate flood maps near real time. So every time Sentinel-1 data comes in, it aims to check if there's a flood there, and
then generate the flood maps within maybe eight hours, or even less. So we would have a flood mapping product without the need to have this activation model. So if you want to view the results of the global product and the global processing,
it's on these links over there, and if you would want to see the WMS endpoint, it's also on the link there. So these are just the general product specifications, which is the main output of the product, which is the flood maps and their accompanying products,
but it's basically a 20-meter product that we do globally. And we aim to have this nice thematic accuracy of about 70% to 80%. If you need to know more about the products in the program, maybe you can check out the resources and the papers about this.
But essentially the flood mapping that we were doing, it's not just one algorithm. We're using an ensemble algorithm that is based out of independent algorithms from DLR list and TUVIN. Unfortunately, I won't be able to give you the details how DLR and lists algorithm work, but if you do have questions about how the TUVIN algorithm works,
I can discuss it with you guys. But basically there's a majority voting, not just of the pixels being flooded or non-flooded, but they also take into account the probability of it being flooded and so on and so forth. But the TUVIN flood mapping algorithm is actually based on just the base inference
between a flooded and a non-flooded probability distribution. So we get our non-flooded probability distribution from a harmonic model that I'm going to discuss for the rest of the presentation. And then we compare it with just probability distribution of water based on Sentinel-1 backscatter that we sampled from different scenes.
So basically if you see this graph here or this plot here, you would see that the backscatter for time series of a particular pixel, it actually varies, and you would see that there are seasonal variations to it. And if you have this very big drop out of that expected backscatter,
you would assume that probably this is flood, because we know that in SAR data, if it has low backscatter, it probably is flood. But we wanted to not just get it based on the decrease, but we use, as I mentioned, base inference.
So basically what you see in that slice there is you see our mean or the expected backscatter at that particular day and the standard deviation of the harmonic model, which we use to generate the non-flooded probability distribution. And you see here this is just the water mean
and then the standard deviation of water for that particular area. And then basically you just compare the probability distribution. So you would know which is water, which is not water. So if it's water, it's probably flooded if it wasn't flooded before. So that's a basic thing. But the cool thing about this algorithm that we're using is that we don't assume that the backscatter of water all throughout the year is just static.
So we know that it changes every time. So for temperate regions, you have the four seasons, winter, spring, summer, and fall. And in some other areas, you also have other seasons where your backscatter would actually change. So probably the decrease in backscatter might not be because of flooding. It might be just because of seasonal variations and all that stuff.
So basically our model takes into account the seasonal variations of the backscatter and not just label floods because there was a decrease in backscatter. So that's the general premise. So the model that we use to model this seasonal variation is this. So it's just based on the harmonic model.
We compute the Fourier coefficients based on this. So we don't have a general trend. We assume that it's a static seasonal model. So we generate the M0 and basically the cosine coefficients
and the sine coefficients of the harmonic model. But as I've shown earlier, we also compute the standard deviation. So we would get the probability distribution that I also mentioned. And as well, we also get the number of sample points that we use when you compute this particular model that we use for our workflow.
But in general, this is how it works. So we have this SAR data cube, which is in euDC and tuvine. And from there, we are able to compute this basically seven coefficients and then the other parameters as well. And using these images, you will be able to compute
any backscatter value at any day of the year. So basically, these are just samples from January, April, July, and October. And you would see that there are seasonal variations on how the backscatter changes. OK, so how or why do we want to show this product with you guys? Because we think that the harmonic parameters can also be used for other purposes.
For example, in some literature, it is also used for maybe vegetation analysis, seasonal changes in vegetation. They can also use this for change detection in vegetation. A colleague of ours also did use this similar analysis to map different types of wetlands,
whether it's seasonally flooded or it's just partially flooded and all that stuff. So these are the things that you can do with this data aside from the flood mapping that we're currently doing. But generating this product for the globe is not really that easy.
I mean, if you guys are very much familiar with using Google Earth Engine, it's easy for you guys because you would just maybe click and use the code already there. But if you don't have that particular platform or maybe you have your own data sets that you would want to analyze and stuff, it's not really that easy to do. So for us, since we have our own Sentinel-1 data cube,
we were faced with these challenges. So basically, the GFM products should be generated in near real time. So these parameters that I showed you, we should process this beforehand. So we would have this bulk processing of all the Sentinel-1 data scenes of the whole globe and have all of these parameters generated beforehand.
And another tricky part about SAR data is that it has this geometric effects. If you're familiar with SAR, you know that based on the incidence angle, it would have this change in backscatter results. So we had to take this into account. So we have to do some filtering based on the data cube that we have
and just match these data sets with what was coming in and match it with orbits or basically the incidence angles that you have. Okay, so basically this is a very technically and logistically challenging task that we didn't have, I mean, we do have the technology to do it
because we do have this software that I'm going to present to you guys. But basically, we did a pixel-wise analysis of all of the pixels that we have in our data cube and generated this particular product for the rest of the globe. Just to give you a brief maybe snapshot of the data cube that we're using, we have our own data cube that was pre-processed using this process over here,
but I don't think I do have time to get into detail. But I would just want to point out the main difference that we have compared to other SAR data cubes that are out there is that our data cube is sampled in the Equi7 resampling and tiling system that I would show in a little bit.
And also, we have it in 20 meters, not in the native 10-meter sampling. Okay, but this is just the overview of everything that I would want to impart with you guys is that we have this software stack, which is topped out by IODA. It has Equi7 grid, Geopath Finder, Veranda, and also we have Windali.
We use this software to compute the harmonic parameters all over the globe. And we do this not just doing it on operations, but we started out with our experiments with our own local machines. We have it on a test bed. Then eventually, we ran it in a high-performance computing environment to do all the computations for all of the globe.
So, this is just the data cube software stack that we're using and we're developing. So, these are the GitHub links that you can see over there. So, and some little summary of what those software are for.
So, basically, on the top of it, we have IODA, we have the Equi7 grid software, Geopath Finder for looking up spatial files and folder-based naming and handling. We also have Veranda, which is similar to, you can think of it as Rasterio in a data cube setting. And then we also have Medali for tagging metadata for all of our results
and ingesting metadata from other. So, just to briefly show you what these softwares are all about. So, we have the Equi7 grid software, which basically we use to tile the whole globe. So, we have seven specific subgrids for the rest of the globe.
Why do we have this particular tiling system? We thought, I mean, there's a paper on it, so you can look it up. But basically, we want to reduce the oversampling of pixels for every, oversampling for the pixels. So, we don't want to waste the pixels that, the samples that you get for every pixels that you have.
So, we have this projection at different tiling systems, at different, you have this different tile sizes and all that stuff. But in the software sense, we have this software that allows you to do that programmatically. So, it's based on Python as well. So, Geopath Finder now is a software that allows us to,
like, search folders, search file names. And basically, in other maybe data cube implementations, you would have this indexing thing. But for us, we do it via folder systems, file naming. And basically, Geopath Finder allows us to generate the file naming,
look up the tree, go up and down the tree for us to be able to search and also maybe later on know where the path, where we would write this results that we have. But it's basically just, it's using file system logic, where we have different grids in different tiles in different folders.
So, as I mentioned, Veranda is just our IO software. So, based on the data that you can read from the different geotiffs, you can create it into a NumPy erase or X-erase that, of course, you can analyse later on.
And lastly, we have Medali, which basically just writes the metadata for all the things that you would need that are not part of the general metadata that you have for your processing. For example, for us, we have to put in the redundancy of the harmonic parameters. And we also have to put in some other metadata information that's not in the usual metadata themes that you have,
which is basically it gets the details from the processing and then it puts it into the files. And again, Yoda just tops it all off and puts it all together. And it basically allows you to do the filtering in the higher level data cube access.
So, I'm going to show just a little bit of some samples of how Yoda and the whole stack, how it works. But basically, we start out with GeoPath Finder to just point and search where the files are. So, in our case, we have this hierarchical tiling system where they're in particular folders. You just point to where the root directory are
and it will search those directory and get the tree and all those files within that particular folder structure. And once you have the files, you know where the files are, you can now instantiate your data cube where we use, again, you use Yoda. But first, you would need to define at which grid system that you're using.
So, we define that it's in this particular grid system. In this case, it's from the EU subgrid. And then there, you would have your data cube object. And from there, you can do your usual data cube stuff or you can do your filtering, extracting data, and then maybe getting some time series for individual pixels and all that stuff.
It's very similar to all of the other maybe data cube products that you have seen in other presentations so far. All right, so what's the benefit of these software stack? It's basically false. It's platform independent. We only just need files in the file system.
You don't need a database. You don't need like geojson to like an overhead stuff to index your files. You only have the file systems and the file naming convention that we define. So, basically, you just need the software dependencies and the files
in your file system and then you're good. You can run data cube analysis. So, in the harmonic parameter case, we did our experiments in our local PCs and then we have this testing bed in the cloud where we can maybe test it on a larger area. But in the end, it's the same software in the same system
that we use for your local PC up until the HPC system. So, it makes easier for the researchers like me to perform the analysis on a small scale up into the global scale. So, just a rundown on how we do the global processing. So, we do the high-performance computing jobs per continent
and then per continent, it's subdivided into several tiles. In each tile, we have one node for the HPC. And then, essentially, we do the filtering for each of those particular tiles and then we perform our least squares regression to compute our harmonic coefficients.
And then, in the end, of course, we have to put in all the metadata using Medali and, of course, each and every step, we see the software that we use to generate these harmonic parameters. Because I don't have enough time, I just want to show that these are the software interfaces and how it works
if you are interested in how to use it. But essentially, these are just the... You need your pathfinder to get the path for your GTIFs. And then, that's basically what you need aside from ECU 7 grid to define your data cube. Just a quick maybe sneak peek to the data set that we have.
So, these are just examples, example tiles that we have generated. For this particular example tile, it just shows that we capture seasonal variation in water based on these coefficients that we computed. And on the next one, we have, again, the parameters.
And you would see... I don't think if you can see it, but there's backscatter change in the vegetation in this particular area in UK. So, basically, this is the coverage of the parameters that we have generated. So, it mainly covers most of the globe, maybe some areas in Greenland
where we don't have enough samples. In some areas where we do have low number of samples where the parameters are not too good. But in general, we covered most of the globe. So, in conclusion, we were able to generate Sentinel-1 harmonic parameter data sets using the software that we have.
So, based on Yoda and its other softwares. So, basically, we did this for a data set. For now, we did it for 2019 to 2020. But it's practically possible to do it at a longer scale. But aside from... Maybe the thing that I would also want to announce now is that aside from our software being open,
the harmonic parameters were also opening up to the public. Unfortunately, it didn't make it in time for us to put it in the repository. But maybe in late September, you would be able to access that data set. But it would be somewhere here in this repository. So, there's about 5 terabytes of data for the harmonic data set
for six continents. And it's in the tile orbit sets that I've mentioned. And basically, that's it. And these are my references. And I would just want to acknowledge our partners in the project.
So, UDC and the Vienna Scientific Cluster where we did this analysis and all of our colleagues there. And for me, personally, I would want to thank my scholarship for my PhD and also the travel grant from the Phosphorgy Organizing Committee. Yeah. And that's it.
And if you would... Thank you very much.