We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

FOSS4G Europe 2024 Tartu - Serving earth observation data with GeoServer: COG, STAC, OpenSearch and more...

00:00

Formal Metadata

Title
FOSS4G Europe 2024 Tartu - Serving earth observation data with GeoServer: COG, STAC, OpenSearch and more...
Title of Series
Number of Parts
156
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Never before have we had such a rich collection of satellite imagery available to both companies and the general public. Between missions such as Landsat 8 and Sentinels and the explosion of cubesats, as well as the free availability of worldwide data from the European Copernicus program and from Drones, a veritable flood of data is made available for everyday usage. Managing, locating and displaying such a large volume of satellite images can be challenging. Join this presentation to learn how GeoServer can help with with that job, with real world examples, including: * Indexing and locating images using The OpenSearch for EO and STAC protocols * Managing large volumes of satellite images, in an efficient and cost effective way, using Cloud Optimized GeoTIFFs. * Visualize mosaics of images, creating composite with the right set of views (filtering), in the desired stacking order (color on top, most recent on top, less cloudy on top, your choice) * Perform both small and large extractions of imagery using the WCS and WPS protocols * Generate and view time based animations of the above mosaics, in a period of interest * Perform band algebra operations using Jiffle Attend this talk to get a good update on the latest GeoServer capabilities in the Earth Observation field.
Keywords
127
Streaming mediaProjective planeSoftware developerMultiplication signOffice suiteOpen sourceLecture/Conference
Source codeSoftwareInformationPhotographic mosaicBeta functionGamma functionProduct (business)Service (economics)Landing pageOpen sourceMereologyProjective planeMultiplication signTemplate (C++)Temporal logicCommunications protocolStack (abstract data type)SatelliteProduct (business)Set (mathematics)Classical physicsSpacetimeFrequencyComputer fileTerm (mathematics)ImplementationServer (computing)Connectivity (graph theory)Library catalogSource codeUniformer RaumPhotographic mosaicService (economics)Representational state transferMusical ensemblePredictabilityPresentation of a groupMetadataRepresentation (politics)Right angleRaster graphicsMetrologieComputer animation
Landing pageQuery languageMenu (computing)Service (economics)Product (business)Web pageWhiteboardSource codeSoftwareData typeOrbitPrincipal ideal domainInformation securityLevel (video gaming)CoprocessorBoom (sailing)Design of experimentsCovering spaceTemporal logicPoint cloudComputing platformSerial portInheritance (object-oriented programming)Point (geometry)Set (mathematics)Landing pageRepresentation (politics)Product (business)Category of beingElectronic mailing listMedical imagingXMLComputer animation
Product (business)InformationComa BerenicesRow (database)SoftwareSource codeComputing platformMedical imagingProduct (business)TelecommunicationMechanism designVirtual machineTemplate (C++)Computer animation
HypermediaSoftwareSource codeMechanism designAdditionLatent heatOpen setParallel portFunctional (mathematics)Function (mathematics)XMLComputer animation
SoftwareSource codeMaizePrice indexBinary filePhotographic mosaicData managementSystem administratorRepresentational state transferProjective planeService (economics)Raster graphicsXMLComputer animation
Source codeSoftwarePlug-in (computing)GoogolData structureData transmissionPlug-in (computing)File systemPhysical systemLocal ringPoint cloudSoftwareServer (computing)Computer fileSource codeXMLComputer animation
Source codeSoftwareIntegerGeometryComplete metric spaceServer (computing)Photographic mosaicSource codeElement (mathematics)Computer filePresentation of a groupUniform resource locatorMereologyLibrary catalogLink (knot theory)Figurate numberPlug-in (computing)Slide rulePoint cloudTable (information)Mechanism designDifferent (Kate Ryan album)Run time (program lifecycle phase)Data storage deviceInformationAttribute grammarSet (mathematics)Subject indexingRelational databaseEndliche ModelltheorieGraph coloringPower (physics)Point (geometry)Projective planeDimensional analysisParameter (computer programming)Multiplication signStack (abstract data type)Medical imagingProduct (business)NumberLocal ringMatter waveUniformer RaumPredictabilityScaling (geometry)Covering spaceNumeral (linguistics)Alpha (investment)File formatCommunications protocolXMLSource codeComputer animation
SoftwareSource codePhotographic mosaicTerm (mathematics)Subject indexingAreaMedical imagingPoint cloud1 (number)Multiplication signXMLComputer animation
Musical ensembleFunction (mathematics)SoftwareSource codeView (database)MetreImage resolutionMusical ensembleLevel (video gaming)Computer fileConnectivity (graph theory)View (database)Set (mathematics)Source codeAlgebraDifferent (Kate Ryan album)Raster graphicsXML
Configuration spaceComputer-generated imagerySoftwareSource codeMusical ensembleMedical imagingAlgebraGeometryBitSpectrum (functional analysis)File formatMusical ensembleHypercubeImage resolutionNP-hardComputer fileCASE <Informatik>PixelLink (knot theory)Raster graphicsSource codeXMLComputer animation
SoftwareSource codePresentation of a groupDefault (computer science)Hausdorff dimensionMaß <Mathematik>Computer reservations systemPhotographic mosaicMedical imagingMultiplication signTesselationProduct (business)Service (economics)Point (geometry)Associative propertyComputer animation
Source codeSoftwareTransformation (genetics)Dimensional analysisMultiplication signComputer configurationField (computer science)MultiplicationFilter <Stochastik>Table (information)Computer fileMedical imagingComputing platformTransformation (genetics)Subject indexingProcess (computing)QuicksortService (economics)Artistic renderingComputer animation
Artistic renderingSource codeSoftwareElectric generatorConnectivity (graph theory)Computer fileCASE <Informatik>WeightResultantSubject indexingRaster graphicsMiniDiscLink (knot theory)Computer animationSource codeXML
Source codeSoftwareScaling (geometry)InterpolationWeb 2.0Communications protocolService (economics)Row (database)Raw image formatSubsetPhotographic mosaicCuboidImage resolutionMusical ensembleSelectivity (electronic)Physical systemComputer animationSource codeXML
Source codeSoftwareDefault (computer science)Projective planeBitCommunications protocolAttribute grammarParameter (computer programming)TouchscreenClient (computing)Medical imagingRow (database)Photographic mosaicSource code
Link (knot theory)SoftwareSource codeProcess (computing)GEDCOMUniform resource locatorConnected spaceCommunications protocolMultiplication signWeb 2.0Process (computing)Analytic setService (economics)Computer animationXML
MetreServer (computing)Raster graphicsProcess (computing)ResultantPolygonVector spaceRow (database)Complex (psychology)Image resolutionSource codeComputer animation
Function (mathematics)Transformation (genetics)SoftwareSource codeProcess (computing)Function (mathematics)Level (video gaming)BitMetreMultiplication signAreaImage resolutionResultantRow (database)Limit (category theory)Computer animation
Multiplication signFrame problemTime evolutionProcess (computing)MappingMoment (mathematics)
Source codeSoftwareMultiplication signExtension (kinesiology)Dimensional analysisSingle-precision floating-point formatLoginZoom lensProduct (business)Row (database)AreaMatching (graph theory)Computer iconFile viewerCommunications protocolFunctional (mathematics)Link (knot theory)Server (computing)WebsiteGroup actionComputer animation
Link (knot theory)X-ray computed tomographyDatabasePresentation of a groupMedical imagingDiagramMereologyTerm (mathematics)Server (computing)WebsiteProgram slicingElasticity (physics)AreaBasis <Mathematik>Photographic mosaicTime seriesData storage deviceLevel (video gaming)Library catalogCodeMassAuditory maskingProcess (computing)QuicksortFunction (mathematics)Service (economics)BitMechanism designPrice indexComputer architecturePlastikkarteStack (abstract data type)Computer fileHierarchyShape (magazine)Virtual machineVector spaceSubject indexingGeometryPolygonSet (mathematics)Scaling (geometry)Key (cryptography)Semiconductor memoryDefault (computer science)Multiplication signBefehlsprozessorCASE <Informatik>Power (physics)Raw image formatSlide ruleExtension (kinesiology)Structural load1 (number)ImplementationFlagReliefModule (mathematics)CuboidMultiplicationComputing platformGoodness of fitRemote procedure callHorizonRaster graphicsView (database)Order (biology)Product (business)Classical physicsLecture/Conference
Computer-assisted translationComputer animation
Transcript: English(auto-generated)
So yeah, welcome to this talk and I would like to thank my employer, Gsolutions, for offering me the possibility to be here today. Gsolutions is a company based in Italy with offices in the United States and soon in Dubai. We support open source projects and we are actually core developers of each one of them.
So you can contact us anytime you need something fixed or a new feature developed that you really need to get into the project. We strongly believe in open source, we are part of OSGEO, we participate in OGC and so on. So let's talk about publishing raster data.
So let's say you have a ton of it and as you see from the previous presentation, there is a lot of sources right now of raster data at high frequency of collection. How do you, first of all, locate that? Well, in terms of protocols, right now the most common protocol to catalog and search for
satellite data in general or metrological prediction or whatever has a space and time component is STACK. STACK stands for Spatial-Temporal Asset Catalog. It's a RESTful API reminiscent actually compatible with OGC API features which collects all your products and organizes it in collections.
So the idea is that you have a collection which is a uniform set of satellite data products. You have the products inside the items and each item has a bunch of typically files which are called assets.
So take the classic Sentinel-2 dataset, it's a set of 13 files plus a bunch, one per band, plus a bunch of metadata files. Each one of them would be an asset. And in GeoServer you can search for the collections, you can search for the products and you can even use the assets as they are to build an image mosaic.
So this is a screenshot from the German Space Agency EOC service which is based on GeoServer STACK implementation. The STACK representation of resources in HTML in GeoServer is customizable through templates and this is what they did for theirs.
So this is the landing page where you can find all the entry points for the API. This is a set of collections, they customize the representation of each collection with a browse image, with a bunch of properties and keywords and so on. This is the representation of one single collection and here there is a list
of items, so products inside the collection, each one of them is a satellite take. And when you look into one particular product you can see all the assets and a browse image and so on. And this is, as I said, all customized.
I've shown you the customization of HTML but each of these resources is also represented as a JSON resource for machine-to-machine communication. And the JSON resource can be customized just the same through template mechanisms. In addition to that, in parallel, we also have open search for EO which is an older specification to do exactly the same, again from OTC.
It was based on RSS rather than GeoJSON and, well, it provides more or less the same principles and functionality. So that's one example of a search. The search is for anything in Sentinel 2 that has a cloud cover of less than 30% and the output is typically RSS.
Backing those two projects, sorry, those two services, we have an administration REST API that can be used to ingest more data, allocate layers out of collections, create image mosaics and the like.
So full automation of the data management. Okay, say that you decided what you would like to use or see, how do you access raster data? Well, the single assets can be located anywhere on the file system, on the network, but also on S3 through the Coq plugin.
If you are using cloud-optimized geotiff, you can store them in S3, Azure, Google or whatever HTTP server, but all the same you can put them on a local file system or a network file system. The Coq plugin takes full advantage of the optimized structure of the cloud-optimized geotiff to reduce
the amount of data transfer and only pick the data that it needs out of the data source. Okay, so GeoServer is thus able to access the assets. How do we have a look at them, Mosaic in them typically? Okay, so in GeoServer we have the concept of Mosaic.
The Mosaic is based on an index which can be imagined as a table having a footprint of the file, a location of where the file is and then a set of alphanumeric information like time, cloud covers, no cover, whatever information might be attached to it.
And all this information can be used for filtering, sorting and then eventually locating the data and rendering it. The elements of Mosaic are called granules, the single files and we have complete freedom over them. They can overlap as they please, they don't have to be nicely aligned. They can be in different file formats, they can be in different projections and within reason they can
be in different color models like I can mix together in the same image, grayscale and RGB for example. As I said, the table Mosaic index is our filtering and lookup mechanism. It can be located in a number of actual data sources like relational databases but it can
also be another stack API, so a remote stack API, I'll have a slide about it later. And if you don't like what we have, there's a plug-in mechanism you can write your own image Mosaic index data source of your own. Maybe to connect to an in-house catalog that you already have without having to duplicate the data.
As I said, many of the attributes are used for search and in particular time and elevation are well known search parameters in the WMS and WMTS protocol. And you can use them to quickly locate the data at a particular point in time and any other attribute can be used as a custom dimension.
So you want to filter by wavelength for example or by run time if you are talking about weather forecast like when you did run the prediction, that's another possibility. We can use the internal stack API as an index for the Mosaic, so
think about your collection that it has many products inside, they are typically all uniform. We can take that search index and use it as the index for an image Mosaic so I can literally ask your server, make me an image Mosaic out of that collection or make me an image Mosaic out of a portion of that collection. I could choose sub parts of the collection to build a layer that I can then publish through WMS and WMTS.
If I don't like the internal one, well I can go to an external one. Let's say that I have a stack API which is remote and it's providing me access to coq files which are accessible to the server. Well then I can use the stack store to connect to that external stack API and use it to power a local image of Mosaic.
So my GeoServer will talk to the stack API, figure out the assets that it needs and then use the coq plugin to read them and generate images, composites.
A bunch of documentation links for you to investigate this topic more after the presentation. Okay, so those were the basics of image Mosaic. What can I do in terms of fun stuff? Well, as I said, image Mosaic indexes can be used for filtering but not just for filtering, also for sorting.
So that I can say something like, make me a composite of all these assets and sort them by time, most recent on top. Or no, wait, let's have the most cloudless image on top instead. So I could say something like, take all the images in this area from last week, composite them, putting the ones with the least clouds on top.
Another interesting thing is coverage views. So many data sources come with bands which are separately stored.
Like Sentinel 2 typically has 13 files because not all files share the same resolution. We have 10, 20 and 60 meters of resolution so we have a set of 13 different files. But often you want to do composites or map algebra over a bunch of them, maybe at different resolutions. Coverage view allow you to take different bands coming from different files or within the same file.
Think about a netCDF with a UNV component of the wind and you can say, okay, now make me a virtual raster that has bands coming from the various different sources. And then I can use it for RGB composites or for map algebra. This is another example exactly from the Sentinel 2 case. Those are a bunch of T files coming from Sentinel 2.
I choose them all, put them together as bands of a new virtual raster that typically, well, that is resampled on the fly on whatever resolution I choose. One of the three, the 10, the 20 or the 60. I have to decide on which
one to uniform and the other two will be resampled on the fly to allow to build composites. We have hyperspectral imagery support. At the beginning it was kind of hard because hyperspectral images can have hundreds if not thousands of bands. And we had to optimize GeoServer a little bit to read efficiently from this.
Well, they are not file formats because typically they are still geotaves but they are organized per band rather than per pixel. And, well, at the beginning we had to struggle a little bit but now it's working fine. Again, a bunch of documentation links for you to follow and drill down into this topic more.
Okay, so now how do I see this image mosaics? Well, WMS. I can just point to the layer that is compositing through the mosaic all these images and have a look. We have WMS-T as in time support so the time associated to all these products is exposed through the capabilities and
can be used to filter the images and choose a particular time either in WMS or in WMTS for the tile services. We have support for multiple dimensions at the same time if you need so this is one example
of filtering over multiple dimensions on the last update of the file and another dimension at the same time. So it all turns into filters against the index table. We have the ability to perform filtering, this is a vendor option, adding SQL filter equals to whatever,
platform equals sentinel 2 for example or cloud cover greater, sorry, less than 30% or stuff like that. And it all turns into filters against the index table which we use to look up the right images. And we also have another vendor option in WMS which is sort by which allows us to sort on
whatever field we want to have the cloudless on top or the most recent on top and so on. We have rendering transformations which are pretty useful. Rendering transformations are this idea that I can run a quick process on top of my data
to turn it into something else like on the fly isoline extraction for example in this example. Or in this case taking a multiband raster and calculating on the fly the NDVI index and then rendering the result as a map.
Without ever storing the NDVI anywhere on disk. Again a bunch of documentation links for you. Okay, let's say that I found the data that I want, I prepared the composite that I like, what do I do about getting the raw data?
Well, WCS is the OGC protocol for doing that. It's web coverage service, it's designed to download a subset of a large image mosaic in a particular bounding box, a particular resolution with a particular band selection, eventually with a particular coordinate reference system with on the fly reprojection.
The protocol can describe all the bits and pieces that make up a particular coverage. And again to match what you can do with WMS we added vendor parameters to our WCS so that you can add on top of it filtering into the mosaic and sorting based on attributes.
The idea is that you go prepare the image that you would like to see on screen through the client and then you can invoke WCS and download exactly what you are seeing as raw data rather than as a pretty image. However, WCS is not always the right answer.
Why? Because WCS is a synchronous protocol. You make a request and you sit on that HTTP connection waiting for the data to come back, which is fine if you are trying to download 100 megabytes of data. But if you are trying to download 20 gigabytes of data that might take a long time and the HTTP connection can go bye-bye and leave you with nothing.
So we need an asynchronous protocol to support a large download capability. What is the protocol in OGC that does asynchronous? WPS, Web Processing Service, which can be used for many things, not just analytics.
So in Jupyter we created a process called WPS download that can be used to perform large raw download of data, both vector and raster. So you can point at your complex large data source, indicate a large polygon that you would like to download at native resolution,
and send a request in an asynchronous way, poll the server to verify when the result is ready and when it's ready you get it. And that solves the issue of timeouts.
Sometimes what you want is not to download the raw data, but to download the processing or the rendering result, like the NDVI output for example. We extended this download process so that you can also download large rendered map.
Same principle. You can do a get map, it's normally fine, but if you are trying to download a map which is very large, you can stumble into time limits, size limits, and so on. And the download might not be happening successfully. And so we have this asynchronous process that allows to download large rendered map.
And so you ask for your NDVI output on a very large area at maybe 2 meter resolution, the map might be 10 gigabytes worth of geotiff rendered as NDVI. If you do it asynchronously, that's not a problem, you just have to wait a little bit.
Finally, since most of this data is time-based, it's interesting to perform animations on top of it and to see a time evolution of your data. So this is, I think, Meteosat over various moments in time.
Again, you can use the download animation process. It will execute all the get maps that it needs to create the frames of the animation and then composite it into an MP4 that you can then download sometime later.
If you want to see all of this that I've been talking about in action, you can go to the UmetSat product viewer at that link. It's open. You can get into it. It will allow you to do time navigation.
The time navigation is powered by another extension that we have in GeoServer which we call WMTS Multidimensional. It's a protocol that allows you to say, oh, can you tell me in this area what are all the single time steps that I have? And then we display them on the time slider. And you can drill in or zoom out, choose a particular date, and you can see that there are multiple layers with one clock yellow.
That means that layer is actually driving the time slider and the others are adapting to it with nearest neighbor matching, which is also something that we implemented recently in GeoServer, nearest neighbor time or dimension matching in general.
And if you take the time to create your own login for it, you can register for this site. It's also free. Then you can use the large row download and the large animation download functionality as well.
They are those two icons here, the camera and the video camera. They allow you to do row downloads or the download of what you're seeing and the download of an animation. And with this, I'm done.
All right, thank you for this interesting presentation. Now we have the chance to ask some questions.
Yeah, thanks for the talk. I have a question regarding the composite image. So when you are compositing multiple products into one larger mosaic, does the process read all of the data or stops when the coverage is full?
So do you check on that? Good question. So by default, it would load and composite all the images that match the current bounding box and filter. But we have one flag that can be enabled to perform footprint removal. And in that case, once the rendering area is full, is already covered by the first images that we are rendering, then it stops.
So we call it excess feature removal. And for that to work, we need each image to have a footprint.
So to know in advance, without having to open the whole raster, what part of the raster is actually covered. And that could be an ROI, a binary mask in the file, or it could be a sidecar polygon as a vector. Yep, that's really important when you have deep time availability.
Because you need to stop after probably two, three times instead of trying to render 200 that you have in that area. Thanks, Andre. You were explaining about the stack API remote stuff.
Can I compare that to a cascading service, or is it connecting on the fly to a remote stack catalog? Yes. Let me see if I can find again the slide. It was somewhere here.
Yeah. So yeah. To some extent, it's like connecting to a remote to WFS. Except that when we do remote to WFS, what do we do then? We either transform it to WMS, because we are maybe rendering a map, or we expose another WMS. In this case, the stack store, it's like any other data store.
So we can render the footprints as a map if we want through WMS, and do whatever we normally do with WMS, because the store is really just like a shape file or a database and so on. Or we can use it to power an image mosaic. So those are the two intended use cases.
Then if you want to expose that as OGC API features again, or WFS, you can do it as well. Do you have any architecture recommendations in terms of orchestration or deployment,
so if you get a massive amount of data and you need multiple view servers to dice and slice and combine that later on and deliver that to the client?
Okay. So I don't have a good diagram for this, but in terms of architecture, you need to have probably some sort of hierarchical storage for the images, especially if you have a very deep time sequence, maybe with local fast storage for the images that you think are going to be used the most,
and then progressively go towards cheaper and colder storage for the older images, and possibly make that transparent to GeoServer. Well, I mean, if you are using the Stack API and so on, it's just a matter of changing the links sometimes. Then you need an efficient database to back the stack.
We normally use PostgresQL, but it could be done with something else. We are just storing an index, so it's not a massive amount of data. Once you have a few 10 millions of entries, you already have a very large raster catalogue, but the database is still relatively small. And then for the services, and mostly for the raster compositing,
you need a relatively powerful machine in terms of CPUs. That's kind of the one key mistake that people do when they deploy GeoServer. They go for the memory-optimized machines. Don't do that. Go for the CPU-optimized ones.
You don't need that much memory. You need that much processing power. And then, well, it's a matter of scaling it up on an as-needed basis, so whatever elastic scaling mechanism is offered by your platform. Give or take, those are the basic indications.
You're welcome. So then I would have one question. Do you see the new OGC APIs as a candidate for also sharing this EO data? So because currently we see WMS, WCS, do you see some candidate popping up in the horizon
that is also capable for this EO data? Of course. Well, OGC APIs are basically doing the same job as the classic OGC services. So it's the same job done in a different way. GeoServer already has community modules for OGC API features, maps, styles, coverages.
They are at various stages of implementation. Features is already site compliant. The orders are sometimes incomplete, but they are there. You can go and use them. So, for example, the DLR example that I was showing with the Stack API, this one,
it's actually based on the same machinery as all the other OGC APIs, and it's sharing quite a bit of code with OGC API features. So, yeah, they are already there. You can use them now. DLR is exposing a full set of OGC APIs through GeoServer this way. All right. That's interesting to hear.
Okay, then thanks again, Andrea. You're welcome.