We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Update on Modular OGC API Workflows specifications

00:00

Formale Metadaten

Titel
Update on Modular OGC API Workflows specifications
Serientitel
Anzahl der Teile
351
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
Produktionsjahr2022

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Update on the status of OGC API standards and draft specifications enabling client-driven execution of processing workflows, supporting on-demand and ad-hoc selection of data and algorithms. Overview of the capabilities enabled by OGC API - Tiles, OGC API - Coverages and Processes – Part 3: Workflows and Chaining. Demonstration of both a server and a client implementing these specifications. The Workflows and Chaining draft extension specification to OGC API – Processes enables ad-hoc execution of workflows integrating processes and data available from one or more OGC API instances. The specification allows triggering processing as a result of requesting results for a specific area and resolution of interest, which provides a simple mechanism to chain geospatial data inputs and outputs. By referring to a collection of geospatial data irrespective of a particular area, resolution or date/time of interest, workflows can be defined in a generic, re-usable manner, and processing can be performed on-demand rather than (or in addition to) as a batch execution. Such on-demand processing has the advantage of optimizing the use of computing resources and speeding up the availability of the latest available data, such as for continuously captured Earth Observation satellite imagery. The initial version of the Workflows and Chaining specification was a result of a GeoConnections 2020-2021 project funded by Natural Resources Canada, which also supported the development of a unified OGC API driver in GDAL allowing to directly visualize the results of such workflows in QGIS. OGC API – Tiles is the specification succeeding to WMTS in the OGC API family, leveraging the concept of 2D Tile Matrix Sets. In addition to providing tiles of maps or imagery, Tiles can also be used to distribute raw data tiles, including coverage and vector tiles. Using tiles to deliver results and trigger execution of processing workflows can facilitate caching while allowing to efficiently select an area and resolution of interest. OGC API – Coverages is the specification suceeding to WCS in the OGC API family, and provides a simple mechanism to request an optionally down-sampled subset of a coverage. Specific fields (e.g. imagery band) can be selected as needed. The Coverages specification can also be used to request results while triggering execution of a workflow.
Schlagwörter
202
Vorschaubild
1:16:05
226
242
Modul <Datentyp>RankingStochastischer ProzessFlächeninhaltDesintegration <Mathematik>Reservierungssystem <Warteschlangentheorie>SpeicherabzugGradientInhalt <Mathematik>SatellitensystemDifferenteAbfrageTermGrenzschichtablösungReservierungssystem <Warteschlangentheorie>Luenberger-BeobachterStochastischer ProzessVektorraumCASE <Informatik>MAPUmwandlungsenthalpieTrajektorie <Kinematik>Markov-ProzessAuflösung <Mathematik>Exogene VariableStandardabweichungMechanismus-Design-TheorieKartesische KoordinatenLeistung <Physik>FokalpunktStellenringServerStapeldateiDiskrete UntergruppeIntegralMereologieErwartungswertFlächeninhaltFunktion <Mathematik>AuthentifikationMapping <Computergraphik>SchnittmengeRückkopplungAlgorithmusParametersystemSchaltnetzModul <Datentyp>BildverstehenMathematische LogikParkettierungÄhnlichkeitsgeometrieSpeicherabzugVisualisierungAnalysisRichtungFamilie <Mathematik>Framework <Informatik>Filterung <Stochastik>ProgrammierumgebungMultiplikationsoperatorMailing-ListeQuadratzahlResultanteZeitzoneKonfiguration <Informatik>RohdatenDateiformatInformation RetrievalRechter WinkelVolumenvisualisierungBandmatrixSechseckPhysikalisches SystemBitStrömungsrichtungComputeranimation
KonfigurationsraumURLSpeicherabzugWurzel <Mathematik>Euler-WinkelSchnittmengeIntelMereologieStochastischer ProzessPräprozessorVerkettung <Informatik>VerhandlungstheorieServerExogene VariableKraftfahrzeugmechatronikerWeb-SeiteMailing-ListeFormation <Mathematik>Cloud ComputingStochastischer ProzessFunktion <Mathematik>ImplementierungFlächeninhaltEin-AusgabeATMDifferenzkernStichprobenumfangDeskriptive StatistikMereologieClientZoomKlasse <Mathematik>ParkettierungAdditionExogene VariableVerschlingungDomain <Netzwerk>Mechanismus-Design-TheorieMarkov-ProzessUmwandlungsenthalpieKonforme AbbildungResultanteE-MailProzess <Informatik>Auflösung <Mathematik>Mapping <Computergraphik>TermInverser LimesDatensatzVektorraumVisualisierungSpeicherabzugSchnittmengeStandardabweichungNotebook-ComputerCodeSynchronisierungMetadatenTesselationMatrizenrechnungElektronische PublikationDateiformatMetrisches SystemSelbstrepräsentationKartesische KoordinatenNichtlinearer OperatorMailing-ListePunktRohdatenQuellcodeFormation <Mathematik>PerspektiveServerTopologieTypentheorieURLEinfach zusammenhängender RaumDifferenteGewicht <Ausgleichsrechnung>DatenfeldSystemtechnikVerteilungsfunktionQuaderSpannweite <Stochastik>BiostatistikAutomatische DifferentiationRechter WinkelAbstraktionsebeneComputeranimation
Formation <Mathematik>Front-End <Software>SpeicherabzugMereologieDifferenteDatenbankWald <Graphentheorie>RandomisierungEndliche ModelltheorieWürfelComputeranimation
GruppenoperationEin-AusgabeWürfelParkettierungWald <Graphentheorie>PunktDeskriptive StatistikTesselationStochastischer ProzessRandomisierungComputeranimation
VorhersagbarkeitTesselationMarkov-ProzessStochastischer ProzessMereologieAppletComputeranimation
Peer-to-Peer-NetzSinusfunktionResultanteTesselationParkettierungStochastischer ProzessRoutingVektorraumOffene MengeComputeranimation
ClientServerVerkettung <Informatik>Stochastischer ProzessFlächeninhaltWiderspruchsfreiheitZellularer AutomatMechanismus-Design-TheorieGruppenoperationDateiformatE-MailFormation <Mathematik>StichprobeExogene VariableChinesischer RestsatzAlgorithmusInformationPasswortImplementierungBenutzerbeteiligungMereologieKategorie <Mathematik>Arithmetischer AusdruckVerschlingungDoS-AttackeMAPClientPunktStochastischer ProzessEinfach zusammenhängender RaumGebäude <Mathematik>AlgorithmusVerzweigendes ProgrammStichprobenumfangVideokonferenzFlächeninhaltDatenaustauschFilterung <Stochastik>RoutingTermTreiber <Programm>Coxeter-GruppeWürfelTeilmengeProjektive EbeneQuick-SortRechter WinkelServerVollständiger VerbandInformationDifferenzkernDatenverarbeitungTestbedElementargeometrieComputeranimation
Transkript: Englisch(automatisch erzeugt)
Hi. Thank you everyone for coming to my talk. So today I'd like to talk to you about modular OGC API workflows and the OGC API specifications that make these possible. So first, basically the vision is to try to instantly integrate geospatial data and processes
that are available from anywhere for both visualization and analysis. That's the goal. And basically the hurdles with geoprocessing workflow is with batch processing specifically. So it takes a very long time, right, to wait for a whole batch processing to complete.
So you have to wait before you can do anything else. The other challenge is it's difficult to bring together different data sets and processing capabilities that are served from different places. So that's kind of the main challenges that we're trying to solve with this OGC API processes
part three, workflows and chaining. So the drawbacks that you have along batch processing is by the time the whole workflow is complete, there might be new data that arrive
that would be actually better, more useful. For example, Earth observation that keeps being collected by satellites every day. So that's one of the challenge. Another problem is the long feedback loop. So data has to be downloaded to the processing server or local processing.
Then you have to run the processing and then you have to visualize the output and often this is for like a specific area and resolution of interest. And then if something is wrong, you want to tweak the settings.
For example, if you have some algorithm with parameters and then you have to run through the whole thing again and restart the whole processing chain. So that might take quite a while. It's also from the server side and the processing capability side. It might be difficult to prioritize maybe more important users or use cases
versus other users that is less priority. For example, disaster or emergency response is one scenario. And all of this basically makes for inefficient use of the resources in terms of bandwidth, time, and processing power
and this ends up wasting money as well. In terms of the challenges with the integration of the data, first you have to find the processes and the data that first are compatible together, but also that basically helps you answer the questions
that you're trying to answer. Then if you discover processing capabilities, there might be challenges to be able to use them in terms of authentication requirements and it might require to first basically define your workflow then deploy it and to be able to run it, to execute it as a process.
Often you have to deploy it first and that usually requires authentication. That's one other integration challenge. And then in terms of interoperability, so specific formats, for example, or specific APIs,
like for example, OGC API coverage versus OGC API tiles versus WCS. So that might be like expectations of a very specific thing that if you don't have exactly that combination that is defined in the workflow, your workflow won't run at all. After you've defined the option, there's no other way to change this.
It has to be like this. So this in general makes the workflows more or less interoperable. And all of this makes it harder to reuse the workflows and reuse the workflows with very similar data sets that like the actual logic, the business logic in the workflow is really the same, but you can't use it
because of the way the workflow is defined. And also often it's hard coded for a specific region and area of interest and changing that means changing the whole workflow again. So you can't reuse that workflow directly. So those are the challenges that we're trying to solve. So with the OGC API, it's a family of standards.
Some are approved, some are still draft specification. And it's trying to be a consistent framework that's better integrated. So there are processing capabilities and there are data access mechanisms. And with workflows and changing the part three of processes it's trying to do
is to better connect the data access with the processing capabilities. So the current status of OGC API, so the consistent framework is provided by OGC API common part one core and part two geospatial data. That's currently not published,
but it's the foundation for several of the other OGC APIs. Then we have processing capabilities. So OGC API processes part one core is the proven published standard. And then there is part two, which is deploy, replace, and update, which is still a draft specification.
This will allow you to upload an application package to create a new process. And then so processes part three, workflows and chaining, is the main focus of my talk about connecting processes with the data access mechanisms. So by the data access mechanisms, it's most of the other OGC API,
like OGC API features. Part one core is an approved standard. Part two, CRS by reference, lets you request features in different CRS. And part three, filtering, will let you filter the features, for example, using the common query language. So features is a way to access the data as vector features.
OGC API tiles, part one core, is in the final stages of approval and publication. And what it will allow, it's an access mechanism that you can request the data, but not only map tiles, but also the raw data, so the vector tiles and coverage tiles, so you can access the raw data.
OGC API coverages is another way to request data, using, for example, a subsetting mechanism, so you can get only the part that you're interested in. And then OGC API environmental data retrieval, or EDR, is also a published and approved standard,
allows you to request the data with different query mechanisms. For example, you can specify a trajectory and retrieve all the data along the trajectory. And then OGC API maps allows you to request either imagery or a rendered map from the server,
whether it's rendered on the fly or pre-rendered or just imagery. And then OGC API discrete global grid systems, the GGS, is a way to use a different discrete global grid system, which is a little bit like tiles, but also has a way to not only ask, give me the data for these tiles,
but also where is the data that I'm interested in? Where is the result? So you specify a query, and you get back a list of zones, as you call them. And it's not limited to square tiles, like tiles, but you could have hexagons, for example, for your grid.
So I'll quickly go over some of these. So OGC API processes, part one core, basically you have a slash processes, slash process ID, and then slash execution is your execution endpoint. You post an execution request,
and then if you're in the synchronous execution mode, you will get back right away the result with an HTTP 200 response. But in the execution request, you specify the values for the input. You specify, but often this is for a specific fix area and resolution of interest.
So you see, you have a bounding box in there, and you even have like fixed formats in there, for example. So it's very, it's quite rigid, so it's for this bounding box only. So that's one of the challenges that I was pointing to earlier. Then there's also an asynchronous execution mode,
where if you specify a respond async prefer header, instead of getting back the data right away, you'll get back a 201 that tells you that the job has started, basically, and you have a way to pull the status of the job, and with its complete results will become available.
But it's still the same thing in terms of being a fix area and resolution of interest, and so it still has that limitation. Then with processes part two, I quickly mentioned earlier, so you have an application package that contains everything that the server needs to create the process.
There's various ways to do this. It could be, for example, a JupyterLab notebook. It could be a Docker container. It could be a C-W-L workflow, or it could be the execution request that I'll talk about in terms of part three to define your workflow. And after you've done this, your process is available,
and then you can execute it with the part one, and there's a way to update it with a put and to delete it with a delete operation. So that's part two. That's still the draft specification. So the main thing I want to talk about is what part three allows you to do. So with part three, the first thing it has is a concept of ad hoc workflow.
So you discover processes and data sources that are available here on the client, and right away you can execute it without having to deploy anything first. So that's the main thing that this part three does. So how it does that is in the execution request, you have a new process input type, and basically that process input type can be an input to another process.
So you can chain your processes this way, and the process could either be a local process, or it could be a process on another server. So this is the way the connection works, and the execution is exactly the same as the regular part one sync and async,
but there are also other execution mechanisms that I will talk about that are the other things in part three. So the other thing that we have is the collection input. So instead of in the input pointing to a URL, for example, to a geotiff with a very fixed area and format and everything,
instead of pointing to a file, you point to an OGC API resource called a collection, which is more like an abstract representation of geospatial data. But at that point, you haven't yet said what area, what resolution, what format, or even what API you want to use.
You leave that open from the workflow perspective. You just say, I want to use that data that is available as the OGC API from here. So your input, instead of being an HTTP URL to a file, it's an HTTP URL to a collection, which leaves open all of that.
So that's the main thing here. So that's for the input. So the input to your process can be an OGC API collection, and the other thing that part three does is the output of your process can also be a collection. So by doing this, it's a new execution mode,
so instead of being sync and async, you could ask for a collection as an output. So when you submit your workflow for execution, you say response equal collection, and what you get back is exactly the same as you get back when you access OGC API features or coverage or tiles.
It's a collection description document that has links to one or more access mechanisms. So if the response is accessible as tiles, you will get a link to tile sets. If your response is available as a coverage, you will get a link to the coverage. If your response is available as features, you'll get a link to items. So it's exactly the same,
and the nice thing about that is that collection is readily usable in any client that understands those APIs. So for example, you can just load it in GDAL by pointing to that document that's the response. So that's the nice thing about that, is basically a visualization client doesn't need to have processing code to execute processes.
That's one advantage. The other advantage is that you can, with the other things, with the ADOC workflow execution, you don't need to first deploy your process to be able to execute it. And because the output and the input of these OGC API processes are collections,
you can chain these processing endpoints together and you can use all of them like it's just one big OGC API that has all these processing and all these data capabilities. And also, the other thing about this collection output is how do you actually get the data?
Well, you get the data by doing OGC API features or coverage or tiles requests, and that's actually when the processing happens. So the processing doesn't have to happen when you submit your execution request. All the server has to say is, yes, I'll be able to execute the process this way, but it doesn't yet do the actual processing.
It's just getting ready to do it. So it can validate the whole chain of processes all the way up, but the actual processing asks when the client asks for the data. So as the client zooms in, for example, it can request a smaller area at higher resolution. As you pan, it can do the requests for the different area to the side.
So that's the main idea with this. So quickly, what OGC API Tiles allows you to do is basically, like I mentioned earlier, access data not only as maps, but also as raw tiles, whether it's vector or coverage.
The main conformance class that we have, core allows you to request a tile with a matrix row column. A tileset describes the set of tiles with a standard metadata. That's also defined in the 2D biometric set and tileset metadata standard. And the tilesets list is just your API will list a list of tilesets.
And the geodata and dataset tilesets are tilesets lists that are connected with the OGC API common specification. Then we have conformance classes for different formats like PNG, JPEG, NetCDF, GeoJSON, Mapbox vector tiles, but those formats, you could add additional formats
on top of that in your implementation. Coverages, it gives you a more detailed description of the domain and fields of your data, and it supports conformance classes like subsetting to get only a specific area,
and also the range subsetting to only get, for example, a particular band or a few bands that you're interested in. And it also supports downsampling, so you can say, I want this data at this downsampled resolution. It also works with coverage tiles, with the tile specification. And the same thing, just like tiles,
there's different formats, conformance classes for GeoTF coverage JSON, NetCDF, CIS JSON, LAS, and SAR, but it's not an exhaustive list. More can always be added. So this is a demonstration that we did with the workflow with Sentinel to imagery from ESA through the EurodataCube for different season.
And then we trained the random forest model with Parcel's database that we know these are the crops here. And then basically we created this Hadoop workflow that points to a random forest classification process, point to a collection from the DataCube, submit this post request, you get back the collection description,
and you see there it's available as coverage and tiles. Then we do a tile request, and that's actually what triggers the prediction that executes the whole chain. And this is another scenario with the workflow that runs our MRO adapter, which basically connects with OGC API processes part one.
If the process doesn't support part three, this is kind of an adapter to make it work with that. So the first process is the adapter. The second process was a routing process from 52N Java PS. And then we loaded the result in QGIS with GDAL. So GDAL basically was submitting the process execution,
and then you see the routes on top of OpenStreetMap. And it's all done with tile requests. So the tile requests ask, and you get the vector tiles for the routes. And so these are some of the implementations of part three. So our Gnosis map server in PyGEO API in the UX branch,
and there's a link in the presentation that supports this. And in terms of clients, our Gnosis cartographer and SDK. And also the web whirlwind in Testbed 17, Geo Data Cube supported this as well. And in GDAL and QGIS, the OGC API driver for GDAL supports part three already,
even though it's a draft at this point. And so in summary, workflows allows you to easily connect data processes from anywhere in a reusable and interoperable manner. And these are just advantages here that I don't really have to go through. But yeah, so trigger-on-demand processing
with OGC API data access requests. You minimize the data exchange by requesting a subset and downsample data. And with workflows and chaining, basically, you also, yeah, so that's the tree thing, right? So easily connect, trigger-on-demand, and minimize the data exchange. And so the last thing I probably won't have to talk much about
is that you can bring algorithms to the data also with simple CQL2 expressions on top of having workflows that you can do ad hoc and deploy. This is like a third way to do processing is you just give a CQL2 expression that defines how you derive properties. For example, properties equal NDVI.
Then you put your NDVI expression right there so that properties filter, sort by, and join collections to do spatial joins. For example, only give me the buildings in this low-elevation area for, like, flood scenarios. And thank you very much. So there's more information on OGC API Processes GitHub.
There's a video here if you want to check it out. It's a pretty long video from the project that we did for this. And there's also a draft discussion paper, which I'm hoping will help OGC publish soon. Thank you. Thank you, everyone.