We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Open Standards for Big Geo Data

00:00

Formal Metadata

Title
Open Standards for Big Geo Data
Alternative Title
Geospatial - Open standards Big geo data
Title of Series
Number of Parts
150
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2015

Content Metadata

Subject Area
Genre
18
20
Thumbnail
55:22
24
Thumbnail
49:05
26
Thumbnail
45:24
30
Thumbnail
25:44
37
Thumbnail
26:33
87
89
90
104
Thumbnail
22:20
126
Thumbnail
16:49
127
Open setData modelAverageSuite (music)Service (economics)Point (geometry)Basis <Mathematik>ImplementationBit rateAbstractionSample (statistics)Internet service providerTape driveMaß <Mathematik>Inclusion mapFile formatCodierung <Programmierung>Computer-generated imageryType theoryTemporal logicDiscrete element methodData structureVirtual realityState observerImplementationType theoryLevel (video gaming)BitService (economics)Envelope (mathematics)Endliche ModelltheorieGeometryCuboidAreaRange (statistics)Square numberData modelVirtual machineDomain nameGeometric modelingBasis <Mathematik>RamificationMultiplication signFile formatMetadataBinary fileIntegerCodierung <Programmierung>Point cloudMessage passingPointer (computer programming)Medical imagingSpacetimeSatellitePixelVolume (thermodynamics)OrthogonalityEmailTerm (mathematics)MereologyObject (grammar)Regular graphComputer wormData structureDifferent (Kate Ryan album)Process (computing)Server (computing)Link (knot theory)Computer fileFluxSoftware testingWeb 2.0Conformal mapSolid geometryFiber bundleDegree (graph theory)Cartesian coordinate systemWordField (computer science)Goodness of fitImage resolutionMeasurementGradientConsistencyCovering spaceAngleGraph coloringCanonical ensembleSocial classTrajectoryPhysical lawRight angleConnected spaceSurfaceFocus (optics)XMLSource codeComputer animation
Data structureData modelService (economics)Virtual realityMilitary operationOpen setComputer multitaskingAverageCore dumpFunction (mathematics)Extension (kinesiology)Time domainSubsetDiscrete element methodFile formatComputer fileUser profileExtension (kinesiology)ImplementationCore dumpFile formatSemantics (computer science)Physical systemMultiplication signSubsetMetadataComputer reservations systemCartesian coordinate systemDomain nameProfil (magazine)Power (physics)Statement (computer science)Latent heatExterior algebraFunctional (mathematics)Level (video gaming)Installable File SystemComputer fileFiber bundleSelf-organizationSeries (mathematics)Service (economics)Medical imagingMereologyOperator (mathematics)Transformation (genetics)AdditionTheory of relativityDimensional analysisData storage deviceTime seriesNumberPartition (number theory)InformationBitObject (grammar)Codierung <Programmierung>Web 2.0Flow separationElectronic mailing listDiagramCurveServer (computing)IdentifiabilityPixelType theoryEndliche ModelltheorieInternet service providerSlide ruleCanonical ensembleMetropolitan area networkParticle systemCuboidPressureScaling (geometry)CubeThree-dimensional spaceRight angleCoordinate systemArithmetic meanIdentical particlesTwo-dimensional spaceSource codeComputer animation
SubsetAverageComputer multitaskingRevision controlCore dumpBayesian networkOpen setIntegrated development environmentRange (statistics)Extension (kinesiology)Musical ensembleVariable (mathematics)Server (computing)Temporal logicRaster graphicsLevel (video gaming)Service (economics)Process (computing)DemosceneVisualization (computer graphics)ImplementationPoint cloudGreen's functionArray data structureMultiplicationMereologyCodierung <Programmierung>IntegerAttribute grammarOperations researchMathematical analysisDisintegrationDatabaseProfil (magazine)Cartesian coordinate systemService (economics)Computer fileVisualization (computer graphics)Raster graphicsCubeGeometryQuery languageAlpha (investment)Endliche ModelltheoriePhysical systemArray data structurePoint cloudProjective plane2 (number)BitPresentation of a groupResultantParameter (computer programming)IdentifiabilityType theoryLevel (video gaming)SatelliteSinc functionWeb 2.0Combinational logicProcess (computing)Extension (kinesiology)Graph coloringWeb browserFile formatNumberRange (statistics)Motion captureSemantics (computer science)Goodness of fitMusical ensembleAnalytic setSubject indexingLatent heatSlide ruleServer (computing)Canonical ensembleCore dumpInstallation artImplementationFunctional (mathematics)Medical imagingDatabaseInterpolationSocial classCodierung <Programmierung>Selectivity (electronic)Transformation (genetics)1 (number)Communications protocolMereologyMultiplication signMoment (mathematics)Parallel computingWindowGroup actionDatabase transactionFrequencyWeightDimensional analysisTable (information)MetadataPower (physics)Repeating decimalPredicate (grammar)Theory of relativityComputer animation
GoogolComputer animation
Transcript: English(auto-generated)
Let's jump into it. I would like to tell you a little bit about standardization in the big geo data area. Myself, I am Peter Baumann and from Jacobs University in Germany. I am busy in OGC, ISO and other areas in that field.
So I wanted to bring some technical backgrounds on that and hopefully stimulate questions and discussions afterwards to exchange and get out the message and also learn and get new requirements maybe.
So coverage is the keyword that is defined in the geo standards and that is what I want to bring to you, the coverage data model of OGC and after that the service model which is two distinct separate things and this has ramifications beyond OGC meantime also in ISO, in Inspire and others
and I want to put a brief glance on that one and also a little bit get into implementation details. Good, so we all know that. Feature, the geographic object and we know that coverage is a special kind of a feature. One could loosely classify that as some space time
varying multi-dimensional phenomenon. In practice that means we talk about regular grids like ortho images, about irregular grids, about point clouds and about meshes. So that's actually what typically contributes the big data in terms of volume at least and so that is the kind of things
that I want to talk about. So coverages, let's get a little bit technical. So on UML level it looks like that. We have a specialization of feature that is the coverage and the coverage is defined by the domain set that is where do the values sit, where do the pixels sit, the range set that is the pixel payload
and what we have added over the GML definition is the range type that is the pixel type so that we know what we are talking about. So that RGB is not just 8-bit integer but it's radiance in some spectral ranges and has some null values and all of the things are contained in here.
Also there is metadata hooks so that you can plug in any other stuff you want to transport. I would like to contrast this to the definition of ISO 19.1.2.3. Some of you may know that one. It's the mother of the coverage definitions in ISO but it's an abstract model. So if somebody says to you,
I'm compliant with ISO 19.1.2.3 and therefore I'm interoperable, wrong. This is so abstract that you can have many implementations and there are many implementations out there that are definitely not interoperable. This one, the coverage definition actually of OGC is an interoperable one and we have conformance tests
that can check a coverage down to pixel level whether it's consistent. By the way, you will often find this named GML-Cov. That is a convenient shorthand we found because OGC had the idea to name this GML 3.2.1 application schema for coverages which is nothing that anybody wants to pronounce.
Okay, so GML-Cov. Hey, come on. That can look like that. Just to throw a few angles, a few XML tags at you. You have a grid coverage up there. The envelope gives you an idea of where that sits
in WGS-84. We know that it's two dimensional with flat long axis measured in degrees and then we have the bounding box. Then all the gory details come and from this envelope you can already see roughly where we sit. The range type definition, as I told you, has a little bit more than just eight bit integer.
It tells, for example, that it is a panchromatic channel and measured in watt per square centimeter. This is fixed syntax so that can be evaluated by machines. Okay, that's what it can look like. And I mentioned GML-Cov, but please don't assume that this only can transport GML data, XML data.
That would not make sense. You don't want to do that with satellite images. You don't want to do this with weather forecasts. This is just a model which is formalized enough because we have validators so that we have a sound basis. But of course we can encode coverages in any other suitable format.
And that's the next part. So we could do the whole thing in GML, of course, and sometimes we like to do that. 1D times here is, for example, perfectly fine. Sometimes, however, we want to have some other format. We called it special format, like NetCDF, like TIFF. They can contain more or less of the metadata
so you may lose something when retrieving that format. But hey, you wanted it like that. You know what you are doing. Sometimes you want to have both. You want to have canonical metadata in XML plus the payload encoded efficiently in a binary format. This is where we use multipart MIME, the method like in your email attachments
where we have the whole thing in XML as defined except for the range set which consists of an X link to the data file coming later. So actually we are quite flexible in transporting coverages in different formats. And we need to because actually the coverage types that we have are quite diverse.
Typically we think about the right-hand side. So we have pixels and we have quadratic pixels, so something like ortho images. But first of all, this can be spatiotemporal. We might have irregular grids. We might have very strange grids like they like to have in climate modeling. And the other world doesn't stop.
We have point clouds called multipoint coverages, trajectories, surfaces, and solid bundles and all of that. This is where we close actually the gap to geometric modeling like city GML and things like that. So this is not for reinventing the wheel but for getting connection, let us say, into these.
Okay, so that was a very brief glance at the data model. Now let's look at the services that we have on this. Again, let me start with UML picture because that actually describes nicely what the server has in mind. The web coverage service is that service
which is focused most on the coverage structure. You can serve out coverages by anything else as well, by web feature service, by web processing service, by sensor observation service, whatever. But this has the most functionality as we will see in a minute. So what does the server have in mind? We have a coverage offering
that has some service metadata. What coordinate systems do I support? What data formats do I support? And stuff like that. Then we have a bunch of coverages which are number one, the coverages as we have seen them, but also again, we have foreseen slots to hold any other descriptive data which may be service related.
So service provider or some WCS extension may want to store additional information here. Good, once again, this can be any encoding here. And so we have actually something like one virtual document. And now we can get our request types and fire them against this conceptual model.
So the get capabilities request, which is the standard canonical one, says what service extensions, what formats coordinate systems do we support, and a list of all coverages. We can easily spot that in our diagram. We get the top level box, and from the coverage we just get the headline, the identifier.
For the described coverage request, we would drill down into a particular coverage and get the metadata. So you get the coverage part, but without the pixels. And you get the service metadata. And finally, get coverage, the workhorse is the one where you get the coverage or a subset of it. That means now we drill down and get exactly this one.
So with that conceptual model, we can very clearly say what we want, what the service means, and we get a clear semantics definition. Okay, so what does it do in the end? Before we go into the functionality, I must mention that OGC at some time
has decided to establish a core extension model, as it's called, or modular specification model. That means we have a clear way to distinguish implementation alternatives. That was really a pain in the neck for everybody, for us specification writers, and for the implementers and for the users.
In the old WCS1.x specification, I counted, and incidentally, I found 63 if statements there, normative if statements. So two to the 63rd power is the implementation alternatives you have. Good luck. What we do now is we have no ifs in one specification,
but we have a bundle of specifications that you can plug together. Much better to oversee. So we start with a core. That is the one that actually does nothing but gives you the coverage or a subset of it. That's already quite useful. So if you think of this as being a time series, X, Y, and T here,
you could get everything about this place a long time. This is called trimming. It keeps the dimension. So a three-dimensional subset from a three-dimensional object. Two-dimensional subset from a 2D object. Slicing, on the other hand, reduces dimensionality. So you may want to know a temperature curve
of Brussels for one year. Gives you a one-dimensional time series. Or you may want to have a particular time slice. What is the weather like in Europe at this particular day? You can combine this arbitrarily, trimming and slicing in one request. And so you can retrieve any subset that you want.
And you can do that in any format that matches. Okay, a 3D object you cannot transport in PNG, for example. So we have some restrictions, but that's practically motivated and we know it anyway. So this is the only thing that the core does. Give me a coverage or a subset of it in a particular format.
The extensions now add in further functionality and add further facets. This is what I will show on the next slide. But first let me mention that we have a third level that is application profiles. Application profiles are a bundle, let us say, for particular application domains. So the question is for an implementer,
man, I want to do something for remote sensing data. What extensions should I support reasonably? So we have an EOWCS that says, okay, you should support scaling. You should support CRS transformation. And by the way, we have some extra functionality
which allows you to search in remote sensing imagery. Okay, same thing is under work format, ocean, and work has started now for sensors. So that is just some packaging, some bundling.
Small inset, there is other standards doing that as well. And one in particular I want to mention is WaterML 2.0. They're also going into time series. However, it will not be efficient. WaterML is perfect for one-dimensional time series.
So you have a series with temperature, with pressure, with whatever. If you plug in images here, you can do that and you get a three-dimensional data cube. But what happens if you do slicing? Perfectly fine. If you want to get a time series cut out, then you are pretty much lost because you have to touch upon every file
and have to extract. That's extremely inefficient. Whereas the coverage concept is more abstract. We don't say anything about the internal data organization and therefore an implementation can do that in a clever way. Like for example, doing some partitioning in there which is suitable for the kind of operations we want to do.
We can be flexible and adaptive which is what we are actually doing in our implementation, for example. And still the whole thing is a little bit simpler to model. So time series is not always equal to time series. Now let me come back to the big picture. That is actually the big picture
of coverages in web coverage service. It may seem complicated but actually it isn't. Let's walk through it. So we have the data part where we have the gmlcov thingy I introduced before and several format encodings that describe how to map, let us say, coverage to geotiff.
The whole thing. No, I won't step in there. Nice try. Depends on some other specifications and the coverages are used by the core. So we are entering the service part and there we have several columns again.
Some of them are open and waiting for ideas of what we could additionally specify. The most important thing is functionality here. So we have an extension that allows to update WCST to update a coverage. We have a processing extension for analytics, range subsetting for band selection, scaling, CRS transformation, interpolation.
These are the function facets where now an implementer can decide whether it wants to plug in this or not. The service in the get capabilities will announce which extensions it support so you know exactly what your service you contact, what it can offer to you. Then we have encodings. The usual suspects here, get KVP posts, the old ones,
SOAP, REST and possibly JSON in future. So that is just the transport protocol, nothing special. And then floating down here, you find the application profiles. Okay, so looking into the request for a moment,
just to look under the hood, that's what a get coverage looks like. As simple as that, request equals get coverage, then coverage ID equals, and you get it. You get it in the native format as stored on the server. You get it in the native projection system as stored on the server. It's as simple as that.
If you want to do subsetting, then you say so for every axis. So longitude, latitude, and maybe time. Notice that we allowed to use calendars here. Some people say we should use seconds since epoch, which is not exactly what I find convenient. If you want to have another format,
you specify that by way of the MIME type identifier. So that is set as well. And that's the get coverage request as per core. By the way, a core mission of WCS is not to deliver images, we can do that too. But the main mission is to deliver data that are unchanged. So if I get bathymetry,
I want to know what that depth is. And therefore, in the specification, it's mandated that values are delivered unchanged. Except of course, if you ask for something like JPEG, okay, then you wanted it. If now we have an extension, it simply adds new parameter combinations.
Like here, for example, for the range subsetting, you can request the red band, or you want to do a false color image near infrared red green, or you want to transpose colors, or you want to have intervals or all of that combined. So that is the syntax for extracting from climate variables or from hyperspectral bands.
And the same thing, it works actually with all of those extensions. So transaction, for example, I want to insert a new coverage. Here it is, take that netcdf file, generate a new identifier for me, and that's it.
So, so much for showing the schema of how the extensions plug themselves into the core. One extension is very dear to me because that is the analytics part, where it gets to processing. Actually, we have a specification for that called Web Coverage Processing Service, which is nothing but a query language
for raster data with geo semantics. Looks, okay, looks like that. It may remind you of XQuery, and actually we cover that with XQuery. So there you have metadata data search. You can filter some predicates,
you can do processing and return the whole thing. Good. That fits nicely into the big picture that we have SOS for upstream sensor capturing. Grab anything that you want, transform it into the standard canonical OGC data formats, and then serve it out downstream via the W something services
that allow for all those things that users want to do. Okay, now, one implementation, ours, which happens to be core reference implementation, is the rasterman system, which is like a database system,
but it doesn't need a database underneath it. It can work on files as well that spices up SQL with arrays. So something like that. It's a little bit as the syntax you saw before, but it's now real SQL. That has been implemented, that works in operational installations up to 130 terabyte data cubes with ESA,
and queries have been distributed over 1000 cloud nodes. Let me skip that one. And we can generate visualizations here like 3D terrain draping with queries where we put elevation in the alpha channel, and the result is something that actually the GPU can understand directly, that is WebGL,
and you can have a 3D presentation on your, in your web browser window. Parallelization, I mentioned already that we split queries. I'm sorry, I have to be fast now. As always, I have too many slides. But I promised a little bit about ISO.
So the one thing that's going on now that has started a few months ago is that OGC coverage standards are transposed into ISO standards. That's a common thing, happened with web map service, with GML, and with others. At that time, ISO discovered that 19.1.2.3 needs a revamping, so actually the whole thing is getting shaped up now.
And there is an additional activity, loosely related, ISO 19.1.6.3 for satellite imagery where we tried to convince people that it makes sense to streamline that with the coverage model. Actually, they agreed meantime. The one thing that I would like to mention in particular,
however, because this is so dear to me, is that the SQL working group, after two and a half years of discussion, has agreed that we add array support to the SQL query language. And this is what it will look like. You can define that in a table, your large arrays, whatever size, whatever number of dimensions, and then you can use that in queries
like getting vegetation index and stuff like that. So, how many minutes do I have left? Minus two? Yeah, okay. Then let's skip this one and just have a summary slide
summarizing what I have said already, so I will not talk any longer. Stop here, and sorry for this speed talk, and looking forward to your questions.