Big Spatio-Temporal Datacubes on Steroids ...and Standards
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 611 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/41902 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2017 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Temporal logicGoodness of fitFlow separationGroup actionElectronic mailing listHypothesisRow (database)Computer animationXML
02:04
Temporal logicBasis <Mathematik>SpacetimeRegular graphPoint cloudPoint (geometry)AbstractionImplementationPC CardCodierung <Programmierung>ImplementationTerm (mathematics)Conformal mapMultiplication signObject (grammar)Level (video gaming)MappingUniform resource locatorSingle-precision floating-point formatRange (statistics)SpacetimeElectronic mailing listCubeClient (computing)Point (geometry)File formatDimensional analysisChainData typeEndliche ModelltheorieNetwork topologyInjektivitätBinary fileSoftware developerMedical imagingMultiplicationType theorySolid geometryMetadataOntologyRing (mathematics)Representation (politics)Computer fileBitDomain nameComputer-aided designAbstractionTemporal logicTrajectorySemantics (computer science)CurveDataflowLatent heatPoint cloudGeometryGroup actionAuthorization2 (number)Four-dimensional spaceWordQuicksortMeasurementLine (geometry)Category of beingSet (mathematics)Condition numberConstraint (mathematics)Computer scientistVector spaceComputer configurationData structureVolume (thermodynamics)CASE <Informatik>Insertion lossExpressionElement (mathematics)Data storage device1 (number)State observerMathematicsCartesian coordinate systemFunction (mathematics)Attribute grammarConnected spaceFluid staticsServer (computing)Web browserMaß <Mathematik>LengthInformationPixelRaster graphicsArithmetic meanFluxFiber bundleSoftware testingPolygon meshOrthogonalityComputer animation
11:10
PC CardEmailCodierung <Programmierung>Basis <Mathematik>ImplementationOpen setNASA World WindServer (computing)Web serviceTemporal logicSubsetExtension (kinesiology)Computer multitaskingRaster graphicsArray data structureIntegrated development environmentParallel computingPartition (number theory)Adaptive behaviorProcess (computing)Sigmoid functionBoss CorporationVertex (graph theory)Query languagePoint cloudJava appletSuite (music)Associative propertyCore dumpFocus (optics)SoftwareSoftware maintenanceHypothesisCombinational logicParallel computingLevel (video gaming)CubeMereologyStudent's t-testDifferent (Kate Ryan album)Web serviceClient (computing)Data structureMappingOpen sourceState observerGraph coloringWordDatabaseGeometryVideo gameNeuroinformatikFunction (mathematics)Mathematical analysisInterface (computing)Core dumpFile formatMedical imagingTime zoneWeb 2.0Object (grammar)QuicksortImplementationComputer architecturePhysical systemMathematical optimizationSatelliteQuery languageExtension (kinesiology)Single-precision floating-point formatServer (computing)TesselationSubsetProgram slicingComputer wormInformationData miningPartition (number theory)SpacetimeLink (knot theory)Profil (magazine)Element (mathematics)Raster graphicsArray data structurePixelGraphics softwareComputer fileMetadataPoint cloudPoint (geometry)Transformation (genetics)Computer reservations systemSuite (music)Complete metric spaceRepresentation (politics)Range (statistics)Cartesian coordinate systemElectronic mailing listStack (abstract data type)Data conversionMultiplication signMaxima and minimaDimensional analysisData storage deviceEmailData managementEndliche ModelltheorieData fusionCodierung <Programmierung>Time seriesCodeConsistencyData analysisArithmetic meanDemo (music)NumberGoodness of fitRevision controlThree-dimensional spaceSoftwareInternet forumException handlingFrequencyOcean currentFour-dimensional spaceWhiteboardComputer animation
20:17
Temporal logicImplementationCapability Maturity ModelProcess (computing)Core dumpFocus (optics)SoftwareMixed realitySpectrum (functional analysis)Duality (mathematics)Free variables and bound variablesEndliche ModelltheorieGeometry1 (number)
21:37
SoftwareGoodness of fitWindowAsynchronous Transfer ModeTouchscreenComputer animation
21:57
TouchscreenWindowAsynchronous Transfer ModeRow (database)Computer animationLecture/Conference
22:23
Computer animationLecture/Conference
23:01
Musical ensembleMeeting/Interview
25:04
Computer animation
Transcript: English(auto-generated)
01:13
Okay. So from my side, unmuted. Good. Let's give it a try.
01:25
I don't want to yell at you, so I'm starting cautiously. I don't know whether the microphone is on. Ah, it's just for recording. Okay, good. So, I start here with. So, thank you for having me here. I'm Peter, Peter Bauman.
01:46
And what I'm going to present to you is actually joint work of several people in my group. One of them in particular, Dimitar Mijev, whose PhD thesis is engaged in that work as well. But I would have to take a long list, actually, to be honest, to give proper credit to everybody who has contributed to that.
02:09
And not the least, of course, also the standardization bodies where we had lots of discussion. Gave me a lot of opportunity to learn things as I'm just a plain computer scientist that has stumbled into geo world.
02:20
And that's what I want to talk about, about multidimensional data that appear as spatial temporal data cubes. So, we have this funny term of coverage. This is age old. Actually, only a few remember that it originally was invented by Esri for a particular data structure.
02:42
But meantime, it's decoupled from that. And they themselves were surprised to discover what coverages have become in the meantime. Coverage is catch all terms that actually relates to feature. You know, in OTC standardization world, a feature is a geographic object in the end.
03:00
Something that has some location, space and time attached. Then we have a special kind of feature that is a coverage. A coverage is defined as something that may sound really strange. A space-time varying phenomenon. That wants to express something specific. If you look at, say, a highway here, and you take the A1.
03:23
If you model that as a vector, then obviously the attribute A1 will be invariant. Over the full size, over the full length of this highway, it will be A1. So it's sort of static attributes that are attached. Here, however, if you take some image and walk from one location to another, the value changes.
03:42
And that is what this expression wants to say. It changes as you go from one element to another. It's pretty clear that this kind of thing requires more storage space. And therefore, the big data that we encounter, at least big in terms of volume, typically are coverages.
04:01
Not so much vectors. And actually, it's not raster data only, if you think about that one. But you could generalize that as regular and irregular grids, as point clouds and meshes. We can sort of make this a little bit more graphic, more spicy. So we have the feature, and we have some abstract coverage, which actually has several subtypes, if you will.
04:29
Which also reflects history. We have the grid coverages, the first naive attempt, around about 2004. That is not very much used today. In particular, not with geo-coordinates, because it has some difficulties.
04:42
I avoid the word flaw here. We had to improve on that. I got the rectified coverage, which is an ortho image. So rectilinear grids. And then, the rest of it all was termed reference- You see, I cannot even pronounce it. Referenceable grid coverage. Nobody could explain me why the term was chosen.
05:03
It's a little bit difficult, but it means irregular grids in the end, so everything else. That is historical development, like rings in a tree. Actually, we supersede this currently with the coverage implementation schema 1.1,
05:20
where we introduce a general grid coverage, which brings all these historical developments together into something that is easier to handle, easier to understand. And in particular, one single concept. And that is what typically is known as data cubes. That is some arrays, some raster sets that have two dimensions, three dimensions, four dimensions, five, whatever.
05:46
And as you can see, they can have straight lines like ortho images. It can be curvilinear, like meshes following a coastline. Or the strange things that people do in climate modeling. All of that falls into that category. But it doesn't stop there, because in the end, it's point in space.
06:03
They just happen to have a particular condition, constraint, that they all have neighbors. If you don't have that, if you give up that, then you come to multipoint coverages, that is point clouds. If you add curves, so bundles of trajectories could be a multicurve coverage, and so on, multisurface coverage, multisolid coverage.
06:24
This is actually where we close the gap to city modeling, for example, city GML. Also to CID, computer-ready design. So we do not reinvent wheels, we just want to close gaps. This is an abstract concept. And that has been defined in OGC Abstract Topic 6, which is identical to ISO 19123.
06:45
Abstract in the sense, it does not prescribe a particular implementation. Which means, you can have divergent implementations, and you find that out there. So if somebody says, I'm interoperable because I'm implementing 19123, wrong.
07:01
You find divergent implementations out there. And you have a very simple criterion. They all come with their own client. Because crisscross coupling does not work. Therefore, OGC has added a concrete coverage implementation schema, which is a concretization that makes a few assumptions so that it becomes interoperable.
07:23
And actually, we can do conformance testing down to the level of single pixels, saying whether an implementation is correct or not. In this sense, it's concrete. And then you can use client A to access server B. That is possible indeed, and that has been done.
07:40
So that is the general thing. Stepping back a little bit, we can bring that into a very simple schema, actually, using UML. So it's a specialization of feature that has a domain set, telling us where do our values sit. The range set, which are the values. You see domain range, it's like a function, mapping locations to values.
08:02
And what we have added then over GML, for example, is the data type. What is the data type of our values? Is it temperature, is it radiance, or whatever else? And this actually we took from SWE Common. So actually now we have a connection to sensor world. So sensor observations can be transformed into coverages without loss of semantics here.
08:26
That was important to us. So there are quite a few data that people want to attach, different ones depending on their domain. And so there is an optional metadata package where I can put anything else you want. The coverage doesn't understand it, but it does transport it for you.
08:42
So this is a way to carry along all sort of specific metadata that you want to remember. If you want to see that in GML, okay, here is an example for it, where you see the domain set with a grid definition. This happens to be four-dimensional. Lat, long, height, and date.
09:02
So you have a regular axis for lat, long. You have an irregular axis for height, with two levels in this case. And you have an irregular axis for time, also two levels. I didn't want to write more. I was lazy. This is mapped to a grid axis, and then you know where the points sit
09:20
and the point values you have here in the range set. And down there in the range type, you see what that means. For example, it's panchromatic. We have radiance reference to it, and we have the units of measure and that kind of things. That is the historical way because GML just was trendy at that time when it originated.
09:42
But with CIS 1.1, if you favor curly braces, then here is the same thing in JSON. Domain set, range set, range type. And the good thing is that it has the same semantics. We can map it. So you can do a one-to-one mapping. You are not bound to a particular world of standardization.
10:04
And if you are into ontologies and into reasoning, then you may want to enjoy the RDF representation. Please look at it closely. I will rehearse it afterwards. So we have different formats available. If you step back a little bit, that means we can encode our coverage
10:23
into a single file where we have domain set, range set, range type, and the metadata. And we are informationally complete in the sense that it contains all of the definition if you use some format like GML, JSON, or RDF. Fine so far, but inefficient, obviously.
10:41
You don't want to transport 10 terabyte encoded in ASCII. Not really. So we need binary formats. Okay. That is defined as well with a coverage standard. You can use any of those formats and this is a growing list. We have more and more mappings defined. Obviously this is incomplete sometimes. GeoTIFF is not able to handle all of the range type information, for example.
11:03
But okay, you want it, you get it. Sometimes there are reasons for that. Maybe you just want to display in a browser so you pick a PNG knowing that you will not get the full information. So that is not an inconsistency. That is meant this way. However, sometimes we want to have both.
11:21
I want to be informationally complete and I want to encode efficiently. Therefore, we have a multipart encoding where we have a container concept that we have some header, which is some informationally complete format, GML or JSON or RDF. And then you have links from particular elements into other files.
11:41
And these contain then say the pixel payload so they can be stored efficiently. So we can use, historically it was multipart MIME in CIS 1.0. But in future that can be zip, gmlchap2, the safe format Q package, whatever we want.
12:00
And this concept actually allows us also to introduce collections of coverages to be transported and it allows also to introduce tiling, multidimensional tiling, partitioning of objects. Some formats support that already, like duty for example, others don't. Now we have a way to model that, to represent that, regardless of the format.
12:24
So this is the coverages. The data structure. This data structure actually can be served by many different services. We tentatively decoupled that from the web coverage service. So a WFS, a web feature service, can serve a coverage as well because a coverage is a feature in the end.
12:42
Just depends on the implementation. So actually the coverages can float between different services or in other words can be passed on from one to the next. For example, from a sensor observation service into a web coverage service and so on. Okay, I will focus on the web coverage service because that offers the dedicated functionality specifically for the coverages.
13:06
This WCS standard is organized into, set into a suite of standards not to make life more complicated but actually to make it easier because the WCS core is very, very simple. It's about the level of master thesis for a good student to implement that
13:26
as part of a semester work and then you get something that delivers a coverage or a subset of it. Subsetting means I can trim, that is, I do a cutout but I retain the amount of dimensions. So a 2D cutout from a 2D coverage.
13:41
Or it can be slicing, which reduces the number of dimensions. So from a three dimensional image time series stack, for example, I take out a time slice, which then is 2D. Or I do a time series analysis, then I get out a one dimensional coverage. So I can walk myself through the dimensions and can extract whatever I want to transport the minimum amount to the client.
14:04
This you get back in the stored format if you don't say anything but you also can request a format conversion. Say, into GeoTIFF or you want a GML representation. And the service can decide which format it wants to support.
14:21
This is the mandatory stuff. Then there are different facets in the extensions, functionality facets which a service may or may not implement. That makes it simpler for implementers because you don't have to code a lot until you have something ready but you can say I offer core and I'm working on extensions.
14:43
It's good for those who want to buy an implementation or select an open source implementation because they can set up a list saying I need WCS core and I need the range subsetting extension, I need the CRS transformation extension and then everybody knows exactly what is necessary.
15:02
And so you can negotiate and you have a clear understanding of the functionality that is to be delivered. Finally we have application profiles which is just some bespoke packaging of functionality for different purposes. Currently we have satellite imagery
15:20
and Metocean data is in the make. So that is something that I'm using Wii because we actually have done that. I'm the editor of those things here. And so naturally we use our Rastaman system for something that is used for implementing.
15:42
And so we also have the implementation available but many others do as well. So actually we get common information space where coverage services inter-operate. Okay, five minutes left. The funny word has fallen. Rastaman, raster data manager.
16:01
So that is our vehicle that we are building, a so-called array database system that enhances standard SQL with queries on multi-dimensional arrays. And behind that, tile streaming architecture which is peer-to-peer. So fully parallelized without single point of failure. We are on OSG Live.
16:23
OSG incubation was not where both parties felt happy with so that we have abandoned. And what did I want to say? Yes, a couple of words. Just to give you an overview of the architecture,
16:41
we do a partitioning of course which can be any sort of partitioning to do optimization like saying this particular region must be really fast and for the rest just do something meaningful to your system. We have a parallel architecture where we have split queries over more than 1,000 cloud nodes. And this is our latest thing.
17:01
This year we are going to be on board a satellite. Big fun for everybody in the team. And I hasten to say these interfaces are not for end users. We can discuss a lot about get versus rest versus post or so, but actually I don't care. This is all just internal interfaces.
17:21
People want to use their comfort zone, their tools. And so actually it's important that we just have this as a client-server interface and instead we allow to plug in all the different things like Python. I presented that yesterday that allow people to work on data analysis,
17:41
on just browsing maps, doing web GIS or whatever. And the standards are our friend. So we can attach client A to server B. Okay, now I've talked too much. So this one I have to shorten drastically. I guess I will just do this one.
18:04
How many minus minutes do I have? Three, okay. So just this one. It's embedded into the Earth Server Initiative which finances us via the European Commission and we are very grateful on that. Where we build up databases
18:20
on three-dimensional image time series and four-dimensional weather data cubes. So we have the Sentinel data of ESA. We have ocean color analysis at Plymouth Marine Laboratory. National Computational Infrastructure Australia. European Center for Medium-Range Weather Forecast
18:41
gives us four-dimensional data cubes. And then we do a join, a combination of data cubes between Australia and London. Reading to be exact. And combine them, visualize them in NASA web whirlwind. So in the end, what we are aiming at is a global federation where you can do anything on the data cubes
19:02
that are stored here and there. If you want, offline I can show you the demo. We don't have time for that now. But I can show you that data fusion already in a first version, let us say. Okay, I had wanted to talk more about OSQ as well
19:21
because this I thought might be a good forum for discussion but we don't have time for that. So I will skip all that and just stop here, I guess. One minute. One minute. Should I talk even faster? We can pitch up by frequency.
19:44
I guess it doesn't make so much sense but just to stimulate discussion. I cannot resist totally. So what I show you is just the summary thing here. All those things I wanted to discuss. I mentioned our incubation in OSQ and actually we were good friends except for one thing.
20:04
The maintenance of the code where they say all software should be free and I say, why? We have, that doesn't want to come. Okay, come on, come on, come on. Hey, don't try to outsmart me.
20:27
Got it, okay. Yes, I admit we have a mixed license model, dual license model. Is that bad? I mean, we have a full spectrum. We have the full commercial stuff, S3 as a placeholder and then on the other hand, this is what OSQ wants.
20:43
What about these ones? They should be accommodated as well. How much more powerful would OSQ be if it would be inclusive and get all that in? Dear Karine is one of the proponents of such thinking but it doesn't somehow get through and in the end, the S3s don't care about our war
21:01
between here and there but the small companies, they suffer. Is that really what we want? Question mark. And with that one, I guess I really need to stop. Thank you, Anne. And thank you for bearing with me.
21:21
So, yes, suddenly I have five minutes again. That is, you have five minutes. So, please, flames on or comments or whatever. Okay, good.
21:42
So, thanks again and bye. Now, how do I get out without this thing hanging up? That should work. It doesn't like to unplug when Windows is in full screen mode
22:00
so I need to be careful but it worked now. Thank you. Okay, thank you. I definitely want to discuss with you about licensing and stuff. Yes, why don't you do it? I'm here now. We just turn off the microphone. The rest of the records. Okay.
25:03
Okay.
Recommendations
Series of 4 media