We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Big Spatio-Temporal Datacubes on Steroids ...and Standards

00:00

Formale Metadaten

Titel
Big Spatio-Temporal Datacubes on Steroids ...and Standards
Serientitel
Anzahl der Teile
611
Autor
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
Produktionsjahr2017

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
With the advent of the massive deluge in Earth data, serving them to diversecommunities is increasingly promising and challenging alike. A usefulabstraction for spatio-temporal raster data (and beyond) is the coverage datamodel, as standardized by ISO, OGC, and INSPIRE. Rather than zillions ofindividual image files it provides spatio-temporal "datacubes" for simple,efficient handling through the corresponding service model, the Web CoverageService (WCS) with its Web Coverage Processing Service (WCPS) geo analyticslanguage - "one cube tells more than a million images". Open-source rasdaman ("raster data manager") is the official referenceimplementation of both OGC and INSPIRE WCS. It supports easy incrementalconstruction and maintenance of spatio-temporal datacubes, based on the OGCWCS-T standard. Retrieval may use WMS for visual navigation, WCS for dataextraction and download, and WCPS for massive server-side processing. Onserver side, adaptive data partitioning and "tile streaming" processingenables fast query responses. In July 2016, US magazine CIO Review hasincluded rasdaman in its top 100 Big Data technologies list. In this talk we present coverages in terms of concepts, implementation, andlarge-scale application. Live demos underpin the talk, using publiclyaccessible sites where the audience can replay and modify the examples. Beingeditor of the OGC and ISO coverage standard the presenter can give first-handsinsights and answers, such as about the new generalized grid model forcoverages (CIS 1.1) which OGC has adopted in Fall 2016 as well as the newlyadopted INSPIRE-WCS. This is an excellent opportunity to learn about the stateof the art and standards in an open, free-of-cost setup.
Temporale LogikGüte der AnpassungGrenzschichtablösungGruppenoperationMailing-ListeStatistische HypotheseDatensatzComputeranimationXML
Temporale LogikBasis <Mathematik>Minkowski-MetrikRegulärer GraphPunktwolkePunktAbstraktionsebeneImplementierungPCMCIACodierung <Programmierung>ImplementierungTermKonforme AbbildungMultiplikationsoperatorObjekt <Kategorie>MAPMapping <Computergraphik>URLEinfache GenauigkeitSpannweite <Stochastik>Minkowski-MetrikMailing-ListeWürfelClientPunktDateiformatDimensionsanalyseKette <Mathematik>DatentypEndliche ModelltheorieTopologieInjektivitätBinärdatenSoftwareentwicklerBildgebendes VerfahrenMultiplikationTypentheorieStereometrieMetadatenOntologie <Wissensverarbeitung>Einfacher RingSelbstrepräsentationElektronische PublikationBitDomain <Netzwerk>CADAbstraktionsebeneTemporale LogikTrajektorie <Kinematik>Formale SemantikKurvenanpassungDatenflussUmwandlungsenthalpiePunktwolkeElementargeometrieGruppenoperationAutorisierungZweiDimension 4Wort <Informatik>Quick-SortEinflussgrößeGeradeKategorie <Mathematik>SchnittmengeKonditionszahlNebenbedingungInformatikerVektorraumKonfiguration <Informatik>DatenstrukturSpezifisches VolumenCASE <Informatik>EinfügungsdämpfungArithmetischer AusdruckElement <Gruppentheorie>InformationsspeicherungEinsLuenberger-BeobachterMathematikKartesische KoordinatenFunktion <Mathematik>Attributierte GrammatikEinfach zusammenhängender RaumHydrostatikServerBrowserMaß <Mathematik>DickeInformationPixelBitmap-GraphikArithmetisches MittelFluss <Mathematik>FaserbündelSoftwaretestPolygonnetzOrthogonalitätComputeranimation
PCMCIAE-MailCodierung <Programmierung>Basis <Mathematik>ImplementierungOffene MengeNASA World WindServerWeb ServicesTemporale LogikTeilmengeMaßerweiterungMultitaskingBitmap-GraphikArray <Informatik>ProgrammierumgebungParallelrechnerPartitionsfunktionAnpassung <Mathematik>Prozess <Informatik>Sigmoide FunktionBenutzerschnittstellenverwaltungssystemKnotenmengeAbfragePunktwolkeAppletSuite <Programmpaket>AssoziativgesetzSpeicherabzugFokalpunktSoftwareSoftwarewartungStatistische HypotheseSchaltnetzParallelrechnerMAPWürfelMereologiet-TestDifferenteWeb ServicesClientDatenstrukturMapping <Computergraphik>Open SourceLuenberger-BeobachterGraphfärbungWort <Informatik>DatenhaltungElementargeometrieComputerspielNeuroinformatikFunktion <Mathematik>AnalysisInterface <Schaltung>SpeicherabzugDateiformatBildgebendes VerfahrenZeitzoneBenutzerbeteiligungObjekt <Kategorie>Quick-SortImplementierungComputerarchitekturPhysikalisches SystemGlobale OptimierungSatellitensystemAbfrageMaßerweiterungEinfache GenauigkeitServerTesselationTeilmengeProgram SlicingWurm <Informatik>InformationData MiningPartitionsfunktionMinkowski-MetrikVerschlingungProfil <Aerodynamik>Element <Gruppentheorie>Bitmap-GraphikArray <Informatik>PixelMalprogrammElektronische PublikationMetadatenPunktwolkePunktTransformation <Mathematik>Reservierungssystem <Warteschlangentheorie>Suite <Programmpaket>VollständigkeitSelbstrepräsentationSpannweite <Stochastik>Kartesische KoordinatenMailing-ListeKeller <Informatik>Umsetzung <Informatik>MultiplikationsoperatorLokales MinimumDimensionsanalyseInformationsspeicherungE-MailDatenverwaltungEndliche ModelltheorieDatenfusionCodierung <Programmierung>ZeitreihenanalyseCodeWiderspruchsfreiheitDatenanalyseArithmetisches MittelDemo <Programm>ZahlenbereichGüte der AnpassungVersionsverwaltungDimension 3SoftwareElektronisches ForumAusnahmebehandlungFrequenzStrömungsrichtungDimension 4WhiteboardComputeranimation
Temporale LogikImplementierungCMM <Software Engineering>Prozess <Informatik>SpeicherabzugFokalpunktSoftwareMixed RealityPunktspektrumDualitätstheorieFreier ParameterEndliche ModelltheorieElementargeometrieEins
SoftwareGüte der AnpassungBildschirmfensterATMTouchscreenComputeranimation
TouchscreenBildschirmfensterATMDatensatzComputeranimationVorlesung/Konferenz
ComputeranimationVorlesung/Konferenz
Formation <Mathematik>Besprechung/Interview
Computeranimation
Transkript: Englisch(automatisch erzeugt)
Okay. So from my side, unmuted. Good. Let's give it a try.
I don't want to yell at you, so I'm starting cautiously. I don't know whether the microphone is on. Ah, it's just for recording. Okay, good. So, I start here with. So, thank you for having me here. I'm Peter, Peter Bauman.
And what I'm going to present to you is actually joint work of several people in my group. One of them in particular, Dimitar Mijev, whose PhD thesis is engaged in that work as well. But I would have to take a long list, actually, to be honest, to give proper credit to everybody who has contributed to that.
And not the least, of course, also the standardization bodies where we had lots of discussion. Gave me a lot of opportunity to learn things as I'm just a plain computer scientist that has stumbled into geo world.
And that's what I want to talk about, about multidimensional data that appear as spatial temporal data cubes. So, we have this funny term of coverage. This is age old. Actually, only a few remember that it originally was invented by Esri for a particular data structure.
But meantime, it's decoupled from that. And they themselves were surprised to discover what coverages have become in the meantime. Coverage is catch all terms that actually relates to feature. You know, in OTC standardization world, a feature is a geographic object in the end.
Something that has some location, space and time attached. Then we have a special kind of feature that is a coverage. A coverage is defined as something that may sound really strange. A space-time varying phenomenon. That wants to express something specific. If you look at, say, a highway here, and you take the A1.
If you model that as a vector, then obviously the attribute A1 will be invariant. Over the full size, over the full length of this highway, it will be A1. So it's sort of static attributes that are attached. Here, however, if you take some image and walk from one location to another, the value changes.
And that is what this expression wants to say. It changes as you go from one element to another. It's pretty clear that this kind of thing requires more storage space. And therefore, the big data that we encounter, at least big in terms of volume, typically are coverages.
Not so much vectors. And actually, it's not raster data only, if you think about that one. But you could generalize that as regular and irregular grids, as point clouds and meshes. We can sort of make this a little bit more graphic, more spicy. So we have the feature, and we have some abstract coverage, which actually has several subtypes, if you will.
Which also reflects history. We have the grid coverages, the first naive attempt, around about 2004. That is not very much used today. In particular, not with geo-coordinates, because it has some difficulties.
I avoid the word flaw here. We had to improve on that. I got the rectified coverage, which is an ortho image. So rectilinear grids. And then, the rest of it all was termed reference- You see, I cannot even pronounce it. Referenceable grid coverage. Nobody could explain me why the term was chosen.
It's a little bit difficult, but it means irregular grids in the end, so everything else. That is historical development, like rings in a tree. Actually, we supersede this currently with the coverage implementation schema 1.1,
where we introduce a general grid coverage, which brings all these historical developments together into something that is easier to handle, easier to understand. And in particular, one single concept. And that is what typically is known as data cubes. That is some arrays, some raster sets that have two dimensions, three dimensions, four dimensions, five, whatever.
And as you can see, they can have straight lines like ortho images. It can be curvilinear, like meshes following a coastline. Or the strange things that people do in climate modeling. All of that falls into that category. But it doesn't stop there, because in the end, it's point in space.
They just happen to have a particular condition, constraint, that they all have neighbors. If you don't have that, if you give up that, then you come to multipoint coverages, that is point clouds. If you add curves, so bundles of trajectories could be a multicurve coverage, and so on, multisurface coverage, multisolid coverage.
This is actually where we close the gap to city modeling, for example, city GML. Also to CID, computer-ready design. So we do not reinvent wheels, we just want to close gaps. This is an abstract concept. And that has been defined in OGC Abstract Topic 6, which is identical to ISO 19123.
Abstract in the sense, it does not prescribe a particular implementation. Which means, you can have divergent implementations, and you find that out there. So if somebody says, I'm interoperable because I'm implementing 19123, wrong.
You find divergent implementations out there. And you have a very simple criterion. They all come with their own client. Because crisscross coupling does not work. Therefore, OGC has added a concrete coverage implementation schema, which is a concretization that makes a few assumptions so that it becomes interoperable.
And actually, we can do conformance testing down to the level of single pixels, saying whether an implementation is correct or not. In this sense, it's concrete. And then you can use client A to access server B. That is possible indeed, and that has been done.
So that is the general thing. Stepping back a little bit, we can bring that into a very simple schema, actually, using UML. So it's a specialization of feature that has a domain set, telling us where do our values sit. The range set, which are the values. You see domain range, it's like a function, mapping locations to values.
And what we have added then over GML, for example, is the data type. What is the data type of our values? Is it temperature, is it radiance, or whatever else? And this actually we took from SWE Common. So actually now we have a connection to sensor world. So sensor observations can be transformed into coverages without loss of semantics here.
That was important to us. So there are quite a few data that people want to attach, different ones depending on their domain. And so there is an optional metadata package where I can put anything else you want. The coverage doesn't understand it, but it does transport it for you.
So this is a way to carry along all sort of specific metadata that you want to remember. If you want to see that in GML, okay, here is an example for it, where you see the domain set with a grid definition. This happens to be four-dimensional. Lat, long, height, and date.
So you have a regular axis for lat, long. You have an irregular axis for height, with two levels in this case. And you have an irregular axis for time, also two levels. I didn't want to write more. I was lazy. This is mapped to a grid axis, and then you know where the points sit
and the point values you have here in the range set. And down there in the range type, you see what that means. For example, it's panchromatic. We have radiance reference to it, and we have the units of measure and that kind of things. That is the historical way because GML just was trendy at that time when it originated.
But with CIS 1.1, if you favor curly braces, then here is the same thing in JSON. Domain set, range set, range type. And the good thing is that it has the same semantics. We can map it. So you can do a one-to-one mapping. You are not bound to a particular world of standardization.
And if you are into ontologies and into reasoning, then you may want to enjoy the RDF representation. Please look at it closely. I will rehearse it afterwards. So we have different formats available. If you step back a little bit, that means we can encode our coverage
into a single file where we have domain set, range set, range type, and the metadata. And we are informationally complete in the sense that it contains all of the definition if you use some format like GML, JSON, or RDF. Fine so far, but inefficient, obviously.
You don't want to transport 10 terabyte encoded in ASCII. Not really. So we need binary formats. Okay. That is defined as well with a coverage standard. You can use any of those formats and this is a growing list. We have more and more mappings defined. Obviously this is incomplete sometimes. GeoTIFF is not able to handle all of the range type information, for example.
But okay, you want it, you get it. Sometimes there are reasons for that. Maybe you just want to display in a browser so you pick a PNG knowing that you will not get the full information. So that is not an inconsistency. That is meant this way. However, sometimes we want to have both.
I want to be informationally complete and I want to encode efficiently. Therefore, we have a multipart encoding where we have a container concept that we have some header, which is some informationally complete format, GML or JSON or RDF. And then you have links from particular elements into other files.
And these contain then say the pixel payload so they can be stored efficiently. So we can use, historically it was multipart MIME in CIS 1.0. But in future that can be zip, gmlchap2, the safe format Q package, whatever we want.
And this concept actually allows us also to introduce collections of coverages to be transported and it allows also to introduce tiling, multidimensional tiling, partitioning of objects. Some formats support that already, like duty for example, others don't. Now we have a way to model that, to represent that, regardless of the format.
So this is the coverages. The data structure. This data structure actually can be served by many different services. We tentatively decoupled that from the web coverage service. So a WFS, a web feature service, can serve a coverage as well because a coverage is a feature in the end.
Just depends on the implementation. So actually the coverages can float between different services or in other words can be passed on from one to the next. For example, from a sensor observation service into a web coverage service and so on. Okay, I will focus on the web coverage service because that offers the dedicated functionality specifically for the coverages.
This WCS standard is organized into, set into a suite of standards not to make life more complicated but actually to make it easier because the WCS core is very, very simple. It's about the level of master thesis for a good student to implement that
as part of a semester work and then you get something that delivers a coverage or a subset of it. Subsetting means I can trim, that is, I do a cutout but I retain the amount of dimensions. So a 2D cutout from a 2D coverage.
Or it can be slicing, which reduces the number of dimensions. So from a three dimensional image time series stack, for example, I take out a time slice, which then is 2D. Or I do a time series analysis, then I get out a one dimensional coverage. So I can walk myself through the dimensions and can extract whatever I want to transport the minimum amount to the client.
This you get back in the stored format if you don't say anything but you also can request a format conversion. Say, into GeoTIFF or you want a GML representation. And the service can decide which format it wants to support.
This is the mandatory stuff. Then there are different facets in the extensions, functionality facets which a service may or may not implement. That makes it simpler for implementers because you don't have to code a lot until you have something ready but you can say I offer core and I'm working on extensions.
It's good for those who want to buy an implementation or select an open source implementation because they can set up a list saying I need WCS core and I need the range subsetting extension, I need the CRS transformation extension and then everybody knows exactly what is necessary.
And so you can negotiate and you have a clear understanding of the functionality that is to be delivered. Finally we have application profiles which is just some bespoke packaging of functionality for different purposes. Currently we have satellite imagery
and Metocean data is in the make. So that is something that I'm using Wii because we actually have done that. I'm the editor of those things here. And so naturally we use our Rastaman system for something that is used for implementing.
And so we also have the implementation available but many others do as well. So actually we get common information space where coverage services inter-operate. Okay, five minutes left. The funny word has fallen. Rastaman, raster data manager.
So that is our vehicle that we are building, a so-called array database system that enhances standard SQL with queries on multi-dimensional arrays. And behind that, tile streaming architecture which is peer-to-peer. So fully parallelized without single point of failure. We are on OSG Live.
OSG incubation was not where both parties felt happy with so that we have abandoned. And what did I want to say? Yes, a couple of words. Just to give you an overview of the architecture,
we do a partitioning of course which can be any sort of partitioning to do optimization like saying this particular region must be really fast and for the rest just do something meaningful to your system. We have a parallel architecture where we have split queries over more than 1,000 cloud nodes. And this is our latest thing.
This year we are going to be on board a satellite. Big fun for everybody in the team. And I hasten to say these interfaces are not for end users. We can discuss a lot about get versus rest versus post or so, but actually I don't care. This is all just internal interfaces.
People want to use their comfort zone, their tools. And so actually it's important that we just have this as a client-server interface and instead we allow to plug in all the different things like Python. I presented that yesterday that allow people to work on data analysis,
on just browsing maps, doing web GIS or whatever. And the standards are our friend. So we can attach client A to server B. Okay, now I've talked too much. So this one I have to shorten drastically. I guess I will just do this one.
How many minus minutes do I have? Three, okay. So just this one. It's embedded into the Earth Server Initiative which finances us via the European Commission and we are very grateful on that. Where we build up databases
on three-dimensional image time series and four-dimensional weather data cubes. So we have the Sentinel data of ESA. We have ocean color analysis at Plymouth Marine Laboratory. National Computational Infrastructure Australia. European Center for Medium-Range Weather Forecast
gives us four-dimensional data cubes. And then we do a join, a combination of data cubes between Australia and London. Reading to be exact. And combine them, visualize them in NASA web whirlwind. So in the end, what we are aiming at is a global federation where you can do anything on the data cubes
that are stored here and there. If you want, offline I can show you the demo. We don't have time for that now. But I can show you that data fusion already in a first version, let us say. Okay, I had wanted to talk more about OSQ as well
because this I thought might be a good forum for discussion but we don't have time for that. So I will skip all that and just stop here, I guess. One minute. One minute. Should I talk even faster? We can pitch up by frequency.
I guess it doesn't make so much sense but just to stimulate discussion. I cannot resist totally. So what I show you is just the summary thing here. All those things I wanted to discuss. I mentioned our incubation in OSQ and actually we were good friends except for one thing.
The maintenance of the code where they say all software should be free and I say, why? We have, that doesn't want to come. Okay, come on, come on, come on. Hey, don't try to outsmart me.
Got it, okay. Yes, I admit we have a mixed license model, dual license model. Is that bad? I mean, we have a full spectrum. We have the full commercial stuff, S3 as a placeholder and then on the other hand, this is what OSQ wants.
What about these ones? They should be accommodated as well. How much more powerful would OSQ be if it would be inclusive and get all that in? Dear Karine is one of the proponents of such thinking but it doesn't somehow get through and in the end, the S3s don't care about our war
between here and there but the small companies, they suffer. Is that really what we want? Question mark. And with that one, I guess I really need to stop. Thank you, Anne. And thank you for bearing with me.
So, yes, suddenly I have five minutes again. That is, you have five minutes. So, please, flames on or comments or whatever. Okay, good.
So, thanks again and bye. Now, how do I get out without this thing hanging up? That should work. It doesn't like to unplug when Windows is in full screen mode
so I need to be careful but it worked now. Thank you. Okay, thank you. I definitely want to discuss with you about licensing and stuff. Yes, why don't you do it? I'm here now. We just turn off the microphone. The rest of the records. Okay.
Okay.