actinia: geoprocessing in the cloud
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 490 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/47043 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 202013 / 490
4
7
9
10
14
15
16
25
26
29
31
33
34
35
37
40
41
42
43
45
46
47
50
51
52
53
54
58
60
64
65
66
67
70
71
72
74
75
76
77
78
82
83
84
86
89
90
93
94
95
96
98
100
101
105
106
109
110
116
118
123
124
130
135
137
141
142
144
146
151
154
157
159
164
166
167
169
172
174
178
182
184
185
186
187
189
190
191
192
193
194
195
200
202
203
204
205
206
207
208
211
212
214
218
222
225
228
230
232
233
235
236
240
242
244
249
250
251
253
254
258
261
262
266
267
268
271
273
274
275
278
280
281
282
283
284
285
286
288
289
290
291
293
295
296
297
298
301
302
303
305
306
307
310
311
315
317
318
319
328
333
350
353
354
356
359
360
361
370
372
373
374
375
379
380
381
383
385
386
387
388
391
393
394
395
397
398
399
401
409
410
411
414
420
421
422
423
424
425
427
429
430
434
438
439
444
449
450
454
457
458
459
460
461
464
465
466
468
469
470
471
472
480
484
486
487
489
490
00:00
Grass (card game)Modul <Datentyp>Raster graphicsBitRight angleMiniDiscGastropod shellINTEGRALCASE <Informatik>Context awarenessBit rateMetrologieCurvatureSoftwareGrass (card game)Physical systemInstance (computer science)Data managementDatabaseTemporal logicMultiplication signSpacetimeFlow separationAreaData storage deviceWordArithmetic meanProcess (computing)Order (biology)Different (Kate Ryan album)Open sourceShared memoryApproximationPoint cloudInformationElement (mathematics)Self-organizationVirtual machineMathematical analysisNeuroinformatikCubeAlgebraVolumeTerm (mathematics)Endliche ModelltheorieResultantSpeicherbereinigungWater vaporInternet service providerRead-only memoryRemote Access ServiceWindowDatei-ServerDisjunctive normal formRepresentational state transferAuthorizationCoordinate systemUniform resource locatorLevel (video gaming)Raster graphicsDenial-of-service attackCloud computingCore dumpDialectAlgorithmTheory of relativityProgramming paradigmVolume (thermodynamics)Projective planeSet (mathematics)Software developerGame controllerImage processingElectronic data processingService (economics)Data analysisInstallation artModule (mathematics)Vector spaceFormal languageForcing (mathematics)RadiusSound effectOpen setShift operatorMetreMachine visionGraph (mathematics)Internet forumBookmark (World Wide Web)Image resolutionRing (mathematics)Reading (process)LogicFood energyComputer animation
09:52
CASE <Informatik>GeometryGrass (card game)Endliche ModelltheorieSet (mathematics)Web 2.0Physical systemRepresentational state transferPoint cloudSampling (statistics)Uniform resource locatorInterface (computing)Dynamical systemOrder (biology)MappingMathematicsInformationAreaChainDifferent (Kate Ryan album)SubsetRight angleReal numberInstance (computer science)Metropolitan area networkProcess (computing)LastteilungComputer file
11:52
VolumenvisualisierungMappingDemo (music)NeuroinformatikObservational studyQuery language
12:13
Process (computing)ChainGrass (card game)Modul <Datentyp>Scripting languageNeuroinformatikCASE <Informatik>Computer fileProcess (computing)Module (mathematics)Uniform resource locatorScripting languageChainRadical (chemistry)Level (video gaming)Open sourceComplex (psychology)Raster graphicsVector spaceDifferent (Kate Ryan album)User interfaceWritingFraction (mathematics)ResultantComputer configurationDatabaseSet (mathematics)Medical imagingMultiplication signPhysical systemPasswordGrass (card game)Java appletNetwork topologyRepetitionInternet forumDrop (liquid)SynchronizationElectronic visual displayAutomationPoint (geometry)WebsiteWeightComputer animation
15:18
Computing platformPoint cloudSingle-precision floating-point formatMusical ensembleProcess (computing)ChainDifferent (Kate Ryan album)AreaProjective planeResultantComputing platformDemosceneNormal (geometry)Subject indexingPhysical systemComputer filePoint cloudMetadataEntire functionInternet service providerCentralizer and normalizerPoint (geometry)TouchscreenMetreForcing (mathematics)Green's functionData storage deviceCausalityBitEngineering drawingComputer animation
17:30
ChainProgrammable read-only memoryProcess (computing)LaptopPrototypeImplementationInformationProjective planeVideoconferencingMultiplication signNeuroinformatikMereologyHorizonBitPoint cloudGrass (card game)MassLibrary catalogInternet service providerComputer fileWebsiteAlgebraDescriptive statisticsImage resolutionSubject indexingProcess (computing)File archiverEntire functionState observerParallel computingData storage deviceComputer animation
20:05
Point cloudFacebookOpen source
Transcript: English(auto-generated)
00:05
So a little bit more extensive than normal. Yeah. All right, okay Thank you. So hello everyone. My name is Marcus Nitterer from coming from Bonn, Mundialis We have a startup existing for now almost five years
00:24
I'm originally coming from research in Italy and I'm the Project coordinator of the GRASS GIS project and also co-founder of OSGEO foundation Yeah, we thought to bring GRASS GIS to the next level in the last years working on that
00:43
The main author is Zuren Gebbert He spent most of the time initially on it and yeah, we are working on this software This is our very outdated company picture, it's
01:01
Let's say an approximation. We have more people now But maybe giving you an idea River Rhine in the background, so we are near Cologne and Interestingly and we are in an open source context here. You can make a living of open source software in case you didn't know The entire company along with our sister company Terrestris
01:23
which exists much longer since 2002 is our full open source companies and This is something Which I still try to repeat everywhere where I can because Yeah, it's an it's an interesting
01:41
way of developing software and offering services, so What I'm talking about we had some idea It's a bit small. I hope you can read it. I wasn't really prepared for low resolution, but I can tell you what's written there bring the the algorithms to the data So we heard in the previous talk that data can increase and do increase
02:05
non-linearly and In our case we are dealing with geospatial data including Copernicus data, so also there are petabytes of data everywhere and those have to be dealt with somehow and everybody is dealing with
02:21
IO problems and disk storage and so forth and why not go where the data are but this also implies kind of Bringing the user to the data. So this paradigm you have probably heard of already several times And it's still valid We wanted to check how to exploit The grass GIS software particularly but not only purely grass, but all the related ecosystem with G dial
02:46
approach Is our snap included as well and whatever you want to deploy yourself how to get this into some kind of cloud context So the original name was grass grass as a service G
03:02
G Ras which is not so intuitive probably to pronounce for marketing reasons. We then called it actinia Actinia is a sea creature which is like having tentacles and filtering the water so now we consider like something like data lake or
03:21
The flood of information or whatever you want up to you and so with our Analysis software we can go there and fish the relevant information and go for processing that of course core Element here is grass GIS And this software is if in case you are not aware of it
03:41
Under development since 1982. So way before I left school. I joined in let's say as a shy user in 1993 and then moved on to basically more or less coordinating it, you know, it's a Duocracy means who is working?
04:03
Can move things and I thought to contribute to that Yeah, if you are not familiar with grass itself we have something called grass database that is more or less a file-based system With SQL database in the background as well, but there are a few particular things
04:24
One thing is called location and there are inside map sets. That's more or less for the organization of the data You could also consider this as a workspace or as a project and subfolders. So nothing dramatic but there's something related to that because this brings the possibility to
04:41
Offer user management So probably you do not want to share especially in a cloud context You don't want to do that share all data with everybody, but you want to have a restrictive User model there and this is coming kind of implicit here Then we have lots of algorithms. We are talking about 500 plus
05:03
methods available Majority is in the core. It's vector analysis, raster analysis, volume, so volumetric data analysis, time series, which is Not so in terms of grass age, it is new, but it's already existing for seven years or something So you have space-time cubes and you can go and analyze things
05:24
With algebra as well and all this is already there You have image processing which we use for the Copernicus data processing or metrological data interpretation and so forth and what you can do here since we are in a GIS context you have the full integration between
05:40
Image processing and GIS in one shell. So it's not two distinct words. I'm not interested in that I'm a geographer myself. So I like to get things together and Here you can do that and you can just smoothly go from one to the next So now the question is how to get this into the cloud And cloud means we want to have a RESTful API on top
06:03
Maybe to start with to list what data are there What does belong to whom space space or temporal data set is offered as resources? So you can then go there and do not naturally Computation on top of that enable usage of grass GIS modules and they already mentioned user management
06:24
So define different roles But in a cloud context where you pay as you go also for the resources You want to have some control over what you offer to the user For example you offer to the to the user being a provider a kind of flat rate But flat rate doesn't mean unlimited, of course
06:42
But it means flat rate in the context of what they want so you can go there and say you restrict To it's like geofencing to a particular area of the world where you can compute things or amount of data volume and so on There are different possibilities and you can also expose The methods you have or the modules called in grass language
07:05
Selectively to the users and say okay, we offer you this stack of functionality and if you want level two, then you can also Access the other one Interestingly you want to avoid That one users overwrites things of the others who have to have kind of data locking also natural
07:24
But you have to implement it and this is also coming already with grass GIS itself. So if you take Up get install grass or DNF install grass or whatever you do Docker pull grass Yeah Then you have the possibility to already use a network drive and using the Unix or Windows user management
07:46
You have to have access or not and all this is now exposed through actinia itself as well We have two kinds of storages. We have the persistent read-only Storage where you offer base cartography, for example, like the original data like
08:04
What it is elevation model Copernicus data land-use map, whatever it is You already provide to your users It would go there because you do not want that anyone modifies them But the users through the computation here are different workers or nodes
08:21
They want to write their own stuff. And so that goes into the user space and this is also Connected to kind of garbage collection for example in ephemeral processing you say the data the results are available for Whatever you put there 24 hours and they are deleted automatically just housekeeping in order to avoid that too much storage is used
08:45
So in the end you have this grass database over there, which is the data storage can be whatever I come to it later And you have the different workers equipped with grass GIS also GDAL, PDAL, RPDAL as well I forgot to mention before
09:00
Whatever you put there, basically the user management is done in in Redis and there we can we have an in Redis instance and the systems are communicating to each other and So forth everything then can be deployed on different cloud infrastructures. So this is all Docker based
09:24
We have running instances in OpenShift, Kubernetes and OpenStack And also others we are using Terraform in order to deploy machines. So kind of if Actinia wants to scale up we can say okay, you can order new machines by yourself and
09:43
After consumption means the finishing of the process the machines are destroyed in order to not generate further cost Then we need a load balancer So the incoming requests are coming here from through the API But you want that let's say the cloud resources are optimally used for that
10:04
there's a load balancer then sending stuff to the different workers and ideally Well, the data are visible anyway, but you also in case you have a heterogeneous You have heterogeneous cloud resources like these instances with different flavors
10:22
You want to send them to the right? Worker in order to be able to compute the job. Okay now how to control all this We are having JSON files here we have the rest API
10:42
So there are requests like get location So that is you can use curl or we have some other interface or the web Based system or maybe in the future also QGIS based one. You can call it from grass command line So different ways of retrieving information. You can query the system and ask. Okay, what data set are already there and
11:05
There could be job this for example the global in our system the global SRTM model Elevation model that is 300 something gigabyte geotiff file and In case you are working with elevation model
11:21
I think most of you will only be interested in a subset, but each of you in a different subset so the idea of cloud is we offer it once and then you can just operate on the area of interest and which which could be changed dynamically and Then as it is rest style you chain more stuff there. You say kind of zooming into
11:43
North Carolina that is our some was geosample data set what is inside? So we're in North Carolina here and then you see what data sets are there and you can go on and you can go further Look into the maps and you see there's already a render Endpoint, which means if you query the system and by the way, this is online reachable under actinia mundialis
12:06
Do you can go and play there's the demo user available now Then you can go into the dive into these data and also use them for computation Now user-defined processing in this case You don't retrieve but you send something to the system and say please do this and that is a post request you see over there
12:27
both requests and You say I want to compute the slope of Some map and I want the result as a geotiff, please and what's also possible by the way You can also give it a URL. It's not shown here because too long
12:44
But you can specify URL in this case. This data set will be retrieved first and then Computation being done on top of that or you intersect with data already there or you fetch different from different data sources and compute stuff and eventually you retrieve either vector file or raster file or you dump it into a
13:06
post-gis database or whatever you prefer and Through this in JSON style you can write custom process chains Already mentioned grass modules are there importer exporter is there and then you can also bring in your own Python scripts
13:21
And those can be whatever and if you say oh, but Python no idea I still have my good old 90th shared scripts. They work so nicely No need to rewrite them. You just wrap them into a Python script and hang them in and you are done So it's not that you have to rewrite everything but you can just Make it appear a Python script and the system is happy with that
13:44
We have also wrapped you find this on github and docker hub snap is our snap We made a docker image out of that by the way. It's a fraction of the original size there are some funny things in the original like full Java and so forth this can be heavily reduced and
14:02
Through that we build up the entire stack So, how does a curl request look like so this of course curl then demo user, please steal the password. It's public You sent a post there a process chain. It's only written like a variable there This is essentially a file adjacent file
14:22
Or maybe you put it into a variable up to you And this is then sent to this endpoint here and it will do in this case asynchronous processing That means with synchronous processing you say okay do that and I wait till you are done and it comes back to me But in case the job is something complex and it would run for several hours
14:42
you don't want to sit and block your terminal with that you use the asynchronous endpoint and then it is sent there and you have URL with a resource of status and you just ping it from time to time this you can automate of course If you have a web interface then it would
15:03
Notify you once it is done So both options are available Which means polling in this case so you get the status and once it is done you get the resource you are at back, which is the geotiff or Whatever it is, and then you can retrieve the map and you are done
15:24
What else is there and we have been implementing processing chains for Sentinel one and Sentinel two Data also for Landsat not written here There are endpoints like NDVI. So for example, you have a certain you're interested in a normalized differences
15:41
Vegetation index a very common index in Houston agriculture and elsewhere also to find green areas in urban In urban areas, you can use that you just say, okay I want to analyze in this area and maybe for the year 2018 from 1st of April to end of June
16:02
Search something seen with less than 1% cloud and do the NDVI So this is more or less one endpoint and then you say just these few metadata send it to the system and you get stuff back We have connectors to ESA API hub. That means there are the Sentinel data retrieved from one way
16:24
We are in discussion also because we are involved in the OpenEO project which was mentioned earlier To connect to the Diaz platforms that are Copernicus platforms for Sentinel processing The Amazon AWS and Google Cloud Storage we have also some deployments there
16:43
Advantage is of those the Sentinel data already unpacked there. You do not have to retrieve the entire full Zip file of one gigabyte size if you are only interested in two channels. Yeah, then you can switch to that provider You can see
17:01
It's flexible. Yeah, our idea is to not be locked into one single platform, but to have the possibility to Well to deploy it here and there and use the best so example here Sentinel to process is the endpoint compute NDVI and use this scene the scene
17:21
I got the scene name I got from somewhere but you can also search for it and then As before you can pull for the result and then you get the NDVI back and this you get like a screenshot So previews that you need to see what you have done plus the geotiff file as well, which is of course a bit larger
17:43
Okay, some more features. You can also write you can write to Google Cloud Storage You can write to your own if you deploy actinia Yourself could be your laptop even Then you can naturally write also there or to s3 buckets Then we have
18:00
added For the grass users here the possibility of grass Sorry often actinia command execution. That means you have one grass command You just write a CE in front of it Of course, you need to have the credentials and then the same command is sent to the cloud So not locally executed but in the cloud so you can play around
18:21
Prototype on your laptop and once you know you set the resolution This is one of the nice glass features to the original resolution and do the heavy computation in the cloud itself as Mentioned we have an open.io support there. We are one of the backend providers we will probably not fully implement everything but no back-end implements everything but the the relevant parts and
18:44
You find on github the related information also on open.io org site. That's a horizon European project by the way If you haven't been here this morning You can see in the video archive of today the related talks and then eventually something very interested
19:04
Interesting called actinia algebra. That is something to do massive computation in parallel since we are on cloud We also want to make good use of that and imagine you want to compute something of an entire country watersheds vegetation index
19:21
Run off whatever you can imagine as I mentioned can be GIS can be Earth observation Then all this stuff is parallelized and executed in much faster time Of course, you need to have some more resources for that so What's upcoming
19:41
We are almost through with implementation of process self description That means what you saw this is almost last like what you saw That we can see what data are there we also want to have what methods are there Yeah, a kind of catalog and if you want to Maybe then wrap around something like WPS style. Yeah, then you can make you