We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

actinia: geoprocessing in the cloud

00:00

Formal Metadata

Title
actinia: geoprocessing in the cloud
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
With a rapidly increasing wealth of Earth Observation (EO) and geodata, the demand for scalable geoprocessing solutions is growing as well. Following the paradigm of bringing the algorithms to the data, we developed the cloud based geoprocessing platform actinia (https://actinia.mundialis.de and https://github.com/mundialis/actinia_core). This free and open source solution is able to ingest and analyse large volumes of data in parallel. actinia provides a HTTP REST API around GRASS GIS functionality, extended by ESA SNAP and user scripts written in Python. Core functionality includes the processing of raster and vector data as well as time series of satellite images. The backend is connected to the full Landsat and Copernicus Sentinel archives. actinia is an OSGeo Community Project since 2019 and a backend of the openEO.org API (EU H2020 project).
33
35
Thumbnail
23:38
52
Thumbnail
30:38
53
Thumbnail
16:18
65
71
Thumbnail
14:24
72
Thumbnail
18:02
75
Thumbnail
19:35
101
Thumbnail
12:59
106
123
Thumbnail
25:58
146
Thumbnail
47:36
157
Thumbnail
51:32
166
172
Thumbnail
22:49
182
Thumbnail
25:44
186
Thumbnail
40:18
190
195
225
Thumbnail
23:41
273
281
284
Thumbnail
09:08
285
289
Thumbnail
26:03
290
297
Thumbnail
19:29
328
Thumbnail
24:11
379
Thumbnail
20:10
385
Thumbnail
28:37
393
Thumbnail
09:10
430
438
Grass (card game)Modul <Datentyp>Raster graphicsBitRight angleMiniDiscGastropod shellINTEGRALCASE <Informatik>Context awarenessBit rateMetrologieCurvatureSoftwareGrass (card game)Physical systemInstance (computer science)Data managementDatabaseTemporal logicMultiplication signSpacetimeFlow separationAreaData storage deviceWordArithmetic meanProcess (computing)Order (biology)Different (Kate Ryan album)Open sourceShared memoryApproximationPoint cloudInformationElement (mathematics)Self-organizationVirtual machineMathematical analysisNeuroinformatikCubeAlgebraVolumeTerm (mathematics)Endliche ModelltheorieResultantSpeicherbereinigungWater vaporInternet service providerRead-only memoryRemote Access ServiceWindowDatei-ServerDisjunctive normal formRepresentational state transferAuthorizationCoordinate systemUniform resource locatorLevel (video gaming)Raster graphicsDenial-of-service attackCloud computingCore dumpDialectAlgorithmTheory of relativityProgramming paradigmVolume (thermodynamics)Projective planeSet (mathematics)Software developerGame controllerImage processingElectronic data processingService (economics)Data analysisInstallation artModule (mathematics)Vector spaceFormal languageForcing (mathematics)RadiusSound effectOpen setShift operatorMetreMachine visionGraph (mathematics)Internet forumBookmark (World Wide Web)Image resolutionRing (mathematics)Reading (process)LogicFood energyComputer animation
CASE <Informatik>GeometryGrass (card game)Endliche ModelltheorieSet (mathematics)Web 2.0Physical systemRepresentational state transferPoint cloudSampling (statistics)Uniform resource locatorInterface (computing)Dynamical systemOrder (biology)MappingMathematicsInformationAreaChainDifferent (Kate Ryan album)SubsetRight angleReal numberInstance (computer science)Metropolitan area networkProcess (computing)LastteilungComputer file
VolumenvisualisierungMappingDemo (music)NeuroinformatikObservational studyQuery language
Process (computing)ChainGrass (card game)Modul <Datentyp>Scripting languageNeuroinformatikCASE <Informatik>Computer fileProcess (computing)Module (mathematics)Uniform resource locatorScripting languageChainRadical (chemistry)Level (video gaming)Open sourceComplex (psychology)Raster graphicsVector spaceDifferent (Kate Ryan album)User interfaceWritingFraction (mathematics)ResultantComputer configurationDatabaseSet (mathematics)Medical imagingMultiplication signPhysical systemPasswordGrass (card game)Java appletNetwork topologyRepetitionInternet forumDrop (liquid)SynchronizationElectronic visual displayAutomationPoint (geometry)WebsiteWeightComputer animation
Computing platformPoint cloudSingle-precision floating-point formatMusical ensembleProcess (computing)ChainDifferent (Kate Ryan album)AreaProjective planeResultantComputing platformDemosceneNormal (geometry)Subject indexingPhysical systemComputer filePoint cloudMetadataEntire functionInternet service providerCentralizer and normalizerPoint (geometry)TouchscreenMetreForcing (mathematics)Green's functionData storage deviceCausalityBitEngineering drawingComputer animation
ChainProgrammable read-only memoryProcess (computing)LaptopPrototypeImplementationInformationProjective planeVideoconferencingMultiplication signNeuroinformatikMereologyHorizonBitPoint cloudGrass (card game)MassLibrary catalogInternet service providerComputer fileWebsiteAlgebraDescriptive statisticsImage resolutionSubject indexingProcess (computing)File archiverEntire functionState observerParallel computingData storage deviceComputer animation
Point cloudFacebookOpen source
Transcript: English(auto-generated)
So a little bit more extensive than normal. Yeah. All right, okay Thank you. So hello everyone. My name is Marcus Nitterer from coming from Bonn, Mundialis We have a startup existing for now almost five years
I'm originally coming from research in Italy and I'm the Project coordinator of the GRASS GIS project and also co-founder of OSGEO foundation Yeah, we thought to bring GRASS GIS to the next level in the last years working on that
The main author is Zuren Gebbert He spent most of the time initially on it and yeah, we are working on this software This is our very outdated company picture, it's
Let's say an approximation. We have more people now But maybe giving you an idea River Rhine in the background, so we are near Cologne and Interestingly and we are in an open source context here. You can make a living of open source software in case you didn't know The entire company along with our sister company Terrestris
which exists much longer since 2002 is our full open source companies and This is something Which I still try to repeat everywhere where I can because Yeah, it's an it's an interesting
way of developing software and offering services, so What I'm talking about we had some idea It's a bit small. I hope you can read it. I wasn't really prepared for low resolution, but I can tell you what's written there bring the the algorithms to the data So we heard in the previous talk that data can increase and do increase
non-linearly and In our case we are dealing with geospatial data including Copernicus data, so also there are petabytes of data everywhere and those have to be dealt with somehow and everybody is dealing with
IO problems and disk storage and so forth and why not go where the data are but this also implies kind of Bringing the user to the data. So this paradigm you have probably heard of already several times And it's still valid We wanted to check how to exploit The grass GIS software particularly but not only purely grass, but all the related ecosystem with G dial
approach Is our snap included as well and whatever you want to deploy yourself how to get this into some kind of cloud context So the original name was grass grass as a service G
G Ras which is not so intuitive probably to pronounce for marketing reasons. We then called it actinia Actinia is a sea creature which is like having tentacles and filtering the water so now we consider like something like data lake or
The flood of information or whatever you want up to you and so with our Analysis software we can go there and fish the relevant information and go for processing that of course core Element here is grass GIS And this software is if in case you are not aware of it
Under development since 1982. So way before I left school. I joined in let's say as a shy user in 1993 and then moved on to basically more or less coordinating it, you know, it's a Duocracy means who is working?
Can move things and I thought to contribute to that Yeah, if you are not familiar with grass itself we have something called grass database that is more or less a file-based system With SQL database in the background as well, but there are a few particular things
One thing is called location and there are inside map sets. That's more or less for the organization of the data You could also consider this as a workspace or as a project and subfolders. So nothing dramatic but there's something related to that because this brings the possibility to
Offer user management So probably you do not want to share especially in a cloud context You don't want to do that share all data with everybody, but you want to have a restrictive User model there and this is coming kind of implicit here Then we have lots of algorithms. We are talking about 500 plus
methods available Majority is in the core. It's vector analysis, raster analysis, volume, so volumetric data analysis, time series, which is Not so in terms of grass age, it is new, but it's already existing for seven years or something So you have space-time cubes and you can go and analyze things
With algebra as well and all this is already there You have image processing which we use for the Copernicus data processing or metrological data interpretation and so forth and what you can do here since we are in a GIS context you have the full integration between
Image processing and GIS in one shell. So it's not two distinct words. I'm not interested in that I'm a geographer myself. So I like to get things together and Here you can do that and you can just smoothly go from one to the next So now the question is how to get this into the cloud And cloud means we want to have a RESTful API on top
Maybe to start with to list what data are there What does belong to whom space space or temporal data set is offered as resources? So you can then go there and do not naturally Computation on top of that enable usage of grass GIS modules and they already mentioned user management
So define different roles But in a cloud context where you pay as you go also for the resources You want to have some control over what you offer to the user For example you offer to the to the user being a provider a kind of flat rate But flat rate doesn't mean unlimited, of course
But it means flat rate in the context of what they want so you can go there and say you restrict To it's like geofencing to a particular area of the world where you can compute things or amount of data volume and so on There are different possibilities and you can also expose The methods you have or the modules called in grass language
Selectively to the users and say okay, we offer you this stack of functionality and if you want level two, then you can also Access the other one Interestingly you want to avoid That one users overwrites things of the others who have to have kind of data locking also natural
But you have to implement it and this is also coming already with grass GIS itself. So if you take Up get install grass or DNF install grass or whatever you do Docker pull grass Yeah Then you have the possibility to already use a network drive and using the Unix or Windows user management
You have to have access or not and all this is now exposed through actinia itself as well We have two kinds of storages. We have the persistent read-only Storage where you offer base cartography, for example, like the original data like
What it is elevation model Copernicus data land-use map, whatever it is You already provide to your users It would go there because you do not want that anyone modifies them But the users through the computation here are different workers or nodes
They want to write their own stuff. And so that goes into the user space and this is also Connected to kind of garbage collection for example in ephemeral processing you say the data the results are available for Whatever you put there 24 hours and they are deleted automatically just housekeeping in order to avoid that too much storage is used
So in the end you have this grass database over there, which is the data storage can be whatever I come to it later And you have the different workers equipped with grass GIS also GDAL, PDAL, RPDAL as well I forgot to mention before
Whatever you put there, basically the user management is done in in Redis and there we can we have an in Redis instance and the systems are communicating to each other and So forth everything then can be deployed on different cloud infrastructures. So this is all Docker based
We have running instances in OpenShift, Kubernetes and OpenStack And also others we are using Terraform in order to deploy machines. So kind of if Actinia wants to scale up we can say okay, you can order new machines by yourself and
After consumption means the finishing of the process the machines are destroyed in order to not generate further cost Then we need a load balancer So the incoming requests are coming here from through the API But you want that let's say the cloud resources are optimally used for that
there's a load balancer then sending stuff to the different workers and ideally Well, the data are visible anyway, but you also in case you have a heterogeneous You have heterogeneous cloud resources like these instances with different flavors
You want to send them to the right? Worker in order to be able to compute the job. Okay now how to control all this We are having JSON files here we have the rest API
So there are requests like get location So that is you can use curl or we have some other interface or the web Based system or maybe in the future also QGIS based one. You can call it from grass command line So different ways of retrieving information. You can query the system and ask. Okay, what data set are already there and
There could be job this for example the global in our system the global SRTM model Elevation model that is 300 something gigabyte geotiff file and In case you are working with elevation model
I think most of you will only be interested in a subset, but each of you in a different subset so the idea of cloud is we offer it once and then you can just operate on the area of interest and which which could be changed dynamically and Then as it is rest style you chain more stuff there. You say kind of zooming into
North Carolina that is our some was geosample data set what is inside? So we're in North Carolina here and then you see what data sets are there and you can go on and you can go further Look into the maps and you see there's already a render Endpoint, which means if you query the system and by the way, this is online reachable under actinia mundialis
Do you can go and play there's the demo user available now Then you can go into the dive into these data and also use them for computation Now user-defined processing in this case You don't retrieve but you send something to the system and say please do this and that is a post request you see over there
both requests and You say I want to compute the slope of Some map and I want the result as a geotiff, please and what's also possible by the way You can also give it a URL. It's not shown here because too long
But you can specify URL in this case. This data set will be retrieved first and then Computation being done on top of that or you intersect with data already there or you fetch different from different data sources and compute stuff and eventually you retrieve either vector file or raster file or you dump it into a
post-gis database or whatever you prefer and Through this in JSON style you can write custom process chains Already mentioned grass modules are there importer exporter is there and then you can also bring in your own Python scripts
And those can be whatever and if you say oh, but Python no idea I still have my good old 90th shared scripts. They work so nicely No need to rewrite them. You just wrap them into a Python script and hang them in and you are done So it's not that you have to rewrite everything but you can just Make it appear a Python script and the system is happy with that
We have also wrapped you find this on github and docker hub snap is our snap We made a docker image out of that by the way. It's a fraction of the original size there are some funny things in the original like full Java and so forth this can be heavily reduced and
Through that we build up the entire stack So, how does a curl request look like so this of course curl then demo user, please steal the password. It's public You sent a post there a process chain. It's only written like a variable there This is essentially a file adjacent file
Or maybe you put it into a variable up to you And this is then sent to this endpoint here and it will do in this case asynchronous processing That means with synchronous processing you say okay do that and I wait till you are done and it comes back to me But in case the job is something complex and it would run for several hours
you don't want to sit and block your terminal with that you use the asynchronous endpoint and then it is sent there and you have URL with a resource of status and you just ping it from time to time this you can automate of course If you have a web interface then it would
Notify you once it is done So both options are available Which means polling in this case so you get the status and once it is done you get the resource you are at back, which is the geotiff or Whatever it is, and then you can retrieve the map and you are done
What else is there and we have been implementing processing chains for Sentinel one and Sentinel two Data also for Landsat not written here There are endpoints like NDVI. So for example, you have a certain you're interested in a normalized differences
Vegetation index a very common index in Houston agriculture and elsewhere also to find green areas in urban In urban areas, you can use that you just say, okay I want to analyze in this area and maybe for the year 2018 from 1st of April to end of June
Search something seen with less than 1% cloud and do the NDVI So this is more or less one endpoint and then you say just these few metadata send it to the system and you get stuff back We have connectors to ESA API hub. That means there are the Sentinel data retrieved from one way
We are in discussion also because we are involved in the OpenEO project which was mentioned earlier To connect to the Diaz platforms that are Copernicus platforms for Sentinel processing The Amazon AWS and Google Cloud Storage we have also some deployments there
Advantage is of those the Sentinel data already unpacked there. You do not have to retrieve the entire full Zip file of one gigabyte size if you are only interested in two channels. Yeah, then you can switch to that provider You can see
It's flexible. Yeah, our idea is to not be locked into one single platform, but to have the possibility to Well to deploy it here and there and use the best so example here Sentinel to process is the endpoint compute NDVI and use this scene the scene
I got the scene name I got from somewhere but you can also search for it and then As before you can pull for the result and then you get the NDVI back and this you get like a screenshot So previews that you need to see what you have done plus the geotiff file as well, which is of course a bit larger
Okay, some more features. You can also write you can write to Google Cloud Storage You can write to your own if you deploy actinia Yourself could be your laptop even Then you can naturally write also there or to s3 buckets Then we have
added For the grass users here the possibility of grass Sorry often actinia command execution. That means you have one grass command You just write a CE in front of it Of course, you need to have the credentials and then the same command is sent to the cloud So not locally executed but in the cloud so you can play around
Prototype on your laptop and once you know you set the resolution This is one of the nice glass features to the original resolution and do the heavy computation in the cloud itself as Mentioned we have an open.io support there. We are one of the backend providers we will probably not fully implement everything but no back-end implements everything but the the relevant parts and
You find on github the related information also on open.io org site. That's a horizon European project by the way If you haven't been here this morning You can see in the video archive of today the related talks and then eventually something very interested
Interesting called actinia algebra. That is something to do massive computation in parallel since we are on cloud We also want to make good use of that and imagine you want to compute something of an entire country watersheds vegetation index
Run off whatever you can imagine as I mentioned can be GIS can be Earth observation Then all this stuff is parallelized and executed in much faster time Of course, you need to have some more resources for that so What's upcoming
We are almost through with implementation of process self description That means what you saw this is almost last like what you saw That we can see what data are there we also want to have what methods are there Yeah, a kind of catalog and if you want to Maybe then wrap around something like WPS style. Yeah, then you can make you