Implementing Copernicus services at the (NVE) with Airflow and actinia
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 266 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/66441 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSS4G Prizren Kosovo 2023158 / 266
10
17
23
44
45
46
47
48
49
50
52
53
80
84
85
91
110
116
129
148
164
167
169
173
174
181
182
183
186
187
199
202
204
206
209
215
241
248
265
00:00
Food energyService (economics)Water vaporNeuroinformatikAlgorithmProjective planeAuthorizationService (economics)Water vaporSpacetimeFood energyComputer animation
00:59
Web serviceProduct (business)Mathematical analysisService (economics)Food energyDigital rights managementContingency tablePlanningService (economics)BuildingSurfaceCartesian coordinate systemAreaDependent and independent variablesDivision (mathematics)Web pageContingency tableProjective planeFood energySatelliteWater vaporProduct (business)PlanningGeometryData managementDenial-of-service attackComputer animation
02:49
AreaMedical imagingProjective planePairwise comparisonDigital photographyArithmetic progressionWebsiteComputer animation
03:15
Computer networkPhysical systemDisintegrationEndliche ModelltheorieVirtual machineMessage passingMedical imagingArtificial neural networkAreaState observerMereologyForm (programming)Physical systemTask (computing)Data managementSatelliteNumberComputer fontField (computer science)Order (biology)InformationComputer animation
05:07
AreaComputer-generated imageryTemporal logicImage resolutionMusical ensembleDenial-of-service attackQuicksortMultiplication signLevel (video gaming)MathematicsAlgorithmAreaNeuroinformatikSatelliteMedical imagingSimilarity (geometry)Computer animation
06:29
Computer networkAlgorithmAuditory maskingNeuroinformatikPoint cloudAlgorithmLevel (video gaming)Relative riskFocus (optics)Fraction (mathematics)Product (business)Vector potentialFood energyContext awarenessForestDenial-of-service attackComputer animation
07:29
Series (mathematics)Covering spaceFraction (mathematics)SatelliteLine (geometry)Dot productMessage passingState observerDenial-of-service attackEndliche ModelltheorieComputer animation
08:06
Equivalence relationWater vaporMoistureFood energySatelliteVector potentialMathematicsCovering spaceSoftware testingPerformance appraisalBasis <Mathematik>Term (mathematics)Service (economics)Process (computing)ScalabilityWeb serviceSoftware developerArchitectureDecision theoryInformation securityDatabaseCodeDigital rights managementOpen sourceDatei-ServerData storage deviceGeometryZugriffskontrollePoint cloudInterface (computing)Link (knot theory)Streaming mediaPhysical systemSoftware frameworkMetadataModul <Datentyp>Grass (card game)DisintegrationFunction (mathematics)Focus (optics)SpacetimeChainEmailSoftwareComputerFile systemPlug-in (computing)Process (computing)Stack (abstract data type)StatisticsMultiplication signQuicksortPhysical systemDatabaseDecision theoryData managementTime seriesComputer architecturePoint cloudData warehouseInformation securityDifferent (Kate Ryan album)Representational state transferComputing platformChainPattern languageArithmetic progressionFront and back endsOpen setFigurate numberServer (computing)MetadataOpen sourceStreaming mediaComputer programSpacetimeGeometryBitParallel portCovering spaceScalabilityPoint (geometry)Web 2.0File formatMedical imagingOperator (mathematics)Interface (computing)SatelliteMathematicsEquivalence relationProduct (business)Projective planeCartesian coordinate systemSoftware maintenanceClient (computing)Web serviceService (economics)Grass (card game)Information technology consultingComponent-based software engineeringSelf-organizationCodeFrame problemCondition numberBasis <Mathematik>Focus (optics)Functional (mathematics)Latent heatSlide ruleWater vaporComputer animationDiagram
14:50
Modul <Datentyp>SoftwareGrass (card game)ComputerStatisticsWeb serviceComputing platformBroadcast programmingSystem programmingCore dumpGraph (mathematics)ChainProcess (computing)Perspective (visual)Service (economics)Term (mathematics)Error messageEvent horizonScheduling (computing)Task (computing)CoroutineDifferent (Kate Ryan album)Chemical equationInformation securityLocal ringGraph (mathematics)AerodynamicsCodeLine (geometry)Group actionData typeMathematical optimizationParallel computingPolygonWeb pageSet (mathematics)Process (computing)Data recoveryMultiplication signSingle-precision floating-point formatDatabaseScheduling (computing)Physical systemGraph (mathematics)Field (computer science)Task (computing)LoginRow (database)Computer fileOpen sourceModul <Datentyp>Web serviceContinuous integrationDynamical systemSoftwareFrame problemEndliche ModelltheorieTemplate (C++)Symbol tableProper mapResultantSurfaceDataflowWebsiteMusical ensembleCore dumpModule (mathematics)Parallel portComputing platformMathematical optimizationAreaParameter (computer programming)Projective planeMoment (mathematics)Combinational logicQuicksortAuthorizationAdditionDirection (geometry)Self-organizationCodeGeometryFunction (mathematics)UsabilityGrass (card game)Installation artGoodness of fitBridging (networking)Computer animation
21:34
Raw image formatPresentation of a groupComputer animation
Transcript: English(auto-generated)
00:08
Hello everybody, so it's my pleasure to be able to present to you how we at the Norwegian Water and Energy Directorate are currently implementing Copernicus services using Apache Airflow and Actinia.
00:25
And I'm presenting this on behalf of the whole IT team working on it, so they couldn't be here but I'm representing them. The whole work is based on a Copernicus project financed by the Norwegian Space Agency,
00:43
and we have science partners from the Norwegian Computing Centre, from NOR's research foundation, that deliver algorithms to us that we are implementing and we have the road authorities as a kind of stakeholder in the project. So I'd like to mention those. What you're going to hear about in this talk is an introduction into the Copernicus services project at NVE,
01:08
the topics that we're working on, and the Copernicus services infrastructure with the two main tools. Actinia, the OSGEO community project, and Apache Airflow.
01:21
But I can say that right away I will be scratching the surfaces only for those things, and if there is anybody who wants to discuss in more detail I'm happy to do that later on in the breaks or after the conference or the like, or during the discussion session.
01:43
So Copernicus services that we're working on are governed by NVE's responsibilities and the mandate. NVE is responsible for managing water and energy resources in Norway, and also for reducing risks of damages associated with landslides, flooding, avalanches.
02:02
And we also have a division that is working on supervision and contingency planning related to permissions to build wind power plants and the like. And the aim of this project is to further and develop products that have been built in precursor projects,
02:21
and to investigate what other areas of NVE's responsibilities this approach with remote sensing can be extended to. So we have like these six work packages in the project, data management, we have a work page on snow avalanches, snow and ice,
02:41
we have glaciers, glacier monitoring, flooding, and then future application which also includes usefulness of new satellites. But as mentioned that is mostly work in progress, so we're in the middle of the project. Here's an example of one of the topics that we're working on, that is snow avalanches, where we use Sentinel-1 data.
03:02
You see the Sentinel-1 image above here, and this is in comparison a photograph of the same avalanche site, where you have in green delineated the areas where the avalanche occurred. And there we are using Sentinel-1 imagery, we have the reference image from before the landslide,
03:23
and then we have an activity image from after the landslide, and then you can look at the change and detect the landslide in those images. And we're currently using a machine learning approach to identify the avalanche areas, and our science partner Norris is working on improving this by using neural networks and deep learning.
03:49
But we're not using those delineations of landslides for disaster management or that kind of things, but rather to evaluate the warning systems that we have in place.
04:02
So one of the important tasks of my colleagues is to put out warning messages to the general public about the risk for avalanches, and here you can see in the system how they judge the situation, like how high is the risk for avalanches to occur.
04:23
And here you see the observations from the field, like how that matched the warnings that they put out based on models that they use. And then we use the exposed, the detected avalanches from satellites in order to calibrate or adjust the models.
04:47
So the aim is to get the satellite and remote sensing information to improve the warning systems and get them into the warning systems. And you see, like for example here, that matched quite well where we had the highest actually observed risk for avalanches,
05:03
where also the largest number of avalanches were detected. We have a very similar technical approach for flooding or flooded area. Here the algorithm is developed from the Norwegian Computing Center.
05:21
The avalanche detection was developed by NORS, and that's very similar. Again, we have the reference image and the change, or the activity image, and then the change that you can use to identify flooding.
05:43
These workflows for like a flood detection or flooded area, they are only sort of triggered when there is an orange level of flood risk in an area. But it turns out that's not very straightforward to use actually in no way because of the steep terrain,
06:01
so floods that are occurring are often like quite short-lived, and having a satellite that passes over exactly in that time is very difficult. So, which is also why we are heavily relying on the CEMS. Amy is here watching the talk. So, there we need to supply with other satellites, which is why we then rely on CEMS to get better data.
06:26
And that's also used in more or less the same way. Other topics we're working on are, again, NORS and the Computing Center are providing algorithms, Computing Center working on fractional snow cover. So, that's important not only for the snow melt during the spring season and the related risk of flooding, but also for the energy potential.
06:48
So, the Computing Center developed deep learning algorithms to map fractional snow cover in a more accurate way, especially also with focus on snow in the forests, like the snow on the ground.
07:06
They are also working on improving the cloud mask for Sentinel-3, which is very important for the Norwegian context, where there's a lot of clouds and with a lot of snow that can be confused. So, that's one of the topics they are working on.
07:22
And we also have activities on lake ice and wet snow products. But this is how then the satellite imagery actually is used by the people that put out the warning for flooding. For example, here you can see the dots that are actually the satellite observations that we collected,
07:46
and the continuous lines are the modeled situations. So, then the people that are putting out warning messages about the risk of flooding to occur, they can look at and see what's actually happening in the catchment compared to what our model says.
08:08
And as mentioned, we have a work package that I happen to lead about future applications, where we would want to look at landslides, soil moisture, and slush flows. A snow water equivalent is an important topic.
08:20
Land cover changes for the guys working in supervision, and also the opportunities that new satellites will give us. And there's Roselle with Elbensar, one of the things we're looking at. But what became very apparent is that with now collecting all those initial initiatives,
08:41
we needed also to think through how we manage all the data. As all of you most likely know, the amount of data coming with the Copernicus program, it's like incredibly huge, and you can't collect and gather all the data. So, we had to think through a bit. And we also expect an increase in demand for the Copernicus services at NBE,
09:05
and we figure out that we need to be able to support flexible processing solutions and patterns, so we have to be a bit more scalable than we are right now. But we also have needs for operational and reliable long-term services, because the people that put out the warnings, they can't use data that is on and off.
09:25
So, it's also very important to have the solutions working stable and reliable. And as mentioned, we have these different science partners. They come with different ways to do things, but we have the burden to maintain all that.
09:41
So, we also see a need for standardizing workflows to reduce the maintenance burdens, and that was the point where we then decided we want to test and evaluate if Actinia, the OSGIO community project, and Apache Airflow can help us and use that as a basis for our Copernicus infrastructure.
10:05
But there are also some framing conditions. So, as mentioned, we gather and build upon various precursor projects with existing solutions and legacy code that we have to handle, and that is developed by different consultants
10:20
with different ways to do things, including job management solutions and so on. We have security and reliable requirements that actually increased thanks to Russia. So, also, we are a relatively small team with just a handful of people working on it, and not all of us are actually working on it full-time.
10:42
And there are some overarching architecture decisions that we just have to abide to. It's also that proprietary GIS solutions are the main platform for GIS stuff, and we have to use proprietary database management systems as our back end.
11:01
And that means that we have to be interoperable with those systems. And one of the also important aspects is that the users at the end, they don't want to have yet another dashboard or yet another tool for Copernicus. They want to get it delivered in the tools that they are using, and the main is xgeo.no. If you're interested, there's a lot of stuff of what we put out.
11:24
This is like the overarching concept figure of the platform. So, we use Ansible Playbooks to set up the servers, and we use Docker Compose to deploy Apache Airflow and Docker Swarm
11:40
to have a scalable setup of Actinia, and then Airflow can tell Actinia what to do and when, which data to get from ESA, the Norwegian Ground Segment, or Alaska Space Facility, and then produce these different Copernicus products that then will end up on the file system or in the databases.
12:04
So, the existing solutions for the image server that puts the data to xgeo can grab it and present it to the users where they used to use it. So, what is then Actinia?
12:22
That is one of the main components in our infrastructure. Is there anybody here working with Actinia actively? So then, I think it's in place to explain a little bit. Actinia is an open source REST API for scalable, distributed, and high performance processing of EO data.
12:42
It's mainly built around Grass GIS, but you can use in principle any tool like GDAL, ESA-SNAP, PDAL, and also other in-house or specific tools. It has a kind of integrated data warehouse concept that helps us organizing data.
13:01
It has lots of GIS and EO functionality that we can leverage with a special focus on time series processing. But we can also use it to build and add processing chains that can be configured using JSON, different ways to deploy it on premise or in the cloud.
13:22
Currently, we're working with in-house solutions, but using Actinia, based on Docker, we prepared ourselves to make the step out of our organization if the need occurs. It has solutions for job management and monitoring, quota management, so we can have different users
13:42
that get access to different things and that get permissions to do different things. So this is also very helpful for on-demand processing especially. It is modular with lots of accessible plugins, which is quite cool. So you can add stuff gradually as you need it.
14:02
So there are solutions for metadata, stack, statistics, and parallel processing. It's a work in progress. It has lots of different interfaces, like it can be accessed or combined with Open UI API, Python client that has QGIS client, and there are also sorts of work ongoing on web UI
14:23
at the North Carolina State University. And what's somewhat specific for Norway is that it actually can stream data from the Norwegian ground segment so we don't have to download all the Sentinel-2 data. We can just read it because there it's made available in netCDF format because the Metrological Institute
14:40
is hosting the data. So we do use Actinia mostly as the workhorse for that infrastructure. So I don't have many cool slides about how that is used because you just would see some JSON in an output. That's not very cool to show.
15:01
But I'd like to use the opportunity to highlight possible win-wins of actively actually using OSGO software. On the one hand, we are sort of by actively using it, contributing to stuff. We produced some new modules that we felt that we need to use in Actinia.
15:21
That's the seven modules at the moment to do different stuff. And we also have some commits to core Grass.js or GDAL and also Actinia. So we are contributing in one way but we also like leveraging not only the sort of
15:41
OSGO tools but also like spirits, concepts and way of doing things. So in a way we're able to just copy how things are done in OSGO communities and use that internally for how we do things with continuous integration with code quality checks and the like.
16:03
So that's very useful also internally. Here is an example of using Actinia outside of the Copernicus stuff in the alarm project. One of my colleagues set up this web service where it's probably hard to see. There is a small polygon drawn here by one of the colleagues in the field of a possible release area for an avalanche.
16:24
And then when the colleague in the field is supposed to assess the risk of that avalanche, that polygon is sent to Actinia and that runs an other frame model and returns the results. And then the colleague can see, okay, where is it, can you expect that this avalanche
16:41
to flow out and are there people at risk from that avalanche site? So that's just a side use of it. Then we have Apache Airflow, which is kind of telling Actinia what to do and when. And I don't know if other people here
17:02
that have used Apache Airflow before know about it. So that's actually a platform for programmatically author, schedule and monitor workflows. It's built around the concept of directed acyclic graphs. So you define the workflow as like do this and then that
17:22
and depending on what happened, continue here or there. It's used in addition to Actinia and supposed to replace in-house systems. I think I'll skip the why we use it because you will see that in the examples that I'm supposed to show.
17:41
So this is a screenshot of the start page when you have Airflow installed. You get a nice overview over the processes that are supposed to run, when they ran, if something went wrong. So the colleagues that are responsible for disaster recovery in our systems,
18:03
they can see immediately go, okay, here's something went wrong and they can go in and do something about it. You can filter the different workflows. You can have hundreds of workflows and you can filter either by user or by keyword or the like. You can search it.
18:21
So that's quite cool. And then you can go in to the single workflow and see if something went wrong. Go in and look at the logs. What happened actually, you get it immediately. It's all in one place. And here you see like for every workflow, when did it run? Lastly, so each column is like a run of the workflow
18:40
and each row is a task within that workflow and you see it failed exactly here and you can go in and restart this or you can restart this and all the future workflows. So that's very helpful for running stuff. And what's also a cool feature is
19:01
the scheduling is not only possible by time and the like. So people say it's cron on steroids but it also has this nice concept of data-based scheduling. So you see workflows have this single symbol and here is the data set symbol. So you define data sets and if this data set is updated
19:21
then all the other workflows are started that are defined to be dependent on this data set. So that's also quite cool. And in addition, you get a whole overview of how data is connected in your organization for all these kind of ETL tasks. So that's quite useful.
19:40
There's also this concept of dynamic DAX. So if you have like one workflow that you want to apply on a hundred of data sets, you don't have to like have this definition of workflow like a hundred times but you have a template and then you can use JSON files
20:00
to just insert the different parameters and then hundreds of workflows are created, inserted into the UI and monitored as I showed you before. So our conclusions and experiences from the work we set out to do here
20:22
is that we feel that Airflow and Actinia, they work quite well together also thanks to the modularity of Actinia. So modularity is in many ways an asset here. As I mentioned, like how you can split up workflows in Airflow that works quite well together with modularity.
20:44
It's very important to like have interoperable solutions. So as mentioned, we have the proprietary JS platform so that we can't get open source tools into our systems if there are not good bridges towards that.
21:01
So that's actually a very important thing but there's also like as mentioned, this was only the scratching the surface but there's still a lot to learn and a lot to do on our side. There are plenty of unused features both in Actinia and in Airflow that we didn't utilize or don't utilize yet
21:23
and especially the optimization and parallelization in the combination of the two tools is something that we have to learn more about. And that's it for the presentation. Thank you.