SMODERP2D Soil Erosion Model Entering Open Source Era with GPU-based Parallelization
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 295 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/43558 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Bucharest 201926 / 295
15
20
28
32
37
38
39
40
41
42
43
44
46
48
52
54
57
69
72
75
83
85
87
88
101
103
105
106
108
111
114
119
122
123
126
129
130
131
132
137
139
140
141
142
143
144
147
148
149
155
157
159
163
166
170
171
179
189
191
192
193
194
195
196
197
202
207
212
213
214
215
216
231
235
251
252
263
287
00:00
Mathematical modelOpen sourceGraphics processing unitFaculty (division)Civil engineeringParallel computingOpen setPresentation of a groupProjective planeMathematical modelComputer animation
00:37
Mathematical modelGraphics processing unitOpen sourceParallel computingAuthorizationProjective planePoint (geometry)Computer programmingMereologyFunctional (mathematics)Presentation of a groupView (database)Natural languageUniverse (mathematics)Process (computing)Computer font
02:01
CodeMathematical modelCalculationPredictionOpen sourceComa BerenicesTerm (mathematics)Water vaporSimulationUser profileFunction (mathematics)Conservation lawInformationCellular automatonMassKinematicsNichtlineares GleichungssystemGreen's functionMathematical modelData structureMeta elementBoom (sailing)Process (computing)Vector graphicsPolygonComputer networkDiscrete element methodElectric currentComputerParallel computingSoftwareGrass (card game)MereologyWordPoint (geometry)Mathematical modelVisualization (computer graphics)Code refactoringSlide ruleSoftwareSoftware developerTask (computing)2 (number)MassProjective planeOpen sourceComputing platformRoutingProcess (computing)Grass (card game)Mathematical modelRaster graphicsChemical equationSound effectRevision controlFunctional (mathematics)FamilyEvent horizonView (database)MeasurementStreaming mediaProfil (magazine)Function (mathematics)Key (cryptography)CodeNeuroinformatikComputer programmingOrder (biology)Computer simulationElectronic data processingCategory of beingTerm (mathematics)Mortality rateFrequencyProcess (computing)Natural languagePolygonDirectory serviceInformationWater vaporDataflowAreaLine (geometry)CoroutineResultantMilitary baseLibrary (computing)Presentation of a groupNormal (geometry)Flow separationMetropolitan area networkForm (programming)Insertion lossComputer animation
10:18
Mathematical modelGrass (card game)Process (computing)Computer configurationComputing platformGamma functionMenu (computing)Recurrence relationInternet service providerEntire functionExtension (kinesiology)Line (geometry)Asynchronous Transfer ModePlug-in (computing)Repository (publishing)outputParameter (computer programming)Function (mathematics)Directory serviceElectronic visual displayError messageRaster graphicsUniform resource locatorStructural loadComputing platformModule (mathematics)Extension (kinesiology)Revision controlNeuroinformatikFunctional (mathematics)Vector spaceCodeNetwork topologyOpen source2 (number)PreprocessorComputing platformMathematical modelPlug-in (computing)Different (Kate Ryan album)outputElectric generatorStandard deviationInternet service providerSimilarity (geometry)MereologyLibrary (computing)Arc (geometry)Ocean currentParameter (computer programming)Projective planeIntegrated development environmentWindowLevel (video gaming)Multiplication signPoint (geometry)Installation artCASE <Informatik>Prime idealMetreSkewnessGodArithmetic progressionGrass (card game)Online helpProcess (computing)CuboidWebsiteTaylor seriesArithmetic meanOperator (mathematics)Virtual machineReading (process)Basis <Mathematik>Machine visionComputer configurationGroup actionFamilyComputer programmingSoftwareDataflowMathematical modelElectronic data processingComputer animation
18:35
Graphics processing unitBefehlsprozessorMatrix (mathematics)Loop (music)Operations researchIndependence (probability theory)MathematicsGraph (mathematics)Parallel computingRead-only memoryTable (information)Graphical user interfaceCore dumpInclusion mapProcess (computing)Execution unitExplosionNeuroinformatikOperator (mathematics)CodeHeat transferLoop (music)Graph (mathematics)PreprocessorResultantLevel (video gaming)Multiplication signLimit (category theory)Beta functionAxiom of choiceAreaTesselationTraffic reportingAlgorithmInterior (topology)Table (information)1 (number)Branch (computer science)Revision controlPixelProgrammschleifeSemiconductor memoryTensorGraph (mathematics)Variable (mathematics)Matrix (mathematics)Graphics processing unitBefehlsprozessorMultiplicationThread (computing)Different (Kate Ryan album)Core dumpMathematical modelProjective planePlug-in (computing)WordPoint (geometry)Random accessComputer architectureSingle-precision floating-point formatDataflowMetropolitan area networkRule of inferenceStructural loadInsertion lossProcess (computing)Position operatorSpecial unitary groupUniverse (mathematics)Coefficient of determinationMetric system10 (number)ForestCross-correlationTerm (mathematics)AdditionAutomatic differentiationGastropod shellGroup actionComputer animation
26:52
Presentation of a groupCASE <Informatik>Control flowEmailVideo gameOperator (mathematics)Lecture/Conference
Transcript: English(auto-generated)
00:07
So, welcome. As a chairman, it will be a schizophrenic feeling for me to give the floor to Martin Landa, so I will try my best. This is the last presentation of this session. I will try to close the door.
00:29
So, I'm going to present a project which is called SmartApps2D. It's one of the erosion models available in the market. I will be presenting the first part, and the second part will be presented by my colleague, Ondřej Pešek.
00:48
But, I mean, real credits must be acknowledged to other colleagues of ours, Jakub Jerabek and Peter Kafka, who are true authors of the project.
01:02
They are hydrologists. This is the important point. Me and my colleague were responsible for technical part, so from programming point of view, and we were introducing some new GIS functionality. We are from Department of Geomatics at the Czech Technical University in Prague.
01:25
So, that's the important point. We are not hydrologists. We are programmers, or we are from Department of Geomatics. So, if you want to have real fun, try to ask us some hydrology related questions after the presentation.
01:44
I can promise that you will get hardly any answer from us. You can ask us about programming, about GIS, but not about hydrology. You can ask, of course. So, this is small introduction. So, first of all, what is MODERp2D? I will try to explain.
02:04
So, this is one of erosion models. It's physically based. It's important to mention. It's a soil, it's a surface runoff erosion model, which is designed to compute or to do computation in episodes.
02:23
It means that, I mean from erosion point of view, the big erosion events are usually caused by huge rainfall, and it's called episodes from the point of, from the terminology point of view. The model is typically used to help designing measures which could decrease or prevent big erosion events in a specified area of interest.
02:58
The project is written in Python. It's open source project available since 2018 on GitHub,
03:10
and it's licensed under GNU GPL version 3. So, this is the most important point about the project.
03:21
About the history, it's a long-term running project, which I mean the development is basically driven by my colleagues from Department of Landscape Water Conservation Department at the Czech Technical University.
03:42
We, as people from Department of Geomatics, we joined the team like two years ago just to help with the programming part of the project. As far as I know, the project was developed many, many years ago, or the development started many, many years ago.
04:03
Originally, it was developed as a surface runoff model simulated by profile model, so it means the 1D version. It was called Smoder 1D. Then it was later, it was redesigned using spatially distributed model, so that's how Smoder 2D was born.
04:25
It was originally written in Fortran, so you can guess how old the software could be. They started development like 10, maybe 15 years ago, maybe longer. Then it was rewritten to Visual Basic, and currently it's written in Python.
04:47
So, what I will present? I will present recent work, which has been done. We joined the team last year, and I will present some major refactoring and some functionality improvements, which have been done recently.
05:14
Before that, just one slide still about Smoder 2D model. It's a model which belongs to the family so-called GIS-based hydrological models.
05:29
It means that GIS software is used for geospatial data processing or processing.
05:40
It's based on raster cell-by-cell mass balance information, which includes some key hydrological features. It's not the right word. I mean effective precipitation, surface runoff, and stream network routing.
06:09
Stream network routing is done line-by-line using user-defined polyline layer. If you really want to know something more about the project from hydrological point
06:23
of view, there is a related paper which should be published recently, or soon, sorry. Okay, so, you see that nice ASCII logo? That's something what I really like.
06:45
And how Smoder works? Basically, the workflow contains three major steps. At the beginning, the data are preprocessed, and this is the important point from
07:01
my point of view, because in this part, GIS software plays the major role. Basically, there are two important parts. First, the soil properties are assigned to each polygon, which defines the area of interest.
07:22
And then, it's important to assign order to watercourse network reaches, and other tasks. I mean, it's quite complex. In the second part, model computation starts. It's important to mention that the
07:46
computation is done using arrays, or better to say, the computation is array-based. We are using well-known NumPy library, which is, I mean, not surprising. And in the last part, resultant data are
08:06
stored in output directory, and this part is done by the same GIS package which is used in the first part. So, basic workflow, and this is maybe the most important slide. This is the overview of the key points I would like to present here.
08:34
So, ongoing development. So, we started, or the main goal was to do major refactoring, because the code
08:41
base was quite complicated. It was not well written, and so the major goal was to do some refactoring. Maybe the most important point was to separate, clearly, the data preparation package from model computation package. That was a very
09:07
important point, or crucial point, because thanks to the clear separation, we were able to introduce support for other GIS packages. I mean, originally, only proprietary SRA ArcGIS platform was support, so that was our goal to
09:27
introduce support for other GIS packages, namely, two widely used open source platforms, GRASS GIS and QGIS. So, that was our major goal, or yeah, major goal, let's say. And
09:47
at the end of presentation, my colleague will present some experiments related to parallelization. Okay. So, that's our important items we are going to present. Maybe from, I mean, on the right side, you can see the
10:12
diagram, which is, I mean, you can clearly see that the data preparation package is basically depending on the given or specified GIS software.
10:25
And then, I mean, the second part, model computation is basically done by NumPy. Okay. What I'm going to present, some GIS tools we designed and
10:41
introduced into this model project. These tools are doing basically the similar things. There are three options. Basically, you can perform data preparation only, so the first step, or you can perform
11:01
only the second step, the model computation, or quite, I mean, usually, you want to perform the whole full workflow. So, you want to start with data pre-processing and then model computation, and usually, you are interested about resultant data.
11:20
Could it be? This part, I already mentioned that currently, Smodep is supporting three different GIS platforms. Originally, only version 10 was supported. We introduced support for version pro, the new
11:42
generation of this software and other two open source platforms like RAS and QGIS. So, first of all, I would start with ArcGIS because it was originally supported, so it makes sense to start with that.
12:03
So, there is a standard ArcGIS toolbox which allows you to run the data preparation part or the model computation or the whole workflow. You need to fill some input parameters. The important thing is that the
12:27
data preparation part is quite, I mean, not surprisingly performed by native ArcGIS tools. Why not? Because, I mean, the whole project is written in Python. The GIS functionality is called via Python library ArcPy well-known.
12:49
Because we wanted to support also funny ArcGIS pro version, we introduced support for Python 3, which, I mean, makes sense.
13:01
I mean, if you are programming in Python, you know that Python 2 will be almost dead in a few months officially. It will be not supported, but still, I mean, it's used in ArcGIS 10, so it will remain for a while. All the code including Arc toolbox is available on GitHub.
13:27
So, now let's switch to the second supported platform which is GRAS GIS. So, it means that the new tool for GRAS was developed.
13:44
In GRAS terminology, it's a GRAS module or GRAS add-ons. Add-ons is like a plug-in, something like that. The tool is called Ars Moderp and it do the same as the ArcGIS toolbox. It allows you to perform all the steps or specified steps.
14:04
It's possible to install it as similar add-ons tools in GRAS using G extension module. And, of course, the data preparation, the GIS part is performed by GRAS provider by native GRAS tools using PyGrass library.
14:24
PyGrass library was used because there are two basically Python libraries you can use in GRAS. PyGrass allows you to run existing GRAS modules, but you can also perform some GIS computation using Python AP.
14:48
So, on the right side, you can see part of the code. I mean, the vector map is open and you are doing some tricks using Python AP.
15:03
So, you don't need to call only existing GRAS modules or tools. The important point is that the GRAS tool assumes that the GRAS is running in Python 3 environment, which is, I mean, if you are using GRAS, you are surprised because, I mean,
15:23
still GRAS, the latest version 6.7.6 is operating still in Python 2 environment. So, basically, this tool requires version 7.8, which is the first version of GRAS which is running in Python 3 environment.
15:41
It's quite fresh stuff. First release candidate of this GRAS version is available in just a few days. Okay, so GRAS. QGIS. So, this is the last supported platform I will be speaking about.
16:04
So, the new QGIS plugin was developed. It was developed for QGIS 3 version. Quite, I mean, it's understandable because it's a current long-term release version.
16:21
It's also available from GitHub. Of course, I mean, when the final release will be available, we are planning to upload it to official QGIS repository, but it will take some time. Data preparation. Data preparation is done surprisingly not by QGIS but by GRAS.
16:43
So, it means that QGIS plugin depends on GRAS software, which is, I mean, not so big problem because usually, especially for Windows users, if you install QGIS on your Windows machine, you will get also GRAS installed. So, it's not so big deal. No problem.
17:05
So, I mean, the data preparation part is performed by GRAS and not by QGIS. Because QGIS 3 is running in Python 3 environment, then, I mean, we are also requiring at least
17:23
GRAS case 7.8. So, currently, it's quite tricky to establish working environment because GRAS 7.8 hasn't been released yet, but it will happen. It will happen quite soon. On the right side, you can see basic workflow, how QGIS plugin works.
17:48
So, basically, the input parameters are loaded. I mean, the input parameters which were given by the user are loaded. Then, I mean, because for, I mean, to perform competition in GRAS,
18:06
you need to have some GRAS location available. GRAS location is something like project, if you don't know GRAS, let's call it project. So, GRAS creates a temporary location somewhere.
18:21
It loads the data, imports or link, depends on, I mean, it depends, rasters are linked, vector data are imported because of the topology cleaning, let's say. And then, I mean, the plugin can run. It can do the same thing as ArcGIS toolbox.
18:47
It will do data pre-processing and start model computation using NumPy. And there's the last step. Of course, I mean, as a QGIS user, you would like to see
19:02
the result and data in a map canvas. So, must be done somehow. And now, it's a time, I would like to introduce my colleague, Andrei Pechek.
19:20
He's studying for PhD at the Czech Technical University and he spent some time experimenting with TensorFlow and it's quite fancy stuff. So, he will explain you what he did.
19:41
So, my role in this project was to make all the computations faster because the speed limits of small-derp were one of the most crucial aspects of the algorithm. So, to reduce the computation time, we have decided to parallelize the computations
20:08
after the data pre-processing or preparation part. And to do this, the first step was to change the loop-based computations into matrix-based ones
20:27
because in the old code, even though it's using NumPy, there is a NumPy array for every pixel
20:41
and because the code is written in Python, it is extremely slow to loop through every pixel. So, it is much faster to use matrices for this. And for this, we have decided to use TensorFlow, also because TensorFlow supports
21:04
parallelization for both CPU and GPUs. So, when a user doesn't have access to TensorFlow, to GPUs, he can still run the parallelized code on CPUs.
21:20
And sometimes there is still NumPy because sometimes it wasn't possible to avoid the loops and it is incomparably faster to loop through a NumPy array instead of a TensorFlow tensor.
21:42
The strength of TensorFlow is its usage of so-called computation graphs. It looks like this, so you can imagine a mathematical operation. You divide it into variables and operations and you can see that for some operations,
22:07
you have prerequisites. You need to know all the variables coming into this operation. But sometimes, for example, when you have this multiplication and the addition, they are not depending on each other.
22:23
So, there is no reason to make them run after each other. They can be done in parallel. And the nice thing of TensorFlow is that it firstly creates this computation graph.
22:40
And then when there is something not depending on other stuff, it directly sends these computations to different cores for CPUs or different threads for GPUs. Here are some results of this parallelized branch.
23:05
It is called experimental because not everything in the code was rewritten, not yet. And also because I want to show you that not every time it makes sense. Because, for example, the word GPU, it's like a magic word nowadays
23:26
and everyone who wants to look like a good nerd is telling that he's going to run his code on GPUs. But beware because I don't want to sell your dreams down the river,
23:44
but it doesn't make sense every time. As you can see, this is the important table. The other one is just explaining these architectures used. And so, I would like to show the last column,
24:03
which is showing that the parallelized branch is much faster than the single CPU one. It's almost like two times faster. And so, this is the column showing that the parallelization works.
24:24
And this is the column showing that the parallelization doesn't work. Because for extremely small data, the parallelized version is 10 times or even 20 times slower. And the crucial point for this is the graph initialization of TensorFlow
24:47
because it's taking some time. So, if the computation itself can be really fast, then it doesn't make any sense to spend some time for the graph initialization.
25:05
And also, it is even much slower for GPUs because the transfer of data between your random access memory and the GPU inner memory is also taking some time.
25:21
So, this is just to show the reason why it will be always just a branch and not the master version. Because sometimes it makes sense, especially if you are working with bigger data. You can save a lot of time by using the parallelized branch.
25:43
But if you are working just with a small area, just forget about it. Forget about the fact that everywhere you have seen that GPU is the right choice. Because we are running out of time, just a few words.
26:04
We are also experimenting with CPU-based parallelization, which is, I mean, the key idea is that you will split the area into irregular tiles based on sub-catchments, but I mean, maybe we can skip it for now
26:23
because we are running out of time. So, thanks for attention. First of all, you can follow us on GitHub. You can enjoy filling some bug reports. We are close to some release candidates.
26:40
I hope during autumn or in winter we will release some release candidates like beta version and the final version I hope will be released something like in the beginning of the next year. We tried to make our presentation as long as possible to avoid questions,
27:03
as you can see. It was a joke. Please, if you have any questions, you can ask, so please. And you know the lunch is waiting for you. Okay, do you have some questions? If you have, please, I mean, in the case that you are a hydrologist,
27:25
feel free to contact my colleagues, Peter Kafka and Jakub Jarabek. They will be happy to discuss with you. They like to discuss, let's say. So, do you have some questions?
27:45
You can ask, but you will just hear nothing. Or maybe you can try and I can ask the colleagues. Maybe we will be able to answer. What kind of rain data do you have? Sorry?
28:00
What kind of rain data do you import? The data? Rain data. Rain data. Well, there are some, yeah, maybe during a break I can show you some examples. Maybe it's better. And feel free to contact us by mail and unfortunately I'm not able to answer as I promised.
28:29
Okay, another question which I cannot answer. Thanks a lot for attention and enjoy lunch.