We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

SMODERP2D Soil Erosion Model Entering Open Source Era with GPU-based Parallelization

00:00

Formal Metadata

Title
SMODERP2D Soil Erosion Model Entering Open Source Era with GPU-based Parallelization
Title of Series
Number of Parts
295
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
SMODERP2D is a runoff-soil erosion physically-based distributed episodic model used for calculation and prediction processes at agricultural areas and small watersheds. The core of the model is a raster based cell-by-cell mass balance calculation which includes the key hydrological processes, such as effective precipitation, surface runoff and stream network routing. Effective precipitation, the forcing of the runoff and erosion processes, is reduced by a surface retention and an infiltration. Surface runoff consists of two components: slower sheet and concentrated rapid rill flow. Stream network routing is performed line-by-line in user predefined polyline layer. SMODERP is a long-term running project driven by the Department of Landscape Water Conservation at the Czech Technical University in Prague. At the beginning SMODERP has been developed as a surface runoff simulated by profile model (1D). Later the model has been redesigned using spatially distributed method. This version is named SMODERP2D. Ongoing development (https://github.com/storm-fsv-cvut/smoderp2d) is focused on obtaining parameters of the hydrological models, incorporating new infiltration and flow routing routines, and conceptualization of a rill flow and rill development. The model belongs to a family of so called GIS-based hydrological models utilizing capabilities of GIS software for geodata processing. Importantly, the SMODERP2D project is currently entering the open source world. Originally the model could be run only in proprietary Esri ArcGIS platform. A new version of the model presented by this contribution adds support for two key open source GIS platforms, GRASS GIS and QGIS. A newly developed GRASS module and QGIS plugin significantly increases accessibility of the SMODERP2D model for research purposes and also for engineering practice. Middle scale distributed hydrological models often encounter with a high computation costs and long model runtime. Long runtime is caused by high resolution input data which is easily available nowadays. The project also includes an experimental version of the SMODERP2D model enabling the parallelization of computations. This parallelization is done using TensorFlow, and its goal is to decrease the time needed for its run. It is supported by both CPU and GPU. Parallelization of computations is an important step towards providing SMODERP2D web processing services in order to allow quick and easy integration to highly specialized platforms such as Atlas Ltd.
Keywords
129
131
137
139
Thumbnail
28:17
Mathematical modelOpen sourceGraphics processing unitFaculty (division)Civil engineeringParallel computingOpen setPresentation of a groupProjective planeMathematical modelComputer animation
Mathematical modelGraphics processing unitOpen sourceParallel computingAuthorizationProjective planePoint (geometry)Computer programmingMereologyFunctional (mathematics)Presentation of a groupView (database)Natural languageUniverse (mathematics)Process (computing)Computer font
CodeMathematical modelCalculationPredictionOpen sourceComa BerenicesTerm (mathematics)Water vaporSimulationUser profileFunction (mathematics)Conservation lawInformationCellular automatonMassKinematicsNichtlineares GleichungssystemGreen's functionMathematical modelData structureMeta elementBoom (sailing)Process (computing)Vector graphicsPolygonComputer networkDiscrete element methodElectric currentComputerParallel computingSoftwareGrass (card game)MereologyWordPoint (geometry)Mathematical modelVisualization (computer graphics)Code refactoringSlide ruleSoftwareSoftware developerTask (computing)2 (number)MassProjective planeOpen sourceComputing platformRoutingProcess (computing)Grass (card game)Mathematical modelRaster graphicsChemical equationSound effectRevision controlFunctional (mathematics)FamilyEvent horizonView (database)MeasurementStreaming mediaProfil (magazine)Function (mathematics)Key (cryptography)CodeNeuroinformatikComputer programmingOrder (biology)Computer simulationElectronic data processingCategory of beingTerm (mathematics)Mortality rateFrequencyProcess (computing)Natural languagePolygonDirectory serviceInformationWater vaporDataflowAreaLine (geometry)CoroutineResultantMilitary baseLibrary (computing)Presentation of a groupNormal (geometry)Flow separationMetropolitan area networkForm (programming)Insertion lossComputer animation
Mathematical modelGrass (card game)Process (computing)Computer configurationComputing platformGamma functionMenu (computing)Recurrence relationInternet service providerEntire functionExtension (kinesiology)Line (geometry)Asynchronous Transfer ModePlug-in (computing)Repository (publishing)outputParameter (computer programming)Function (mathematics)Directory serviceElectronic visual displayError messageRaster graphicsUniform resource locatorStructural loadComputing platformModule (mathematics)Extension (kinesiology)Revision controlNeuroinformatikFunctional (mathematics)Vector spaceCodeNetwork topologyOpen source2 (number)PreprocessorComputing platformMathematical modelPlug-in (computing)Different (Kate Ryan album)outputElectric generatorStandard deviationInternet service providerSimilarity (geometry)MereologyLibrary (computing)Arc (geometry)Ocean currentParameter (computer programming)Projective planeIntegrated development environmentWindowLevel (video gaming)Multiplication signPoint (geometry)Installation artCASE <Informatik>Prime idealMetreSkewnessGodArithmetic progressionGrass (card game)Online helpProcess (computing)CuboidWebsiteTaylor seriesArithmetic meanOperator (mathematics)Virtual machineReading (process)Basis <Mathematik>Machine visionComputer configurationGroup actionFamilyComputer programmingSoftwareDataflowMathematical modelElectronic data processingComputer animation
Graphics processing unitBefehlsprozessorMatrix (mathematics)Loop (music)Operations researchIndependence (probability theory)MathematicsGraph (mathematics)Parallel computingRead-only memoryTable (information)Graphical user interfaceCore dumpInclusion mapProcess (computing)Execution unitExplosionNeuroinformatikOperator (mathematics)CodeHeat transferLoop (music)Graph (mathematics)PreprocessorResultantLevel (video gaming)Multiplication signLimit (category theory)Beta functionAxiom of choiceAreaTesselationTraffic reportingAlgorithmInterior (topology)Table (information)1 (number)Branch (computer science)Revision controlPixelProgrammschleifeSemiconductor memoryTensorGraph (mathematics)Variable (mathematics)Matrix (mathematics)Graphics processing unitBefehlsprozessorMultiplicationThread (computing)Different (Kate Ryan album)Core dumpMathematical modelProjective planePlug-in (computing)WordPoint (geometry)Random accessComputer architectureSingle-precision floating-point formatDataflowMetropolitan area networkRule of inferenceStructural loadInsertion lossProcess (computing)Position operatorSpecial unitary groupUniverse (mathematics)Coefficient of determinationMetric system10 (number)ForestCross-correlationTerm (mathematics)AdditionAutomatic differentiationGastropod shellGroup actionComputer animation
Presentation of a groupCASE <Informatik>Control flowEmailVideo gameOperator (mathematics)Lecture/Conference
Transcript: English(auto-generated)
So, welcome. As a chairman, it will be a schizophrenic feeling for me to give the floor to Martin Landa, so I will try my best. This is the last presentation of this session. I will try to close the door.
So, I'm going to present a project which is called SmartApps2D. It's one of the erosion models available in the market. I will be presenting the first part, and the second part will be presented by my colleague, Ondřej Pešek.
But, I mean, real credits must be acknowledged to other colleagues of ours, Jakub Jerabek and Peter Kafka, who are true authors of the project.
They are hydrologists. This is the important point. Me and my colleague were responsible for technical part, so from programming point of view, and we were introducing some new GIS functionality. We are from Department of Geomatics at the Czech Technical University in Prague.
So, that's the important point. We are not hydrologists. We are programmers, or we are from Department of Geomatics. So, if you want to have real fun, try to ask us some hydrology related questions after the presentation.
I can promise that you will get hardly any answer from us. You can ask us about programming, about GIS, but not about hydrology. You can ask, of course. So, this is small introduction. So, first of all, what is MODERp2D? I will try to explain.
So, this is one of erosion models. It's physically based. It's important to mention. It's a soil, it's a surface runoff erosion model, which is designed to compute or to do computation in episodes.
It means that, I mean from erosion point of view, the big erosion events are usually caused by huge rainfall, and it's called episodes from the point of, from the terminology point of view. The model is typically used to help designing measures which could decrease or prevent big erosion events in a specified area of interest.
The project is written in Python. It's open source project available since 2018 on GitHub,
and it's licensed under GNU GPL version 3. So, this is the most important point about the project.
About the history, it's a long-term running project, which I mean the development is basically driven by my colleagues from Department of Landscape Water Conservation Department at the Czech Technical University.
We, as people from Department of Geomatics, we joined the team like two years ago just to help with the programming part of the project. As far as I know, the project was developed many, many years ago, or the development started many, many years ago.
Originally, it was developed as a surface runoff model simulated by profile model, so it means the 1D version. It was called Smoder 1D. Then it was later, it was redesigned using spatially distributed model, so that's how Smoder 2D was born.
It was originally written in Fortran, so you can guess how old the software could be. They started development like 10, maybe 15 years ago, maybe longer. Then it was rewritten to Visual Basic, and currently it's written in Python.
So, what I will present? I will present recent work, which has been done. We joined the team last year, and I will present some major refactoring and some functionality improvements, which have been done recently.
Before that, just one slide still about Smoder 2D model. It's a model which belongs to the family so-called GIS-based hydrological models.
It means that GIS software is used for geospatial data processing or processing.
It's based on raster cell-by-cell mass balance information, which includes some key hydrological features. It's not the right word. I mean effective precipitation, surface runoff, and stream network routing.
Stream network routing is done line-by-line using user-defined polyline layer. If you really want to know something more about the project from hydrological point
of view, there is a related paper which should be published recently, or soon, sorry. Okay, so, you see that nice ASCII logo? That's something what I really like.
And how Smoder works? Basically, the workflow contains three major steps. At the beginning, the data are preprocessed, and this is the important point from
my point of view, because in this part, GIS software plays the major role. Basically, there are two important parts. First, the soil properties are assigned to each polygon, which defines the area of interest.
And then, it's important to assign order to watercourse network reaches, and other tasks. I mean, it's quite complex. In the second part, model computation starts. It's important to mention that the
computation is done using arrays, or better to say, the computation is array-based. We are using well-known NumPy library, which is, I mean, not surprising. And in the last part, resultant data are
stored in output directory, and this part is done by the same GIS package which is used in the first part. So, basic workflow, and this is maybe the most important slide. This is the overview of the key points I would like to present here.
So, ongoing development. So, we started, or the main goal was to do major refactoring, because the code
base was quite complicated. It was not well written, and so the major goal was to do some refactoring. Maybe the most important point was to separate, clearly, the data preparation package from model computation package. That was a very
important point, or crucial point, because thanks to the clear separation, we were able to introduce support for other GIS packages. I mean, originally, only proprietary SRA ArcGIS platform was support, so that was our goal to
introduce support for other GIS packages, namely, two widely used open source platforms, GRASS GIS and QGIS. So, that was our major goal, or yeah, major goal, let's say. And
at the end of presentation, my colleague will present some experiments related to parallelization. Okay. So, that's our important items we are going to present. Maybe from, I mean, on the right side, you can see the
diagram, which is, I mean, you can clearly see that the data preparation package is basically depending on the given or specified GIS software.
And then, I mean, the second part, model computation is basically done by NumPy. Okay. What I'm going to present, some GIS tools we designed and
introduced into this model project. These tools are doing basically the similar things. There are three options. Basically, you can perform data preparation only, so the first step, or you can perform
only the second step, the model computation, or quite, I mean, usually, you want to perform the whole full workflow. So, you want to start with data pre-processing and then model computation, and usually, you are interested about resultant data.
Could it be? This part, I already mentioned that currently, Smodep is supporting three different GIS platforms. Originally, only version 10 was supported. We introduced support for version pro, the new
generation of this software and other two open source platforms like RAS and QGIS. So, first of all, I would start with ArcGIS because it was originally supported, so it makes sense to start with that.
So, there is a standard ArcGIS toolbox which allows you to run the data preparation part or the model computation or the whole workflow. You need to fill some input parameters. The important thing is that the
data preparation part is quite, I mean, not surprisingly performed by native ArcGIS tools. Why not? Because, I mean, the whole project is written in Python. The GIS functionality is called via Python library ArcPy well-known.
Because we wanted to support also funny ArcGIS pro version, we introduced support for Python 3, which, I mean, makes sense.
I mean, if you are programming in Python, you know that Python 2 will be almost dead in a few months officially. It will be not supported, but still, I mean, it's used in ArcGIS 10, so it will remain for a while. All the code including Arc toolbox is available on GitHub.
So, now let's switch to the second supported platform which is GRAS GIS. So, it means that the new tool for GRAS was developed.
In GRAS terminology, it's a GRAS module or GRAS add-ons. Add-ons is like a plug-in, something like that. The tool is called Ars Moderp and it do the same as the ArcGIS toolbox. It allows you to perform all the steps or specified steps.
It's possible to install it as similar add-ons tools in GRAS using G extension module. And, of course, the data preparation, the GIS part is performed by GRAS provider by native GRAS tools using PyGrass library.
PyGrass library was used because there are two basically Python libraries you can use in GRAS. PyGrass allows you to run existing GRAS modules, but you can also perform some GIS computation using Python AP.
So, on the right side, you can see part of the code. I mean, the vector map is open and you are doing some tricks using Python AP.
So, you don't need to call only existing GRAS modules or tools. The important point is that the GRAS tool assumes that the GRAS is running in Python 3 environment, which is, I mean, if you are using GRAS, you are surprised because, I mean,
still GRAS, the latest version 6.7.6 is operating still in Python 2 environment. So, basically, this tool requires version 7.8, which is the first version of GRAS which is running in Python 3 environment.
It's quite fresh stuff. First release candidate of this GRAS version is available in just a few days. Okay, so GRAS. QGIS. So, this is the last supported platform I will be speaking about.
So, the new QGIS plugin was developed. It was developed for QGIS 3 version. Quite, I mean, it's understandable because it's a current long-term release version.
It's also available from GitHub. Of course, I mean, when the final release will be available, we are planning to upload it to official QGIS repository, but it will take some time. Data preparation. Data preparation is done surprisingly not by QGIS but by GRAS.
So, it means that QGIS plugin depends on GRAS software, which is, I mean, not so big problem because usually, especially for Windows users, if you install QGIS on your Windows machine, you will get also GRAS installed. So, it's not so big deal. No problem.
So, I mean, the data preparation part is performed by GRAS and not by QGIS. Because QGIS 3 is running in Python 3 environment, then, I mean, we are also requiring at least
GRAS case 7.8. So, currently, it's quite tricky to establish working environment because GRAS 7.8 hasn't been released yet, but it will happen. It will happen quite soon. On the right side, you can see basic workflow, how QGIS plugin works.
So, basically, the input parameters are loaded. I mean, the input parameters which were given by the user are loaded. Then, I mean, because for, I mean, to perform competition in GRAS,
you need to have some GRAS location available. GRAS location is something like project, if you don't know GRAS, let's call it project. So, GRAS creates a temporary location somewhere.
It loads the data, imports or link, depends on, I mean, it depends, rasters are linked, vector data are imported because of the topology cleaning, let's say. And then, I mean, the plugin can run. It can do the same thing as ArcGIS toolbox.
It will do data pre-processing and start model computation using NumPy. And there's the last step. Of course, I mean, as a QGIS user, you would like to see
the result and data in a map canvas. So, must be done somehow. And now, it's a time, I would like to introduce my colleague, Andrei Pechek.
He's studying for PhD at the Czech Technical University and he spent some time experimenting with TensorFlow and it's quite fancy stuff. So, he will explain you what he did.
So, my role in this project was to make all the computations faster because the speed limits of small-derp were one of the most crucial aspects of the algorithm. So, to reduce the computation time, we have decided to parallelize the computations
after the data pre-processing or preparation part. And to do this, the first step was to change the loop-based computations into matrix-based ones
because in the old code, even though it's using NumPy, there is a NumPy array for every pixel
and because the code is written in Python, it is extremely slow to loop through every pixel. So, it is much faster to use matrices for this. And for this, we have decided to use TensorFlow, also because TensorFlow supports
parallelization for both CPU and GPUs. So, when a user doesn't have access to TensorFlow, to GPUs, he can still run the parallelized code on CPUs.
And sometimes there is still NumPy because sometimes it wasn't possible to avoid the loops and it is incomparably faster to loop through a NumPy array instead of a TensorFlow tensor.
The strength of TensorFlow is its usage of so-called computation graphs. It looks like this, so you can imagine a mathematical operation. You divide it into variables and operations and you can see that for some operations,
you have prerequisites. You need to know all the variables coming into this operation. But sometimes, for example, when you have this multiplication and the addition, they are not depending on each other.
So, there is no reason to make them run after each other. They can be done in parallel. And the nice thing of TensorFlow is that it firstly creates this computation graph.
And then when there is something not depending on other stuff, it directly sends these computations to different cores for CPUs or different threads for GPUs. Here are some results of this parallelized branch.
It is called experimental because not everything in the code was rewritten, not yet. And also because I want to show you that not every time it makes sense. Because, for example, the word GPU, it's like a magic word nowadays
and everyone who wants to look like a good nerd is telling that he's going to run his code on GPUs. But beware because I don't want to sell your dreams down the river,
but it doesn't make sense every time. As you can see, this is the important table. The other one is just explaining these architectures used. And so, I would like to show the last column,
which is showing that the parallelized branch is much faster than the single CPU one. It's almost like two times faster. And so, this is the column showing that the parallelization works.
And this is the column showing that the parallelization doesn't work. Because for extremely small data, the parallelized version is 10 times or even 20 times slower. And the crucial point for this is the graph initialization of TensorFlow
because it's taking some time. So, if the computation itself can be really fast, then it doesn't make any sense to spend some time for the graph initialization.
And also, it is even much slower for GPUs because the transfer of data between your random access memory and the GPU inner memory is also taking some time.
So, this is just to show the reason why it will be always just a branch and not the master version. Because sometimes it makes sense, especially if you are working with bigger data. You can save a lot of time by using the parallelized branch.
But if you are working just with a small area, just forget about it. Forget about the fact that everywhere you have seen that GPU is the right choice. Because we are running out of time, just a few words.
We are also experimenting with CPU-based parallelization, which is, I mean, the key idea is that you will split the area into irregular tiles based on sub-catchments, but I mean, maybe we can skip it for now
because we are running out of time. So, thanks for attention. First of all, you can follow us on GitHub. You can enjoy filling some bug reports. We are close to some release candidates.
I hope during autumn or in winter we will release some release candidates like beta version and the final version I hope will be released something like in the beginning of the next year. We tried to make our presentation as long as possible to avoid questions,
as you can see. It was a joke. Please, if you have any questions, you can ask, so please. And you know the lunch is waiting for you. Okay, do you have some questions? If you have, please, I mean, in the case that you are a hydrologist,
feel free to contact my colleagues, Peter Kafka and Jakub Jarabek. They will be happy to discuss with you. They like to discuss, let's say. So, do you have some questions?
You can ask, but you will just hear nothing. Or maybe you can try and I can ask the colleagues. Maybe we will be able to answer. What kind of rain data do you have? Sorry?
What kind of rain data do you import? The data? Rain data. Rain data. Well, there are some, yeah, maybe during a break I can show you some examples. Maybe it's better. And feel free to contact us by mail and unfortunately I'm not able to answer as I promised.
Okay, another question which I cannot answer. Thanks a lot for attention and enjoy lunch.