We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

PALM – a story of developing and maintaining a scientific model system

00:00

Formal Metadata

Title
PALM – a story of developing and maintaining a scientific model system
Title of Series
Number of Parts
60
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
PALM is an advanced and modern meteorological model system for atmospheric and oceanic boundary-layer flows. It has been developed as a turbulence-resolving large-eddy simulation (LES) model that is especially designed for performing on massively parallel computer architectures including GPUs. Within our talk we give a short introduction to the model system PALM and give an overview on how we manage to develop and maintain the software. This includes a view on how we do source-code management, automated testing, and code documentation. We also give information on the many problems and difficulties which arise when working on an ever-growing software package with a team originally not based within the field of computer science.
6
Thumbnail
15:56
53
Thumbnail
22:03
Data modelPhysical systemEndliche ModelltheorieLarge eddy simulationComputational fluid dynamicsSimulationCodeComputer animationXML
Web pageMetrologieSurfacePlotterPhysicalismCodierung <Programmierung>Large eddy simulationComputer simulationMetreFood energyEndliche ModelltheoriePoint cloud
Stokes' theoremKartesisches GitterNumbering schemeRunge's theoremOpen setTime domainLevel (video gaming)Parallel computingScalabilityYouTubeVideoconferencingHelmholtz decompositionBlock (periodic table)Order (biology)Numeral (linguistics)Different (Kate Ryan album)Vertex (graph theory)VideoconferencingPressureBitCodeNumbering schemeCore dumpDomain name3 (number)CodeNichtlineares GleichungssystemYouTube5 (number)QuicksortScalabilityParallel portGraph (mathematics)Sound effectScaling (geometry)Computational fluid dynamicsCartesian productBefehlsprozessorComputer animation
Point cloudComputer simulationOntologyQuicksortSymplectic manifoldEndliche ModelltheorieParticle systemInteractive televisionPlanningSurface
YouTubeVideoconferencingSimulationMathematical modelWeb pageBuildingThermal radiationData modelSymplectic manifoldMultiplicationFood energyModule (mathematics)CodeParallel computingOpen setSoftware developerSoftwareStandard deviationServer (computing)Physical systemFeedbackGraphics processing unitTurbulenceParallel portLine (geometry)NumberDynamical systemEndliche ModelltheorieSound effectBuildingPlanningModule (mathematics)Electronic mailing listCodeSoftware developerSpecial unitary groupCASE <Informatik>Standard deviationOrder (biology)Goodness of fitQuantum stateWater vaporMeasurementPopulation densityMereologySimulationSoftware testingNetwork topologyInteractive televisionPhysical systemServer (computing)Wave packetThermal radiationGroup actionFeedbackSoftware engineeringGraph (mathematics)Multiplication signComputer fileTrailRepository (publishing)Point cloudComputer scienceUnit testingSymplectic manifoldParticle systemBefehlsprozessorNeuroinformatikFood energyBootingMathematical modelCodierung <Programmierung>FlagSimilarity (geometry)WebsiteComputer simulationBasis <Mathematik>Parametrische ErregungBit error rateMetrologieXML
Data managementWeb pageStandard deviationData structureCodeElectronic mailing listSample (statistics)Modul <Datentyp>System callSource codeInternet forumIdentity managementWebsiteMeasurementThread (computing)Task (computing)GradientPoint (geometry)Canadian Light SourceSimulationFunction (mathematics)Total S.A.Line (geometry)Server (computing)Software testingInsertion lossSoftware developerWave packetKeilförmige AnordnungData modelMetropolitan area networkOpen setMathematical modelCrash (computing)Physical systemModal logicNormal (geometry)Ideal (ethics)Software developerMereologyCodeBasis <Mathematik>Rule of inferenceSoftware testingAreaSimulationResultantFunction (mathematics)Server (computing)CodeComplex (psychology)Wave packetMultiplication signMathematicsMoment (mathematics)Data managementPoint (geometry)Standard deviationElectronic mailing listData structureWebsitePhysical systemDataflowEndliche ModelltheorieComputer fileCue sportsRun time (program lifecycle phase)NumberMeasurementTrailSoftware bugMathematical optimizationEntire functionComputer animation
Server (computing)Point (geometry)Data structureTrailView (database)Different (Kate Ryan album)WebsiteWordDirection (geometry)Software developerUniformer RaumLine (geometry)Level (video gaming)Multiplication signFilm editingPhysical systemDistanceLatent heatGoodness of fitSoftware maintenanceScaling (geometry)Cartesian coordinate systemData managementComputer simulationPoint cloudStreaming mediaLecture/Conference
Web pageCodeData modelSymplectic manifoldMultiplicationFood energyBuildingThermal radiationYouTubeVideoconferencingScaling (geometry)Differential equationMereologyPhysicalismPartial differential equationAlgorithmMetreSimulationPoint cloudDifferent (Kate Ryan album)Form (programming)DatabaseProjective planePoint (geometry)Computer simulationEndliche ModelltheorieBuildingComputer animation
VaporBuildingNetwork topologySimulationLecture/Conference
Point (geometry)Materialization (paranormal)Building
VaporTask (computing)CASE <Informatik>
Transcript: English(auto-generated)
Welcome to my talk. Yes, the topic of my talk is the model PALM. PALM is a large eddy simulation model and I would like to give you a short summary of the story of how to develop and maintaining a scientific code like PALM.
So first, what is PALM? PALM is a large eddy simulation model, so it's a CFD code and we can simulate dust devils, wind energy, aviation, urban metrology and city planning, cloud physics. So basically we simulate the atmosphere within the lower
few hundred meters to one to two kilometers above the surface and everything what is in there is simulated by PALM. So then what does PALM in the yeah or what does PALM on the on the within the base of the code? So we
solved the Navier-Stokes equations which I think every CFD code does and we solved it on the Cartesian grid. Then we have within the code we have numerous different numerical schemes. For example we use a third-order Runge-Kutta
time-stepping scheme, we use a fifth-order differencing advection schemes, we have a pressure solver, we have even different pressure solvers, an FFT solver and a multigrid solver and we also use a lot of different other numerical codes and schemes. To run PALM we use a 2D
domain decomposition realized using MPI. So if you think of this block being the atmosphere then we divide the atmosphere in these vertical columns and then one code, no one core, one CPU solves all the equations
within this core. And then we use MPI to communicate between the different cores. But that's not all, we also use OpenMP and OpenACC for parallelization
within one of these columns. So we use we try to use everything we can to get the code parallel and quite fast. And we also achieved a quite good scalability. We tested this with up to 40,000 cores and this is the graph so
it scales quite well I would say. So these are some facts. Now to give you a little bit more of a nicer introduction to PALM I would like to give you, show
you a short video. And there are also other videos, we have a YouTube channel you can go there on YouTube and have a look. So what does PALM? So it is from the Institute of Metrology and Climatology in Hannover and for
example we can simulate dust devils. Dust devils are like these little guys in the atmosphere where we can simulate the dust whirling around in eddies. We can also simulate shallow cumulus clouds and the interaction
between the clouds and the surface above, below the clouds. We also simulate, can simulate the cloud particles with a Lagrangian particle model. We are able to simulate wind turbines and the, sorry, and the
turbulence around these. Can I go any faster than this? I guess not. Okay so
basically we can do a lot of things. So and this is then the first challenge of maintaining the code. So in this graph I tried to put everything we can simulate with PALM and I guess I even missed some things
but to just to name just a few we can of course simulate the general dynamics. So the wind speed, so the wind through the atmosphere, we can simulate the buildings. So we can resolve the buildings which is not
always the case with every CFD code. We simulate the radiation effects. So we have actually a sun in our simulation which shoots the rays through the city. We can simulate the chemistry within our, within the
atmosphere. We can simulate, or we have a soil model so that we also simulate how the temperature and the humidity or the water goes into the soil and also out of the soil into the atmosphere. We simulate the plant canopy. So we have a plant canopy model. So we have actual trees in our
simulation and we have the interactions between the trees and the atmosphere. Of course as we do meteorology we also have the clouds and also rain. Unfortunately no snow but this will come soon. We have a
Lagrangian particle model which is a large model by itself but it's also implemented into PALM and it's part of PALM. We have a multi-agent model. This means we can simulate people running around our city or within the atmosphere and have the effects of wind and temperature and humidity on
the people. We have a wind energy model so we also can simulate the energy gain from wind turbines and also the the turbulence effect of a wind turbine into the atmosphere. We have a flight module so we do not
simulate actual planes but we can simulate flights through the atmosphere and measure data along the flight path. We also have an ocean model so we can switch the model and then we do not simulate the atmosphere anymore but we simulate the ocean which is basically the same with a
different density. So we have a large list of features in the code and this is also the first challenge, the code base. We have an ever-growing list of demands for new features although we already have a lot of
features there are still new features which shall be implemented. This is a very big challenge. Then we have to deal with legacy code so our code base is already over 20 years old but it's still lively, it's still developed further but parts of the code are already over 20 years old and
they could be improved maybe in some places. Then it is quite fast-growing in the recent years. This is also a big challenge of the code base. So this is
a graph showing the total files within our repository. So it starts at 2008 and recently within the past one and a half years the number of files went up from about 2,000 to nearly 6,000 so the code base is very fast growing. The
number of commits within our code base is also quite growing and also within the past two years it's even faster growing. Also the lines of
code is also quite interesting so we started with about 50,000 lines of code and now we are nearly at 250,000 lines of code so it's quite a large code base. And this now brings me to the next challenge, the performance and the parallelization of this very large code base. So we already
have a good performance but when we implement more features we have to keep this performance. So to do this the new developed features must support the parallelization and this means the new features have to support
MPI, OpenMP and also OpenACC. And this is a big challenge because the developers who develop all the new features they have to be familiar with the parallelization techniques in order to keep the performance and to keep Palm as good as it is now. Then this brings me to the next
challenge, not only the code base is a challenge but also the developers themselves are a big challenge for the sustainability of the code. So Palm is usually developed by a team consisting of scientists rather than trained software engineers or software developers. And this is a picture of our
group there. So this means as we do not come from the computer science modern software development methods and tools are mostly unknown to us. We are metrologists and we do not know
anything or well we now try to learn something about unit testing and all these things but they were unknown for us for quite a long time. So and this brings me now to the to the solutions. We try to tackle these challenges or at least our approaches to the solutions. So there are many
approaches. First is the coding standard. We implemented a coding standard for our code. We have now an internal performance monitoring. We use a Jenkins test server and CI pipeline. We have a user documentation and tutorials for
the user. We have a feedback system via a issue tracking system on our website. We increase our performance know-how by joining hackathons, GPU workshops, CPU workshops, computing conferences. We also have internal palm
workshops to train first our own team and then also our users how to use the code. And a big thing which we are now currently doing is we make a design rethinking and refactoring. So we try to make some change management in our
codebase. So and I will now give a go into more detail in some of these points. So first the coding standard. We have a coding standard. It's a list of rules with examples to ensure a unified code structure. So we have different developers and they all have their kind of how they write the
code and that our codebase is somehow homogeneous within through all the whole codebase. We try to teach our developers to code after a certain rules. So we have a palm coding rules. They are quite a lot. It's a
list of rules how to write code and this is all available publicly at our website so every developer can have a look at it and should also code after this coding standard. Then we have an internal performance monitoring. This
means we have runtime measurements specified in around some code parts. And I think I didn't mention but as you might see we use Fortran code and we use some own subroutines to measure how long certain parts of the
code are taking to execute. And after each simulation or each run of the code we have an output into a so-called monitoring file where we list all the runtime of all the different parts of the code so that we
always can monitor if some code parts are getting slower and slower while the development is moving on. Then we also use a Jenkins test server.
I think most of you already know what the Jenkins test server is. It's an automated testing with a Jenkins server and we introduced this only in 2017 although the code development started in 1997.
So after 20 years we finally have some automated testing and now we also give the result of the testing of the test on our website so that everybody sees if the code is broken or not. Which is also something quite
good for the developers so that they see that they try to be good in developing and that the code does not break. But the codes have to cover the entire code base and this is still not achieved. We are still trying to cover most of the code base but this takes some time. And you also
have to train the developers to actually use the tests and also to develop new tests. So at the moment it always happens quite frequently so
maybe once every two weeks that the build does not pass and it's failing. This is because our developers are still not that familiar with how to use the test server in advance before submitting to the trunk, so to the
official code base. Then we have an issue tracking system. So on our website when you log in as a user you can issue, you can submit an issue or a ticket and then we try to figure out what the bug is or if there is a bug
or if the user just doesn't know how to use the code then we have to fix the documentation of course. But then there's another problem. As PAM is getting more popular also the number of tickets is rising quite to a quite high
number and as we are a small development team we have to try to keep up to the number of tickets. Then hackathons and internal workshops. So we always try to to get more involved in the software development within our
team. So we attend hackathons to improve our code performance and we also hold seminars to train our users and also our development team internally so we give I think almost twice a year now we give a so-called PAM seminar
to train our developers and also our users. Okay and this already brings me to the summary. So what I showed is we have a highly complex simulation model. It has many features and is highly performance optimized. This brings a
high demand of sophisticated development methods and tools and also the training of the development team and also the users unfortunately is necessary to correctly use this very complex code. Then we also have
the problem of the development team that it is that it has less to no IT background and this this gives that the solutions to problems are often far
from ideal which we use and the modern development methods and tools take quite a long time to be adopted to our working flow. For example as I mentioned the automated tests took over 20 years to be used somehow or any tests
to be used. So and this brings me already to my end. So thank you for listening and if you have any questions or comments or suggestions what we do wrong or what we can improve then I'm very happy to hear your thoughts. Thank you. Yeah thank you Tobias for this clear talk. I recall my
suggestion that we postpone discussion about the management and maintenance aspects to the end of the session and now other questions regarding the application the technology. I don't know probably I might forget my
question so I ask it now. So you talked about those tickets and you had accepted tickets so what is this? The accepted tickets they are just this is
some feature of our server of the track system we use on our server. We can you can assign it you can write a ticket as a user then it is on the
website and it as long as no developer is assigned to the ticket it's not assigned or it's yeah so it's just an open ticket and for a long time we did not assign a specific developer to this ticket so this is why this red
line is quite quite small but but usually the those two lines should be on the same level and they are normally. Further questions up there? I repeat the
question yeah thank you I repeat the question for the benefit of the video streaming. So is that a uniform Cartesian grid or is the grid adapted to the structures at the ground? We use a uniform grid we use a
rectilinear grid so it is stretched along one direction along the z direction but in X&Y we have a fixed distance between the grid points. And I have a related question. In the beginning you showed simulations at different scales
building up of clouds over a vast landscape and then wind blowing through a street. These are different scales then. It's the same physics the same partial differential equations of the same algorithm at these different
scales? Yes it's always the same algorithm at the different scales what we change is just the the grid size and usually well in this simulation for example we simulated a dust devil we have a grid size of 0.1 meter I guess and then in other simulations where we assimilate clouds if we only simulate
one cloud then we might also be in the in the size of one meter in this simulation for example we have a grid size of 25 meters so we then adopt or we use another different grid so to say so a different grid size and then we can
cover the different scales of the simulation. And when it then comes to urban climber to where do you take your ground model from? Do we have a database with all the houses? The database is the big issue there. Right
now we have we have a quite large project running or actually it was running until the end of May where we got all the data from all the different people who own the data so we got the the building heights from from the ministry we have some other colleagues getting all the the data's
of the of the trees for example then oh this is a nice example for here in this simulation we also had the the the data for the for the wall
materials of the of the buildings and this was a research topic itself to get all the wall materials of the buildings for example so I'm getting the data is although the the starting point of the data is always
quite quite a challenging task and this is yeah case specific then yeah thank you any more urgent questions yes maybe we come back to that at the end
and discuss that with all speakers