Code to Last: Bridging the Gap in Academic Software Sustainability
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 17 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/67779 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Kaiserslautern |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Leibniz MMS Days 20243 / 17
16
17
00:00
Computational physicsMathematicsDisintegrationMultiplicationAlgebraic numberScalabilityElement (mathematics)SoftwareCodeFaculty (division)Local GroupComponent-based software engineeringPhysical systemCompilerVideo gameComputerBoiling pointSoftware developerReal numberSoftwareMereologyMultiplication signCodeNonlinear systemTask (computing)Line (geometry)Type theoryComputer simulationMeasurementRepository (publishing)outputPolygon meshFlow separationSoftware maintenanceAreaProjective planeAverageMathematicianBlock (periodic table)Endliche ModelltheorieGeometryStudent's t-testMultiplicationPhysicalismLinear codeComplex (psychology)DampingTerm (mathematics)DemosceneSoftware frameworkDifferent (Kate Ryan album)Characteristic polynomialDegree (graph theory)Order (biology)Symmetric matrixWritingFaculty (division)NumberLevel (video gaming)Position operatorGoodness of fitRight angleAlgebraBuildingBitCircleAuthorizationComputer-aided designWordSquare numberComputational scienceNumerical analysisQuicksortSlide ruleMedical imagingComputer animationMeeting/InterviewLecture/Conference
08:01
Functional (mathematics)Component-based software engineeringCompilerPhysical systemOperations researchInterface (computing)Gamma functionMachine codeProjective plane2 (number)Virtual machineSoftware developerLine (geometry)Software frameworkBitLibrary (computing)Finite element methodGraphical user interfacePerspective (visual)Right angleNumberBlock (periodic table)DampingVapor barrierPoint (geometry)Mathematical analysisFormal languageCircleSquare numberComputer programSoftwareGoodness of fitElectronic mailing listOpen sourceMultiplication signUtility softwareCodeMereologyPhase transitionBuildingCuboidComputerCASE <Informatik>WritingRule of inferenceSoftware maintenanceRevision controlMoment (mathematics)CompilerStudent's t-testLine codeDigital rights managementType theoryQuicksortComputer animation
15:30
Line (geometry)CodeWeb pageNumberResultantMereologyUtility softwareGoodness of fitReal numberSource codeStudent's t-testLibrary (computing)Line (geometry)CodeMachine codeArithmetic meanSoftware testingElectronic mailing listComputer animation
16:48
FaktorenanalyseCodeNumberLine (geometry)Web pageNumberLibrary (computing)Source codeLine (geometry)Characteristic polynomialSoftware developerProjective planeMultiplication signWeb pagePrincipal idealMereologyPolygon meshStudent's t-testMetropolitan area networkPerspective (visual)Parallel portOpen sourceComputer animation
18:43
Lattice (order)FluidData structureOpen sourceMKS system of unitsCoprocessorWeb pageComputerWeb browserMultiplication signFunctional (mathematics)Single-precision floating-point formatLibrary (computing)Student's t-testInterface (computing)Conjugate gradient methodComputer fileEmail1 (number)MathematicsSoftwareMereologyCodeInteractive televisionParameter (computer programming)Term (mathematics)ParsingSoftware developerFinitismusGroup actionData structureVolume (thermodynamics)Patch (Unix)Projective planePoint (geometry)DampingNumberWritingSign (mathematics)Electronic mailing listOpen sourceFinite element methodCollaborationismStandard deviationUniverse (mathematics)Computer animation
25:02
Dew pointPatch (Unix)Element (mathematics)Adaptive behaviorCodecComputerPoint (geometry)Dimensional analysisAuthorizationLibrary (computing)Ferry CorstenSpacetimeView (database)Product (business)Goodness of fitPatch (Unix)EmailOrder (biology)Element (mathematics)Pattern languageNumeral (linguistics)CollaborationismWaveRandelemente-MethodeCore dumpComputer configurationOpen sourceSoftware maintenanceMultiplication signSoftwareGroup actionOnline helpFinite element methodComputer fileSoftware testingNichtlineares GleichungssystemCodeSimulationOperating systemMathematicsSoftware developerRevision controlSquare numberAdaptive behaviorProjective planeTriangleQuadrilateralNumerical analysisMereologyPower (physics)Kernel (computing)Computer animation
31:22
Open sourceLinear codeLibrary (computing)Particle systemMultiplication signSoftware frameworkRevision controlVirtual machineStandard deviationView (database)Goodness of fitSurfacePartial differential equationSummierbarkeitCodeKodimensionPoint (geometry)SoftwareComputer hardwareRandelemente-MethodeMachine learningComponent-based software engineeringMereologyMachine codeSimulationProjective planeRight angleMathematicsStudent's t-testDimensional analysisDescriptive statisticsElectric generatorTerm (mathematics)Cartesian coordinate systemGeometrySource codeComputer animation
35:32
NumberPrincipal idealSoftware developerWebsiteLibrary (computing)Software developerDot productGraph (mathematics)Finite element methodSoftwareComputer animation
36:36
MKS system of unitsFile formatNumberComputer-generated imageryPrincipal idealSoftware developerWebsiteLibrary (computing)MereologyEstimatorGraph (mathematics)Revision controlRandomizationComputer animation
37:32
DeterminismRevision controlSoftwareLibrary (computing)Closed setCodeProjective planeStudent's t-testRight angleMathematicsOpen sourceResultantTheoremFinite element methodAveragePattern recognitionGraph coloringMachine codeRevision controlCASE <Informatik>Internet service providerOpen setMedical imagingWritingProof theorySource codeGraph (mathematics)Order (biology)Computer animation
40:35
Nichtlineares GleichungssystemLaptopPopulation densitySupercomputerDataflowMathematicsCoprocessorEntire functionComputer simulationLibrary (computing)Video gameFluid
41:10
VolumeSurfaceDependent and independent variablesComputer networkTime evolutionGamma functionResultantGraph coloringSoftwareSimulationStudent's t-test1 (number)EvoluteGroup actionLibrary (computing)SurfaceComputer animation
41:49
Gamma functionElement (mathematics)Numbering schemeInfinityComputer simulationDataflowRevision controlTurbulenceComputerSupercomputerCASE <Informatik>Integrated development environmentVideo gameComputer animation
42:29
Compilation albumNumerical analysisGeometrySimulationData modelContinuous functionPhysical systemSeries (mathematics)Discrete groupDynamical systemVideo gameFlow separationComputerMereologyMultiplication signResultantType theoryVideo gameProjective planeComputer simulationInteractive televisionFluidSurfaceNichtlineares GleichungssystemData structureComputer animationSource code
43:11
Open sourceAdaptive behaviorParallel portWaveDifferent (Kate Ryan album)CodeComputer simulationMixed realitySlide ruleMusical ensembleParticle systemComputer animation
43:57
World Wide Web ConsortiumContrast (vision)Stokes' theoremType theoryLibrary (computing)Multiplication signSoftware developerSoftwareResultantComputer animation
45:00
VideoconferencingComputer animation
Transcript: English(auto-generated)
00:06
I'm really honored to be here and to present here and also I would like to thank my hosts because the visit was fantastic so I really liked to actually see people doing things in real life instead of me just playing with computers for
00:21
video games for the for most of my life. So let me start with a consideration so for most of us and like for the people that actually work here what we do boils down to software development essentially. So our typical workflow and many of us will relate to this slide in some sense
00:43
is to start from some sort of data that comes from real life and usually it's either CAD that somebody decided that they want to study and simulate or medical image or just a geometry definition square or circle for mathematicians most of the times. And then from there you go to a
01:02
computational mesh and then you build a mathematical model and after that you start with the numerical discretization, you make a simulation, you do some post-processing, improve, repeat, compare with the experiments essentially independently on the type of computational model that you want to do this is what you're going to be doing if you're a researcher in this
01:24
area in some sense. And so the idea here is that typically if you're an industrial partner if you're one of the users of these type of models you're interested in ready-made solutions so you want to have software to do this without you having to intervene and you want to integrate this with existing
01:42
tools so for example you would like to be able to match your experiments with the data that are produced by the software and you want to make sure that the inputs and outputs are responding that's not a trivial task but this is what you will look for when you're a user most of the times and as academia developers usually you are part of a
02:03
developer or at least part of the frameworks so we don't really use the entire framework most of the time as the academic developers so we concentrate maybe on the model or maybe on writing the software that goes from the numerical approximation to the post-processing of the things and so on
02:21
and so forth and typically we have different needs with respect to the industrial and the academic scenes so if I talk about what I'm going to talk about today is something that comes from my own experiments and from my own experience so typically what we would like to have if you want to compete
02:40
and to talk with industry is ideally something that is based on is capable of treating realistic complex geometries in 2D and 3D and you want to be able to treat multi physics problems and you want to do linear solvers and nonlinear solvers in an efficient way you want to do adaptive measures if this is required by your simulations you want to do high order degrees and you
03:03
want to do efficient solvers typically multi grid solvers or algebraic or symmetric and you want to be able to run these things on many computers so the ideal world is what you would like to this is what you would like to have so if somebody comes to you and says I want to solve this problem this is a multi physics difficult problems I would like to have these
03:23
characteristics all of this comes with a very high cost and it's not a cost in terms of money it's a cost in terms of knowledge and it's a cost in terms of what we can actually do so the idea here is that if you want to do this from scratch it at least you requires 100,000 lines of code so keep this mind
03:45
this little number in mind because for those of you who have been here this morning the reality the bitter reality that we actually face most of the times is the following so most research software in academia are developed by
04:02
graduate students or PhD students obtained by postdocs who are supervised by faculty and there's nothing bad about this in general well the main issue here is that the people that actually write the software so the PhD students don't really have a good overview of what is the existing
04:20
software right they don't have a writing software experience and also they have very little incentive on writing good software so if you talk to a PhD student their primary goal is not to actually write a good quality software that's you know that's your goal as an advisor to have somebody that writes good quality software but that's not the opposite and when you progress in
04:43
your career that's even worse because your time decreases with the career stage so you become a software you know you maintain software when you're a postdoc and you have very little time because you have to publish you have to find you know a permanent position somewhere so you're you're fighting for things that have nothing to do with software writing and when
05:04
you're a faculty it's even worse you have no time of writing software it's very difficult for faculties and PIs to sit down and actually write code if they do they are very lucky because they are doing something that they usually very like very much and it means that they are finding quality times in their research but most of the times you have to find money you
05:22
have to you know do politics lots of things but not writing things and very often time also the faculty doesn't don't really have software writing experience right so the questions here are are we doomed this seems like a very bad scenario the reality of this is a very bad scenario so why are we so
05:44
bad let me put it with the words of some good people here this is Greg Wilson and and many other authors no you know some time ago that's ten years ago right it's a very nice paper best practices for scientific computing and what they say here is that scientists spent an increasing amount of time
06:03
building and using software however most scientists are never taught how to do this efficiently ever so you never taught in practice how to do that so many are unaware of tools and practices so students don't know exactly
06:20
how to do things in the right way and especially it's very difficult to write things in a reliable and maintainable way so let me make this a little bit more straightforward because the reality is even worse than what it looks like so if you think about these questions why are we so bad there's at least two more things that one has to consider so the first thing
06:45
is how much we overestimate our capability of writing good software so this is a person to be as keepers who's an expert in code maintainability he has founded a company that actually does code maintainability okay
07:00
and this is a very nice block article why you need to know about code maintainability and it analyzes this article several thousands GitHub repositories and tries to figure out what is on average lifetime of projects the time that is spent in maintaining the projects and on average
07:22
how much any developer is capable of writing during one year the numbers are scary because what he says is the following a good developer manages about 10,000 lines per year 10,000 line per year is nothing okay considering the 100,000 lines that are needed for a software to be at least good software
07:45
and that's okay it's a fact 10,000 line per year and then the second thing that it actually observes is that if you leave a software as is for one year your software will stop functioning that's also very very bad and it's also
08:02
very true so if you take your software you write it today and you let it leave it there and then come back after one year and try to install it on your new machine because it just bought a new machine and the compiler will not compile exactly the same thing as before so a few things will have to be changed maybe the Python version has changed from 2.7 to 3 something or
08:20
rather 3.1, 3.8 and things don't work exactly in the same way so on average what they say is that you need about 15% of the code to be readjusted per year that's just to maintain the code running just to keep the wheel turning if you put these things these two things together this is
08:42
a very scary number because it says to you and if you write codes every 66,000 lines of code require a full-time developer just to make it running now let me pose here for a few seconds so if you think about writing your own finite element solver and in 3d with with the algebraic multigrid and the
09:05
code surpasses 66,000 lines all right which is six years of work roughly after six years work you will have to hire somebody just to maintain the code that you wrote in six years so what is the biggest consequence of this
09:24
well the biggest consequence of this is that the lifespan of most of the research projects is on average five to seven years and as you can find on github just go to github look at the first commit look at the last commit of the inactive projects and you will see as soon as the project has become a
09:41
little bit too big it dies and it's simply because PhD move from one place to another postdocs move from one place to another and it's very difficult to keep up with the things and then this is exactly what you were discussing this morning right for those of you who were here this morning in a discussion so we must change a perspective of course if your code
10:02
relies on five lines and you import the package that we were saying before that's a different story you have to maintain five lines of codes you don't have to maintain 66,000 lines of codes so the idea is we must change perspective if we want to keep our code maintainable so we want to reuse
10:21
or use existing software libraries as much as possible that's the first rule for everyone and we have to keep the amount of code that you write and you need to maintain at a minimum and for this you need good software frameworks and of course when you write code and when you really need to write code
10:42
you have to write code with sustainability in mind of course there are cases in which you cannot do anything else and just write your own piece of software because it's not there it doesn't exist so if you write it with sustainability in mind there is a chance that maybe not one person works full-time on it but ten persons work one month per year so the perspective
11:06
that I will present today is a perspective that has been used in the last 25 years by a library which is called the DL2 library and this is the one that I have experience with so I'm just telling you what our experience is in 25 years of developing this library so there's two typically two
11:23
different perspectives which are both equally good according to what you need to do in one sense you have the user perspective and this is the one that actually uses frameworks rather than libraries or libraries within frameworks or libraries within scripting languages so for example your code requires import by CMD and then five lines of code and you're able to
11:46
do your analysis fantastic that's the user perspective and there's many examples in the finite element community that actually does that so for example Fenix, FireDrake, ng-solve these are very good examples they give you five lines codes to solve the Poisson problem on a square so if you want to
12:02
solve the Poisson problem on a circle well you change two lines of codes perfectly good they usually have graphical user interfaces or scripting languages as as a way to access those codes and it's a very low entry barrier for new users so these are very attractive if you want to be the user of these libraries and the problem is that if you arrive to a point
12:24
in which things don't work or don't do exactly what you need the barrier to go from users to developers is extremely high it's very difficult to become a developer for tensorflow if you use tensorflow it's very easy to use tensorflow but if you want to become a developer for tensorflow that requires many years of experience and the other perspective is the developer
12:45
perspective it's the one that like likes to get its ended dirty in a certain sense and then you start from scratch with the hardcore things the difficult things so here you have blast, lapac, MPI, patsy, trilliness these are all libraries they don't provide you solutions they provide you the tools it's like giving you the legal blocks and then you say I'm building my house
13:04
from scratch with the legal blocks I don't build each legal block but at least I have a box of pieces that I can use and combine together so usually they offer common linings phases in their compile languages I have very high entry barriers for new users so it takes a long time to get used to
13:20
this software developer sort of perspective type but it's a very low barrier from user to become a developer so this has advantages and disadvantages so if you're an advisor like like I am at the moment and you want your students to start working on things that you will know will become difficult at some point choosing the left is for those students that do not have a
13:45
chance to become developers in the future choosing the right is for those users instead that actually have a chance of becoming developers or for those that actually show some taste for coding and the part which is the software development I'm saying they are equally good on both sides so my
14:03
experience is that we did yield to library and that's what I'm going to be describing to you today and the interesting thing is that there's a lot of lessons that come when you start developing a library that becomes so big and a library that becomes so you so the first lessons that have been
14:23
collected together by both Kang and Timo the two of the other main developers at the library is collected in the very nice article that is what makes computational open-source software libraries successful and incredibly writing good software it's not in the list so writing good quality at c++
14:45
program it's not really in the list so what is there is building a community and it's improving the quality and utility of the code and it's writing good documentation and good having good project management and last but not
15:02
least attracting new developers so this improving the quality and utility of the code could be interpreted as writing good software but it's very easy to write with software if you have a community that actually helps you in reviewing the software that you're writing so that's why I think it's
15:22
not such a difficult thing to do if you have a community around you and let me just give you a couple of numbers of why I picked deal 2 as an example so this is the history of commits in the deal 2 library in the last 25 years and the source code of the library the one that actually produces the results
15:43
that you see in the publications has grown from a few hundred lines of code a few hundred thousand lines of codes to 500,000 lines of codes that's not too much however if you look at these red lines over here this is the amount of test code that is used to test the library and this is growing
16:03
steadily at the same size of the library itself what that means is that there is a lot of testing that is going on and there is a lot of workforce that is going on to make sure that what is inserted in the library and what is programmed by students and what is programmed by you
16:20
is actually of good quality and utility and that's what I mean by good quality and utility it's not here so the most difficult part is not this one it's this one is to make sure that the code does something which is useful and it keeps being useful during the during the years now if you look at this these numbers are pretty large and if you look at the real
16:43
numbers here this is also the most important part that was in this list writing good documentation and if you look at these numbers here it's roughly the same amount so this is one third one third one third which is comments both in the tests and in the source so if you print out the
17:03
documentation of the library which is generated automatically from the source code that's an important part it's about more than 10,000 pages more than 1,000 pages of documentation as a PDF file this is about 10 years of work of a single man working just writing documentation so if you think about
17:23
these numbers in the perspective of the 100,000 lines or the 10,000 lines per year this is 10 years of a man just writing documentation every day without doing anything else so it's an impressive amount of investment that you have to do and so many people look at me and say so why do you do
17:41
it or why do you ask your PhD students to do it that makes 25% of their time being wasted because you're writing the documentation I have my answers for this but I'll ask you I'll make sure that this comes just in the end so this is the number of contributors that we have in the library since the beginning and today there's more than 300 people that
18:00
actually contribute to the library so this makes this is a co in the sense what I was saying about the community so it's just it's not just a garage project anymore and it actually takes a lot of time to distribute this across the development so why does it take so much so what are the actual characteristics of the library just to give you very short graphical history of the library so we started with some important characteristics so
18:26
parallelizations came quite early this was when we started introducing with Pepsi then we had HP at activity and isotropic mesh refinement distributed meshes matrix free and many many other things and these are prices that the library have won during the years today we are about 13 principal
18:43
developers this is the list of the guys from scattered all over the world none of us live in the same place essentially and this 13 developers if you think about the numbers that we said before it's roughly just about right to maintain the amount of code that we have every year which means
19:03
that we are called the main developers because most of our time in the library is spent reviewing patches for the over the people and trying to make sure that the code keeps compiling whenever new things happen or whenever your computer is distributed or whenever Apple decides to create a new processor that was a nightmare for example so if you look at the
19:23
cumulative knowledge it's about hundred years just for the code 50 years for documentation and if you look at all the linked libraries of other thousands of years so the question is can you compete with this no you cannot so no small business no single person no group or single
19:46
developers can ever hope to compete so why should you be writing your own code why should be writing your own solver for finite element or for finite volume the answer is you should not okay so this is the advertisement
20:01
finished and now I want to tell you the story which is the reality and the reality is a bit harsh so the reality is it is very difficult to maintain this and the reason why it's very difficult to maintain this is because you say okay I want to contribute to a very large software library so how do I do
20:21
it I'll tell you my story so that we can discuss about how this this is actually happened in the past so why did I choose deal 2 so it was a PhD student in Pavia working with professor Daniele Boffe and I started in 2003 so I was given total freedom except I know on the argument so the argument was
20:41
chosen by my advisor and I said okay he said I would like to work with fully structured interaction problems do you like to structure in terms of sure yes why not this seemed very cool so he gave me the code that he had been developing to decode that he had been working for 20 years for the entire career and he was a fortune code parts of it were written in
21:03
Fortran 4 code and other parts were in Fortran 77 and I looked at it there was no sign of documentation anywhere and so I decided no I cannot do that so I started looking for you know all the places to figure out how I could build things in a better way so no open source tools was available to do what
21:24
he wanted me to do so I had to really write something from scratch and the point was I didn't want to write everything from scratch I didn't want to write my own solvers I didn't want to write my conjugate gradients over because I knew that that would have taken too much of my time so I looked
21:41
and looked and looked and then I stumbled upon the library of the deal to web page and at the time 2003 it wasn't so common to have a web page for software projects so web pages for software projects started coming out in 2000 and only the big ones had the web pages so around 2000 you had the web
22:04
page for the then web browser for Linux that had become web browser for Linux that was the way you know the project that actually had a web page and it was not so common and they actually built the web
22:23
page not because they had the need to simply because they could so the three main developers were just students that were nerdy enough to to decide well why don't we build a web page for our project yeah sure why not let's put the web page on they asked for permission to the university and the university said yes why not let's put these on the web page this
22:42
attracted a few developers that were not developers in the first place and I found the web page in 2003 and I'll if you look at the history the history starts in 1997 with the c++ standard I tried compiling the library
23:01
and it actually worked that's a big step in the first the first time that I tried it and I looked at I looked at the first story also incredibly enough the most important thing that I found was that it was full with examples that you could actually drive your knowledge from and I for the first
23:20
time I was actually seeing things that I had been only seeing in courses and if you look at the examples that are for example there for many of the competitors libraries they are not nearly as complete as this one's and now there's about 100 tutorials at the time when I started the number of tutorials was 14 so there was only 14 tutorials when I
23:43
started developing this I never went back so I just started there and then from then on I just kept working with this so I first committed a small change in the header file just looked it up so it's 2004 so there's no as a PhD student I decided okay I want to give back with the people that actually gave
24:02
to me so the first meaningful contribution is the interface with function parser this is my story 2005 and then he took me from 2004 to 2009 so I was already a postdoc at a time to do my first major contribution and this was Connie mentioned one abilities for the library
24:22
so if you look at the span that is from here to here that's a span that is usually not something too convenient for PhD students to do so you don't want your PhD students to spend too much time in contributing to libraries so it's something that is very difficult for an advisor to say to the student so
24:43
please go and contribute back to the library that you just used it takes time so the history of why I did this was because I was hired as a postdoc in CISA for an industrial project and this was an industrial project in
25:00
collaboration with Fincantieri and they wanted to see what the numerical groups of CISA could be doing with numerical analysis so they gave me again an old Fortran code and I said we want this to run faster and it was a Fortran core Fortran 4 code plus Fortran 77 code you know a pattern that I had seen in the past and they want to replace this and I said
25:23
okay I did this for my PhD I can do this again so I tried to keep using deal 2 for this project and and the idea was to build deal 2 to use boundary element methods what was the issue with this well deal 2 was built by
25:40
people that were actually doing finite elements no none of them actually did ever boundary elements what does that mean well that means that usually your elements have the same dimension of the space in which they're embedded in so you have triangular elements or quadrilateral elements in 2d and you have exahedral or tetrahedral in 3d and I needed triangles in 3d and I
26:05
it turned out that it's actually not so easy if you do not design your library from the beginning with this in mind so what I did was I modified essentially every file of the library together with the help of another postdoc of the group and we made all possible mistakes and then at the end we passed all the
26:24
code to Wolfgang at the time github was not an option so it was not there yet so this was 2009 and it started being used in 2010 and so we did that through emails so we send the patch to Wolfgang and Wolfgang would take
26:41
several weeks of good reviewing before we could actually make it work so we learned a few things and the first things that we learned is that coordinating a large project is not easy from the technical point of view so today we use pull request and the github hosting and the first pull request on deal 2 was done on 2014 so that's a few years after github it
27:02
started being being around we do strict peer review from principle developers meaning none of us can actually merge our own changes so we make changes proposals and people review the proposals and then merge the changes and there's a strict feature testing so at every pull request so whenever you
27:21
propose a change there's about 12,000 tests they need to pass before we can actually merge the thing so these are what saves us from a lot of problems and we also learned many other things in the process and for coordinating the project of this is not easy from the social point of view
27:42
so how do we convince people to invest in contributing so that's the main questions that now I face as a developer and as an advisor it's very or two years in contributing back to the library so this is a question that we have to ask ourselves if you want to have maintainable software in
28:02
academia in essence so there's many other things so from the social point of view how do we help people so today we have video tutorials so there's about 80 video tutorials of a one hour each that explains you know single parts of the libraries like how do you solve the Poisson problem on a square how do you solve the Poisson problem in parallel how do you solve the
28:23
Navier-Stokes equations in parallel and so on and so forth and now the other questions is from the social point of view do we accept everything so if you have somebody that you have never seen have you heard of the exit problem lately I don't know whether you're aware of that but there's been you know a very well engineered attack on an extremely sensible portion
28:45
of the Linux operating systems which is the exit package right so for three years somebody faked being a good person and you know produced good quality contributions to the package after three years they injected something that
29:03
allowed them to connect without authorizations to any computer that had a particular version of exit installed on the computer and that affected fedora and many many other packages and now you ask these questions for Linux kernel probably you have to be much more careful for a finite element
29:22
software I don't think that's going to be a big problem but still one has to be asking questions like for example who is going to maintain the thing if the PhD stops showing up at some point so if we take over things in a library you have to make sure that there is the power to maintain the
29:42
things that have been taken over so the other thing that we have learned as an educator is that the most precious thing and rare thing to come about is competence not good software but competence and time that's the most
30:00
difficult you know the most precious money that you have in your hands so one way to keep around the competence is to make sure that the products of your competence are open source they are shared they are published as we said this morning you know but of course you know it's very difficult to convince
30:23
people to furnish this competence and to open this competence to others so as academic developers we also learned that working with industry is not always easy so just to go back to the examples that I said before the first success that we had with the BAM product that we created at the
30:44
beginning was not the actual 2009 contribution it came way later so this was wave BAM and it's a software that is still available now and it's still used now by some of my collaborators and this is being the developed by a
31:01
project called OpenShift which was financed by the European community this product was actually solving boundary element methods with adaptive grid refinements with arbitrary orders and this was in 2010 so this is a it's actually quite old as a simulation if you think about this and the reason why
31:21
I'm important this is for the following reasons if you look at the the timeline of how the contributions that led to that publications have been introduced in the library this timeline is also very scary I'm so I'm saying very bad things here I'm not saying good things today I'm just saying we have to be bridging this gap which is there and it's clear that the
31:43
gap is there we have to be aware of this gap so these publications was done in 2010 and before the publications we added the could I mention one support to the library so this was the base for the publication and then it took us five years to get the the CAD geometry description to be embedded in the
32:03
simulation part which was in these publications to be accepted in the library so five years of peer review on part of the software which is now part of the library and it's a very powerful part of the library but it took a very long time before it got accepted there and there's a publication that is related to this of 2020 so ten years time before you can
32:25
actually do things in a way that for today's standards in terms of machine learning for example is unacceptable so this is impossible in today's speed of generations of codes and it took another three years to get another component in certain in the library and another from 2009 2017 eight years for
32:46
another fundamental component that was actually inserting sandals in the library and all of these things were in the publication of 2010 before they could be inserted in the library they required us to rewrite many of the things in the good way because they were not written in a proper way because they
33:03
were written by a PhD student or by a postdoc or by me during the nights when I was waiting for you know the projects to be reviewed or things like that so it takes a long time it takes a really long time so from 2009 to 2022 to get everything that was in here inside the library so now the
33:21
question is does it pay back it's it's a big question right so after ten years you would say okay then you just develop your own thing and then if it dies it dies who cares well I think it pays back and the reason why I think it pays back is the following is because I think that giving back to the community and instead of taking from the community is extremely important
33:45
because you'll take you take even if you don't realize that you're taking so for example I added support for the co-dimension one part of the library to do boundary element methods and Andrea Bonita and Sebastian Politi used it for partial differential equations and surfaces of the framework
34:02
was the same really because molar had a particle supports to the library and Bruno and I combined the two things in each FSI this was step 70 remember the first tutorial was step 14 and if I run my PhD code on the same machine using DL to 6.0 which was the library that was available for my PhD and I
34:28
was the last version that is compatible with my PhD code because I did not update it okay it runs 40% 40 times faster without any changes on my side on
34:40
the PhD code so what that means is that the library in the meantime has become 40% 40% times faster from the software point of view not from the hardware point of view so if I take the same machine same code the library that is running on the background has become much more effective and much much faster in doing a lot of things and this happens a lot if you use libraries
35:02
that are being developed constantly by other people and all of a sudden on the code that I was using here I can do things that they would not possible before just in terms of dimensions of things and in terms of applications to order problems so this I think the whole is much greater than the sum of
35:27
in my opinion if you look at the way this has had impact on science in general so if you look at this this is a graph that I really like I think we are stable of about 200 publications per year site in the library and this
35:40
is just the history and I think we saturate it here because there isn't there isn't really that much of a market more than this for finite element software I mean we're not selling software that runs on iPhones and things like that so it's difficult to get larger but I think this is not really interesting but what is interesting is this graph and so this
36:03
represents what the world has been doing with our library and the red dots are the principal developers the 13 people of the principal developers so most of the publications are here but there's a very large hollow of people around the world and around the globe that have nothing to do with us that have never published with us that are using the
36:22
library and our site in the library and I think this is what makes me happy about contributing so I think this what this is what makes it worth and how do we make this sustainable this is the main question that I would like to ask you usually these papers are papers that actually use the
36:48
library and the site library for usage so we have this graph is actually maintained by our students yearly and all of the papers that are here this is actually an under estimations all of the papers that are here are papers
37:03
that are using the library and are citing the library and the paper of the library with the version of library that have been used so it's not just a citation by random paper citing the deal to library because it's part of a piggy package or whatever so and so I like this hollow part here because it
37:24
has nothing to do with us so it means it's useful for other people independently on the originated people that are here so how do we make it sustainable the little important things we need strict version control and that's what we do today to make sure that this works so we have about
37:41
five for requests per day every day on average so that's why we need every year and you release things we have strict peer review and the thing that really need to to make sure is that we use it in our courses and we teach it to our students otherwise it's impossible for them to use it but it's a difficult library days and so we try to use this in all the courses that
38:03
we teach so I'm teaching a finite element course and I do the examples with C++ in deal to students hate me because it's difficult at the beginning but then after a week it switches and they start liking it because they can do things that other colleagues cannot do we encourage
38:22
students to write sustainable code to write shareable code and reusable code but the main issue here is that we need to attract new contributions and this is open questions because students and postdocs need recognitions for their software work otherwise there's no way that we can actually sustain this they we we do this by allowing students to become co-authors of new release
38:44
papers if they contribute to the library in a substantial way and also PIs need recognition for their software work and I think this today is at least in Italy it's very difficult to obtain here there's much more sensitive sensibility on these topics and what I would like to conclude is making
39:03
good software is not a hobby it's a work on its own right so if we want to make sure that this is sustainable with with libraries that are useful for everybody we need more funding to make this sustainable we need to make sure that this is actually done in a good way and then last but not least we would
39:20
like to have reproducibility and I think using open source libraries like deal to helps in this respect so we need to be reproducible in the sense that nobody will accept a paper with a closed source theorem saying oh I have proved the theorem but I don't want to publish the proof of the theorem in mathematics we are used to this mathematics is open science in a certain
39:43
sense so why should we accept a paper without the source code that produces the graphs on the paper and today it's very difficult to actually reproduce anybody's results in many cases it's very difficult to do that so we are studying now to get reproducibility badges which is very
40:02
good and this is on articles that actually share the codes used to produce the results deterministically so you have to have doctor images capsules or snapshots of what has happened in order for that to work and we encourage this in all possible ways so we provide doctor images of the library with all the references and dependencies and so on and so forth so
40:21
let me just finish with a few pictures and colors that actually show the projects in which what I have been developing has had an impact and these are for example the aspect solvers for mantle convection and these are just solving boost in ask equations in which you have fluid flow which is
40:43
driven by both temperature and density changes and these are simulations that you can actually reproduce on your laptop of you know these simulations in particular in a very effective in a very nice way but they scale to thousands of processors these takes capital weeks as a
41:02
supercomputer by using the entire supercomputer and of course life becomes beautiful because if you start using a library like this one you're enable things that you wouldn't even think possible at the beginning this is actually a simulation of the rifts of you know the earth mantle that is drifting apart and it's coupled with another software which is fast capes
41:24
colors can't fast cape the software which is on the top it actually reproduces what happens on this on the landscapes of the evolution of the surface of the earth coupled with what is happening on the bottom and and this is a very nice work by these people on here not my work but it's been done
41:42
with a lot of the things that have been developed and added to the library by students fine students of our group so I'm very happy about these results and just to mention the last ones these are the ones that I actually like the most these are beautiful simulations of all the flows which are extremely difficult to get right and extremely difficult to get accurately
42:03
and on standard computers this would not be possible this is a very high performance computing simulations in 2d in this particular case with all the details of the turbulent flows and the 3d versions is a similar similar things it has an impact on today's life and social life and I think it has an
42:23
impact also on the environment and we know that this is an impact on the environments because it's being used for such simulations so this is a downbreak and it's not just a classical downbreak that you see in 2d in which the left part is pulled it up and you see the downbreak on the left part this is a true downbreak and it takes several hours of
42:44
computing time I wouldn't say several thousands hours of computing times and this is a result of Matthias Meyer computations but it's not only used for these type of things but it's also used for for example life X which is a project which has been done at the Politecnico di Milano for heart
43:01
simulations and what they do is the full heart simulations with the full fluid structure interaction problems on the surface of the art and only into your part with the Navier-Stokes equations and as well of the electrical activations and the wave propagations for the electrical waves
43:22
so is it worth it I think it is worth it I wanted to show some of the things here because I've seen that there is also a lab here where there's a lot of mixing going on and a lot of particles being mixed and foam being generated this is for example simulations that have been done by by
43:40
Bruno Blais at Politecnico di Montreal with a code that have been written by me and Bruno himself and this is just a simulations of mixing with different tools for mixing with different slides bands for mixings and I would like to simply say it's not important what you do today but what you do
44:02
today will enable things to work in years from today so this software that is being published in 2023 it's a result of about 20 years of development that I haven't done by myself so I did the development only for maybe one year of this 25 years and the other 24 came for free because I
44:24
have used this type of libraries so I think this is what comes back if you use these types of libraries and I think this is no it's priceless so if you manage to get a hold of this type of libraries and to make sure that they are maintained for such a long time then you have a chance to get
44:41
results which you wouldn't be able to get in any other ways and with this I would like to conclude it's just maybe asking questions to you so how do you do it how do you make your you know your contribution sustainable to the to the economic world thank you