The Quest For Better Tests In Scientific Computing - TIB AV-Portal

The Quest For Better Tests In Scientific Computing

00:00

22

Gesellschaft für Informatik e.V. (GI)

Formal Metadata

Title

The Quest For Better Tests In Scientific Computing

Title of Series

deRSE 2019 - Konferenz für ForschungssoftwareentwicklerInnen in Deutschland

Number of Parts

60

Author

Contributors

de-RSE e.V. - Gesellschaft für Forschungssoftware

License

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/42513 (DOI)

Publisher

Gesellschaft für Informatik e.V. (GI)

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Writing tests for scientific computing software is very hard. The input space for programs can be infinite, therefore selecting "good" or "interesting" inputs is crucial. Time and computing resources to execute tests however are limited, but developers need timely responses to changes. Guidelines to implement tests under these, and likely more, constraints are not readily available in literature. This contribution tells the story of trying to move our Python library pyMOR to property based testing, simplifying the process of writing more meaningful tests with less code and balancing runtime versus impact. We also include a call to action for the community to jointly develop concrete guidelines for designing and implementing unit tests.

deRSE 2019 - Konferenz für ForschungssoftwareentwicklerInnen in Deutschland16 / 60

1

08:12

Introducing: GERMAN INFORMATICS SOCIETY (GI)

2

44:17

Research Computing and Computing for Research

3

04:23

Potsdam Institute for Climate Impact Research (PIK) welcomes you to the 1st deRSE conference

4

05:36

A warm welcome from the GFZ German Research Centre for Geoscience

5

04:46

Welcome from AWI (Forschungsstelle Potsdam des Alfred-Wegener-Instituts)

6

15:56

7

22:49

HIFIS Software Services, the Competence Cluster for a Sustainable Spftware Development in the Helmholtz Association

8

29:46

Empfehlungen für bessere Forschungssoftware

9

22:58

Neukonzeption des DLR Software-Katalogs

10

23:49

PALM – a story of developing and maintaining a scientific model system

11

24:16

Integrierte Entwicklungs- und Publikationsumgebung für Forschungssoftware und Daten am Helmholtz-Zentrum Dresden-Rossendorf (HZDR)

12

16:39

Software for autonomous astronomical observatories

13

24:40

Help me help you

14

22:52

Challenges and Opportunities of Open-Source Software: the case of SU2

15

16:52

Parallel-in-Time integration with PFASST: from prototyping to applications

16

28:35

The Quest For Better Tests In Scientific Computing

17

23:59

Umsetzung effizienter plattformunabhängiger App-Entwicklung in einer bestehenden Forschungssoftware-Landschaft

18

38:53

deRSE 2019 - Poster Lightning Talks

19

27:41

Zwischen Digital und Humanities, oder die digitale Kirche im Dorf lassen

20

26:32

New Approaches towards User Research and Software Architecture in Research Software Engineering: A Humanities Example

21

30:27

The Debian Astro Project

22

25:39

ESM-TOOLS: a tool for Earth-System-Modellers

23

22:39

Develop, License, Test, Curate - Mathematical Optimization In The Real World

24

17:12

Building a healthy and vibrant volunteer driven community: The Bio-IT project

25

25:32

Von Closed zu Open Source

26

30:10

Portable Container zum Entwickeln, Erstellen, Verteilen und Ausführen von komplexer wissenschaftlicher Software

27

18:37

Die Hard 1.1024.0: backward compatibility of a search engine with persistant Ids

28

22:35

Softwareentwicklung zwischen Forschungscode und Industriereleases

29

26:31

GUI-Architektur für interaktive Datenanalyse

30

26:25

Building research software communities

31

26:26

Curious Containers: Framework zur Reproduzierbarkeit von digitalen Experimenten

32

22:11

The Research Software Engineering Landscape in Germany

33

27:03

NICOS - ein Steuerungsframework für Großforschungsgeräte

34

27:13

Eine virtuelle Werkstatt für die Digitalisierung in den Wissenschaften

35

17:03

Breaking down scientific mono cultures by cross-disciplinary software development

36

24:33

ediarum - from bottom-up to generic programming

37

23:27

Forschungssoftware als digitale Ressource erhalten

38

17:02

GitLab pipelines for every need: testing, documentation, and writing a paper

39

09:50

deRSE19 and de-RSE e.V. - Society for Research Software

40

09:56

How to save a scientist’s career with data classes?

41

35:43

deRSE 2019 - Closing Session

42

19:42

Building scientific communities - Lessons learned with AIRR

43

19:17

Development of research software at DLR - role and status in practice

44

29:34

Keynote: Delivering on the promise of Research Computing

45

26:52

Entwicklung der Forschungssoftware RCE im DLR

46

28:47

Decentralized software engineering and CI/CD in a European joint research project

47

19:11

Against Schematisation – Mapping the Choreographic Vector Space

48

49:56

Keynote: Sustainable Research Software – as Code, as Paper, as Book

49

45:34

Keynote: RSEs together - Building careers, collaborations, groups and communities

50

13:42

Evaluation of the semantic research data management system CaosDB in glaciology

51

30:55

Data mining made easy, reproducible and open-source

52

21:25

The art of giving and receiving code reviews

53

22:03

54

16:57

Generation of a wrapper library for MPI - MeDiPack

55

13:47

Rückblick: Herausforderungen für die nachhaltige Entwicklung, Bereitstellung und Pflege von Forschungssoftware in Deutschland

56

27:10

Rahmenbedingungen für einen nachhaltigen Umgang mit Forschungssoftware am Helmholtz-Zentrum Potsdam - Deutsches GeoForschungsZentrum GFZ

57

23:28

Lebensverlängernde Maßnahmen für Fortran-Codes

58

15:57

Auto-scaling deadline-constrained workloads

59

15:57

Automated Deadline-Based Scaling of Experiments in the Cloud with MiCADO

60

1:01:26

Paneldiskussion "Nachhaltigkeit von Forschungssoftware in Deutschland" (in German)

Automatic playback

Speech

Text

Image

00:00

Computational scienceSoftware testingSoftwareDeutscher FilmpreisOrder (biology)Data modelBasis <Mathematik>InfinityElement (mathematics)Execution unitBuildingMultiplication signLibrary (computing)Basis <Mathematik>Branch (computer science)BitSoftwareCartesian coordinate systemUnit testingSystem callProjective planeComputer animationLecture/Conference

01:25

Software testingData modelLibrary (computing)BuildingOrder (biology)Endliche ModelltheorieCartesian coordinate systemReduction of orderPhysical systemPartial differential equationLecture/ConferenceXMLComputer animation

01:49

Public domainParameter (computer programming)Compact spaceSolution setSpacetimeComputerPartial differential equationSoftware testingSingle-precision floating-point formatDimensional analysisInfinityReduction of orderSet (mathematics)Descriptive statisticsNumerical analysisParameter (computer programming)SpacetimeFunctional (mathematics)MappingOrder (biology)Endliche ModelltheorieMultiplicationNeuroinformatikComputer animation

02:27

ComputerParameter (computer programming)Compact spacePublic domainSpacetimeSolution setPartial differential equationRun time (program lifecycle phase)Software testingInterface (computing)FreewareDrum memoryBuildingIndependence (probability theory)DisintegrationComputer-generated imageryPhysical systemWindows AzureTotal S.A.Function (mathematics)outputSystem callExecution unitAverageIterationLimit (category theory)Test-driven developmentSharewareMoment (mathematics)Reduction of orderGraph (mathematics)Run time (program lifecycle phase)Real-time operating systemConfiguration spaceComputer hardwareSoftware testingContinuous integrationSet (mathematics)Mixture modelMultiplication signOrder (biology)Endliche ModelltheorieMedical imagingSoftwareINTEGRALLibrary (computing)Execution unitoutputParameter (computer programming)Computing platformVector spaceAbstractionNumerical analysisFunctional (mathematics)Installation artDifferent (Kate Ryan album)MeasurementUnit testingImplementationQuery languageBitChromosomal crossoverPoint (geometry)SpacetimeCASE <Informatik>Function (mathematics)CodeRight angleOrder of magnitudeOpen sourceLinear codeVisualization (computer graphics)Physical systemSoftware developerCurveSelectivity (electronic)Commitment schemeOvalOracleOperator (mathematics)WindowPartial differential equationParametrische ErregungComputer animation

09:08

Software testingAerodynamicsExecution unitCodeoutputRadio-frequency identificationCategory of beingImplementationKeilförmige AnordnungComputer configurationIntegrated development environmentElectronic mailing listRange (statistics)HypothesisParameter (computer programming)Functional (mathematics)WaveMultiplication signElement (mathematics)outputSoftware testingMereologyRight angleComputer configurationCASE <Informatik>Fiber bundleArea4 (number)Strategy gameSpacetimeArithmetic mean1 (number)Parametrische ErregungOperator (mathematics)Musical ensembleGreatest elementElectronic mailing listDistribution (mathematics)HypothesisConfiguration spaceSoftware frameworkElectric generatorTime seriesOnline helpInfinityVector spaceLine (geometry)Different (Kate Ryan album)Library (computing)ResultantSlide ruleLengthProjective planePoint (geometry)Category of beingNeuroinformatikMobile appLocal ring10 (number)Function (mathematics)Set (mathematics)AbstractionIntegrated development environmentForestNumerical analysisSubsetBranch (computer science)Sheaf (mathematics)Type theoryArray data structureChainFront and back endsIntegerError messagePrice indexCodeTerm (mathematics)XMLComputer animation

15:50

ChainSoftware testingTransformation (genetics)HypothesisFiber bundleScale (map)Basis <Mathematik>CoefficientState of matterSpacetimeExecution unitoutputKeilförmige AnordnungUsabilityMilitary baseLinear codeFiber bundleFunction (mathematics)Electronic mailing listSoftware testingMathematicianoutputFormal languageElectronic program guideLibrary (computing)CoefficientCategory of beingPoint (geometry)CASE <Informatik>Office suiteState of matterComplex (psychology)Mixed realityBitBasis <Mathematik>Transformation (genetics)MereologyLevel (video gaming)Software developerSpacetimeElectric generatorOrder (biology)AreaIntegerLatin squareAlgorithmFunctional (mathematics)SequenceÜbertragungsfunktionFilm editingFinite-state machinePhysical systemQuicksortVirtual machineCodeOperator (mathematics)Vector spaceDiscrete groupDimensional analysisUnit testingDifferent (Kate Ryan album)HypothesisStrategy gameWritingSoftware engineeringCombinational logicUsabilityArray data structureMatrix (mathematics)Social classExecution unitRule of inferenceInfinityError messageElectronic visual displayRepository (publishing)Computer animation

22:31

Library (computing)Point (geometry)Forcing (mathematics)Software testingError messageFrame problemInterpreter (computing)MeasurementoutputFunctional (mathematics)Medical imagingDistanceVirtual machineDatabaseEndliche ModelltheorieComplex (psychology)Entire functionLocal ringGreedy algorithmDemosceneVector spaceDot productSoftware developerINTEGRALRun time (program lifecycle phase)MultilaterationMereologyCryptanalysisStrategy gameCodeRevision controlIntegrated development environmentOrthogonalityRight angleBuildingConfiguration spaceStack (abstract data type)Digital photographySpacetimeControl flowQuicksortCASE <Informatik>View (database)Focus (optics)Unit testingPhysical systemComputer animationLecture/Conference

Transcript: English(auto-generated)

00:02

Okay, yeah, thank you for the introduction Yeah, I'm a PhD candidate I'm also the infrastructure guy in our Institute and Currently founded in the sustainable software call from DFG and

00:24

The software I'm Co-writing and using I'm also employing the actual research. I'm supposed to be doing Especially in the localized reduced basis method branch. I'm guessing very few of anybody has heard of this

00:43

So Yeah, so I'm going to tell you a little bit about the software library pi more Which I'm working on I'm going to tell you about my personal or projects problems with unit testing and

01:04

How we're not trying to use a different better approach, let's hope and Yeah, the last bit will be calling on everybody to help me Okay, so what is pi more that's a Python library it's not a

01:22

Application you can just run and it's you can use it to build your own model order reduction applications If you don't know what model order reduction is The model involved is for us something

01:40

Mathematical usually involving a parameterized PDE or an LTI system and Well, the mathematical description is something like this you have a function that maps from some big unknown parameter space

02:01

to solution and Maybe you from this solution you want to compute some Single number. This is the quantity of interest, but this doesn't really matter I'm just trying to show you that there are possible Infinite dimensional spaces at play here where we get parameters from

02:21

So and usually a model order reduction is used in one of two settings either you want to calculate a lot of Quantities of interest for any number of parameters. This is the many

02:40

situation many queries or You have to be in some kind of real-time setting where you No matter what parameter you get you have a deadline of I don't know 15 microseconds to get a solution And I'm not going to tell you what the actual model order reduction techniques are because well

03:04

That's a couple of lectures This is a graph for the many many query situation right you have somewhat linear explosion of runtime the more solves you have to do for different parameters the more time you spend and

03:21

this red curve is the Visualization of a model order reduction technique you have to invest time up front this is the very steep slope at the beginning and then at some point you have a Crossover point where you're cheaper because every single solve after the initial

03:43

expenditure of effort gets much much much much much faster and Depending on the sizes of spaces and whatever This graph can look this way can look magnitudes of orders better or worse This is very problem dependent

04:00

Okay, if you want to know more about the actual time where you can read our paper You can come to the first ever pi more school in October and mark the book But you'll have to use you'll have to know model order reduction before and we're only teaching About pi more not the underlying techniques

04:23

Okay, so we've started pi more seven-ish years ago now at Munster with three people We're now somewhere between 12 and 16 contributors four of them

04:45

Fairly regularly We have a largest code base lots of commits open source. We're hosted on get up And of course I say of course, but we do have continuous integration tests

05:02

Which we are right from the beginning I think we had a setup on Travis CI where every commit pushed gets built and Tested in some way Recently, I migrated away from Travis I to use Infrastructure we have at our Institute and in the university at large

05:24

The method is pretty pretty simple I say We just have a docker image with lots and lots and lots of dependencies libraries external PD solvers and whatnot and our Pipeline gets executed in this we use pi test to run the tests

05:41

We do some deployment for Python wheels and check if everything installs on different platforms And we do some very limited testing on OS X and Windows, but that's not no that's actually just running a demo okay, so this is what our integration pipeline

06:02

Looks like at the moment We can run a bunch of stuff in parallel on the test column. There are different configurations in which the tests are run and we have a very Varying runtime of each but as a single build step can take up to 55 minutes and

06:22

this is on hardware in our Institute, which is recent and fairly large-sized so and these are these are a mixture of traditional unit tests where we input a given data into our into our

06:43

Function check the assertion the Oracle as I just learned and see what the output is and We have some Into some system system tests where we run a full

07:00

Model and we have an analytic function. That is the real solution and we can check against that All right, so What is what are the problems I mentioned that we and I have with unit testing First of all We found that people aren't actually very good at selecting what to test

07:25

So the developer has a very can have a very strong bias to select test cases He knows are okay, and the software will work with them This is a problem test parameterization

07:40

Can be difficult to formulate You have to run the same tests for a bunch of different scenarios, right For us this is different back ends for PD solves different measures different whatever

08:00

different implementations of our vector abstractions and so forth You have to write this and especially in pi test This can be Can be looking very not so nice If you then want to compose this functionality into something that is a

08:24

More expressive System so where you have to Execute a bunch of tests in order in some order to check if your model pipeline still works this is Again, something you have to write manually

08:43

it's well, it's effort and The last last bit is I don't actually know if my tests are good. I Have no real metric for it Then the question is how much testing how much unit testing should I actually do?

09:07

Do I aim for 100% coverage is 70% okay Does this actually mean anything? And How much of the input space can I cover should I cover?

09:22

This is all this is a very big problem, especially for us where we have unknown dimensional parameter spaces Our vectors can be 10 the 10,000 dimensional time series and whatever

09:42

The next question then is if I can do it should I do it Is this okay if my test runs for two days Does this help me? And of course, can I run it? is my local computer environment able to run 13 of these tests concurrently and

10:07

There's practical practical problems, especially for our setup for us in pi tests the parameterization runs to something called fixtures and For us, it's always a problem if a single failure happens somewhere in the code and the fixture is very complex

10:24

it's composed of multiple parts and The fixture itself has a large input space So what happens if a failure if I want to reproduce a failure? I Currently have two options. I can hack the fixture manually change Whatever I need to just produce one result do a complex breakpoint

10:44

This is very hard for something that has very large data So we've looked and to Improve the situation. We come across something called Hypothesis. This is a Python

11:00

library That implements property-based testing The idea is very old in computer terms the original quick check Which this is a reimplementation of was developed I think in the late 80s

11:22

This is a library that plugs into a couple of different test frameworks Which for us in this one And when there's one case that I was able to spend time on now Makes for much silver parameterization. It is much easier to compose

11:42

Tests with it It has the very very nice feature of outputting a decorator If a test fails you put it on your test function it reruns the exact test it did before and only this You can add your edge cases manually and

12:03

Well, you can configure How much I think I skipped it, okay You can configure how much data hypothesis produces to input into your tests, so your everyday

12:20

tests can get 300 points of data and once a week you run three salaries and My favorite one PI charm actually can resolve these kinds of Fixture decorators and if I run PI tests with its fixtures I can I get nothing it Doesn't understand. It doesn't understand pi test fixtures

12:43

Okay, so this is a a non-working subset example of our fixtures For a array Abstraction this is relatively complex because we want we need different setups we need

13:01

Situations where we get two arrays of same size with values. We need setups where we get Ones that you can't actually add so we get an error and so forth and this is I think just one where we get actually one vector with Prescribed elements and

13:23

A list of indices. I mean I can't read Okay, so And The test empty function then gets the vector array Fixture that's defined two lines above and this is what this is the pi test fixture magic that nobody understands

13:44

Okay now with hypothesis you define strategies to generate input data for your tests The top line is a strategy for integers. This is currently Actually implemented in a branch worker for our project and

14:03

Well these strategies Search their input space I'm not going to say randomly but It's a way to implement it that you get edge cases. If you don't disallow it you get in infinities and so forth

14:25

and Now what's here on the slide actually produces About six or seven different configuration options with the times of vectors length different number types and

14:43

It's still pretty dense But you get one function that defines all your data output If you can if you look at the whole setup now, it's much more readable. You don't have any magic numbers in it

15:04

It's easily extensible you can Do the same setup for different backends and composing different strategies into a new strategy as trivial it's

15:22

Fourth line from the bottom, you know, you're just it's an operator right right this this is how you compose strategies and Hypothesis that's the rest for you Those this is a major major improvement for us because before we had to

15:42

Chain parameters next to another and this is much much nicer Yeah, as I said This is only the first bit of Code refactoring that I've started it's very little effort

16:01

Yeah as of now, I've basically only changed how our unit tests get data It was fixed list before now. There are strategies that generate whatever data I've told it to and Well, it's only and I have not implemented this

16:21

fast test for development Weekends can run longer tests. This is all done yet The actual bigger part I think what hypothesis Enables you to do is Define a state machine as a class you define

16:43

transformations between your state as Rules It was it called You can then Implement a check that some property of your state still holds and

17:02

The library will combine transfer functions in every which way generate inputs how and strategies like you define and will try to fail your test and If it finds a failure it will try to Shrink the sequence of input it generated to the minimal one where this test still fails

17:26

I've only come up with a very simple example as of now. This is a State machine for Basis testing so a vector space basis

17:44

You Define a member that is a bundle. So basically the library will then generate new Basis from all functions that have a target for this

18:01

I've cut out the actual basis generation because it's a little more involved and then you can define transfer functions On your state where you know a property will still be should still hold after the After the transformation, so if I run a basis through a Gram-Schmidt algorithm, I know it's still a basis afterwards and

18:25

If I want to Display any given vector of the same dimension for a basis. I know that I must be able to solve this linear system here So now if I run this

18:44

Hypothesis will generate a list of inputs for basis generation Will pick up basis from this bundle put it in Gram-Schmidt put it in coefficients Outputs get in new basis and so forth and so forth and I don't have to tell my test casing

19:02

How it should compose new tests from the state machine. This of course is a pretty Truvial example, but It still works Okay, so to recap what I found if I want to use hypothesis, it's much easier for me to

19:24

Let my test run on a very large input space. I Don't have to manually write how many data points I get It searches edge points. It's edge cases for your input data So as an example, we've never input

19:44

infinities in our areas before We didn't mix We didn't mix complex valued arrays and integer arrays and something like that and I think our values were capped at minus one and one and

20:03

Some algorithms actually behave very differently if there's Order of difference in your input data and this is everything that just pops up now and we have to fix a lot and stuff Yeah, for me it's a huge boost in and usability if I actually find an error

20:26

It is however For me and us Somewhat hard to actually apply this on our code Because you need these properties

20:43

Or transformations and you still need something to control against and So for for this for the basis, it's pretty easy if I put in a vector I know I must be able to solve the coefficient matrix But for other algorithms and operators and discretizations

21:05

This is quite unclear how this will work or if it will work But for this very data-driven vector abstraction, this was this was I think it's great Okay, so and something that I've in looking into this I found I

21:25

Was very lacking as Practical examples and I mean guidelines on how to actually and write unit tests There's I found literature in software engineering research that I don't understand as a mathematician

21:44

So I don't speak the language I found 100% coverage is great. You're done this is very untrue for us and Well, so basically I would have hoped to have a repository

22:07

Of some kind with guides, what do other people do what failed at other failed for other people and stuff? so If anybody wants to Join or doesn't know of this if this exists already

22:25

Please contact me. Yeah That's it Thank you very much Renee Okay, are there questions?

22:43

Otherwise, oh was that what okay I have one maybe it's a very practical one. Maybe I didn't perfectly get it like when you do this testing and it generates The input data for you Do you test mainly the functionality of your code whether it breaks or do you test

23:05

The quality of the outcome or the reproducibility or the you know what I mean? Yeah For this part where I've started doing a path of this. This is real unique testing for our vector extraction

23:21

So like does the scalar products of two orthogonal vectors still produce zero? Yeah, something like this we haven't used this in a Integration testing no, no complex. This doesn't run a parameterized model or something and where we then check if the

23:42

truth truthiness of that model is uh-huh, and this is actually one of the problems if I if I want to use this I Not super sure how this will work. It would be great to join forces here I think some problems are similar. Yeah You two first and then

24:29

They have all code or they try to maximize the distance Okay, yes, so a greedy search in the aerospace error measure in some kind, okay

24:49

Yeah, thank you

25:00

If you don't change the sorry, right The question was if I run this run a test twice does the library input the same data Yes If you don't change the configuration as far as I understand the strategy is deterministic

25:49

okay, so the question is if a check fails I fix it and How are the chances that the same input data is reproduced at a later date, right?

26:03

Well these strategies can change of course with library versions and whatnot But you can actually use a functionality of the library to Put exactly that input data that you checked as an example

26:22

there's extensive Data-basing behind the scenes where you can and I'm not sure if one should does do that But you can basically ship the entire library of everything that has been done yet as already and

26:40

That would ensure that the test is rerun if it failed at some point or of course the explicit decorator to Mark the function. I know there's one more question is possible if somebody wants to Okay, I have a question just very practical you said you switch to Docker

27:06

So that's the frame you have around that everything gets reproducible or Why did you do that? Well, I wouldn't say it gets reproducible There's a lot of challenges in actually getting reproducible to Docker builds. No this this was purely practical

27:27

Because we have a lot of dependencies that are C++ libraries that use very Large stacks of support libraries and If you try to install this on a machine

27:41

That's fine. If you try to install this on 20 machines that that's nonsense This is basically something for us to get a well reproducible environment to execute in but it's still not really reproducible because Some stuff gets installed at runtime

28:01

Which we want because Python dependencies Update all the time and if a user gets a new library and our code breaks we want to know that But yeah, this is mostly practical for us just to distribute it Well, you can you can use our images for local development just as well

28:24

So I don't install all the PD solvers in my machine I just run my Python interpreter in the Docker image. Mm-hmm. Okay. Thank you very much

Recommendations