We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The Quest For Better Tests In Scientific Computing

00:00

Formal Metadata

Title
The Quest For Better Tests In Scientific Computing
Title of Series
Number of Parts
60
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Writing tests for scientific computing software is very hard. The input space for programs can be infinite, therefore selecting "good" or "interesting" inputs is crucial. Time and computing resources to execute tests however are limited, but developers need timely responses to changes. Guidelines to implement tests under these, and likely more, constraints are not readily available in literature. This contribution tells the story of trying to move our Python library pyMOR to property based testing, simplifying the process of writing more meaningful tests with less code and balancing runtime versus impact. We also include a call to action for the community to jointly develop concrete guidelines for designing and implementing unit tests.
6
Thumbnail
15:56
53
Thumbnail
22:03
Computational scienceSoftware testingSoftwareDeutscher FilmpreisOrder (biology)Data modelBasis <Mathematik>InfinityElement (mathematics)Execution unitBuildingMultiplication signLibrary (computing)Basis <Mathematik>Branch (computer science)BitSoftwareCartesian coordinate systemUnit testingSystem callProjective planeComputer animationLecture/Conference
Software testingData modelLibrary (computing)BuildingOrder (biology)Endliche ModelltheorieCartesian coordinate systemReduction of orderPhysical systemPartial differential equationLecture/ConferenceXMLComputer animation
Public domainParameter (computer programming)Compact spaceSolution setSpacetimeComputerPartial differential equationSoftware testingSingle-precision floating-point formatDimensional analysisInfinityReduction of orderSet (mathematics)Descriptive statisticsNumerical analysisParameter (computer programming)SpacetimeFunctional (mathematics)MappingOrder (biology)Endliche ModelltheorieMultiplicationNeuroinformatikComputer animation
ComputerParameter (computer programming)Compact spacePublic domainSpacetimeSolution setPartial differential equationRun time (program lifecycle phase)Software testingInterface (computing)FreewareDrum memoryBuildingIndependence (probability theory)DisintegrationComputer-generated imageryPhysical systemWindows AzureTotal S.A.Function (mathematics)outputSystem callExecution unitAverageIterationLimit (category theory)Test-driven developmentSharewareMoment (mathematics)Reduction of orderGraph (mathematics)Run time (program lifecycle phase)Real-time operating systemConfiguration spaceComputer hardwareSoftware testingContinuous integrationSet (mathematics)Mixture modelMultiplication signOrder (biology)Endliche ModelltheorieMedical imagingSoftwareINTEGRALLibrary (computing)Execution unitoutputParameter (computer programming)Computing platformVector spaceAbstractionNumerical analysisFunctional (mathematics)Installation artDifferent (Kate Ryan album)MeasurementUnit testingImplementationQuery languageBitChromosomal crossoverPoint (geometry)SpacetimeCASE <Informatik>Function (mathematics)CodeRight angleOrder of magnitudeOpen sourceLinear codeVisualization (computer graphics)Physical systemSoftware developerCurveSelectivity (electronic)Commitment schemeOvalOracleOperator (mathematics)WindowPartial differential equationParametrische ErregungComputer animation
Software testingAerodynamicsExecution unitCodeoutputRadio-frequency identificationCategory of beingImplementationKeilförmige AnordnungComputer configurationIntegrated development environmentElectronic mailing listRange (statistics)HypothesisParameter (computer programming)Functional (mathematics)WaveMultiplication signElement (mathematics)outputSoftware testingMereologyRight angleComputer configurationCASE <Informatik>Fiber bundleArea4 (number)Strategy gameSpacetimeArithmetic mean1 (number)Parametrische ErregungOperator (mathematics)Musical ensembleGreatest elementElectronic mailing listDistribution (mathematics)HypothesisConfiguration spaceSoftware frameworkElectric generatorTime seriesOnline helpInfinityVector spaceLine (geometry)Different (Kate Ryan album)Library (computing)ResultantSlide ruleLengthProjective planePoint (geometry)Category of beingNeuroinformatikMobile appLocal ring10 (number)Function (mathematics)Set (mathematics)AbstractionIntegrated development environmentForestNumerical analysisSubsetBranch (computer science)Sheaf (mathematics)Type theoryArray data structureChainFront and back endsIntegerError messagePrice indexCodeTerm (mathematics)XMLComputer animation
ChainSoftware testingTransformation (genetics)HypothesisFiber bundleScale (map)Basis <Mathematik>CoefficientState of matterSpacetimeExecution unitoutputKeilförmige AnordnungUsabilityMilitary baseLinear codeFiber bundleFunction (mathematics)Electronic mailing listSoftware testingMathematicianoutputFormal languageElectronic program guideLibrary (computing)CoefficientCategory of beingPoint (geometry)CASE <Informatik>Office suiteState of matterComplex (psychology)Mixed realityBitBasis <Mathematik>Transformation (genetics)MereologyLevel (video gaming)Software developerSpacetimeElectric generatorOrder (biology)AreaIntegerLatin squareAlgorithmFunctional (mathematics)SequenceÜbertragungsfunktionFilm editingFinite-state machinePhysical systemQuicksortVirtual machineCodeOperator (mathematics)Vector spaceDiscrete groupDimensional analysisUnit testingDifferent (Kate Ryan album)HypothesisStrategy gameWritingSoftware engineeringCombinational logicUsabilityArray data structureMatrix (mathematics)Social classExecution unitRule of inferenceInfinityError messageElectronic visual displayRepository (publishing)Computer animation
Library (computing)Point (geometry)Forcing (mathematics)Software testingError messageFrame problemInterpreter (computing)MeasurementoutputFunctional (mathematics)Medical imagingDistanceVirtual machineDatabaseEndliche ModelltheorieComplex (psychology)Entire functionLocal ringGreedy algorithmDemosceneVector spaceDot productSoftware developerINTEGRALRun time (program lifecycle phase)MultilaterationMereologyCryptanalysisStrategy gameCodeRevision controlIntegrated development environmentOrthogonalityRight angleBuildingConfiguration spaceStack (abstract data type)Digital photographySpacetimeControl flowQuicksortCASE <Informatik>View (database)Focus (optics)Unit testingPhysical systemComputer animationLecture/Conference
Transcript: English(auto-generated)
Okay, yeah, thank you for the introduction Yeah, I'm a PhD candidate I'm also the infrastructure guy in our Institute and Currently founded in the sustainable software call from DFG and
The software I'm Co-writing and using I'm also employing the actual research. I'm supposed to be doing Especially in the localized reduced basis method branch. I'm guessing very few of anybody has heard of this
So Yeah, so I'm going to tell you a little bit about the software library pi more Which I'm working on I'm going to tell you about my personal or projects problems with unit testing and
How we're not trying to use a different better approach, let's hope and Yeah, the last bit will be calling on everybody to help me Okay, so what is pi more that's a Python library it's not a
Application you can just run and it's you can use it to build your own model order reduction applications If you don't know what model order reduction is The model involved is for us something
Mathematical usually involving a parameterized PDE or an LTI system and Well, the mathematical description is something like this you have a function that maps from some big unknown parameter space
to solution and Maybe you from this solution you want to compute some Single number. This is the quantity of interest, but this doesn't really matter I'm just trying to show you that there are possible Infinite dimensional spaces at play here where we get parameters from
So and usually a model order reduction is used in one of two settings either you want to calculate a lot of Quantities of interest for any number of parameters. This is the many
situation many queries or You have to be in some kind of real-time setting where you No matter what parameter you get you have a deadline of I don't know 15 microseconds to get a solution And I'm not going to tell you what the actual model order reduction techniques are because well
That's a couple of lectures This is a graph for the many many query situation right you have somewhat linear explosion of runtime the more solves you have to do for different parameters the more time you spend and
this red curve is the Visualization of a model order reduction technique you have to invest time up front this is the very steep slope at the beginning and then at some point you have a Crossover point where you're cheaper because every single solve after the initial
expenditure of effort gets much much much much much faster and Depending on the sizes of spaces and whatever This graph can look this way can look magnitudes of orders better or worse This is very problem dependent
Okay, if you want to know more about the actual time where you can read our paper You can come to the first ever pi more school in October and mark the book But you'll have to use you'll have to know model order reduction before and we're only teaching About pi more not the underlying techniques
Okay, so we've started pi more seven-ish years ago now at Munster with three people We're now somewhere between 12 and 16 contributors four of them
Fairly regularly We have a largest code base lots of commits open source. We're hosted on get up And of course I say of course, but we do have continuous integration tests
Which we are right from the beginning I think we had a setup on Travis CI where every commit pushed gets built and Tested in some way Recently, I migrated away from Travis I to use Infrastructure we have at our Institute and in the university at large
The method is pretty pretty simple I say We just have a docker image with lots and lots and lots of dependencies libraries external PD solvers and whatnot and our Pipeline gets executed in this we use pi test to run the tests
We do some deployment for Python wheels and check if everything installs on different platforms And we do some very limited testing on OS X and Windows, but that's not no that's actually just running a demo okay, so this is what our integration pipeline
Looks like at the moment We can run a bunch of stuff in parallel on the test column. There are different configurations in which the tests are run and we have a very Varying runtime of each but as a single build step can take up to 55 minutes and
this is on hardware in our Institute, which is recent and fairly large-sized so and these are these are a mixture of traditional unit tests where we input a given data into our into our
Function check the assertion the Oracle as I just learned and see what the output is and We have some Into some system system tests where we run a full
Model and we have an analytic function. That is the real solution and we can check against that All right, so What is what are the problems I mentioned that we and I have with unit testing First of all We found that people aren't actually very good at selecting what to test
So the developer has a very can have a very strong bias to select test cases He knows are okay, and the software will work with them This is a problem test parameterization
Can be difficult to formulate You have to run the same tests for a bunch of different scenarios, right For us this is different back ends for PD solves different measures different whatever
different implementations of our vector abstractions and so forth You have to write this and especially in pi test This can be Can be looking very not so nice If you then want to compose this functionality into something that is a
More expressive System so where you have to Execute a bunch of tests in order in some order to check if your model pipeline still works this is Again, something you have to write manually
it's well, it's effort and The last last bit is I don't actually know if my tests are good. I Have no real metric for it Then the question is how much testing how much unit testing should I actually do?
Do I aim for 100% coverage is 70% okay Does this actually mean anything? And How much of the input space can I cover should I cover?
This is all this is a very big problem, especially for us where we have unknown dimensional parameter spaces Our vectors can be 10 the 10,000 dimensional time series and whatever
The next question then is if I can do it should I do it Is this okay if my test runs for two days Does this help me? And of course, can I run it? is my local computer environment able to run 13 of these tests concurrently and
There's practical practical problems, especially for our setup for us in pi tests the parameterization runs to something called fixtures and For us, it's always a problem if a single failure happens somewhere in the code and the fixture is very complex
it's composed of multiple parts and The fixture itself has a large input space So what happens if a failure if I want to reproduce a failure? I Currently have two options. I can hack the fixture manually change Whatever I need to just produce one result do a complex breakpoint
This is very hard for something that has very large data So we've looked and to Improve the situation. We come across something called Hypothesis. This is a Python
library That implements property-based testing The idea is very old in computer terms the original quick check Which this is a reimplementation of was developed I think in the late 80s
This is a library that plugs into a couple of different test frameworks Which for us in this one And when there's one case that I was able to spend time on now Makes for much silver parameterization. It is much easier to compose
Tests with it It has the very very nice feature of outputting a decorator If a test fails you put it on your test function it reruns the exact test it did before and only this You can add your edge cases manually and
Well, you can configure How much I think I skipped it, okay You can configure how much data hypothesis produces to input into your tests, so your everyday
tests can get 300 points of data and once a week you run three salaries and My favorite one PI charm actually can resolve these kinds of Fixture decorators and if I run PI tests with its fixtures I can I get nothing it Doesn't understand. It doesn't understand pi test fixtures
Okay, so this is a a non-working subset example of our fixtures For a array Abstraction this is relatively complex because we want we need different setups we need
Situations where we get two arrays of same size with values. We need setups where we get Ones that you can't actually add so we get an error and so forth and this is I think just one where we get actually one vector with Prescribed elements and
A list of indices. I mean I can't read Okay, so And The test empty function then gets the vector array Fixture that's defined two lines above and this is what this is the pi test fixture magic that nobody understands
Okay now with hypothesis you define strategies to generate input data for your tests The top line is a strategy for integers. This is currently Actually implemented in a branch worker for our project and
Well these strategies Search their input space I'm not going to say randomly but It's a way to implement it that you get edge cases. If you don't disallow it you get in infinities and so forth
and Now what's here on the slide actually produces About six or seven different configuration options with the times of vectors length different number types and
It's still pretty dense But you get one function that defines all your data output If you can if you look at the whole setup now, it's much more readable. You don't have any magic numbers in it
It's easily extensible you can Do the same setup for different backends and composing different strategies into a new strategy as trivial it's
Fourth line from the bottom, you know, you're just it's an operator right right this this is how you compose strategies and Hypothesis that's the rest for you Those this is a major major improvement for us because before we had to
Chain parameters next to another and this is much much nicer Yeah, as I said This is only the first bit of Code refactoring that I've started it's very little effort
Yeah as of now, I've basically only changed how our unit tests get data It was fixed list before now. There are strategies that generate whatever data I've told it to and Well, it's only and I have not implemented this
fast test for development Weekends can run longer tests. This is all done yet The actual bigger part I think what hypothesis Enables you to do is Define a state machine as a class you define
transformations between your state as Rules It was it called You can then Implement a check that some property of your state still holds and
The library will combine transfer functions in every which way generate inputs how and strategies like you define and will try to fail your test and If it finds a failure it will try to Shrink the sequence of input it generated to the minimal one where this test still fails
I've only come up with a very simple example as of now. This is a State machine for Basis testing so a vector space basis
You Define a member that is a bundle. So basically the library will then generate new Basis from all functions that have a target for this
I've cut out the actual basis generation because it's a little more involved and then you can define transfer functions On your state where you know a property will still be should still hold after the After the transformation, so if I run a basis through a Gram-Schmidt algorithm, I know it's still a basis afterwards and
If I want to Display any given vector of the same dimension for a basis. I know that I must be able to solve this linear system here So now if I run this
Hypothesis will generate a list of inputs for basis generation Will pick up basis from this bundle put it in Gram-Schmidt put it in coefficients Outputs get in new basis and so forth and so forth and I don't have to tell my test casing
How it should compose new tests from the state machine. This of course is a pretty Truvial example, but It still works Okay, so to recap what I found if I want to use hypothesis, it's much easier for me to
Let my test run on a very large input space. I Don't have to manually write how many data points I get It searches edge points. It's edge cases for your input data So as an example, we've never input
infinities in our areas before We didn't mix We didn't mix complex valued arrays and integer arrays and something like that and I think our values were capped at minus one and one and
Some algorithms actually behave very differently if there's Order of difference in your input data and this is everything that just pops up now and we have to fix a lot and stuff Yeah, for me it's a huge boost in and usability if I actually find an error
It is however For me and us Somewhat hard to actually apply this on our code Because you need these properties
Or transformations and you still need something to control against and So for for this for the basis, it's pretty easy if I put in a vector I know I must be able to solve the coefficient matrix But for other algorithms and operators and discretizations
This is quite unclear how this will work or if it will work But for this very data-driven vector abstraction, this was this was I think it's great Okay, so and something that I've in looking into this I found I
Was very lacking as Practical examples and I mean guidelines on how to actually and write unit tests There's I found literature in software engineering research that I don't understand as a mathematician
So I don't speak the language I found 100% coverage is great. You're done this is very untrue for us and Well, so basically I would have hoped to have a repository
Of some kind with guides, what do other people do what failed at other failed for other people and stuff? so If anybody wants to Join or doesn't know of this if this exists already
Please contact me. Yeah That's it Thank you very much Renee Okay, are there questions?
Otherwise, oh was that what okay I have one maybe it's a very practical one. Maybe I didn't perfectly get it like when you do this testing and it generates The input data for you Do you test mainly the functionality of your code whether it breaks or do you test
The quality of the outcome or the reproducibility or the you know what I mean? Yeah For this part where I've started doing a path of this. This is real unique testing for our vector extraction
So like does the scalar products of two orthogonal vectors still produce zero? Yeah, something like this we haven't used this in a Integration testing no, no complex. This doesn't run a parameterized model or something and where we then check if the
truth truthiness of that model is uh-huh, and this is actually one of the problems if I if I want to use this I Not super sure how this will work. It would be great to join forces here I think some problems are similar. Yeah You two first and then
They have all code or they try to maximize the distance Okay, yes, so a greedy search in the aerospace error measure in some kind, okay
Yeah, thank you
If you don't change the sorry, right The question was if I run this run a test twice does the library input the same data Yes If you don't change the configuration as far as I understand the strategy is deterministic
okay, so the question is if a check fails I fix it and How are the chances that the same input data is reproduced at a later date, right?
Well these strategies can change of course with library versions and whatnot But you can actually use a functionality of the library to Put exactly that input data that you checked as an example
there's extensive Data-basing behind the scenes where you can and I'm not sure if one should does do that But you can basically ship the entire library of everything that has been done yet as already and
That would ensure that the test is rerun if it failed at some point or of course the explicit decorator to Mark the function. I know there's one more question is possible if somebody wants to Okay, I have a question just very practical you said you switch to Docker
So that's the frame you have around that everything gets reproducible or Why did you do that? Well, I wouldn't say it gets reproducible There's a lot of challenges in actually getting reproducible to Docker builds. No this this was purely practical
Because we have a lot of dependencies that are C++ libraries that use very Large stacks of support libraries and If you try to install this on a machine
That's fine. If you try to install this on 20 machines that that's nonsense This is basically something for us to get a well reproducible environment to execute in but it's still not really reproducible because Some stuff gets installed at runtime
Which we want because Python dependencies Update all the time and if a user gets a new library and our code breaks we want to know that But yeah, this is mostly practical for us just to distribute it Well, you can you can use our images for local development just as well
So I don't install all the PD solvers in my machine I just run my Python interpreter in the Docker image. Mm-hmm. Okay. Thank you very much