The Quest For Better Tests In Scientific Computing
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 60 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/42513 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
6
13
21
25
41
53
00:00
Computational scienceSoftware testingSoftwareDeutscher FilmpreisOrder (biology)Data modelBasis <Mathematik>InfinityElement (mathematics)Execution unitBuildingMultiplication signLibrary (computing)Basis <Mathematik>Branch (computer science)BitSoftwareCartesian coordinate systemUnit testingSystem callProjective planeComputer animationLecture/Conference
01:25
Software testingData modelLibrary (computing)BuildingOrder (biology)Endliche ModelltheorieCartesian coordinate systemReduction of orderPhysical systemPartial differential equationLecture/ConferenceXMLComputer animation
01:49
Public domainParameter (computer programming)Compact spaceSolution setSpacetimeComputerPartial differential equationSoftware testingSingle-precision floating-point formatDimensional analysisInfinityReduction of orderSet (mathematics)Descriptive statisticsNumerical analysisParameter (computer programming)SpacetimeFunctional (mathematics)MappingOrder (biology)Endliche ModelltheorieMultiplicationNeuroinformatikComputer animation
02:27
ComputerParameter (computer programming)Compact spacePublic domainSpacetimeSolution setPartial differential equationRun time (program lifecycle phase)Software testingInterface (computing)FreewareDrum memoryBuildingIndependence (probability theory)DisintegrationComputer-generated imageryPhysical systemWindows AzureTotal S.A.Function (mathematics)outputSystem callExecution unitAverageIterationLimit (category theory)Test-driven developmentSharewareMoment (mathematics)Reduction of orderGraph (mathematics)Run time (program lifecycle phase)Real-time operating systemConfiguration spaceComputer hardwareSoftware testingContinuous integrationSet (mathematics)Mixture modelMultiplication signOrder (biology)Endliche ModelltheorieMedical imagingSoftwareINTEGRALLibrary (computing)Execution unitoutputParameter (computer programming)Computing platformVector spaceAbstractionNumerical analysisFunctional (mathematics)Installation artDifferent (Kate Ryan album)MeasurementUnit testingImplementationQuery languageBitChromosomal crossoverPoint (geometry)SpacetimeCASE <Informatik>Function (mathematics)CodeRight angleOrder of magnitudeOpen sourceLinear codeVisualization (computer graphics)Physical systemSoftware developerCurveSelectivity (electronic)Commitment schemeOvalOracleOperator (mathematics)WindowPartial differential equationParametrische ErregungComputer animation
09:08
Software testingAerodynamicsExecution unitCodeoutputRadio-frequency identificationCategory of beingImplementationKeilförmige AnordnungComputer configurationIntegrated development environmentElectronic mailing listRange (statistics)HypothesisParameter (computer programming)Functional (mathematics)WaveMultiplication signElement (mathematics)outputSoftware testingMereologyRight angleComputer configurationCASE <Informatik>Fiber bundleArea4 (number)Strategy gameSpacetimeArithmetic mean1 (number)Parametrische ErregungOperator (mathematics)Musical ensembleGreatest elementElectronic mailing listDistribution (mathematics)HypothesisConfiguration spaceSoftware frameworkElectric generatorTime seriesOnline helpInfinityVector spaceLine (geometry)Different (Kate Ryan album)Library (computing)ResultantSlide ruleLengthProjective planePoint (geometry)Category of beingNeuroinformatikMobile appLocal ring10 (number)Function (mathematics)Set (mathematics)AbstractionIntegrated development environmentForestNumerical analysisSubsetBranch (computer science)Sheaf (mathematics)Type theoryArray data structureChainFront and back endsIntegerError messagePrice indexCodeTerm (mathematics)XMLComputer animation
15:50
ChainSoftware testingTransformation (genetics)HypothesisFiber bundleScale (map)Basis <Mathematik>CoefficientState of matterSpacetimeExecution unitoutputKeilförmige AnordnungUsabilityMilitary baseLinear codeFiber bundleFunction (mathematics)Electronic mailing listSoftware testingMathematicianoutputFormal languageElectronic program guideLibrary (computing)CoefficientCategory of beingPoint (geometry)CASE <Informatik>Office suiteState of matterComplex (psychology)Mixed realityBitBasis <Mathematik>Transformation (genetics)MereologyLevel (video gaming)Software developerSpacetimeElectric generatorOrder (biology)AreaIntegerLatin squareAlgorithmFunctional (mathematics)SequenceÜbertragungsfunktionFilm editingFinite-state machinePhysical systemQuicksortVirtual machineCodeOperator (mathematics)Vector spaceDiscrete groupDimensional analysisUnit testingDifferent (Kate Ryan album)HypothesisStrategy gameWritingSoftware engineeringCombinational logicUsabilityArray data structureMatrix (mathematics)Social classExecution unitRule of inferenceInfinityError messageElectronic visual displayRepository (publishing)Computer animation
22:31
Library (computing)Point (geometry)Forcing (mathematics)Software testingError messageFrame problemInterpreter (computing)MeasurementoutputFunctional (mathematics)Medical imagingDistanceVirtual machineDatabaseEndliche ModelltheorieComplex (psychology)Entire functionLocal ringGreedy algorithmDemosceneVector spaceDot productSoftware developerINTEGRALRun time (program lifecycle phase)MultilaterationMereologyCryptanalysisStrategy gameCodeRevision controlIntegrated development environmentOrthogonalityRight angleBuildingConfiguration spaceStack (abstract data type)Digital photographySpacetimeControl flowQuicksortCASE <Informatik>View (database)Focus (optics)Unit testingPhysical systemComputer animationLecture/Conference
Transcript: English(auto-generated)
00:02
Okay, yeah, thank you for the introduction Yeah, I'm a PhD candidate I'm also the infrastructure guy in our Institute and Currently founded in the sustainable software call from DFG and
00:24
The software I'm Co-writing and using I'm also employing the actual research. I'm supposed to be doing Especially in the localized reduced basis method branch. I'm guessing very few of anybody has heard of this
00:43
So Yeah, so I'm going to tell you a little bit about the software library pi more Which I'm working on I'm going to tell you about my personal or projects problems with unit testing and
01:04
How we're not trying to use a different better approach, let's hope and Yeah, the last bit will be calling on everybody to help me Okay, so what is pi more that's a Python library it's not a
01:22
Application you can just run and it's you can use it to build your own model order reduction applications If you don't know what model order reduction is The model involved is for us something
01:40
Mathematical usually involving a parameterized PDE or an LTI system and Well, the mathematical description is something like this you have a function that maps from some big unknown parameter space
02:01
to solution and Maybe you from this solution you want to compute some Single number. This is the quantity of interest, but this doesn't really matter I'm just trying to show you that there are possible Infinite dimensional spaces at play here where we get parameters from
02:21
So and usually a model order reduction is used in one of two settings either you want to calculate a lot of Quantities of interest for any number of parameters. This is the many
02:40
situation many queries or You have to be in some kind of real-time setting where you No matter what parameter you get you have a deadline of I don't know 15 microseconds to get a solution And I'm not going to tell you what the actual model order reduction techniques are because well
03:04
That's a couple of lectures This is a graph for the many many query situation right you have somewhat linear explosion of runtime the more solves you have to do for different parameters the more time you spend and
03:21
this red curve is the Visualization of a model order reduction technique you have to invest time up front this is the very steep slope at the beginning and then at some point you have a Crossover point where you're cheaper because every single solve after the initial
03:43
expenditure of effort gets much much much much much faster and Depending on the sizes of spaces and whatever This graph can look this way can look magnitudes of orders better or worse This is very problem dependent
04:00
Okay, if you want to know more about the actual time where you can read our paper You can come to the first ever pi more school in October and mark the book But you'll have to use you'll have to know model order reduction before and we're only teaching About pi more not the underlying techniques
04:23
Okay, so we've started pi more seven-ish years ago now at Munster with three people We're now somewhere between 12 and 16 contributors four of them
04:45
Fairly regularly We have a largest code base lots of commits open source. We're hosted on get up And of course I say of course, but we do have continuous integration tests
05:02
Which we are right from the beginning I think we had a setup on Travis CI where every commit pushed gets built and Tested in some way Recently, I migrated away from Travis I to use Infrastructure we have at our Institute and in the university at large
05:24
The method is pretty pretty simple I say We just have a docker image with lots and lots and lots of dependencies libraries external PD solvers and whatnot and our Pipeline gets executed in this we use pi test to run the tests
05:41
We do some deployment for Python wheels and check if everything installs on different platforms And we do some very limited testing on OS X and Windows, but that's not no that's actually just running a demo okay, so this is what our integration pipeline
06:02
Looks like at the moment We can run a bunch of stuff in parallel on the test column. There are different configurations in which the tests are run and we have a very Varying runtime of each but as a single build step can take up to 55 minutes and
06:22
this is on hardware in our Institute, which is recent and fairly large-sized so and these are these are a mixture of traditional unit tests where we input a given data into our into our
06:43
Function check the assertion the Oracle as I just learned and see what the output is and We have some Into some system system tests where we run a full
07:00
Model and we have an analytic function. That is the real solution and we can check against that All right, so What is what are the problems I mentioned that we and I have with unit testing First of all We found that people aren't actually very good at selecting what to test
07:25
So the developer has a very can have a very strong bias to select test cases He knows are okay, and the software will work with them This is a problem test parameterization
07:40
Can be difficult to formulate You have to run the same tests for a bunch of different scenarios, right For us this is different back ends for PD solves different measures different whatever
08:00
different implementations of our vector abstractions and so forth You have to write this and especially in pi test This can be Can be looking very not so nice If you then want to compose this functionality into something that is a
08:24
More expressive System so where you have to Execute a bunch of tests in order in some order to check if your model pipeline still works this is Again, something you have to write manually
08:43
it's well, it's effort and The last last bit is I don't actually know if my tests are good. I Have no real metric for it Then the question is how much testing how much unit testing should I actually do?
09:07
Do I aim for 100% coverage is 70% okay Does this actually mean anything? And How much of the input space can I cover should I cover?
09:22
This is all this is a very big problem, especially for us where we have unknown dimensional parameter spaces Our vectors can be 10 the 10,000 dimensional time series and whatever
09:42
The next question then is if I can do it should I do it Is this okay if my test runs for two days Does this help me? And of course, can I run it? is my local computer environment able to run 13 of these tests concurrently and
10:07
There's practical practical problems, especially for our setup for us in pi tests the parameterization runs to something called fixtures and For us, it's always a problem if a single failure happens somewhere in the code and the fixture is very complex
10:24
it's composed of multiple parts and The fixture itself has a large input space So what happens if a failure if I want to reproduce a failure? I Currently have two options. I can hack the fixture manually change Whatever I need to just produce one result do a complex breakpoint
10:44
This is very hard for something that has very large data So we've looked and to Improve the situation. We come across something called Hypothesis. This is a Python
11:00
library That implements property-based testing The idea is very old in computer terms the original quick check Which this is a reimplementation of was developed I think in the late 80s
11:22
This is a library that plugs into a couple of different test frameworks Which for us in this one And when there's one case that I was able to spend time on now Makes for much silver parameterization. It is much easier to compose
11:42
Tests with it It has the very very nice feature of outputting a decorator If a test fails you put it on your test function it reruns the exact test it did before and only this You can add your edge cases manually and
12:03
Well, you can configure How much I think I skipped it, okay You can configure how much data hypothesis produces to input into your tests, so your everyday
12:20
tests can get 300 points of data and once a week you run three salaries and My favorite one PI charm actually can resolve these kinds of Fixture decorators and if I run PI tests with its fixtures I can I get nothing it Doesn't understand. It doesn't understand pi test fixtures
12:43
Okay, so this is a a non-working subset example of our fixtures For a array Abstraction this is relatively complex because we want we need different setups we need
13:01
Situations where we get two arrays of same size with values. We need setups where we get Ones that you can't actually add so we get an error and so forth and this is I think just one where we get actually one vector with Prescribed elements and
13:23
A list of indices. I mean I can't read Okay, so And The test empty function then gets the vector array Fixture that's defined two lines above and this is what this is the pi test fixture magic that nobody understands
13:44
Okay now with hypothesis you define strategies to generate input data for your tests The top line is a strategy for integers. This is currently Actually implemented in a branch worker for our project and
14:03
Well these strategies Search their input space I'm not going to say randomly but It's a way to implement it that you get edge cases. If you don't disallow it you get in infinities and so forth
14:25
and Now what's here on the slide actually produces About six or seven different configuration options with the times of vectors length different number types and
14:43
It's still pretty dense But you get one function that defines all your data output If you can if you look at the whole setup now, it's much more readable. You don't have any magic numbers in it
15:04
It's easily extensible you can Do the same setup for different backends and composing different strategies into a new strategy as trivial it's
15:22
Fourth line from the bottom, you know, you're just it's an operator right right this this is how you compose strategies and Hypothesis that's the rest for you Those this is a major major improvement for us because before we had to
15:42
Chain parameters next to another and this is much much nicer Yeah, as I said This is only the first bit of Code refactoring that I've started it's very little effort
16:01
Yeah as of now, I've basically only changed how our unit tests get data It was fixed list before now. There are strategies that generate whatever data I've told it to and Well, it's only and I have not implemented this
16:21
fast test for development Weekends can run longer tests. This is all done yet The actual bigger part I think what hypothesis Enables you to do is Define a state machine as a class you define
16:43
transformations between your state as Rules It was it called You can then Implement a check that some property of your state still holds and
17:02
The library will combine transfer functions in every which way generate inputs how and strategies like you define and will try to fail your test and If it finds a failure it will try to Shrink the sequence of input it generated to the minimal one where this test still fails
17:26
I've only come up with a very simple example as of now. This is a State machine for Basis testing so a vector space basis
17:44
You Define a member that is a bundle. So basically the library will then generate new Basis from all functions that have a target for this
18:01
I've cut out the actual basis generation because it's a little more involved and then you can define transfer functions On your state where you know a property will still be should still hold after the After the transformation, so if I run a basis through a Gram-Schmidt algorithm, I know it's still a basis afterwards and
18:25
If I want to Display any given vector of the same dimension for a basis. I know that I must be able to solve this linear system here So now if I run this
18:44
Hypothesis will generate a list of inputs for basis generation Will pick up basis from this bundle put it in Gram-Schmidt put it in coefficients Outputs get in new basis and so forth and so forth and I don't have to tell my test casing
19:02
How it should compose new tests from the state machine. This of course is a pretty Truvial example, but It still works Okay, so to recap what I found if I want to use hypothesis, it's much easier for me to
19:24
Let my test run on a very large input space. I Don't have to manually write how many data points I get It searches edge points. It's edge cases for your input data So as an example, we've never input
19:44
infinities in our areas before We didn't mix We didn't mix complex valued arrays and integer arrays and something like that and I think our values were capped at minus one and one and
20:03
Some algorithms actually behave very differently if there's Order of difference in your input data and this is everything that just pops up now and we have to fix a lot and stuff Yeah, for me it's a huge boost in and usability if I actually find an error
20:26
It is however For me and us Somewhat hard to actually apply this on our code Because you need these properties
20:43
Or transformations and you still need something to control against and So for for this for the basis, it's pretty easy if I put in a vector I know I must be able to solve the coefficient matrix But for other algorithms and operators and discretizations
21:05
This is quite unclear how this will work or if it will work But for this very data-driven vector abstraction, this was this was I think it's great Okay, so and something that I've in looking into this I found I
21:25
Was very lacking as Practical examples and I mean guidelines on how to actually and write unit tests There's I found literature in software engineering research that I don't understand as a mathematician
21:44
So I don't speak the language I found 100% coverage is great. You're done this is very untrue for us and Well, so basically I would have hoped to have a repository
22:07
Of some kind with guides, what do other people do what failed at other failed for other people and stuff? so If anybody wants to Join or doesn't know of this if this exists already
22:25
Please contact me. Yeah That's it Thank you very much Renee Okay, are there questions?
22:43
Otherwise, oh was that what okay I have one maybe it's a very practical one. Maybe I didn't perfectly get it like when you do this testing and it generates The input data for you Do you test mainly the functionality of your code whether it breaks or do you test
23:05
The quality of the outcome or the reproducibility or the you know what I mean? Yeah For this part where I've started doing a path of this. This is real unique testing for our vector extraction
23:21
So like does the scalar products of two orthogonal vectors still produce zero? Yeah, something like this we haven't used this in a Integration testing no, no complex. This doesn't run a parameterized model or something and where we then check if the
23:42
truth truthiness of that model is uh-huh, and this is actually one of the problems if I if I want to use this I Not super sure how this will work. It would be great to join forces here I think some problems are similar. Yeah You two first and then
24:29
They have all code or they try to maximize the distance Okay, yes, so a greedy search in the aerospace error measure in some kind, okay
24:49
Yeah, thank you
25:00
If you don't change the sorry, right The question was if I run this run a test twice does the library input the same data Yes If you don't change the configuration as far as I understand the strategy is deterministic
25:49
okay, so the question is if a check fails I fix it and How are the chances that the same input data is reproduced at a later date, right?
26:03
Well these strategies can change of course with library versions and whatnot But you can actually use a functionality of the library to Put exactly that input data that you checked as an example
26:22
there's extensive Data-basing behind the scenes where you can and I'm not sure if one should does do that But you can basically ship the entire library of everything that has been done yet as already and
26:40
That would ensure that the test is rerun if it failed at some point or of course the explicit decorator to Mark the function. I know there's one more question is possible if somebody wants to Okay, I have a question just very practical you said you switch to Docker
27:06
So that's the frame you have around that everything gets reproducible or Why did you do that? Well, I wouldn't say it gets reproducible There's a lot of challenges in actually getting reproducible to Docker builds. No this this was purely practical
27:27
Because we have a lot of dependencies that are C++ libraries that use very Large stacks of support libraries and If you try to install this on a machine
27:41
That's fine. If you try to install this on 20 machines that that's nonsense This is basically something for us to get a well reproducible environment to execute in but it's still not really reproducible because Some stuff gets installed at runtime
28:01
Which we want because Python dependencies Update all the time and if a user gets a new library and our code breaks we want to know that But yeah, this is mostly practical for us just to distribute it Well, you can you can use our images for local development just as well
28:24
So I don't install all the PD solvers in my machine I just run my Python interpreter in the Docker image. Mm-hmm. Okay. Thank you very much
Recommendations
Series of 10 media
Series of 14 media
Series of 25 media