Mutation Testing in Python with Cosmic Ray
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 96 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/51825 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
NDC Oslo 201689 / 96
2
7
8
9
12
14
19
20
26
28
31
33
38
40
43
45
48
50
51
61
63
65
76
79
80
83
87
88
90
93
94
96
00:00
Statistical hypothesis testingTheoryStatistical hypothesis testingDemo (music)WordBitDifferent (Kate Ryan album)Universe (mathematics)ResultantScheduling (computing)Grand Unified TheoryVideo gameOpen sourceStatistical hypothesis testingPhysical systemWave packetProgrammer (hardware)Demo (music)TheoryCodeLibrary (computing)Statistical hypothesis testingMultiplication signProcess (computing)Mathematical optimizationInformation technology consultingSoftwareMereologyQuicksortTouch typingSoftware developerComputer clusterComputer animation
02:34
Statistical hypothesis testingHardware-in-the-loop simulationSuite (music)Statistical hypothesis testingStatistical hypothesis testingMotion captureJava appletMathematicsStandard deviationRight angleTwitterPhysical systemJSONComputer animation
03:26
Statistical hypothesis testingSuite (music)CodeOperator (mathematics)AlgorithmWebsiteHill differential equationStatistical hypothesis testingStatistical hypothesis testingCodeSuite (music)Encapsulation (object-oriented programming)Different (Kate Ryan album)FunktionalanalysisExpected valueLibrary (computing)Exception handlingWebsiteMultiplication signMathematicsSystem callLoop (music)AlgorithmOperator (mathematics)Software suiteComputer programmingComplexity classResultantMessage passingProcess (computing)Projective planeMereologyTwitterLattice (order)Cartesian coordinate systemCompilerHypermediaLine (geometry)Computer animation
06:58
Function (mathematics)Mathematical analysisCodeOperator (mathematics)Bit error rateInsertion lossConditional probabilityLogical constantLogicRouter (computing)Relational databaseException handlingScalable Coherent InterfaceInheritance (object-oriented programming)Formal languageScalar fieldVariable (mathematics)Object (grammar)MathematicsRegulator geneMathematicsSoftware suiteLine (geometry)WordProjective planeCodeFunctional programmingMathematical analysisOperator (mathematics)Statistical hypothesis testingAlgorithmStatistical hypothesis testingArmBitFunktionalanalysisProcess (computing)Complexity classRepresentation (politics)Sound effectMultiplication signInsertion lossElectronic mailing listArithmetic meanComputer programmingOrder (biology)Control flowMereologyTheory of relativityLevel (video gaming)Pattern languageEquivalence relationProgrammschleifeCASE <Informatik>Information overloadInfinityNatural numberStructural loadClassical physicsRight angleFormal languageLoop (music)Condition numberDemo (music)Physical systemTouchscreenType theoryOracleNumbering schemeBuildingProgrammer (hardware)Surface of revolutionQuicksortCategory of beingVideo gameCoprocessorComplex systemGoodness of fitLogicTrailSuite (music)Computer animation
15:30
MathematicsPattern matchingStatistical hypothesis testingKolmogorov complexityCodeStatistical hypothesis testingSuite (music)Equivalence relationInfinityCovering spaceWebsiteLevel (video gaming)DebuggerJava appletNumbering schemeLoop (music)Drop (liquid)ImplementationStatistical hypothesis testingCodeStatistical hypothesis testingMathematical analysisMathematicsFunktionalanalysisSuite (music)Run time (program lifecycle phase)Computer programmingIntegrated development environmentMultiplication signQueue (abstract data type)Complex systemObject (grammar)Goodness of fitRight angleTuring testLibrary (computing)Standard deviation1 (number)Line (geometry)FluidLengthClassical physicsMilitary baseMaxima and minimaRegular graphView (database)ResultantPoint (geometry)Variety (linguistics)Memory managementBitEquivalence relationPhysical systemComputer configurationTouch typingOperator (mathematics)Strategy gameHeuristicNeuroinformatikSoftware suiteMappingDifferenz <Mathematik>Functional programmingPerfect graphInstance (computer science)Computer scienceRadical (chemistry)Real numberSpeech synthesisType theoryGraph (mathematics)Block (periodic table)Lattice (order)Scripting languageSweep line algorithmForcing (mathematics)Process (computing)Mathematical optimizationArithmetic meanQuicksortBasis <Mathematik>Sound effectServer (computing)Virtual machineControl flowLink (knot theory)Computer animation
24:01
Statistical hypothesis testingImplementationOperator (mathematics)Core dumpIdentity managementVector potentialWebsiteModule (mathematics)Standard deviationElement (mathematics)Source codeAbstract syntax treeVertex (graph theory)Singuläres IntegralCodeObject (grammar)Modul <Datentyp>Data managementBootingComplexity classMathematicsOperator (mathematics)Traffic reportingSet (mathematics)BootingStructural loadAbstract syntax treeLevel (video gaming)DemosceneMereologyModule (mathematics)Arrow of timeCodeTransformation (genetics)Core dumpImplementationEquivalence relationFigurate numberPoint (geometry)DataflowStatistical hypothesis testingElectronic mailing listSuite (music)WebsiteTheoryNumbering schemeFormal languageGastropod shellProcess (computing)Standard deviationMessage passingComputer programmingOpen sourceSlide ruleParsingRun time (program lifecycle phase)Abstract syntaxObject (grammar)Keyboard shortcutInheritance (object-oriented programming)Programmer (hardware)Statistical hypothesis testingOrder (biology)Complex systemElectric generatorSource codePattern languagePhysical systemLibrary (computing)Utility softwareNetwork topologyGoodness of fitAsynchronous Transfer ModeInstance (computer science)CountingFeedbackContext awarenessRepresentation (politics)Form (programming)Direction (geometry)Multiplication signNP-hardSoftwareNatural numberFunktionalanalysisRight angleEstimationMetreAxiom of choiceProjective planeMultilaterationData compressionRow (database)Endliche ModelltheorieSystem callQuicksortMechanism designAbstractionCompilerMassNamespaceImpulse responseLine (geometry)Web 2.0Patch (Unix)Factory (trading post)TorusArithmetic meanComputer animation
32:33
BootingModule (mathematics)Data managementModul <Datentyp>Statistical hypothesis testingStatistical hypothesis testingProcess (computing)Execution unitPhysical systemOperator (mathematics)Numbering schemeOpen setStack (abstract data type)Task (computing)Broadcast programmingInterrupt <Informatik>Pattern languageConfiguration spaceConic sectionIDLTrailMedical imagingDirectory serviceElectric currentException handlingDisintegrationTrailTask (computing)FunktionalanalysisVirtual machineStatistical hypothesis testingLocal ringProcess (computing)Level (video gaming)Physical systemStatistical hypothesis testingNumbering schemeModule (mathematics)MathematicsResultantCore dumpInformationOpen sourceProjective planeDatabaseBitMechanism designSoftwareNamespaceInstallation artChainForm (programming)Analytic continuationQuicksortBootingLine (geometry)InfinityProgrammschleifeMereologyComputer fileMultiplication signData compressionAreaQueue (abstract data type)Strategy gameReal numberSystem callTerm (mathematics)Different (Kate Ryan album)Network topologyFormal languageParallel portCodeConstraint (mathematics)Proper mapPoint (geometry)View (database)Limit (category theory)Repository (publishing)Execution unitFigurate numberMessage passingSuite (music)Distribution (mathematics)Right angleMilitary baseReverse engineeringParsingOpen set1 (number)Online helpInterrupt <Informatik>Exception handlingParsingSlide ruleEndliche ModelltheorieCAN busComputer programmingWritingElectronic mailing listGoodness of fitBus (computing)Sweep line algorithmLattice (order)Spherical capOrder (biology)Scheduling (computing)Video gameStandard deviationBeat (acoustics)Unit testingP (complexity)Arithmetic meanAbsolute time and spacePlug-in (computing)Port scannerComputer animation
41:04
File formatSoftwareException handlingProjective planeComputer programmingEndliche Modelltheorie40 (number)Multiplication signSolid geometryPoint (geometry)PiReading (process)Numbering schemePhysical systemCodeBitMathematicsSequenceFunktionalanalysisComputer animation
42:05
String (computer science)SequenceRight angleDisk read-and-write headMathematical optimizationFunktionalanalysisHypothesisMathematicsNumbering schemeIntegerStatistical hypothesis testingProjective planeCodePressureCategory of beingMultiplication signBitStatistical hypothesis testingComputer animation
43:53
Control flowLogical constantMIDIHardware-in-the-loop simulationDemo (music)FunktionalanalysisStatistical hypothesis testingStatistical hypothesis testingFormal verificationSuite (music)Survival analysisGreatest elementCodeMessage passingWindowBitQuicksortRight angleComputer file2 (number)Virtual machineDirectory serviceMultiplication signPlastikkarteTraffic reportingCASE <Informatik>Numbering schemeComputer animationSource codeJSON
45:33
Logical constantHill differential equationMathematicsSurvival analysisOperator (mathematics)FunktionalanalysisLine (geometry)Term (mathematics)Traffic reportingReverse engineeringStatistical hypothesis testingLie groupComputer file2 (number)Right angleFunction (mathematics)Source codeJSON
46:29
ClefDialectQuadrilateralStatistical hypothesis testingSuite (music)FunktionalanalysisInverse elementStatistical hypothesis testingCASE <Informatik>Computer animation
47:11
Duality (mathematics)Hill differential equationVenn diagramDemo (music)1 (number)Right angleLine (geometry)Statistical hypothesis testingComputer fileMultiplication signComputer clusterJava appletLevel (video gaming)Statistical hypothesis testingProcess (computing)Source code
Transcript: English(auto-generated)
00:06
Okay. Yeah, that sounds like it's on now. This is a much bigger crowd than I expected to get on the last day of the conference on a bit of a niche topic. My name is Austin Bingham. I work for and part-own a small company called 60 North.
00:22
We do a bunch of software consulting, training, and things like that. The topic today, as you can read, is mutation testing in Python with a tool called CosmicRay that we've developed. The bulk of this talk will be about mutation testing itself, not so much about the details of how we implemented this in Python,
00:41
although we'll touch on some of that for the purposes of kind of showing you how the guts of the system work, but this isn't really a Python conference, so we'll stay away from too much nitty-gritty. I'm originally from Texas, Austin, Texas, this little orange dot here, home of the University of Texas. Go Horns. It's a very pretty town. We have cool statues. About eight years ago, I moved to Stavanger on the west coast of Norway.
01:04
I'm sure most of you know where that is. We have different kinds of statues there, the three swords, very famous, and we have different kinds of landscapes, so Nussefjord from the top of the pulpit rock, Rekestolen. So that's me in a very small nutshell, and 60 North was developed, or found there and here in Oslo, too.
01:24
What are we going to talk about? First, we'll do an introduction to the theory of mutation testing, and this is, I think, the most sort of generally interesting part to most people here. It's actually a very fascinating topic, and one that I learned about a few years ago and got interested in because it is sort of appealing to a certain mindset. I think lots of programmers will really appreciate the elegance, so to speak, of the technique.
01:45
We'll look at some of the practical difficulties of mutation testing, why you haven't probably seen it in your professional life. Even if it's a neat idea that has actual practical benefits, why are we not using it more often, and there are big reasons for that that we need to figure out and solve. We'll look a bit at CosmicRay. This is a mutation testing tool for Python
02:02
that we wrote over the past year and a half, something along those lines. We'll look at some results, some actual practical results from the real world where we applied mutation testing to an open source library and found interesting, I wouldn't call them defects, but optimizations that we could apply based on the results.
02:21
Do a quick demo if time permits. This is a fairly constrained schedule, and we'll do any questions at the end. If you have questions that we don't get to, you can talk to me, of course, after the talk. I'll be around for a while. What is mutation testing? You can go to pitest.org. Pitest is a mutation testing tool for Java.
02:43
It's the gold standard. It's really an industrial strength, high quality tool. It's conceptually quite simple. They seed faults or mutations automatically into your code, and then you run your test suite. If the test suite kills the mutant that you've created, that's great. It means your test suite has enough strength to detect the change that you introduced.
03:00
That's what you want. If your test suite passes on the mutant, we say that the mutant lived. This is typically what you don't want. And mutation testing is really good at gauging the quality of your tests. This is primarily what it is used for, to determine if your tests are sufficiently powerful to capture changes in behavior in your system, even accidentally introduce changes like we simulate with mutation testing.
03:22
Anybody here like an Uncle Bob follower, likes Uncle Bob? Right, so he quoted this the other day. This came through on Twitter two days ago, very serendipitously for me. Having fun with pitest, really easy to use, very useful, pitest.org. So it's not just me, it's also famous people, relatively famous people, who talk a lot on Twitter.
03:41
So I'm glad I put that in there. So what is mutation testing? You have some code under test, your library, your application, your program, whatever it is you're concerned about testing. And you have a test suite, which presumably in principle is 100%. It should be testing and verifying the functionality of every part of your library. That's unrealistic for most projects in the real world.
04:01
We don't have test suites that are that powerful. But we can still use mutation testing with that in mind. We just have to gauge our expectations a bit differently. We introduce a single change. These are typically very, very small changes to the code under test. And we run our test suite. And one of two things fundamentally will happen. It will pass or fail. Ideally, all changes that we make will result in failures of the test suite
04:23
when run against the mutated code. That's the goal. The basic algorithm is something like this. For every operator in the mutation operator set, and we'll talk about what we mean by operators, but operators are the encapsulation of the idea of a small change to a piece of code. So for every operator that you've got, and for every site in your code where that operator might be applicable,
04:43
then mutate the code. Mutate on the site and then run your tests. So it's, in some sense, a triply nested loop. You know, loop, loop, and this running test is typically a whole bunch of tests. So I'll call that a loop as well. So this is probably setting off alarms already, but this could take a very, very long time. And that's the truth. This is one of the major practical difficulties with mutation testing,
05:02
is the amount of time it takes. So what does mutation testing tell us? We go through the whole process of mutating and testing our code. What are the results? Well, for each mutation, the mutation can be killed. This is great. I'm coloring this green because that means, in a sense, that the test passed. A mutant could be incompetent.
05:21
This is a class of mutants that, for one reason or another, can't actually run. You can't test them because they segfault immediately, or they throw an exception immediately, or don't compile, or something along those lines. So these are mutants that can't even get walking, so you can't test them. We still consider those to be green because the fact that they don't run at all means that any real-world defect that mimicked those
05:41
would also not be runnable. So this is still considered a success. It doesn't really gauge anything about your test suite, however. What we don't want is stuff that's red. We don't want our mutant to survive. This is where the test suite actually passes and says, your program looks fine even when it's mutated. What does a surviving mutant tell us?
06:02
What does this tell us about the quality of our test or of our code? It tells us either that our tests are inadequate for detecting defects in our code. So we have some code that we know is necessary because it implements a piece of functionality that we have to have, and our test suite can't tell us if that functionality is working correctly under a change that we introduced.
06:21
That's one thing it can tell us. The other thing it can tell us is that the code that we mutated doesn't do anything for us. It doesn't actually have any impact on functionality that we might otherwise be correctly testing, that we are correctly testing. And so when you have a surviving mutant, you have to look really hard at it and think really hard about what is actually going on here. Is it that my test suite is inadequate
06:41
or is it that I have extra code that shouldn't be there? This code is typically viewed as a liability. You don't want extra code lying around that you're apparently testing that you don't really need. So you can shove it out of your code base. Wonderful. This is the two main classes of results you get from mutation testing. And of course you want to kill all the mutants.
07:01
I've got to have a meme picture in here. That's all Dave Rieger. So kill all the mutants. That's what your goal is with mutation testing. Track them down and squish them either by improving your test or getting rid of the code. What are the goals of mutation testing? They're really just a handful of high-level goals. One, and this is probably the most common, the one most people think about and the one you'll read about
07:21
when you do a lot of the literature research review is coverage analysis. But you're not doing coverage analysis in the typical way that most of you might be accustomed to, which is literally just does my line of code get executed by a test suite? That is almost meaningless if you can't correlate that with functionality in your program.
07:42
Knowing that a line of code was run that the instructions that came out from that line of code went through the processor doesn't tell me if that line of code is doing what I think it's supposed to be doing and if that piece of functionality it's attached to is actually behaving the way I expect it to. This is where mutation testing can help you. It can verify that the functionality is being tested and maintained by your test suite,
08:02
and this is really, really important. So this picture is like, you know, they're happy because, yeah, they're wearing suits, great, but they haven't done anything meaningful with their time, and so it's a really beautiful picture for how I feel about typical coverage analysis. There's nothing wrong with coverage analysis per se. It's in some sense good to know that you are exercising all the parts of your code,
08:20
but if you're not sure that those lines that are being executed actually do anything and that you're verifying their functionality, you're really just wasting time. You're just spinning a processor. So mutation testing, as expensive and complex as it can be, helps you defeat this waste of time. The other thing, yeah, it's cold right there, isn't it?
08:40
The other thing that mutation testing can tell you and as we discussed, is it can help you detect unnecessary code. So most of you probably recognize this drawing. It's from Gray's Anatomy, maybe not this particular drawing, but what it's a representation of. This is a picture of the lower intestines, and this here, down here, the vermiform process is the appendix, and most of you probably know
09:01
that the appendix can be removed from the body with no ill effects, right? So this is the kind of thing you could say the mutation testing is looking for, little bits of your code that are no longer really necessary and can be yanked out. I'm not advocating that we all have our appendix removed unless it's inflamed, but you could if we ran mutation testing on ourselves. The corollary to this, and not the corollary,
09:20
the add-on, the extra moral to this story, is that, okay, I was told it's totally useless as a kid. I was told it does nothing at all, and I believed this until about, I don't know, half a year ago when I started looking into this. It turns out this actually does a little bit for you, just not that much that it matters. So it could be that when you're examining a piece of code that your mutation testing has shown you is problematic,
09:41
you need to think really, really hard about what's going on there. You need to put your engineering thinking head on and decide, can I get rid of this? Does it do anything even mildly important? Maybe I actually do need more tests, and I have to really think in a very multilateral way to decide what it is that mutation testing is telling me. So mutation testing isn't a magical oracle.
10:00
It simply shines a spotlight on something that's a bit problematic in your code or your tests. So types of mutations. This is where we start talking about this notion of operators and things like that. Examples of the kinds of mutations. Well, before I get into that, does anybody know what this is and why I have it up on the screen? It's the pepper moth,
10:21
and I think that's what it's called. And this is a moth that was around in Birmingham. This is a city that's full of limestone. So before the coal revolution in Birmingham in the UK, all the buildings were made of limestone, and they still are, and they were very, very white. And these things could land on the limestone, and because they were primarily white, the birds couldn't see them.
10:40
We start burning coal, especially around Birmingham. Everything gets covered in soot and painted black, and these guys, unless they mutated to become black, got eaten by the birds immediately. So they mutated very quickly to become black, and then we cleaned up Birmingham, and I guess they changed back to white. I'm not entirely sure, but it's a fascinating story, actually, about mutation in our own time and anthropogenic mutation, things like that.
11:01
It's very, very cool. So check it out. A little side lesson there. So what are the kinds of things we mean when we talk about mutations in the scope of mutation testing? This is a typical one. Replace relational operators. You've got some place in your code that says X greater than one. Make that X less than one. That should obviously be testable, right?
11:21
I say obviously. It's not always actually that easy, but you probably instinctively believe this is very easy to test for, and you're generally speaking correct. If you've got a good test suite, it should be able to detect that some program has accidentally put the wrong relational operator in some important algorithm. But it's this nature. This is the nature of the kind of thing we do with mutation testing.
11:41
Very, very small localized changes and then run your test suite. Another common example is break continue replacement. Another very, very small localized change that you would hope you could detect with your test suite. This is a really interesting one, though, because what would you imagine would happen very often if you replaced break with continue in your code? What would you expect to happen
12:01
in a lot of cases? We're all very tired. Infinite loops. Your program is just going to go forever, because you've reached the condition that says, oh, break out of this loop, and that condition still holds after we change it to continue, and we're just going to sit in that loop and spend forever. This actually speaks to an interesting class of mutants called, well, not equivalent mutants,
12:21
incompetent mutants, that we'll look at in some detail. But this is the kind of thing that goes on. This is a list of the operators that I want to implement for cosmic ray. Only a handful are so far implemented, but it gives you a sense of the kinds of things that we're talking about. You know, the logical connector replacement or super calling insert. This is a Python kind of thing,
12:41
but that's the nature of the things that we're talking about. If you start looking into the research in mutation testing, and there's a substantial and interesting body of things to read, people have started to classify the kinds of mutations you might want to perform. There are a number that are language agnostic. You could imagine applying to almost any language.
13:01
Examples are constant replacement. Replacing one number with another number. Another example is replacing, you know, a variable with a constant here. Another simple but, you know, reasonably interesting kind of change. Arithmetic operator replacement. Replacing plus with times. I mean, these are all very simple things and not terribly exciting to think about on your own.
13:20
But again, you should be able to detect with a test when these kinds of things happen. And of course, relational operator replacement. So there's a handful of things, and actually quite a broad category of things that can be applied to almost any language that you're probably using, unless you're using prologue or something like that, in which case, awesome. Who's doing that? And this is one we'll look at in some detail
13:41
in the demo if we get there. Unary operator insertion. There are some operators, however, that only apply to certain kinds of languages. You know, object-oriented mutations. Most of you are probably working in an OO language, just probabilistically speaking. So changing access modifier. You can't really do this in Python because there's no such thing.
14:00
But if you're working in C sharp or C++, you could imagine doing something like that. That really ought to be detectable, right? And if it's not, then you need to think about, you know, why did you choose public versus private in that situation. Removing overloads. This is not, yeah. You could remove a particular overload. This is a bad example, actually. But you could take an overload from a subclass
14:22
and yank it out. Changing base class order. This is a really interesting one because most of the time, I would warrant that this actually has no noticeable functional effect in a program. I know in Python, if you did this, most of the time, nothing's gonna break. But sometimes, things will catastrophically break.
14:41
And so you start to have to think really hard about these kinds of situations where, okay, obviously the change could be made, but it's impossible to test for, so what do I do? This is a class of mutants called the equivalent mutants and we'll talk a little bit about those and the difficulty in dealing with them when you do mutation testing. Yes, this is one to think about.
15:01
Does this order matter in your projects? It's something I haven't thought a whole lot about in my life, but I probably should pay more attention to. And how would you test for that? Testing for this very often would be almost meaningless. It would be a test you wrote purely to satisfy a mutation testing system most of the time and that's the problem with equivalent mutants.
15:21
Functional programming languages have their own sorting mutations you can think about and a good example, it's very long-winded, but is switching around the pattern matching here. So we have a take function, which is a classic functional programming thing, and swapping around the two first clauses there. That's something you normally can't do in C++ or Java because it doesn't make a whole lot of sense.
15:41
It's a very high-level operation, but in a functional language, it's a very low-level mutation. That gives you the flavor of the kinds of changes we're talking about. They're small, they're localized, they don't cover big spans of code because what you want to be able to do is zero in on exactly why things are failing or succeeding when you make a mutation. And if you had mutations scattered all over your code
16:02
touching three or four places, you would have a much harder time triangulating on the actual problem and you'd just have to run more tests at that point. So what are the complexities of mutation testing? We're good on time here. This is not an easy thing, and as I said early in this talk, you probably haven't seen mutation testing because of the complexities of doing it,
16:20
the difficulties of using it in real-world practical environments are big. So what's the main one? It takes a long, long time on real-world code bases. Does anybody know what this is? Come on. It's the Queensland pitch drop experiment. This guy, this professor in Queensland who's got this funnel
16:41
of pitch, which is this really thick super-viscous fluid, but almost feels like a solid, but it's a fluid because it's dripping, and it drips once every 15 years or something along those lines. And he's waiting to see it drop, basically. He keeps missing it. But it takes a long time for this thing to make drips. It's a fairly famous experiment. You should look it up.
17:00
What do we do? How do we address this issue? The triply nested loop of operators and sites and test suites that I talked about before. One thing is that you can parallelize this as much as possible. Fortunately, mutation testing is an embarrassingly parallel problem. You can literally, if you have an infinite number of machines, you could spit off an infinite number of copies
17:20
and do one mutation on each copy and get the results back and be done. We don't have that many machines, but you might have 10,000 on Amazon or something. If for five minutes you want to just buy 100,000 servers, if they give you that many at a time, you could parallelize out to all of them at once and get the results back. So this is one of the main ways that we deal with the
17:40
runtime complexity, so to speak, of mutation testing. Another option is to do what we call baselining, and this is a fairly complex thing to do. The idea is to figure out how your tests correlate with your code. Test A touches these lines of code. This is where the traditional kinds of coverage analysis
18:00
can be very useful, because you can start to say, okay, when I run this test, these lines are executed. Once I have that baseline number, that baseline graph, I can say, well, only run the tests in my mutation testing suite that touched the modified code. And this is one way to massively reduce the scope of the tests that you need to run.
18:20
And you can also say, well, look at the diff that I'm testing against. I ran mutation testing last week, and I'm running it again. What's changed? Only mutate that code. And this way you can sort of, by combining both of these bullet points, you can doubly narrow down and massively reduce the scope of the amount of testing you need to do. I haven't implemented a system that does this yet, but it's an important one.
18:40
Now, if you think hard enough about this, you realize that, okay, this isn't actually perfect, because if I do a baseline, I can have a perfect graph of tests versus code that's executed. But if I make modifications, which is the definition of why I'm rerunning my mutation testing, I may have modified that relationship. I may have modified that graph. So baselining isn't something you can use forever and ever,
19:02
but it is a way that you can do fairly rapid updates to your mutation testing results and only occasionally force yourself to do a full rebaseline, maybe on the weekends or maybe once a month, depending on how long these things take. So baselining is an important technique, I think, and one that I would like to spend more time on, and it's one way we can really speed things up.
19:22
But it requires fairly sophisticated tooling to do that mapping between tests and lines of code. Finally, you can speed up your test suite. And if you talk to the star DD guys, Dan North and that crowd, they'll tell you, you should do this anyway. Your test suites should be as fast as possible for a variety of reasons,
19:40
and this is one of the reasons that you can do mutation testing, which is, of course, what we all want to do now, because I've made such a strong sell. So that's complexity one, and it's probably the most important one from a practical point of view. It just takes a long, long time to run these suites. Another really, in many ways, much more interesting complexity is incompetence detection.
20:01
So incompetent mutants, again, are these mutants that fall over and can't execute for one reason or the other. Those are actually very simple to deal with. They just fall over, and you can see that immediately. What's more difficult are the ones that run forever, and that's where this is, of course, Alan Turing, and he said, it's a joke. No, that's Alan Turing right there.
20:20
He apocryphally said, good luck with that, because he proved the halting problem right. We can't look at a body of code and tell if it's ever going to end. This is one of the most famous results in computer science, so you should all know this. He never actually said good luck with that. He said something much more long-winded, but you cannot look at a mutant and tell from the outset
20:40
if it's ever going to terminate. It's mathematically impossible. The only way to do it is to execute it, and if it terminates, then you know it terminates. So we looked earlier at break-continue replacement as an example of something that could put you into an infinite loop, and that's something that's effectively undetectable. You can apply heuristics to try to do it, but from a purely mathematical standpoint, it's impossible.
21:02
Sorry, what should we... What we do about this is essentially create a baseline time, and say we only let the tests run for so long before terminating them and calling them incompetent. So that's a strategy we use to deal with that. So, look up the halting problem. It's fascinating. Everything you did is fascinating.
21:20
The third complexity is what we call equivalent mutants. Equivalent mutants are mutants that are legitimately have been changed, but you cannot actually detect them meaningfully, or in a practical sense. Now, there may not be anything as a completely undetectable mutant, but from a practicality point of view, there are plenty of mutants that you could
21:40
never actually detect. So, how many here program Python on a regular basis? It's not everybody, but okay, a good... But it's not really important what this does. This is a bit of code from the Python standard library documentation, telling you how to take an iterable object and just plow through it, consume it, which is why
22:01
it's called consume, without doing anything else. So you're plowing through an iterable thing to get its side effects. And this is a way to do it super fast. This is idiomatic suggested Python code. So that's why I was surprised when it had problems. The important line is this one here. So, what we're doing is taking the iterable thing, pumping it into a double ended queue with a max
22:21
length of zero, and that means that as fast as possible, using the C code, which DQ is implemented with, plow through this iterable. So it's an optimization. What happens when mutation testing runs over this code is it sees this zero. It says, oh, that's a number, and one of the things I know how to do is replace numbers with other numbers. So it changes this number to
22:40
42 or negative 6 or something. And then it runs the test. And lo and behold, this number actually has no real observable effect outside of this function. It's such an implementation detail that nobody will ever know that you've replaced this with something else, unless you replace it with a really, really big number, perhaps, that your testing suite
23:00
is not going to detect this number has done anything. All that's really going to happen when you change this to say, 10, is that Python's internal memory allocation system is going to do something slightly different in a way that you probably can't even detect from the outside unless you're using a debugger to watch Python at the C level. So nobody has tests that do that. I hope you don't.
23:21
That's crazy. So this is an equivalent mutant, and it's one that's a bit insidious because this is very natural code to use. So this gives you a sense of the kinds of things that can jump up and bite you when you're doing mutation testing. And it's an insidious problem. It's tough to solve. Another one that I ran into after I started doing some mutation
23:41
testing in Python is this. Every Python program in the world has seen something like this. If name is main, then run. This is how you make a main function or executable script, so to speak, in Python. The problem is when you're running this code in a test suite, name is never going to be main, and nothing inside here is ever going to get executed.
24:00
So if you've got code in here, in this block, that gets mutated, it's completely undetectable. So this is another instance of a broad class, really, of equivalent mutants, things that you can never really detect easily, and we have to have some accounting for that in any system that does practical mutation testing. That's why it's one of the three complexities.
24:21
So that's really about it for the, yeah, good time, for the theory, so to speak, of mutation testing. Does anybody have any questions before I move on? Okay, good. So we'll take a quick breeze through CosmicRay itself. This is a mutation testing tool specifically for Python. By their nature, mutation testing tools are not
24:40
cross-language that I know of, and there's no theory in that direction that I'm aware of, either. If you're interested in this kind of stuff, if you can get CosmicRay from GitHub, it's in our corporate GitHub repository, called CosmicRay, and it's, you know, it's something that I'd be interested to get feedback on, if nothing else, or if you have ideas for how to make it better, or actual patches, that would be even cooler.
25:01
What are the implementation challenges for this tool? We have to determine which mutations we want to make. Somehow scan a body of code and figure out what we're going to do, what changes we're going to implement before we run the test suite. We then have to make those mutations one at a time. We can't just randomly seed them throughout the code. We have to do them one at a time, and then run a test suite against
25:20
that mutant, undo the change in some sense, and then make another change. We have to do that while dealing with all of the complexities I just talked about, you know, massive runtimes, incompetence, equivalence, and things along those lines. So, how do we make that work? In a nutshell. Well, we have this idea of operators. So, I talked about how operators were these small, little changes, the encapsulated
25:40
changes. So, you have one plus two. The first job of operators is to identify the places where it can make a change. So, I have some operator, you know, it's a class called the, you know, replace plus operator, or something like that. And the first thing it has to know how to do is when looking at the code, say, oh yeah, that's the thing that I know how to mutate. The next thing it has
26:00
to be able to do is actually perform that mutation. Critically, though, it's not the job of the operators to decide when to perform the mutation. So, they're kind of orchestrated by a higher level executive set of code that says, oh, you found a place to mutate. Well, why don't you do that now? Or, please don't do that now. So, the operators encapsulate this idea of detecting and creating the changes in the code that we need
26:20
to do before we run the tests. Operators are implemented in a sort of two-part way. There's the operator itself, which is the thing that knows how to detect, report that it's found a place that it can mutate, and perform the mutations. And then there's this thing that I call core. It's actually kind of a bad name for it, but it's the name that evolved out of late night coding.
26:41
The core is the thing that the operator says, hey, I found a place to mutate, and the core says, okay, well, then you should make that mutation now, or not make that mutation now. But cores can do other things. Two kinds of things that cores do right now are counting. One of the important things we have to do is count how many mutations are going to be done so that we can build up a work order to do at the beginning of the mutation testing run.
27:02
And the other thing is actually making mutations. So, inside CosmicRay, we have multiple modes of running in one mode running with the counting core, the other mode running with the mutating core. Implementation details, but it gives you a sense of how things are kind of bolted together inside. We make a lot of use of the AST module. So, the AST module is part of the
27:22
Python standard library, and it lets you work with, as you might imagine, ASTs, abstract syntax trees. So, what's beautiful about AST is I can just hand it a bunch of Python code in text form and have it spit back to me, and, well, who knows what ASTs are, just real quick. Okay, it's a programmatic representation of a program. It makes a tree, so it can take this text here, this code, and give me back
27:42
something that I can work with inside a program, inside Python code. So, I've got Python code working on decompiled Python code, so to speak. What do we use the AST to do? We generate abstract syntax trees from Python code. We literally take the Python source files you're working on, pump them into the AST to get out the abstract syntax tree, abstract syntax tree,
28:02
which we can then work against. We walk the ASTs using the built-in facilities in AST and actually make modifications. We change nodes, we replace nodes, we pull nodes out. That's how we actually do the mutations, actually implement the mutations at runtime. And, okay, we manipulate the ASTs very cleanly. There
28:22
are a couple of, you know, it's not the easiest thing in the world to do, to manipulate these things cleanly. Sometimes you can make mistakes that mess up the whole tree. So, the AST gives you some nice utility features to go in and mess with this. The broader message from this slide is that if you decide to try mutation testing in your language of choice, try to find the equivalent of the AST. Don't try to write
28:40
a compiler or a parser for your language. That's the path to madness. Any reasonably built language is going to have something to help you do this, I would hope. Added to this, there's the compile function, and this is something that I was really overjoyed to find out. The compile function, I can take an AST that I may have mutated, pass it to compile,
29:01
and get back what's called a code object, which I can then load up into the Python runtime and actually use, have the rest of the program use. So this, this is the magic sauce at the core of how CosmicRay does its work. It can create abstract syntax trees, modify them, and then turn them in at runtime into
29:20
code objects that I can then actually execute. If this didn't exist, if I hadn't found this stuff ready made for me, I would not have done this project, because this is the hard part, and I don't want to do the hard part. I'm lazy. Like all good programmers. So how do the operators work in a functional sense? So we have this thing called the Node Transformer. This is part of the AST module. It's built into Python.
29:41
It knows how to walk a syntax tree and give me opportunities to make modifications. It's a standard visitor pattern. The Node Transformer calls visit functions, so these are actual subclass arrows. This is UML, so you know it's hardcore engineering, right? Visit num, so it sees a number and this says, oh, I know how to do number replacement. This guy calls up to the base class and says
30:01
hey, I found a mutation site, and this is how he says it. He calls the visit mutation site function. This is not maybe the best relationship, but again, it evolved out of late night coding. I should consider refactoring it some. Visit mutation site says, oh, core, we found a mutation site. What would you like me to do? And the core might say, well, I need you to do mutation. Or he might just increment a counter.
30:21
If it's the accounting core, increment a counter. Otherwise, ask replace constant to actually do its work. And once this is called, once five is called, then the abstract syntax tree gets modified, injected up into the module list and used by the test suite. So that's the basic flow of, you know, bouncing through the code to make things happen.
30:41
So the summary of operators is that we use ASTs. We implement operators which can detect and perform mutations, and we use different cores to actually control how they do what they do and when they do them. So it's a quick breeze through, but okay, hope you're getting the point. The next trick we have to do, we've got the technology now to take Python source code, create an AST, and modify it
31:01
and we have this code object. How do I actually make it available to the test suite? It's one thing to have the code object. It's another to make the rest of my program use it. And this is a very, very fast overview of how Python does module loading. There's essentially three main moving parts. There's a thing called a finder. And a finder is given a module name. So if somebody says import foo,
31:21
all the finders are asked, hey, do you know how to import foo? And if one of them should report back, yeah, I know how to import foo. And if it does know how to import foo, it hands back what's called a loader. And the loader is then requested later to actually do the job of essentially populating a module. So it's given an empty module shell and asked to fill it in, right? So the loader's
31:41
job is to, in some sense, do all the name bindings inside of a module. To make these two available to the rest of Python, there's a thing called sys metapath. Sys is part of the standard library. And metapath is nothing more than a list, a literal list of finders. So when you say import foo from your Python code, Python behind the scenes
32:02
goes to metapath and asks, each finder in order, do you know how to load foo? Do you know how to load foo? Do you know how to load foo? So you can see where this is going. And you're allowed to, I guess I should say this, you're allowed to make your own, right? So we wrote our own. We have a custom finder. All this finder has is a name and an AST, a modified AST. And we stick it at the front of sys
32:22
metapath so that if anybody asks to load that module, this guy goes, yeah, I know how to do that. He hands back a loader, which also has that AST and knows how to execute that AST using the compile function to populate the namespace. So now we've got all the mechanisms we need to mutate and to install,
32:42
that is, import a custom finder loader slash AST mutated AST loader. This is another bit of magic, that if this didn't exist in the language, I wouldn't have been able to do this project and would have given up a long time ago. But critically, once these guys have done their job and made this module available, the
33:02
tests don't have to know this is happening. The tests just still naturally say import XYZ, and from their point of view, nothing has changed. They've just been snuck in underneath them and replaced the modules they're going to get with the ones that we want to test. So finally, how do we figure out what to mutate? Somebody's told me I want to test this package.
33:21
At a very high level, what CosmicRay does is we ask for a package, which is a Python term for collection of modules. CosmicRay just scans those using a technique that I removed the slide because of time constraints, but there's facilities in the language for scanning through dynamically a package to find out what it depends on and what it contains.
33:42
And using that, that's how we find all of the sub-packages and thus all of the files that we need to pull the source code out of to do the AST parsing. There are currently some pretty severe limitations on the kinds of modules we can mutate. Basically, we can only right now work with standard .py files, source text files. We can't work with modules that are coming from zip files. We can't
34:02
work with modules that are coming from HTTP imports. There's an infinite supply of exotic kinds of modules we can't work with and we actually just fall on our face right now. So that's a big area of work that we need to look into given time and infinite money. And you can also tell CosmicRay that there are certain parts of the package I don't want to work with. Either
34:21
we don't have tests for them and we know that they're bad or they're very hard to test or something along those lines. But broadly speaking, what we do is walk the tree of packages and modules to find out what needs to be mutated. So how, finally, do we run the tests? And this is where things get slightly interesting. Crash test dummy. Good picture, right? First thing we did, what I just described, we figure
34:41
out what to mutate. We go through the package and figure out all the things we might be able to touch. We then create a single mutant. We go in and we make the AST and we change the plus to minus. And then we install that mutant. We make it available through the import system using the finders and loaders. And then we have this concept of a test runner. And the test runner just encapsulates the different kind of testing systems
35:01
in Python. Pytest, unit test, knows, et cetera, et cetera. And we tell the test runner that the user configured to run the tests. Importantly, critically really, all these steps here run in a separate process. So for each mutant, we actually start up a new Python process. This is primarily for sandboxing. Because you can imagine, as you can imagine,
35:22
making a mutant could actually cause all sorts of wacky things to happen. It could cause your mutant to go in and fiddle with the sys-metapath, for example, that you're relying on for CosmicRay to actually execute. So CosmicRay says we're not going to give them the chance to mess with the test execution system. We're going to put every test, every mutant
35:41
is going to run in a separate process. This also allows us to do quite a bit of parallelization. You can fire up as many parallel processes as we want. And we couple this with something we'll talk very briefly about, which is Celery, which is a message bus, a task distribution bus, so to speak, that we use to actually run CosmicRay workers on however many machines you want.
36:01
So, that very briefly is how we do testing. And finally, I talked earlier about incompetent mutants, and you get these mutants that can go into infinite loops because you have, for example, replaced break with continue. How do we deal with that? There are two strategies we make available to the user. One is we can let them provide an absolute timeout. They can say if the test runs longer than five minutes, call it incompetent.
36:22
The other, which I think is better, is that we let them run a baseline. We will run the baseline for them. We'll say we're going to run your test suite over your unmutated code, and we're going to time it. And then the user can provide a multiplier, 1.2, 10, something like that. We multiply the time that we got by that baseline, and that becomes the timeout.
36:41
So, it's basically saying if the test, if any mutant takes longer than x times the baseline time, then we're going to call it incompetent. The basic high-level strategy is there's a timeout. There's no other real way to do it that I'm going to spend time trying to investigate. The rest of the tech, the rest of the stuff, just a bit, we use something called Stevedore to do
37:01
plugins, most of the functionality of Cosmic Rays provided by plugins. It's really worth looking into if you're using Python. It's a wonderful system. We use Celery, which I mentioned, which is basically just a task queue. I say basically just a task queue. It's a very complicated piece of machinery sitting on top of RabbitMQ, which is in Erlang. It's a very robust, widely used, very powerful system. So, the Cosmic Ray
37:21
executive level starts pumping jobs into the task queue, and we have some member of workers, which could be on your local machine or be on other machines, and their job is to receive work and fire up a Cosmic Ray worker, which is the thing, this is that separate process I was talking about. This is the sandbox that does the mutation, installs it, runs the test suite, and dies, reporting back results up through the chain.
37:42
You can read all about Celery at celeryproject.org. It's a fascinating and really excellent piece of software. We have a little database that we use to keep track, initially we use it to keep track of the work that needs to get done. So, one of the first things you do in the mutation testing run is build up the work order, and we pump all that information into this thing called TinyDB.
38:01
We use the counting core. We talked about these cores. We use the counting core to determine which work needs to get done. And then we only schedule, when somebody says, okay, now do the testing, we only schedule things to run that don't already have results. So, as those results arrive back to the executive level, we drop those results into the database. So, you can have an interrupted
38:21
test run and you can pick it back up again. That's very, very handy. So, yeah, we can resume. We use this and it's a natural place to put the results. So, at the end of a full run, it's really great. You've got this database full of not only what you did, but the results that came out of it and some timing information and things like that. It's a very handy thing to have. You can get TinyDB at this GitHub repository.
38:41
It's open source. It's a very simple database and we'll probably have to replace it at some point because of performance constraints. I'm a bit worried that as we start running this over a larger and larger code bases with more and more mutants that it's going to be more than this can handle. But for a small embedded JSON-based database, this thing is pretty darn excellent. So, check it out as well. We use docopt,
39:01
which I won't go into great detail about, except that it's like magic. You specify your help message. It parses your help message and produces a command line parser for you. So, it's the reverse of what you've probably been doing your whole life and hated. So, look into docopt. This works for Python and about, what, 30 other languages, I think. It's a wonderful, wonderful project. And it really has changed
39:21
the way I write command line programs, at least. I enjoy doing them now. And docopt.org. Check it out. There's some remaining work and probably more that I'm going to list here, but there are some well-known remaining work from my point of view. One is properly implementing timeouts. I think I've nailed this properly. And this is really a question of learning how to use the Celery API correctly. But for a while, this was a real problem.
39:41
I was doing it wrong. A really big area is this question of exceptions and processing instructions. And what I mean is being able to embed or somehow tell CosmicRay for example, please do not ever mutate this bit of code. We saw that bit of code where if we change the zero to something else it was undetectable. I want to be able to put a tag or somehow annotate
40:01
that bit of code to say CosmicRay don't ever make that mutation here because I know I can't attest for it and it's meaningless for you to tell me that. Tools like Pylint have support for this kind of annotation so I think I can piggyback off of that, but it's a big unknown area for me right now. I haven't spent a lot of time looking at it. I just know it's an issue. I need support for more kinds of modules. I talked about zip files
40:20
and all sorts of esoteric module forms that I don't support right now. And I would like to integrate it with coverage testing. This goes back to the first use of the term baselining I had, which is to correlate your tests with code that gets executed in your module. I would like to be able to pair the mutation testing suite up with the code that gets executed
40:40
so I can know how to pair down the amount of work to do after a change has been made to your code, after you've committed to your repository. There's a number of open issues as well with CosmicRay. You're more than welcome to fix them for me. So that brings me to something I'm actually really proud of. The fact that we've got some real world results practical real
41:00
world results on a real project that's actually being used to do real stuff. So this is a picture of a reservoir modeling piece of software. And it all depends, at the base, all these pieces of software depend on a file format called SegWy, which you don't have to know anything about except that it holds big chunks of data. And we at 60 North have this project called
41:20
SegPy, which is a Python program for reading SegWy data. So, pretty picture, piece of software, and it's really well tested. We think this is a pretty solid piece of software. So we thought it would be a good target to run CosmicRay over. And so we did that. One of the details of SegPy, well SegWy, I should say, is that it's floating point numbers
41:40
are always or sometimes stored in IBM System 360 floating point format. Because SegPy was invented a billion years ago. And, you know, when the world was black and white. So one of the things we have to do when reading in SegWy data is convert from IBM format to standard IEEE 754
42:01
whatever format that Python uses internally. So we have to have a bit of code that do that. This is what the beginning of that function looks like. So we take in some sequence of four bytes, and we do a bunch of math. One of the critical things we do is an optimization. I say critical, but it's an optimization. It says, look, if all four bytes are equal to zero, if all four of the IBM
42:20
bytes are zero, then we know that this is zero. Spit it back. And zeros are very, very common in this kind of data, so it's a great optimization. Right. And we have a test for this that asserts that if I pass, you know, four bytes into IBM to IEEE, that's going to equal zero, which it does right here. Great. We ran CosmicRay over this, and CosmicRay identified something interesting. It said, look, I changed
42:41
this to less than, and I changed it to not equal, and I changed it to a number of other things, and this test continued to pass. Basically, whatever I did to this, this test still worked. So basically the mutant was surviving, and we kind of scratched our heads and thought about it a bit, and then we realized, wait a second, this optimization has absolutely nothing to do with a. And we in fact determined that
43:00
a could be anything. This optimization really was asking if b, c, and d were zero. Then this optimization was okay. a played no role whatsoever. And in fact, if we change the code to take the a out, everything still works perfectly fine. Basically, a played no role in determining whether or not we should return zero. So we changed our code, and we
43:21
added a new test here, which says for every integer from zero to 255, which is all bytes, we ran this test with a, and we expect it to be zero, and this is our new test, cosmic ray was happy, and it let us use a really wonderful tool called Hypothesis. If you've ever heard of property-based testing, and you haven't used it, you should look into it. It's almost
43:40
magical, and Hypothesis is the tool for doing that in Python, and it gave us a great opportunity to use that. Wonderful project. So, I have a few minutes left, according to the standard timing, so hold on, I'm going to get out of this, and do a quick, quick demo. So, this is from the
44:00
cosmic ray test suite. We have a test suite that actually tests cosmic ray, and verifies that, you know, it does the right things. Over here, we have a bunch of basically meaningless functions that are easy to test, and over here, I have a bunch of tests for those functions, and we basically run cosmic ray over this, and expect to get a zero percent survival rate. So, if I go
44:21
to here, in this bottom window, I've got, this is celery running here, I'm running a celery worker, so it's talking to the message queue, waiting for work to show up, so that it can execute some code. On this top panel here, is that big enough? I can try to make it a bit bigger. There. How about that? Good enough. So, the first thing you do
44:41
with cosmic ray is you initialize a session. So, I say init, I give it, this is telling it, run the baseline testing, and then make the timeout ten times that number. The session name is NDC, a session name just identifies the work you're doing. Adam is the name of the module we'll be mutating, and all the tests live in the test directory. init doesn't take very long in this case, and it's created
45:00
a little file called ndc.json, which is the database, that tiny db. The next thing we do is exec, and once I start exec, you'll see that celery goes nuts, it starts doing all sorts of work, and if you read very quickly, you can see that there are actually four workers here, so celery's pretty darn smart. It says, oh, this machine has four cores, I'm going to use all of them to do this work. And this takes a few minutes to run,
45:21
and then, or a few seconds in this case, and then we can run a report, and we'll see zero percent survival rate right there. So, that's what we want. Now, I want to prove to you that cosmic ray is not just a bunch of smoke and mirrors. We'll try to prove it. So, I'm going to go in here, and I'm going to comment out, effectively remove
45:41
one of the tests. So, this test unary sub is checking this function here. This is the only test that checks that function. Without that, we can make a change to unary sub, and we won't be able to detect it, and cosmic ray will let the mutant survive. So, let's go back to cosmic ray, I have to reinitialize, I have to basically start fresh, re-exec,
46:02
again, a few seconds, and we can run a report. I'm going to pipe this into LESS so we can actually track down what happened. Now, if I search on the term survived in the output, I see, ah, something survived. Something at line 12 in atom.py
46:21
survived because the reverse unary sub operator made a change that wasn't detected. Well, that's exactly what we expect. So, what's at line 12 in our file? This. Right? This is the one that we removed the test for. So, what happened is, cosmic ray got to that negative one, made it a positive one, ran the test suite again, and the test suite was like, yeah, everything's great.
46:41
And this is exactly what cosmic ray is designed to detect. The fact that you don't have a test for that. So, I'm going to go back over here, do all that, and then show you essentially the inverse. Now I'm going to add some functionality that has no test. So, these are
47:00
the two sides of the coin. In one case, I um, I had forgotten to add a test. In another case, I've added some new functionality and still forgotten to add tests. So, let's go back over to reinitialize, re-exec, and
47:22
re-report in just a second. Right. We still have a survivor. This one's on line 16. Let's see what's on line 16. It is this. Right. So, quick, hopefully slightly convincing demo that cosmic ray actually does real work. Um,
47:41
that is it. I'm glad we got through everything. So, um, thank you very much. And I guess we have some time for questions. Nobody's kicking me off the stage. So, if you want to ask anything, now's a good time or we can wait till afterwards. Yeah.
48:06
It, I mean, theoretically, yes. You could, you could hire somebody to do this, and they could sit and they could make a modification. I don't know of any, no. Exactly. Yeah. Welcome to the world of C++. Yeah. Yeah. It's
48:21
a practical problem. I guess, though, the, the, another answer to that, though, is that PITest is a Java testing tool for, for, for this kind of work. And Java, of course, is compiled. Now, Java probably has better support for going in and directly modifying compiled files. Maybe that's what PITest is doing. I don't know. Um, if I was to approach this in C++, I would look really heavily at LLVM
48:41
first to see what hooks it provides me to do things in a smart way. I wouldn't try to do this with GCC or even GCC XML. Uh, but it's, it's a harder problem because it has another wrinkle that you don't face with Python. You're absolutely correct. Any other questions? Take that as a no. Thanks again.