Mutation Testing in Python with Cosmic Ray

Video thumbnail (Frame 0) Video thumbnail (Frame 3850) Video thumbnail (Frame 5140) Video thumbnail (Frame 10462) Video thumbnail (Frame 23249) Video thumbnail (Frame 36036) Video thumbnail (Frame 48823) Video thumbnail (Frame 61608) Video thumbnail (Frame 63128) Video thumbnail (Frame 65826) Video thumbnail (Frame 68333) Video thumbnail (Frame 69716) Video thumbnail (Frame 70775)
Video in TIB AV-Portal: Mutation Testing in Python with Cosmic Ray

Formal Metadata

Mutation Testing in Python with Cosmic Ray
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Mutation testing is a technique for systematically mutating source code in order to validate test suites. It works by making small changes to a program’s source code and then running a test suite; if the test suite passes on mutated code then a flag is raised. The goal is to check that a system’s test suite is sufficiently powerful to detect a large class of functionality-affecting changes, thereby helping ensure that the system functions as expected. Mutation testing is a fascinating topic with great potential that has valuable lessons for the broader software development community. I’ll begin this talk with a description of the theory behind mutation testing. We’ll look at how it works and the benefits it can provide. We’ll also consider some of the practical difficulties associated with the technique, including long runtimes and certain difficult classes of mutants. I’ll then move into an analysis of Cosmic Ray, a tool for mutation testing in Python. I'll demonstrate using Cosmic Ray to locate untested code not detected by traditional coverage techniques in an open source library – with surprising results.
Statistical hypothesis testing Scheduling (computing) Open source Multiplication sign Demo (music) Mereology Code Theory Information technology consulting Statistical hypothesis testing Wave packet Programmer (hardware) Different (Kate Ryan album) Mathematical optimization Physical system Demo (music) Theory Bit Statistical hypothesis testing Statistical hypothesis testing Word Process (computing) Grand Unified Theory Software Universe (mathematics) Video game Quicksort Resultant Library (computing)
Statistical hypothesis testing Complexity class Suite (music) Algorithm Multiplication sign Motion capture Mereology Code Computer programming Statistical hypothesis testing Twitter Expected value Mathematics Hypermedia Operator (mathematics) Hardware-in-the-loop simulation Website Physical system Exception handling Algorithm Suite (music) Projective plane Operator (mathematics) Funktionalanalysis Line (geometry) Lattice (order) Cartesian coordinate system Statistical hypothesis testing Compiler Code Message passing Process (computing) Loop (music) Website Hill differential equation Right angle Resultant Library (computing)
Statistical hypothesis testing Suite (music) Information overload Bit error rate Computer programming Variable (mathematics) Insertion loss Computer configuration Object (grammar) Oracle Physical system Scalable Coherent Interface Theory of relativity Touchscreen Arm Regulator gene Mapping Block (periodic table) Kolmogorov complexity Structural load Electronic mailing list Infinity Sound effect Bit Maxima and minima Funktionalanalysis Lattice (order) Instance (computer science) Surface of revolution Formal language Statistical hypothesis testing Category of being Radical (chemistry) Arithmetic mean Process (computing) Order (biology) Pattern language Heuristic Quicksort Router (computing) Point (geometry) Numbering scheme Inheritance (object-oriented programming) Variety (linguistics) Control flow Mathematical analysis Drop (liquid) Code Queue (abstract data type) Representation (politics) Energy level Standard deviation Graph (mathematics) Demo (music) Suite (music) Operator (mathematics) Basis <Mathematik> Line (geometry) Equivalence relation Mathematics Word Loop (music) Integrated development environment Personal digital assistant Function (mathematics) Video game Complex system Exception handling Library (computing) Conditional probability Functional programming Building Multiplication sign View (database) Relational database 1 (number) Insertion loss Mereology Formal language Programmer (hardware) Mathematics Strategy game Logic Covering space Scripting language Algorithm Logical constant Statistical hypothesis testing Type theory Pattern matching Computer science Website Right angle Resultant Classical physics Complexity class Turing test Implementation Server (computing) Sweep line algorithm Link (knot theory) Real number Virtual machine Equivalence relation Statistical hypothesis testing Programmschleife Fluid Natural number Operator (mathematics) Touch typing Software suite Mathematical optimization Condition number Military base Forcing (mathematics) Projective plane Debugger Mathematical analysis Memory management Code Scalar field Speech synthesis Object (grammar)
Torus Statistical hypothesis testing Axiom of choice Scheduling (computing) Source code Open set Computer programming Impulse response Mechanism design Different (Kate Ryan album) Object (grammar) Core dump Estimation Website Physical system Exception handling Constraint (mathematics) Trail Namespace Keyboard shortcut Electronic mailing list Infinity Bit Funktionalanalysis Unit testing Lattice (order) P (complexity) Statistical hypothesis testing Demoscene Abstract syntax tree Arithmetic mean Message passing Process (computing) Order (biology) Chain Pattern language Modul <Datentyp> Quicksort Figurate number Writing Booting Point (geometry) Slide rule Numbering scheme Beat (acoustics) Module (mathematics) Computer file Open source Transformation (genetics) Patch (Unix) Disintegration Online help Mass Code Goodness of fit Term (mathematics) Queue (abstract data type) Energy level Representation (politics) Configuration space Implementation Traffic reporting Booting Form (programming) Standard deviation Distribution (mathematics) Information Operator (mathematics) Counting Core dump Multilateration Line (geometry) Limit (category theory) System call Equivalence relation Compiler Medical imaging Software Network topology Video game Complex system Exception handling Library (computing) NP-hard Standard deviation Conic section Run time (program lifecycle phase) Multiplication sign Direction (geometry) View (database) 1 (number) Port scanner Mereology Formal language Web 2.0 Data management Programmer (hardware) Mathematics CAN bus Strategy game Spherical cap Data compression Bus (computing) Pattern language IDL Vertex (graph theory) Process (computing) Endliche Modelltheorie Area Source code Electric generator Broadcast programming Numbering scheme Feedback Parallel port Element (mathematics) Open set Statistical hypothesis testing Repository (publishing) Website Right angle Physical system Task (computing) Resultant Electric current Asynchronous Transfer Mode Row (database) Reverse engineering Complexity class Metre Dataflow Trail Implementation Sweep line algorithm Real number Virtual machine Abstract syntax tree Directory service Vector potential Theory Statistical hypothesis testing Natural number Operator (mathematics) Gastropod shell Utility software Interrupt <Informatik> Plug-in (computing) Task (computing) Module (mathematics) Execution unit Inheritance (object-oriented programming) Military base Projective plane Database Stack (abstract data type) Code Factory (trading post) Absolute time and space Object (grammar) Identity management Abstraction Singuläres Integral
Point (geometry) Numbering scheme File format 40 (number) Multiplication sign Projective plane Bit Solid geometry Funktionalanalysis Code Sequence Computer programming Pi Mathematics Software Reading (process) Physical system Exception handling
Statistical hypothesis testing Numbering scheme Demo (music) Multiplication sign Projective plane Bit Funktionalanalysis Code Hypothesis Statistical hypothesis testing Category of being Mathematics String (computer science) Formal verification Right angle Pressure Mathematical optimization
Statistical hypothesis testing Greatest element Numbering scheme Logical constant Computer file Multiplication sign MIDI Virtual machine Bit Directory service Control flow Code 2 (number) Message passing Personal digital assistant Hardware-in-the-loop simulation Quicksort Traffic reporting Window
Statistical hypothesis testing Dialect Logical constant Computer file Line (geometry) Function (mathematics) Funktionalanalysis Inverse element Statistical hypothesis testing 2 (number) Mathematics Term (mathematics) Personal digital assistant Operator (mathematics) Quadrilateral Source code Hill differential equation Right angle Lie group Clef Reverse engineering
Statistical hypothesis testing Duality (mathematics) Process (computing) Demo (music) Computer file Venn diagram Multiplication sign 1 (number) Source code Hill differential equation Right angle Statistical hypothesis testing
ok i'm young and so it's all know this is a much bigger crowd that i expected to have to get on the last day of the conference on a bit of a nice topic so my name is austin bingham work for in part on a small company called sixty north we do much of software consulting training and. things like that. the topic today as you can read his mutation testing in place in with a tool called cosmic ray that we've developed the bulk of this talk will be about mutation testing itself not so much about the details of how we implemented this implies and all the will touch on some of that for the purposes of going to show you have the guts of the system work that this is really a pipe and conference so. stay away from too much nitty gritty originally from texas this so austin texas little or no doubt your home with university taxes go warns very pretty town we have cool statues about eight years ago i moved to stay longer on the west coast of norway i'm sure most of you know where that is we have different kinds of statues their the three swords. so i'm very famous and we have different kind of landscape so this a few words from the top of the pulpit rock because children. that's that's made in a very small that show and six the north was developed were found there and here and also to. what are going to talk about first will do an introduction to the theory of mutation testing in this is this is i think the most are generally interesting parts most people here it's actually very fascinating topic and one that i learned about a few years ago and got interested in because it is sort of appealing to certain mindset i think lots of programmers will really appreciate the elegance so to speak of the technique. look at some of the practical difficulties of mutation is why you haven't probably seen it you professional life even if it's a neat idea that has an actual practical benefits why are we not using it more often there is big reasons for that that we need to figure out and solve look a bit of cosmic ray this is a mutation testing tool for python that we wrote over the past year. year and a half something along those lines. look as a result some actual practical results from the real world will be applied up mutation testing to an open source library and found interesting when call them defects but optimization is that we could apply based on the results do a quick demo if time permits this is a fairly constrained schedule and the new questions after at the end in if. if you have questions that we don't get to you can talk to me of course after after the talk of the around for awhile. so what is mutation testing you can get a p i test dot work he i test is a potential donors and how to pronounce it but it's a mutation testing tool for job it's the gold standard it's really a industrial strength high quality tool its conceptually quite simple the seed faults or mutations automatically in your code and then you run your test.
eight if the tests week kills the mutant that you've created that's great it means your test suite has enough strength to detect the change you introduce that's what you want if you test week passes on the you can we say that you can live this is typically which don't want and mutation testing is really good engaging the quality of your tests this is primarily what it is used for to determine if you're tests are sufficiently power. a full to capture changes in behavior in your system even accidentally introduce changes like we simulate with mutation testing anybody here like an uncle bob follower like some kebab right so he quoted this the other day this came to on twitter two days ago very surreptitiously from eight having fun with the eye test really easy to use very useful p i tested work so.
it's not just me it's also famous people are relatively famous people who talk a lot on twitter some got to put that in there so what is the tension testing you have some code under test your library your application your program whatever it is you're concerned about testing and you have a test suite which presumably in principle is one hundred percent its issue tea. to be testing in verifying to the functionality of every part of your library that's unrealistic for most projects in the real world you don't have to switch that are that powerful but we can still use mutation testing with that in mind which have to gauge your expectations of it differently. we introduce a single change needs to be very very small changes to the code under test and we run our test sweet and one of two things fundamentally will happen it will pass or fail ideally all changes that we make will result in failures of the test week when run against the mutated code that's that that's the gold the basic algorithm is something like. this for every operator in the mutation operator said will talk about what we mean by operators but operators to pick our the the in cancellation of the idea of a small change to piece of code so for every operator that you've got for every site in your code that operator might be applicable then mutate the code you mutate on the site and then run your tests sits in some sense a tripling of. salute you a loop the loop and this morning test is typically a whole bunch of tests will call that a loop as well so it is probably setting off alarms are ready to this could take a very very long time and that's the truth this is one of the major practical difficulties with mutation testing is the amount of time it takes so what does the haitian testing tell us we go through the whole process of mutating in testing or code one. the results well for each mutation week can be killed this is great calling this green because that means in in a sense of the test passed. the meeting could be incompetent this is a class of mutants that for one reason the other can't actually run you can test them because they sex ultimately or they throw an exception and media or don't compiler something along those lines so these are mutants that can't even get walking so you can test them we still consider those to be green because the fact that they don't write it all means that any. the real world the defect that mimic those what also not be run the ball so this is still considered a success it doesn't really gauge anything about your tests we however what we don't want his stuff that's read we don't want for the that we don't want our new can to survive this is where the tests wheat actually passes as a year program looks fine even with me to. they did what is the surviving didn't tell us what does this tell us about the quality of our test or of our code it does either that or tests are inadequate for detecting defects in our code so we have some coded we know is necessary because it implements piece of functionality that we have to have an artist we can't tell us of that functionality is working quickly. under a change that we introduced that's one thing it can tell us the other thing you could tell us is that the code that we view tasted doesn't do anything for us it doesn't actually have any impact on functionality that we might otherwise be quickly testing that we are currently testing and so when you have a surviving you and you have to look really hard at it and think for the hard about what is actually going on. here is it that my test which is a net inadequate or is it that i have extra code that shouldn't be there is coded typically viewed as a liability you don't want extra code lying around to your apparently testing that you don't really need so you can shut out your code base wonderful this is the two main classes of the results you get from mutation testing.
and of course you want to kill all the mutants i got to have a you don't mean picture here that's all de rigueur so kill all the news that's what your goal is a mutation testing track them down in squished them either by improving your test or getting rid of the code. what are the goals of mutation testing there really is just a handful of high level goals one and this is probably the most common with most people think about and when you read about when you do a lot of the literature research review is coverage analysis but you're not doing coverage analysis in the in the typical way that most of the might be accustomed to which is literally just. does my line of code get executed by to sweet that is almost meaningless if you can't correlate that with functionality in your program knowing that a line of code was run that that that the instructions that came out from the line of code went through the process or doesn't tell me if that line of code is doing what i think it's supposed to be doing enough that the piece of functionality is attached. who is actually behaving the way i expected to this is where mutation test can help you can verify that the functionality is being tested and maintained by your to sweden this is really really important so i mean this this picture is like each other they're happy because the other words suits great that they haven't done anything meaningful with their time and so it's a really beautiful picture for how i feel about typical. such analysis there's nothing wrong with coverage analysis per se its its and some sounds good to know that you are exercising all the parts of your code but if you're not sure that those lines are being executed actually do anything and that your verifying their functionality you really just wasting time you just spinning a process so you take in testing is expensive and complex as it can be helpful. the defeat this this this waste of time. the other thing gets cold right there isn't a young and the other thing that mutation testing can tell you as you to as we discuss it help you detect unnecessary code so most of you probably recognize destroying its from grey's anatomy maybe not this particular drawing but what his representation of this is a picture of the lower intestines and this here down here government. for processes the appendix and most of you probably know that the appendix can be removed from the body with no ill effects right so this is the kind of thing you could say mutation testing looking for little bits of your code that are no longer really necessary and can be any doubt i'm not advocating that we all have her appendix removed less and flames but you could if we regulation testing ourselves. the corollary to this is not the quarterly the the ad on the extra moral to this story is that guy was told that totally useless as a kid i was told it does nothing at all and i believe this until about a half year ago when i started looking into this it turns out this actually does a little bit for you just not that much that it matters so it could be that we're examining a piece of code. but in your arm that that mutation testing to show you is problematic you need to think really really hard about what's going on there you need to put your new engineering thinking had on in the side can i get rid of this doesn't do anything even mildly important maybe i actually do need more tests and i have to really think very multilateral way to decide what it is that mutation testing as hell. lisa mutation testing is a magical oracle it's simply shines a spotlight on something that's a bit problematic in your code or your tests. so types of mutations this is where we start talking about these this notion of operators and things like that. examples of the kinds of mutations well before going into that is anybody know what this is and why have it up on the screen so it's the it's the pepper ok it's the pepper him off and and i think it was called in this is a moth that was around in was burning him this is city is full of limestone is so for the for the coal revolution in birmingham in the uk all. the buildings are made of limestone and still are and they were very very white and these things would land on the limestone and because they are primarily white the birds couldn't see them we start burning coal especially regarding him everything it's covered in soot and painted black and these guys less a mutated become black that even by the birds immediately so they mutated river quickly to become black and then. then we cleaned up birmingham and i guess they change back to white i'm not entirely sure but it's a fascinating story actually about you know mutation in our own time and anthropogenic mutation things like that it's very very cool and so check it out a little side lesson there so what are the kinds of things we took were mean we talk about mutations in the new year the scope of mutation testing this is a. typical one replace relational operators you've got some place in your car this as x. greater than one make that x. less than one that should obviously be tested all right i say obviously it's not always actually that easy but this is you would you probably instinctively believe this is very easy to test for you and your generally speaking correct if you've got a good test we did should be able to detect that some programmers accidentally put. the wrong relational operator in some important algorithm this is what this it's this nature is a nature of the kind of thing we do with mutation testing very very small localized changes and then run your tests we know the common example is bright continue replacement another very very small localized change you would hope you can detect with your test week. this is a really interesting one because what would you imagine what happened very often if you replaced break with continue your code what you would expect to happen a lot of cases are all very tired infinite loops write your programs is going to go forever because you've reached the condition says oh break out of this loop and we in that condition still holds after a change to continue. it was going to sit in that loovens been forever so this this actually speaks to old interesting class of mutants cold while equivalent is incompetent means that we'll look at in some detail but this is the kind of thing that goes on of this is a list of the operators that i want to implement for cosmic ray only a handful or so far implemented but. it gives you a sense of the kinds of things that we're talking about you know the logical connect a replacement or super calling insert in this a pipe and kind of thing but that's the nature of the things that we're talking about. in the if you start looking into the research in mutation testing this is as a substantial and interesting body of things to read people have started to classify the kinds of mutations you might want to perform there are a number that a language agnostic you could imagine applying to almost any language examples are constantly place for replacing one number with another number. another example is replacing a variable with the constant hear another simple but you know reasonably interesting kind of change their of medical parade replacement replacing plus with with times in these are all very simple things are not terribly exciting to think about on your own but again you should be able to detect with the test when these kinds of things happen and of course op. you're really relational operator replacement so there's a handful of things actually quite quite a broad category of things that can be applied to almost any language that you're probably using less using for a lot of something like that which case awesome its who's doing that and this is one look at in some detail in the demo if we get there you know the operation insertion operators n a. there are some operators our that only apply to certain kinds of languages you off you going to mutations and most most of your probably working in an old in which just you probably stickley speaking so changing changing access modifier you can't really do this imply think there's no such thing but if you're working in see sharper six plus you could imagine doing something like that that really ought to be detectable right. and it's not the need to think about you know why did you choose public versus private in that situation moving over loads. this isn't not yet this really could move up recover overload this is a bad example actually but you could take a and over overload from a surplus in and yank it out. changing baseless order this is a really interesting one because most of the time i would warrant that this actually has no noticeable functional affects in a program i know in in fife and if you did this most of the time nothing's going to break but sometimes things were catastrophic the break and so you you you you start to have to think really hard about these kinds of. patients were obviously the change could be made but it's impossible test for so what do i do this is these this is a class of means called the on equipment putin's and we'll talk a little bit about those and the difficulty in dealing with them when you give mutation testing the u.s. is with his with this is going to think about you know does this order matter and your project is something i haven't got a whole lot about in my life. but i probably should pay more gentle attention to and how would you test for that. testing for this very often would be almost meaningless would be a test you wrote your the to satisfy a mutation testing system most the time and that's the problem with equivalent of mutants. functional programming languages have their own sort of mutations you can think about in a good example is very long winded but is switching around the pattern matching here so we have no take functions classic functional programming think in swapping around the two first clauses are that's something you normally can't to do.
you know as you see plus plus or a job like that of the whole lot of sense for high level operation but in a functional language it's a very sort of low level mutation so that gives you the flavor of the kinds of changes were talking about their their small localized they don't cover big spans of code because what you want to be able to do is zero in on exactly why things are failing or succeeding. when you make a mutation and if you had a mutation scattered all of your coaching three or four places you would have a much harder time triangulating on the actual problem you just have to run more tests at that point. so what are the complexities of mutation testing it on time here this is not an easy thing as i said early in this talk you probably haven't seen mutation testing because of the complexities of doing it the difficulties of using it in real world practical environments are did so what's the main one it takes a long long time. on real world code bases his does anybody know what with this is on its the queensland pitch drop experiment that some this guy is professor in queensland who's got this funnel of pitch which is really thick super viscous fluid but almost feels like a solid but it's a food because it's dripping and he it drifts once every fifteen. in years or something along those lines and he's he's waiting to see it drop basically he keeps missing it but it takes a long time for this thing to make trips it's a question for the famous experiment you should look it up what do we do how do we address this issue does that the tripoli nested loop of operators and sites and tests which i talked about for one thing is you can paralyze peril. allies this as much as possible fortunately mutation testing is an embarrassingly parallel problem you can literally if you have infinite number of genes you could spin off infinite number of copies and you when you taken on each copy and get the results back can be done we don't have that many machines that you might have ten thousand on you know amazon or something if her for five minutes you want to just buy one. thousand servers they give you that many the time you could paralyzed out to all of them at once and get the results back so this is one of the main ways that we deal with the one time complexity so to speak of them you to shun testing another option is to do what we call base lining in this is a fairly complex thing to do the idea. idea is to figure out how your tests correlate with your code so test a touches these lines of code this is work of the traditional kinds of coverage analysis can be very useful because you could start to say if you want to run this test these lines are executed once i have that baseline number of the that the that baseline graph i can say well only run the test. it's in my mutation to sweep that touch the modified could write in this is this is one way to massively reduced the scope of the tests you need to run and you can also say will look at look at the defense that i'm test against itinerary mutation testing last week and running again what's changed only mutate that code and this where you can sort of by. by combining both of these bullet points you can double the narrow down a massively reduce the scope of the amount of tests you need to do i have implemented a system that does this yet but it's important one now if you think hard about this you realize that ok this isn't actually perfect because if i if i do a baseline i get i can have a perfect a graph of tests forces codes. executed but if i make modifications which is the definition of why running a mutation testing i may have modified that relationship i may have modified that graft so baseline isn't something you can use for ever and ever but it is a way that you can do fairly rapid updates to mutation testing results and only occasionally force yourself to do for the baseline maybe on. the weekends or maybe once a month depending on how long these things take so baseline is important technique i think when that i would like to spend more time on and it's one way we can with the speed things up but it requires very sophisticated tooling to do that mapping between tests and lines of code. finally you can speak protests week and if you talk to the you know the star d.d. guys you know damn north net crowd the tell you should do this any way you your test which should be as fast as possible for a variety of reasons and this is one of the reasons that you can do you musician testing which is of course we all want to do now because i've made such a strong sell so that's. city one is probably the most important one in of from a practical and you just takes a long long time to run these sweets another really in many ways much more interesting complexity his incompetence detection so incompetent mutants again are these meetings that fall over and can't execute for one reason the other those are actually very simple. all to deal with they just fall over and you can see that immediately what's more difficult are the ones that run forever and that's where this is of course alan turing and he said could the joke is a joke now that's alan turing right there he the apocryphal he said good luck with that because he proved the holding problem right we can't look at a body of code in tell if. it's ever going to end this is one of the most famous result in computer sciences all know this but he never actually said good luck with that he said something much more long winded but you cannot look at a mutant and tell from the outset if it's ever going to terminate know it's easy it's mathematically impossible the only way to do it is to executed and if it terminates then you know it terminates so we looked earlier. you're at break continue replacement as an example of something that could put you into an infinite loop and that's something that's affected lee undetectable you can apply heuristics to try to do it but for purely mathematical standpoint it's impossible. i'm sorry which we see what we do about this is essentially east create a baseline time and say we only let the test run for so long for terminating the name calling and incompetent so that's the strategy we use to deal with that. so look at the halting from its fascinating everything it is fascinating. the third complexity is what we call equivalent use equipment use our means that are legitimately have been changed. but you cannot actually detect them meaningfully or in a practical sense now there may not be anything as a completely undetectable mutant but from a firm of practicality point of view there are plenty of means that you could never actually detect so i don't how many here program price in on a regular basis it's not everybody but ok a good but it is. not really important what this what this does this is a bit of code from the pipe in standard library documentation telling you how to take an interval object and just plow through it you consume it is what's called consume without doing anything else you're ploughing through an utter inevitable thing to get its side effects and this is a way to do it super fast this is a humanitarian. dick suggested i think code so that's why i was surprised and it had problems the important line is this one here so what we're doing is taking the interesting thing pumping it into a double in the queue with a max link to zero in that means that as fast as possible using the sea code which the queues are implemented with plow through this little so it's in optimisation. what happens when mutation testing runs over this code is sees this is erode this is no that's a number and one of the things i know how to do is replace numbers with other numbers so changes number to forty two or negative six or something and then it runs the test and lo and behold. this number actually has no real observable fact outside this function it's such an implementation detail that nobody would it will ever know the replace this was something else unless you place with a really really big number perhaps that you your testing suite is not going to detect this number has has done anything all that's really going to happen when you change this to say ten. and is that heightens internal memory allocation system is going to something slightly different in a way that you probably can't even detect from the outside and this using a debugger to watch pipe into the sea level so nobody has tested do that are i hope you don't that's crazy so this is an equivalent mutant in his one this a bit insidious because this is very natural code to use so. this gives you a sense of the kinds of things that can jump up and bite you when you do mutation testing and its of in cities from it's tough to self and another one that i ran into after i started doing some mutation testing if i think is this every type of program the world has seen something like this you know his name his main then run this is how you make a main. function or executable script so to speak by the problem is when you're running this code in a test wheat name is never going to be main nothing side here is ever going to get executed so if you've got code in here in this block that has mutated is completely undetectable so this is another instance of a.
broad classroom the of of equivalent means things that you can never with the detect easily and we have to have some accounting for that in any system that does practical mutation testing that's why it's one of the three complexities. so it's really about it for the young good time for the theory so to speak of mutation testing is anybody have any questions before move on. good so will take a brit quick breeze through cosmic ray itself this is a mutation test until specifically for python by their nature mutation testing tools are not cross language that that i knew him there's no theory in that direction that i'm aware of either if you're interested in this kind of stuff you can get cosmic rays from get of its in our of corporate to him to get a repository. tories called causing great and it's some you know it's something that we're i'd be interested to get feedback on if anything else are few ideas for how to make it better or actual patches be even cooler one of the implementation challenges for this tool we have to determine which mutations we want to make somehow scan a body of code and figure out what we're going to do what you know what changes are going to implement. for one the to sweet with enough to make those mutations one of the time we can't just randomly see them throughout the code we have to do than one of the time and then run a test with the consent mutant undo the change in in some in some sense and then make another change we have to do that while dealing with all the complexities of just talked about the massive run times incompetence equivalence and things have one. his lines so how do we make that work in an intimate shelled well we have this idea of operators are talked about how operators were the small little changes in camps later changes so you have one plus to the first job of operators is to identify the places where it can make a change so i have some operators have a clue you know it's a class called the going to replace. this plus operator or something like that in the first thing that has to know how to do is when looking at the code say oh yeah that's the thing i know how to mutate the next thing it has to be able to do was actually perform that mutation critically though it's not the job of the operators to decide when to perform the mutation so there could have orchestrated by a higher level executive said of code this is have you found a place to mutate way. what you do that now or please don't do that now so that the operators in camps late this idea of detecting and creating the changes in the code that we need to do before we run the tests. operators implemented in a us would have to part waiters the operator itself which is the thing that knows how to detect report that is found a place to communicate and perform the mutations and there's a saying that i call course it's actually a kind of bad name for but it's the name that evolved out of them you know late night coding the core the course of thing that the. operator says hey i found a place to mutate in the course as ok well then you should make that mutation network not make that mutation now. course can do other things in the country kinds of things of course you right now are counting one of the important is we have to do is count how many mutations are going to be done so that we can build that they work order to do at the beginning of the mutation testing run and the other thing is actually making mutations so inside cosmic ray we have multiple modes of running in one motor running with accounting for the other motor running with the mutated. for implementation details but it gives you a sense of how things are going to pull together inside and we make a lot of use of the a s. t. module so the a s. t. module is part of the pipe and standards library and it lets you work with as you might imagine a estes abstract centex trees so what's beautiful about as t. as i can just handed a bunch of python code in texas for. women have a spit back to me and well and who knows what estes are from his record the uk that it's a program attic representation of a program it makes a tree so can take this text here this code and give me back something that i can work with insider program inside i think as i got quite thin coat working on the compiled by the code so to speak what do we use the a s. t. to do. we generate abstraction tex trees from high think that we literally take the pipe and source files you working on pumped into the his tea to get out the obstruction century abstract syntax tree which we can then work against we walk the a estes using the built in facilities in a dusty and actually make modifications we change knows replace knows we pull know. out that's how we actually do the mutations actually implement the impulse the mutations at one time and ok to manipulate is to use very clearly. there are a couple of you to me it's not easy sing in the world to do to to manipulate these things clean the sometimes you can make mistakes of the mess a poultry so the list he gives you some nice utility features to go in and mess with this the broader message from the slide is if you decide to try mutation test in your language of choice try to find the equivalent of the a s. t. don't try to write new a compiler or a good part. sir for your language that's the path of madness any reasonably building which is going to have something to help you do this i would hope. add to this as a corollary there's the compile function in this is something that i was really overjoyed to find out the compile function i could take in a dusty that i may have mutated pass it to compiling get back what's called the code object which i can then load up into the pipe and run time it actually use have the rest of the program used so this this is. the magic sauce at the core of of of how how cosmic ray does its work it can create web search centex trees modify them and then turn them in at one time into code objects that i can then actually execute if this didn't exist if i hadn't found this stuff ready made for me i would not done this project is as of this is the hard part and i don't want to do the hard part. lazy like all good programmers so how did the how the operators work sort of the functional son so we have this thing called the no transformer this is part of the a.s.u. models built into place than it knows how to walk syntax tree and give me opportunities to make modifications its it's a standard visitor pattern no transformer calls visit functions on the substance of this these are actually. the subclass are as as you may also you know it's hard core engineering right visit numb so season number and this is all i know how to do number replacement. this guy calls up to the base class and says hey i found a mutation site and this is how he says he calls the visit mutations a function this is not maybe the best relationship but again it evolved out of late night coding i should consider a factory at some visit mutation site says oh core we found a mutation site what would you like me to do and the core might say well i need you to the mutation. when you might just in from into counterfeits the counting core and commit to counter otherwise ask replace constant to actually do its work and once this is called once five is called then the abstraction tech streets modified injected up into the module list and used by the test sweet so that's the basic flow of you know bouncing through the code to. make things happen so the summary operators is that we use a estes how we implement operators which can detect and before mutations and use different course actually control how they do what they do when they do them so quick breeze through but i hope you're getting the point the next trick we have to do was no we've got the technology now to take part the source code creating is too. in modifying we have this code object how to actually make it available to the test week it's one thing to have the code object it's another to make the rest of my program use it and this is very very fast a review of how hyphen does module loading are essentially three main moving parts as a thing called the finder the finder is given a module name so somebody says import food. all the finders are asked made you know how to import food and of one of them should report back yard to import few and if it does know how to import through it hands back what's called a loaded and the loader is in a request later to actually do the job of have essentially populating a module so it's given an empty module shell and asked to fill it in right so that the loaders job. it is to in some sense to do all the name bindings inside the module. to make these two available to the rest of pipe and there's a thing called says met a path ces is skewed any part of the standard library and met a path is nothing more than a list a literal list of finders so when you say import food for your pricing code by thing behind the scenes goes to met a path and ask each finder in order you how to load food. do you know how to load food you know how to load food so you can see where this is going we in your loud to i guess i should say this you're allowed to make your own right so we wrote our own we have a custom find are all this find her has his name in a dusty a modified a us team we stick at the front of six metre path so that if anybody asks to low that module this guy goes you. i know how to do that. he handed back the loader which also has that a us team and knows how to execute that a.f.c. using to compile function to populate the name space so now we've got all the mechanisms we need to mutate and to install that is import a custom custom finder loader slash ac mutated a us to lower.
this this is another bit of magic that if this didn't exist in the language i wouldn't have been able to do this project would have given up a long time ago but critically once these guys have done their job and made this module available the testers don't have to know this is happening the tests just still naturally say import x. y. or z. and from their point of view nothing is chain. change they just been kind of snuck in underneath them in replace the modules are going to get with the ones that we want to test so finally had we figure out what to mutate somebody told me i want to test this you know this package. very very high level what what causes great does is we asked for a package which the pipe and term for collection of modules cause a great just scans of those using a technique that i remove the slide because of time constraints but there's facilities in the language for scanning for dynamically a package to find out what it depends on and what it contains. and we using that we were that's how we find all of the subject of his and dust all the files need to pull the source code out of to do the a s. t. parsing there are currently some pretty severe limitations on the kinds of models we can that we can mutate that basically we can only right now work with standard a p y fall source textiles we can't work with models that are coming. firms that files you can't work with models that are coming from h.t.t.p. imports all their hold you know this infinite supply of exotic kinds of models we can't work with we actually has fallen or face right now sets a big area of work that we need to look into given time an infinite money from and you can also tell us a greater there are certain parts of the packers i don't want to. work with either day we don't have test for them we know that they're bad or they're very hard to test or something along those lines but broadly speaking what we do is walk the tree of packages and modules to find out what needs to be tainted. so how finally do we run the tests this is where things get a slightly interesting to have a crash test dummy good picture right first thing we did what i just described the figure out what to mutate we go through the packaging forget all the things we might be able to touch with then create a single mutant we go in and we make the s. t. and we change the plus to minus and then we. install that you can we make it available for the import system using the financial loaders and then we have this concept of a test writer in the test reducing cap slates the different kind of testing systems in pipe and high test unit test knows its entrance cetera and we tell the tester know that the user configured to run the tests importantly critically really all these. that's year run in a separate process so for each mutant we actually started a new piping process this is primarily for sand boxing because you can imagine as you can imagine making a mutant could actually cause all sorts of wacky things to happen it could cause your mutant to going to fiddle with the the on the other six met a path for example that you rely. gone for your for cosmic ray to actually executes a cosmic ray says we're not we're not going to give them the chance to to mess with the test execution system are going to put every test everyone is going to run in a separate process this also allows us to do quite a bit of parallel as asian you can fire up as many parallel processes as we want and we couple this with something to talk very briefly about which is so. murray which is a message bus or a task distribution bus so to speak that we use actually run cosmic ray workers on however many machines you want so that very briefly is how we do testing and finally i talked earlier about incompetent mutants and you get these meetings said can go in infinite lives because you have for example are placed break with continued how do we deal with that. there are two strategies to make available to the user one is we can let them to provide an absolute time out they can save the test was longer than five minutes told incompetent the other which i think is better is that we let them run a baseline what we will run the baseline for them as it were to run your test week over your own mutated code over time it and then the user can. provide a multiplier one point two you tend something like that we multiply the time that we got by that baseline and that becomes the time out so basically saying if if the test any if any mutant takes longer than x. times the baseline time they were going to call it incompetent the but the basic high little strategy is there's a timeout there's no other real way to do it that i'm going. the spent time trying to investigate. the rest of the check the rest of the stuff just bits that we use something called steve door to do plug ins most of the functionality of them of cosmic rays provided by plugins it's really worth looking into a few using point that it's a wonderful system where you celery which i mentioned which is basically is the task you i say basically just a task is a very complicated piece of machinery sitting on top of rabbit in queue which is. but underlying it's a very robust why the use very powerful system so the causing great executive level start pumping jobs into the task you and we have some member workers which could be on your local sheen or be another machines and their job is to receive work and fire up a cosmic ray worker which is the think this is that separate process i was talking about this is the sandbox the. as a mutation installed at once the tests week and dies reporting back results of through the chain you can read all about salary at solar project at work it's a fascinating really excellent piece of software and we have a little database that we used to keep track for initially be easy to keep track of the work that needs to get done so within one of the first things you do in the mutation testing run his bill. but the work order and we pump all that information into this thing called twenty beat now use the counting core know we talk about these course use the county court to determine which work needs to get done and then we only schedule when somebody says ok now do the testing we only schedule things to a run that don't already have results so as results arrived back to the executive level. all we drop those results into the database so you can have interrupted test run and you can pick back up again that's very very handy so yeah we can know we can resume we use this and it's a natural place but the results so at the end of a full run it's really great you've got this database full of not only what you did but the results that came out of it in some kind of information and things like that it's very handy thing to have. you can get tiny d. b. at this could have repositories open source of some it's a very simple database and will probably have to replace it at some point because of performance constraints on a bit worried that as as we start running is over larger larger code bases with more more means that it's going to be more than this can handle but for small in bed and jason based databases thing is pretty darn. excellence so check it out as well he's dot which i was going to great detail about except that it's like magic you specify your help message it parses your help message and produces a command line parts are for you so it's the reverse of what you've probably been doing your whole life and hated so look in the dock up this work for python and about what thirty other languages i think it's a wonder. for wonderful project and and it really has changed the way i write command line programs at least i enjoy doing in a dock up that work check it out. some remaining work and probably more than i'm going to list here but there's some well known remaining work from my point of view one is properly in committing timeouts i think i've nailed this properly in this is really a question of learning how to use the celery a.p.i. correctly but for a while this is a real problem i was doing it wrong. a really big area is this question of exceptions interesting instructions what i mean is being able to embed or somehow tell cosmic ray for example please do not ever mutate this bit of code we saw that bit of code where for change zero to something else is undetectable i want people to put a tag or somehow annotate that bit of code say cars are great don't ever make that you take. here because i know i can test for and it's meaningless for you to tell me that. tools like pilot have support for this kind of annotations i think i can to get off of that but it's it's a big unknown area for me right now having spent a lot of time looking at it i just know it's an issue any support for friends of modules i talk about supply als and all sorts of esoteric them module forms that i don't support right now and i would like to integrate with coverage testing this goes back to the. the first use of the term base lending i had which is to correlate your tests with code that gets executed in your model i thought i would like to be able to pare the mutation to sweep up with the code is executed so i can know how to pare down the amount of work to do after changes been made to code after you've committed to repository there's a number of open issues as well. cosmic ray year more than welcome to fix them for me so that brings me to something i'm actually write really proud of the fact that we've got some real world results and look at them time i'm doing ok practical results on a real project that's actually being used to do real stuff so this is a picture of a reservoir modeling piece of software in it all.
penn's at the base all these pieces offer depend on a file format called said why which you don't have to know anything about except that it holds big chunks of data and when we succeed or have this project will send pie which is a pipe in the program for reading said why data right so pretty picture piece of software and it's really well tested within. this is a pretty pretty solid piece of software so we thought to be a good target to run cosmic ray over and so we did that one of the details of said pie will say why i should say is that it is floating point numbers are always are some time stored in i.b.m. floating i.b.m. system three sixty forty one format because said by was invented a billion years ago and you and you know when.
and this was in the world was black and white. so one of the things we have to do when reading in said why data is convert from i.b.m. format to standard i truly seven fifty four whatever format that piping uses internally so we have to have a bit of code that you that this is what the beginning of that function looks like so we've taken some sequence of four bites and we do much math one of the critical things we do.
his and optimization as a critical but it's an opposition says look at all four bites are equal to zero from if all four of the i.b.m. bites are zero then we know that this is a zero spit back and there was a very very common in this kind of data so it's a great optimization right and we have a test for this that asserts that if i pass the no four bites and i.v. into i truly. that's going to equal zero which it does writer great we ran cosmic ray over this because agree identified something interesting and said look i change this to less than and i changed to not equal in a change to a number of other things in this test continue to pass basically whatever i did to this this test still worked so basically you can with surviving and we've kind of pressure had. and thought about it a bit and we realize website. this opposition has absolutely nothing to do with a we in fact a determined that they could be anything this optimization really was asking if b c n d one zero then this opposition was ok a played no role whatsoever in fact if we change the code to take the a out everything still works perfectly fine basically a plate. no role in determining whether or not which returns are so we changed our code and we added a new test here which says for every into your from zero to fifty five which is all bites we ran this test with a and we expected to be zero and this is our new test cosmic ray was happy and let us use a really wonderful tool called hypothesis if you if you. if you've ever heard of property based testing usually and use that you should look into it it's almost magical and hypothesis is the tool for doing that in pipe and and gives a great opportunity to use that wonderful wonderful project so i have a few minutes left eye and according to the standard timing so will hold on to get out of this.
this and you a quick quick demo so this is from the cars are great to squeeze we have a test with actually tests cosmic ray verifies and you know it does the right things over here we have a bunch of basically meaningless functions that are easy to test and over here i have a bunch of tests for those functions and we basically run causing great over this and expect to.
it is euro percent survival rate so if i go to hear in this bottom window i've got this is celery when you're running a salary worker so it's talking to the message to waiting for work to show up so they can execute some so code on this topic here is that that big enough i can i can try to make it a bit bigger.
they're about that good enough so the first thing you do with cosmic rays you initialize the sessions i say in it i get this is this is telling it run the baseline testing and then make the top ten times that number of the session name is in the sea you a session is identifies the work you're doing adam is the name of the module mutating and all the tests live in the test. directory in it doesn't take very long in this case it's created a little file called and you see dodgy some which is the database that tiny t.v. the next thing we do is exact once i started zac you'll see the solar goes nuts and searching all sorts of work and if you read very quickly you can see that they're actually for workers are sellers pretty darn smart it's as though this machine has four course on. can use all of them to do this work and his takes a few minutes to run and then a few seconds in this case and then we can run reports will see the or percent survival rates right there so that's that's we want i want to prove you the cosmic ray is not just a bunch of smoke and mirrors try to prove it so i'm going here.
and i'm going to comment out effectively remove one of the test of this test you know he said is checking this function here this is the only test the checks that function without that we can make a change to do you know he said. and we will be able to detect and cosmic ray will let them you can survive such go back to cosmic ray after initial lies have to basically start fresh. the exact in a few seconds. to do and we can report them up like this unless we can actually track down what happened now if i search of the term survive in the outputs i see on something survived something at line twelve in adams up he why survived because the reverse you know he said operator was doing. made it made a change that wasn't detected well that's exactly what we expect so what's it wasn't line twelve in our file this right this is the one that we remove the test for so what happened is causing great got to that negative one made a positive one ran the test we began and the test was like everything's great and this is this is exactly what cosmic rays designed to detect the fact that you don't have.
the test for that so many go back over here. i do all that and show you essentially the inverse now i'm going to add some functionality. that has no test so these are these are these of this the two sides of the coin in one case i. um i had forgotten at a test another case of at his new functionality and still forgotten and tests so let's go back over to the.
the initial lies reject. and we report in just a second. right still have a survivor this one's online sixteen let's see what's online sixteen it is this right so quick hopefully slightly convincing demo that cosmic ray actually does real work. that is it i'm glad we got through everything so.
thank you very much. but. i guess we have some time for questions no was kicking off stage so if you want to ask a thing as a good time or we can whittle afterwards. as for you. so. this. it means theoretically yes you could you can hire somebody to do this and they could sit and they can make a modification i don't know of any know. what exactly are welcome to the world a c. plus plus yeah yeah it's a it's a practical problem that i guess though the the another answer to that though is that the eye test is a job a testing tool from verve for this kind of work in job of course is compiled now job are probably as better support for going into directly modifying compiled files maybe that's a ph. this is doing i don't know if i was to approach this and see plus plus i would look really heavily at l.v. him first to see what hooks it provides me to do things the smart way i wouldn't try to do this g.c.c. or even g.c.c. accidental but it's a hard problem because it has another wrinkle that you don't face a pipe you're absolutely correct. any other questions. i think it has no thanks again. if. if you. if. is it. so. i. he said. he put it. u s. but that. but. could it. of people and. and.