We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Genetically Modified Tests

00:00

Formale Metadaten

Titel
Genetically Modified Tests
Serientitel
Anzahl der Teile
52
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
CodeDesign by ContractElektronischer ProgrammführerLineare RegressionSoftwaretestFunktionalSoftwaretestCodeWikiLineare RegressionProgrammfehlerSoftwareentwicklerComputersicherheitKlasse <Mathematik>MultiplikationsoperatorSchnitt <Mathematik>Güte der AnpassungSummengleichungBefehl <Informatik>Lesen <Datenverarbeitung>UmwandlungsenthalpieIntegralObjekt <Kategorie>GeradeElektronischer ProgrammführerDateiformatParserVerband <Mathematik>Test-First-AnsatzDesign by ContractPunktDistributionenraumKomponententestUniversal product codeFormation <Mathematik>SoundverarbeitungIndexberechnungAlgorithmusKorrelationsfunktionEntscheidungstheorieDruckverlaufEinflussgrößeMAPGesetz <Physik>KommandospracheRechenwerkHumanoider RoboterBesprechung/Interview
SoftwaretestQuellcodeCodePhysikalisches SystemBefehl <Informatik>Operations ResearchMathematikKonditionszahlRandwertAusnahmebehandlungFehlermeldungSuite <Programmpaket>GleichheitszeichenSkriptsprachePlug inDateiformatFunktion <Mathematik>CodeKonditionszahlBeobachtungsstudieSoftwaretestCASE <Informatik>MathematikRechter WinkelVariableMathematische LogikGeradeEin-AusgabeResultanteBimodulBefehl <Informatik>PufferüberlaufLoopDateiformatInstantiierungBoolesche AlgebraKlasse <Mathematik>HalbleiterspeicherFehlermeldungProgrammierungMultiplikationsoperatorPhysikalisches SystemAusnahmebehandlungFunktion <Mathematik>QuellcodeTwitter <Softwareplattform>AlgorithmusProjektive EbenePlug inApp <Programm>Zeiger <Informatik>RechenschieberDatensatzNichtlinearer OperatorFramework <Informatik>GruppenoperationCodierungMultiplikationRestklasseArithmetisches MittelPunktBitKonstanteRandwertHumanoider RoboterUniversal product codeGebundener ZustandHyperbelverfahrenAutomatische IndexierungGanze ZahlComputeranimation
Humanoider RoboterArchitektur <Informatik>ComputerspielHumanoider RoboterComputerarchitekturStichprobenumfangDifferenz <Mathematik>Plug inSoftwaretestXMLComputeranimation
SoftwaretestVerkehrsinformationQuellcodeFramework <Informatik>MereologieComputeranimation
SoftwaretestExplosion <Stochastik>ZahlenbereichVerkehrsinformationZoomCodeSichtenkonzeptGeradeTaskWechselsprungXMLComputeranimation
Innerer PunktCodeCASE <Informatik>SoftwaretestArithmetisches MittelSystemaufrufMereologieGeradeTaskBefehl <Informatik>Computeranimation
Konfiguration <Informatik>CodeProzess <Informatik>Test-First-AnsatzServerDatenverwaltungFramework <Informatik>AppletHumanoider RoboterPlug inHumanoider RoboterSoftwaretestMultiplikationsoperatorPunktBefehl <Informatik>Arithmetisches Mittelsinc-FunktionFramework <Informatik>Mathematische LogikResultanteSoftwareentwicklerMessage-PassingCodeSystemaufrufGeradeKomponententestProjektive EbeneRefactoringCASE <Informatik>Funktion <Mathematik>Plug inElektronische PublikationKlasse <Mathematik>ProgrammverifikationMathematikEinfache GenauigkeitQuellcodeNichtlinearer OperatorBitServerDatenverwaltungProzess <Informatik>VerschlingungDifferenteTest-First-AnsatzVerkehrsinformationWellenpaketUngleichungZeitrichtungPhysikalische TheorieFormale SpracheInstantiierungTermByte-CodeVollständigkeitZahlenbereichMusterspracheCoxeter-GruppeProgrammfehlerSortierverfahrenMaßerweiterungSchreib-Lese-KopfJust-in-Time-CompilerComputeranimation
Hash-AlgorithmusSoftwaretestMathematikRechenwerkFormation <Mathematik>Befehl <Informatik>Coxeter-GruppeFunktion <Mathematik>ZahlenbereichKontinuierliche IntegrationMultiplikationsoperatorCodeUnendlichkeitVerkehrsinformationSoftwaretestProzess <Informatik>ResultanteGeradeKlasse <Mathematik>Weg <Topologie>CASE <Informatik>RechenwerkFramework <Informatik>InstantiierungQuellcodeNichtlinearer OperatorLogischer SchlussSpannweite <Stochastik>Zentrische StreckungComputeranimation
Transkript: Englisch(automatisch erzeugt)
Thank you very much. So, yes, as you said, we're going to be talking about genetically modified tests. So it's an introduction to mutation testing. So thank you for the introduction. My name is Xavier Gouché. You can find me also on all the social networks. So
if you want to connect after the talk, you're quite welcome. Anyway, let's start with a show of hands. Who here use or knows about unit testing? Everyone. Great. About integration testing? Functional testing? Great. Test-driven development? Good.
Code coverage? Mutation testing? Good. So you're in the right room. So anyway, let's start with the premise of this talk, is that code has bugs. We all know that. No one has
ever written bugless code. And when you write your test, it's written in code also. Most of the time. So, of course, this means that tests also have bugs. And so the goal of this talk is to know that if the tests are there to ensure the quality of your
code, what's going to be there to ensure the quality of your tests? So let's start at the beginning. We're going to talk briefly for those of you who didn't show your hands about everything. So basically unit testing. It's like a single musician passing
an audition. It's just making sure that on its own it does know how to use instruments. Then after that, you have all the integration testing. You're going to have a couple of musicians working together and making sure that they have the same rhythm, that they are in harmony. And then you have the functional testing. You're working with the
production code. It's much like a dress rehearsal. You have all the musicians on stage. You have the lighting. You have all the effects. Making sure that the full show works perfectly. So if we keep with this analogy, test-driven development is basically just writing the score beforehand instead of just improvising the music until it actually sounds good.
And code coverage is basically the idea that you're going to make sure that the test you're testing that every note is played correctly. So, when you're launching your
test, code coverage is just making sure that every line of code is seen by one or more tests. The problem with code coverage is that it's imperfect. You can have your tests with Android person code coverage and yet you're not actually testing correctly.
So this is a perfect example of what I mean. It's that to have 100% test coverage, you just need to call your code. The code coverage just says when I run the test, this line of code is called. The statement is called. It's run through in the JVM.
But the code coverage doesn't tell you if your tests actually verify anything. So, there are some issues with that. Because some people, some companies I know, use the code coverage as a KPI. And they measure it and they have objectives on the code coverage.
And the problem is that when a measure becomes a target, it ceases to be a good measure. Because the code coverage is very easy to fake. You just need to have your tests calling your code. Not testing anything. Just call your code and you're going to run through all the statements and very easily you're going to have 100% code coverage. But you're not
testing anything. And this is a correlate of the good hearts law. It's the Campbell's law that the more any indicator used for decision making, the more subjected it will be to corruption pressures. So, basically, when you say, yeah, we need to have 80% code coverage. We need to have 100% code coverage. Don't worry. You're
going to have 100% code coverage. But you're not going to test anything. So, anyway, code coverage just tells you that you run through the code, but it doesn't tell you if you are testing anything. So, we're just going to take a small parenthesis
on the goal of tests. The main goal of tests is to verify that the code works. Okay? Is the basics. But it's not the only reason to write tests. One of the main reasons I think is to make the contract explicit. That is, when you write the tests, you are actually writing in code what your feature, what your class is expected to do. And it's
a stronger contract than all the specifications that you can have on a Wiki because it's actually run. It also helps to guide the development, especially if you're working with TDD. So, if you know the contract, it's going to guide how you're going to develop
the feature and you're not going to develop any unneeded features. It's also useful to prevent regressions caused by other developers or just yourself in the future. And one of the other things is to ensure the retro compatibility. Basically, when you
have other developers, when you have, let's say, an old format that you need to parse and you change the parser or you add a new format, you need to stay also compatible with the old formats. So, it's mostly the last three or four points that we're
going to be interested in when we talk about mutation testing is to ensure that the test actually catch regression or retro compatibility issues. Because bad tests can give you a false sense of security and you need to be as confident in the tests
you code that you are in the code you test. And you can actually quote me on this one. So, this is where the mutation testing comes in. So, we're going to take a look at the basic algorithm of mutation testing and how it works. Basically, mutation testing 101 is first you write your test. You all do that. So, that's very good. Then you
mutate the code. We're going to see just later what it means. Then you watch the test fails. And, of course, you profit afterwards. So, when I say you watch the test fail, basically, you need to see your test as your security team. The tests
are there to catch any bug or any issue in your code. And mutation testing is not going to find bugs. It's just going to run a security drill. It's going to say, hey, let's say we make your code bad. And let's see if the test catch actually the bad code. So, step one, write the test. So, it's pretty easy. You just make sure
that all the tests is green. Step two is mutating the code. That is, you make one or more modification on the source code. And hopefully a mutation, a modification of your source code should make the behavior of your code change.
And if the behavior of the code change, the test should fail. Okay? So, when you say you mutate the code, you mutate your production code and not the test code. You mutate the source code of your app. And so, let's talk more about what's a mutation. A mutation is a modification of a statement in the
code. So, basically anything that is a statement in your code can be modified. Some examples can change the math operation. So, for instance, if you have A plus B, you're going to change it into A minus B. If you have A times B, you're going to change it to A divided by B. You can change the condition
boundaries like you change a greater or equal into just greater. You're going to change equals into not equals. Stuff like that. You're going to change the Boolean conditions. You're going to change the constant values.
For instance, if you have an integer value, you're going to change its value. You're going to change the written statements. Say, for instance, you're going to say, well, this method is always going to return nil. And will the test catch it? And, of course, there are a lot more kind of
mutations that you can have. So, step three, you run the test again and you watch the test fail. Because you changed your source code, the test should fail. Because the behavior is not the one you expect in your test. So, at least one test should break for each mutation that you introduce.
Or not. So, why would a mutation die? Pretty obviously you can have the test condition fail. So, the mutation will change the output of the method you're testing. And so, your test condition will fail. You can have an It will change the size of an array and you're going to have an index
out of bound. You can have non-variable code. Even though most frameworks try to avoid this. Because you are going to change the code. Sometimes the code won't compile at all. You can have a system error. That is, you can have a memory error. Because you're going to
allocate much more memory that you required. Or you're going to have a stack overflow. And you can, of course, have a timeout because the end condition of the loop cannot be met ever again. Now, more interestingly, why would a mutation survive? Because this is
where the fun happens. A mutation can survive because first it's uncovered. So, basically, if you have a line or a method that is not covered by your test, obviously, any mutation in this method won't be detected by your test. Because no test runs through this method.
You can also have a silent mutation. So, silent mutation is very tricky to detect. Because it's a mutation that changes a statement in such a way that it actually still does the work. To give you a concrete example, I had this issue myself on a program I wrote. And I had a
method that did some math on a couple of variables. And at some point I had I took A plus B modulo something. Modulo C. And the mutation just changed my A plus B to A minus B. And the thing that
I discovered during my test is that B was always a multiple of C. Meaning that even changing A plus B into A minus B, it will always give the same results with the module next. So, basically, the change didn't the mutation didn't change anything in the logic and the code
was still correct. Because actually my two variables were linked. And it was a specific problem. But, yes, silent mutation can still happen. And it's not a bad thing. It's just sometimes it happens. And mostly
when you're gonna have a mutation surviving, it usually means that your tests are either incomplete or bad. So, I'm gonna take a very quick example of a mutation texting in action. That is, for instance, you
have this method that takes two Boolean, A and B. And you're gonna have a result that is either 42 when both inputs are true and zero in every other cases. And you have your test methods. And you say, okay, if I run check on true and true, I expect 42. And if I run
check on false and false, I expect zero. So, obviously, there is something missing in the test. And when you run the mutation, you're gonna have your A and B change into A or B. And, of course, your tests are still gonna be green. Because you still true and true is
still true and false or false is still false. So, to fix your test, in this case, you just need to add a couple more cases to cover all your ground. So, this is a basic example. You can have a lot of
examples. But we're gonna get our hands dirty, right? And so, basically, you have a plugin for Android projects that helps you run this. So, you'll find it in the slides. I'll post them on Twitter later. But basically, you have a basic plugin. It's very easy to add
to your Gradle project. You can configure it pretty easily too. So, basically, you just have to say which classes are gonna be mutated. So, here I just want to mutate every class that lies in
the example package. And you just define your output formats. So, I want both XML and HTML format. And the result is gonna be something that looks a bit like this. So, those of you who already use code coverage, it's gonna be pretty familiar. But instead of saying you didn't walk on the line, you didn't cover the statement,
it's gonna say that a mutation on this line was not covered by the test. So, let's take a closer look live. So, I took my I took one of the Android samples, Android architecture samples that are
available on GitHub. And I basically, if you take a look at the diff, I only added the mutation testing plugin. Like I showed earlier. So, first of all, the first part is just making sure that
the test works. So, hopefully it's gonna run. So, I'm not sure you can read everything. Basically all the tests are green. So, that's
good. So, now we're gonna run the mutation framework. And basically what it does, it starts to analyze the all the tests. And it's gonna try to find all the source code that can be modified and
that will impact the tests. So, it tries to not change something that's not covered. Even though sometimes it still does so. And so, here you have just a report that says all the surviving and
killed mutation. But a better way to do this is to just open the report. So, it's an HTML report that lies in the build folder. And so, here, let's just zoom in. So, here you are the coverage
report is gonna be presenting that way. And you're gonna have both the line coverage. So, it's the code coverage and the mutation coverage. And these two can give you a clue on where to look for issues. Because right here, the code coverage is not 100%. And
so, this helps you comparing the code coverage with the mutation coverage can show where the issues lie. Like for example, here in this package, we see that we have more code coverage than we
have mutation coverage. So, we can jump in actually and we can see here that we have almost we are more than 50% code coverage, but only 20% of mutation coverage. So, if we take a look here, you're gonna have the same the view of your code. And each green line is basically the pale green lines are gonna be not
mutated code. The pale red line is not mutated code, but not covered. And what we're interested in are these lines. Because here we see that when we when we call the delete task
here. So, this part is correctly killed. So, we have we had a mutation and it was killed. Meaning that the test detected that the code was changed. But these two lines were not detected. Meaning that we could change the these two statements. And
nothing was found by the test. So, basically here, this means that we don't test that when we delete a task, the listener is warned. Which is a pretty bad thing. So, this using this, you can actually find the way your test miss cases. So, let's
go back. So, first I'm gonna answer some frequently asked questions. Yes, it does work with Kotlin. Because basically
this framework works on the byte code. So, it works with Kotlin and it works with any other JVM language. Because it just modifies the byte code statement and not the source code. Well, maybe because I only tested with Kotlin, I
didn't bother training with JIT or Scala. But in theory, it should work. It's very configurable. Meaning that you can change which mutation you want or you don't want applied on your code. You can change the number of mutation you want to run every time you do a mutation coverage path. And
it's extensible. Meaning that you can write your own mutation. So, this framework has a lot of mutations available already. But you can have your own if you have some patterns that you want to mutate. But there are a few things to keep in mind when doing this. The first one is
mutants won't find bugs in the code. They are just going to be able to reveal test issues. And mostly it's going to reveal the edge cases that are not covered by a test. It's not bulletproof. We saw some example of a mutation not
being detected by the tests. And it's perfectly okay sometimes. Changing some statements won't change the logic, actually. And it's not a viable metric because of that. Because you can still have some surviving mutations
which don't change the way your logic works. You can refer to the output value and say, oh, I only have 50% of mutation coverage. This means that my tests are all bad. No, it doesn't mean that. It just means that half the mutations are still surviving. One thing that's
important to know about mutation testing is that it only simulates atomic faults. That is, it only simulates removing a call to a method. It only simulates a single bytecode statement. So, it can change a single
math operation. It can change a single inequality check. It can change a single return statement. But it's not going to be able to simulate a complete refactoring of something. Like, for instance, if you write if you have, let's say, a sort algorithm and you have your
test running through it, it's only going to mutate small atomic faults. It's not going to create a new sort algorithm and make sure that your tests still work against it. Also, it's very costly. Because basically
each mutation is only on a single statement. And if your project is large, well, not even large, but if you have even a medium sized project, that could mean thousands and thousands of statements. So, you can, of course, be exhaustive with this. Most of the framework
and the one I presented lets you have four or five mutation per classes. Which is still long. Because for each mutation, it's going to run the tests against it. So, if you have a very, very large code base, the
time spent on this mutation testing can be very, very huge. So, my own recommendations about this is to only use it locally during TDD process. So, basically when I want to develop a new feature or I want to change an existing feature, I'm going to work TDD. So,
I'm going to write my test, write my code, refactor things a bit. And once I'm satisfied with my feature and I'm satisfied with the test I wrote, I'm going to run a couple of mutation testing paths. And with this, I'm going to be able to detect if I missed some head cases of if my tests are incomplete and I
missed some verifications. But I only do it locally. It can be automatically triggered by PR. So, for each commit you can say, okay, I'm just looking at which files were changed and I'm going to use that to filter which class can be mutated. It's fairly
easy to do. But I don't run it on a CI server for a couple of reasons. The first one is it takes a long time. And because we have a lot of developers committing all day long, of course, this means that
the longer a run takes to validate a PR, the longer you're going to wait for the next PR. The other thing is that each mutation coverage path is only going to test several mutations among all the
mutations possible. This means that two consequent mutation paths won't have the same results. So, this means that you can't keep the values and make it viable. So, of course, the coverage value is not shared with management because
each run can have a roughly different output. And of course, it's not absolute. And always take a result with a grain of salt. As I said, it's not because you have a mutation surviving that it means that your tests are wrong. Sometimes a mutation survives for the good reasons. So, these are the
links to the frameworks I use. So, the basic one is a test framework. And you have two plugins. There is an Android Studio or intelligent plugin called Zester that allows you to run mutation testing on a single file. Much like, you know, you have
the green arrow inside your Android Studio and you can run a single test. With Zester, you have a blue arrow and you can run a single test with mutation testing. And the other one which I talked about in this presentation is the PI test plugin that works with Android projects. And allows you to run
a full pass on all your projects. So, that's it for me. Thanks a lot for your attention. And if you have any questions, I think we have time for a small Q&A.
Yes, we have about three or four minutes time for any questions. If you have any, please line up in front of the microphones. In fact, I don't see a line. I don't see one single person standing in front of any microphone. I'm the only one. And I don't have a question since I'm not technically a
guy enough to. I think there's a question here. Is there? Yep. Wonderful. So, first of all, thank you for the great talk. You're welcome. I've already added this plugin, as you said. The test. And the issue that I had is that I wanted to run the mutation test only for the methods I had unit tests for. And the problem is
that I got a very red line for the methods that had no coverage. And I wonder if there is any way to run the mutations only on the methods you have unit tests for? I don't think that this framework actually does
it. But you can still trick the you can trick the you can yeah, you can take the XML outputs of the mutation report and just filter all the not covered. Like, for instance, here, here
you have you're going to have the report that says no coverage and basically you can filter the XML report to remove all the entries with no coverage and only keep the entries that are not of this kind.
Thanks a lot. You're welcome. The next question from the other side. Hi, thank you for the presentation. It was very interesting. You said that you don't run it on continuous integration server because it takes a lot of time, it's not always reliable, and it always has different results if I understood correctly. So you recommend to do it locally,
but then if I do it locally once, it will give me one result, then maybe I cover all these cases that I missed, then I write again, then it will show me again. Yeah, that's what I say. Actually, when I do it, I run a couple of paths to make sure that it's going to take different mutation. So this
framework actually keeps unless you clean your output folder, it's going to keep a track of all the mutation he already tested. That way, if you have a mutation that's already covered by your test, it's not going to try it and it's going to try something else the next time. And one more question, follow-up.
You said that it takes longer time, so let's say all my unit tests run in two minutes, what range of scale I can count on? Is it three times longer or it depends how many mutations it does? Basic answer is it depends, because you can have only one single test with one single class,
but it runs in two minutes because it does a lot of processing, or you can have, I mean, it depends. But it's mostly the more code you have, the longer it's going to take. Yeah, exponentially longer, I guess. Okay, thank you. You're welcome. Hi. If I understand correctly, each line in the
code can be mutated or changed. Each statement in the code, that is, you can have a line that says, for instance, result equals A plus B minus C, the whole divided by D, et cetera. So each operation is a statement that can be modified. And each statement can have probably infinite
number of possible mutations. No, no, no. Each statement can have only one or two mutations, like A plus B will always be changed to A minus B, because otherwise it's, you can't infer what's going to be
going on. But if each statement can be mutated, what does that mean that mutation coverage indicates, is a number indicating number of statements and the number of mutations applied to it? No, the mutation coverage is basically the number of mutations that are covered by the
percentage of lines that are covered by the test when mutated. So basically, if you have like 100 lines in a source file, and it's going to try mutation on all the lines, and if the result is 80% mutation coverage, it
means that 80% of the lines when mutated are detected by the test.