Fear the mutants. Love the mutants.
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61897 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Computer virusStatistical hypothesis testingProcess (computing)Information technology consultingQuicksortFood energyExpert systemHypothesisCASE <Informatik>VideoconferencingProduct (business)MereologySoftware developerWordDivisorAverageMachine codeAuthenticationTelecommunicationDiagramComputer animation
03:04
FlagStatistical hypothesis testingDataflowException handlingMachine codeProjective planeStatistical hypothesis testingPoint (geometry)Latent heatWeb pageSoftware developerDomain nameProcess (computing)Boss CorporationRootQuicksortMereologyMultiplication signBitSoftware maintenanceProduct (business)Negative numberInformation technology consultingControl flowUnit testingMathematicsWhiteboardField (computer science)DatabaseINTEGRALLinear regressionServer (computing)Universal product codeElectric generatorHecke operatorComputer animation
09:47
Game theoryService (economics)Statement (computer science)Condition numberComputer clusterAverageMachine code2 (number)Statistical hypothesis testingDataflowStatistical hypothesis testingPoint (geometry)Condition numberSystem callGoodness of fitBitResultantJava appletService (economics)Physical lawCASE <Informatik>Execution unitMultiplication signFunctional (mathematics)Statement (computer science)Product (business)Formal languageSuite (music)Metric systemContext awarenessParameter (computer programming)Crash (computing)Unit testingStreaming mediaMereologyComputer animation
16:30
Statistical hypothesis testingStatistical hypothesis testingRevision controlGoodness of fitMachine codeRow (database)BitSuite (music)MathematicsComputer animation
17:31
Revision controlMaxima and minimaStatistical hypothesis testingMessage passingProduct (business)Suite (music)Entire functionKey (cryptography)CompilerMachine codeSoftware frameworkBoundary value problemTheoryBitMathematicsLatent heatLogicProcess (computing)Revision control1 (number)Operator (mathematics)AnalogyShared memorySoftwareMultiplicationSoftware bugAbstractionAttribute grammarUniversal product codeComputer animation
22:35
Maxima and minimaFrame problemComa BerenicesPhysical systemGroup actionMachine codeInformation securitySynchronizationNumberProgrammschleifeGamma functionGEDCOMUsabilityStatistical hypothesis testingComputer-generated imageryDefault (computer science)Machine codeSoftware frameworkJava appletHecke operatorSoftware developerBitAuthenticationEntire functionNumberSuite (music)Function (mathematics)Statistical hypothesis testing1 (number)Validity (statistics)Configuration spaceMessage passingCASE <Informatik>Type theoryRevision controlSoftware bugStatistical hypothesis testingMathematicsEnterprise architectureLine (geometry)Presentation of a groupPrice indexSoftware repositoryMultiplication signComputer animation
27:45
Statistical hypothesis testingClient (computing)Formal verificationJava appletScripting languageSoftware frameworkStatistical hypothesis testingStatistical hypothesis testingLogical constant2 (number)BitJava appletSoftware framework1 (number)Client (computing)Machine codeQuicksortMessage passingSuite (music)Front and back endsDifferent (Kate Ryan album)Goodness of fitFormal languageFunctional (mathematics)Module (mathematics)MereologyComputer animation
30:48
Statistical hypothesis testingExplosionLine (geometry)NumberSocial classPoint cloudTraffic reportingStatistical hypothesis testingGoodness of fitVirtual machineSoftware bugLine (geometry)SubsetSocial classElectric generatorSoftware frameworkRevision controlFormal languageMachine codeStatistical hypothesis testingService (economics)Cartesian coordinate systemGraph (mathematics)Point (geometry)Computing platformUtility softwareDifferent (Kate Ryan album)Metric systemExecution unitQuicksortEntire functionBitDomain nameProjective planeBootingProduct (business)Point cloud1 (number)10 (number)Spring (hydrology)Sheaf (mathematics)Physical systemComputer animation
37:38
BuildingInformation securityDigital filterGroup actionStatistical hypothesis testingData managementMachine codeCache (computing)Branch (computer science)Event horizonRun time (program lifecycle phase)Machine codeOpen sourcePhysical systemProjective planeGroup actionComputer fileBitDisk read-and-write headFunction (mathematics)Freeware
38:37
Strategy gameMatrix (mathematics)Error messageStatistical hypothesis testingRevision controlInstallation artGroup actionFunction (mathematics)Statistical hypothesis testingArithmetic progressionMachine codeLine (geometry)State of matterFlagBitSoftware bugPrice indexEnterprise architectureSoftware frameworkNeuroinformatikEntire functionDecision theoryPhysical systemTraffic reportingSuite (music)Product (business)MereologySoftwareMultiplication signRevision controlMoment (mathematics)Process (computing)Projective planePoint cloudFunction (mathematics)1 (number)Default (computer science)Type theoryFreezingStatistical hypothesis testingGroup actionCombinational logicLatent heatDifferent (Kate Ryan album)Ferry CorstenJava appletComputer animation
46:03
Wiener filterGamma functionMachine codeSoftware frameworkElectric generatorStatistical hypothesis testingType theoryProduct (business)MathematicsPhysicalismComputer fileTerm (mathematics)Film editingOperator (mathematics)BitRange (statistics)Frame problemFormal languageJava appletExpressionResultantTraffic reportingComputer virusConfiguration spaceStatistical hypothesis testingSet (mathematics)Multiplication signLogicSystem callProjective planePhysical lawMereologyElectronic mailing listCASE <Informatik>QuicksortGoodness of fitStrategy gameComputer configurationSubsetComputer programmingCompilerVirtual machineError messageImplementationSpring (hydrology)Thresholding (image processing)PlastikkarteComputer animation
53:30
ArmMultiplication signOpen sourceMalwareEvent horizonPlug-in (computing)Computing platform1 (number)Form (programming)Software frameworkSocial classStatistical hypothesis testingTraffic reportingState of matterProjective planeCondition numberWeightComputer virusSoftware repositoryExtension (kinesiology)Game theoryFacebookTerm (mathematics)System callINTEGRALResultantGroup actionPoint (geometry)Machine codePhysical systemData managementRandomizationHand fanRevision controlSoftware maintenanceElectric generatorComputer animation
01:00:56
Lemma (mathematics)Computer animationProgram flowchart
Transcript: English(auto-generated)
00:05
No problem. Hello. How's it going, everybody? A lot of people in this room didn't really expect so many. This is wonderful. Thank you for coming to see us. Just want to say that we want to talk today about mutation testing. That's what we're here for. If you like this penguin, does anyone not like this penguin?
00:23
Just you. Okay. Personal vendetta noted. So this is a penguin generated by Dolly. So hopefully it's friendly enough, because this is going to be kind of part of our talk. We're going to see a lot of penguins in this talk. So if anyone has a personal objective of penguins, please speak now. Otherwise, if you like penguins, can I get a hand up just to see? Are we cool with that?
00:43
Awesome. I've never seen so many people kind of sort of want to put their hands up, but not really be sure. I absolutely love the energy in this room. So my name's Max. I'm this guy. As you can tell, I'm also this guy. But I'm here to talk to you about mutation testing. I work for a company called Vonage, and I'm a Python developer advocate there. Now, what that means is that I maintain our Python tooling.
01:04
And so I'm kind of here to talk about mutation testing, because I've just kind of went through this process myself of understanding all this stuff and applying it to my own work. And so I want to show you kind of how that went. But with me, not only do I have the tallest person in the room. Stand up straight. I don't know. This person's one hundred ninety six centimeters tall.
01:21
I'm like one seventy seven. I'm not sure. I promise. I'm average in Britain. So in this place, right? So this person knows a lot more about mutation testing than me. I'm really not the expert here, but I just want to say this is Paco, if you want to. Yes. So, yeah, I'm Paco. I work for Opel Value, a small consultancy company in the Netherlands. And I got into mutation testing via my thesis. So when I wrote my thesis on test effectiveness, I wanted to learn more about mutation.
01:46
Also, after that, I got into speaking at conferences and more spreading the word about this quite awesome tool. And I hope that at the end of the door, you haven't talk. I have another cool tool in your toolbox to. Write better code. Awesome. So if we're cool with that, we do have to do the
02:01
obligatory these companies paid for us to come here and pay for our flights and stuff. So, you know, what my company does, I'll just quickly tell you. So we do communications APIs as a service, basically. So things like SMS, like voice calls, like video chats, like two factor authentication or via API. That's kind of what we do. That's that's really just what I want to say is relevant because I will show you what actually applied this to, which was one of our SD case.
02:23
Yeah. So for me, we don't actually have a product to sell. Also, definitely didn't fly here from the Netherlands. Just just to make sure it's just a two hour car drive. No. So we're just a consulting company and we really like to share knowledge. So that's mostly the reason why I'm here to tell you more and teach you more.
02:40
So quite simple. Yeah. He doesn't have the funding crushed. I do, unfortunately. Luckily, we're all good. So there's two of us on this talk. There's two of us here. And actually, there is a third person in this talk. We've seen a hint about this person already. But this person is really the thing that's going to tie this whole talk together. And it's going to get us all feeling good about mutation testing. So this person is very important.
03:02
So say hello to Henry. This is Henry. Look at his little face. OK. Hands up if you think Henry's a cute, cute penguin. That's right. Thank you very much. Yes, I'm glad we agree. I'm glad we're on the same page. Now, this is just some quick audience participation, because if you can't tell, we're quite big on audience participation.
03:23
So quick question here. He was heard of this stock photo, but more importantly, he was heard of testing. This is just the check to see if we found the room. Thank you very much. Great stuff. OK. Who's heard about code coverage? A lot of people. Maybe not everybody. And that's OK if you haven't. We're going to talk about code coverage. So please don't worry if you haven't.
03:40
But yeah, it's awesome to know that some people have. That's a good starting point, too. OK. Final one. And I'm going to say, other than via knowing about this talk, who's heard of mutation testing? Oh, quite a few. Yeah. And a quick break. And who actually is already using was already using mutates testing? Ah, nice. There are enough quick wins here and hopefully you have some good experiences.
04:04
Yeah. So really nice to see that people sort of are familiar with the concept. But if you're not, it's also OK because we're going to go through this like you don't know anything at all. Because when I started doing this, you know, a few months ago, I didn't know anything at all. And so I want to take you through that journey as well. And that's what we're going to do. But before that, what I want to do first is give us some background.
04:21
And what I actually really want to do is pass to Paco, who knows a lot more about this than me. So I'm going to pass to you right now. Yes. This is going to be some improvising. Good work. Good luck. I'm going to drink water with this. Nice. Great. So, yeah, we're first going to talk a bit about testing in general. And then we're going to more specifically talk about unit testing. So just a quick check.
04:42
Does anybody know what a unit test is? That's great. I don't have to explain that part. For those who don't know, it's the smallest possible test you can write in your code base. Just one method and you write one test for it to test the outcome of that method. Now, there are many different reasons why we're writing unit tests. And I think one of them, my favorite or the most used one is for maintenance.
05:05
We write tests because we want to be confident in the changes we make to our code base. So whenever we make a small change, we add a new field to some endpoint that we know that we didn't completely break the database integration because it can happen at times. So, yeah, that's very important. Maintenance regression testing.
05:23
But there are more reasons. One I like also a lot is tests can actually serve documentation purposes as documentation. You can use tests to describe certain scenarios in your code base that when you have a specific test for that, it already makes clear this is intended behavior.
05:42
I have an example for this. I worked for a company where we had an endpoint that returned warehouses. These warehouses, just the domain object, had a soft delete. So there was a flag in there that indicated whether it was deleted or not. At some point, this endpoint returned both deleted and non-deleted warehouses.
06:03
At some point over time, as we were working on it, a new guy came in and looked at it and said, hmm, that's strange. Why are we returning deleted warehouses? Why would you want that? It was a fair question because we also forgot. And there was only one test which tested the success flow. And you can already guess here a bit.
06:21
So the success flow in this case, I mean, they only returned non-deleted warehouses in the test. So he made the changes. And we all thought, oh, this makes sense. It looks broken. Of course, didn't check with product management, the product team deployed it. And then you can guess, of course, this was broken. So we had to revert it. And the whole lesson here was just one test which also included a negative scenario with tests,
06:43
with warehouses that were deleted, could have already been a trigger to think like, hey, this behavior is intended. And that's where tests can serve as sort of a documentation purpose. Also very useful in getting to learn a new code base. So whenever you're on a new code base, you have this very complicated method. A test can help you kick step through the method to sort of explain what's going on, for example, while debugging it.
07:07
Not another one. This one is here for the consultants. So who here works as a consultant? Oh, not that many. Wow. Because we're sort of the root of all evil always. We tend to run to the next project and we don't have to maintain our own code often.
07:23
Not always. And so I have this nice quote that's mostly also for us. Keep in mind that you're not doing this only for yourself. I had a colleague who once told me, keep in mind that you always have this point in your development process where you think,
07:42
okay, should I write a unit test for this? It's going to be a painful unit test. I know that it works. I do really have to document it. We all know how it works. Yeah, sure, we all know how it works. But we also leave the project and then go on and go to another project. We, as in consultants. And I always speak to myself, what would I do if I would be the next person?
08:00
So what would I do if I were the next John or Jane Doe working on this project? So tests are not there just for you, but also for the next person working. I would actually like to jump in here because I've been that person. Thank you. I've been the person who works on a project after someone's left that. And honestly, if you have good documentation or if you don't have that, if you have good testing, thank you, you do your water break.
08:23
So if you have good testing, it can really help you understand what a project does. And so when I came to a certain project recently, I didn't have necessarily the kind of testing that I would have liked to really document my code that well. And so honestly, if I had someone like Paco who actually was a bit more conscientious with what they tested, that would have really helped me get on board with the project quickly.
08:42
But as it was, this was a real problem for me and it was something that we want to hopefully avoid other people having to deal with as well. Quick question, actually. Has anybody ever taken over a code base that they maybe look at and go, what the heck is this? Okay, so you know what the point of this slide, right? You know why we're saying this. We know this is important. Now let's stop that from happening to the next generation of very pained developers, right? Let's stop that happening.
09:03
Yes, so write tests. And if all these reasons haven't convinced you, there's often maybe a team lead or a boss or somebody else who's telling you to write tests. In most cases, there's always, of course, exceptions. Ah, okay. Wow, this is annoying.
09:20
So at the end of the day, we're all writing tests. If it's not for ourselves, then it's for someone else. And even though we're now sort of happily all adding tests, we also have to sort of sketch a problem scenario here. And this problem is that as projects evolve and grow, our tests also evolve and grow. But the problem is that we do refactor a lot and we spend a lot of time on keeping our production code clean and well monitored.
09:44
We have lots of metrics, where on the other hand, for tests, what you can see on long-living projects is that sometimes you just get tests where nothing more than a blank setup and teardown and some mocking going on. Because the functionality already moved long ago. Which brings to the point that test code is often not monitored. Test code is sort of our, the kid that didn't get all the attention it needed.
10:07
So there is still one metric for testing. What do you think is the most used metric for test code? Yes, yeah, we sort of gave it away already in the intro. But yes, yes, code coverage.
10:21
Code coverage tells you how much of the code is executed when you run the test suite. And I personally really like code coverage because it already helps you write more and better tests. And I want to go through a simple example here to show you how it can already help you. So here we have a submit method. So this is the Python guy. I'm the Java guy.
10:46
So the context is you are at the conference and you have a service where you can submit proposals. You can only have, you can't have more than three or more proposals and you can submit after the deadline.
11:00
If you do that, there will be a failure and otherwise you will get success. So quite a simple method with everything as a parameter just to make it easy to explain. So if we would take method coverage. Method coverage is the simplest coverage metric we can get which checks is this method coverage, yes or no. We can add one simple test, call it test x, which submits a proposal.
11:23
There are no open proposals, which is good. And we have a deadline that's 999 seconds in the future. So great, now we can get a step further. We can get into statement coverage. And with statement coverage we check, well, if each statement was executed. And now we see, hey, we didn't cover our unhappy flow.
11:41
So we need to add another test. In this case we add another test which has five open proposals which means this check evaluates to true and we have a negative scenario. Now we can even go one step further through, for example, condition coverage. And with condition coverage we check if each boolean sub-expression has been evaluated to both true and false.
12:05
Because what we don't know now is whether our deadline check is actually working. We just know that it returns false but we haven't seen it return true yet. So we add one more test now with a deadline that is 999 seconds in the past. And now we have three tests. And this is already why I like code coverage so much.
12:23
Because it really helps you write proper tests. Proper, it helps you write tests. Because let me get on to the good part here. As I said, write better and more tests. Code coverage is really easy and cheap to measure. I think most of the language is just a matter of instrumenting the code. You run the test suite and you get a nice report out of it that everybody
12:43
can quickly see and you can quickly see the pain points of where you're lacking in testing. To get a bit further, as I mentioned, it shows you what you didn't test. But the only guarantee, and I'm going to get to the bad parts next, is that the only thing that shows you is that what you did test didn't crash.
13:02
It doesn't guarantee anything actually about the functionality. Because code coverage can actually be quite misleading. It doesn't guarantee any test quality. So if I take this method, for example, this is a valid unit test. This test generates coverage. It calls a method. But there is no assertion on the result.
13:21
Which makes this test, for example, generate 80% coverage. Yet the test actually only guarantees that the method doesn't crash. It doesn't tell us whether it returned true, false or anything. And this is the pain point of code coverage. Which brings us to something nice, which Max told me about, which is called the good horse law. So can you maybe explain a bit about that?
13:44
Can I grab your clicker? Can I explain about good horse law? No, sorry. Can't. Just kidding. Okay, so when a metric becomes a target, it ceases to be a good metric. So quick question, has anyone ever written a unit test just to get coverage up rather than because the test was useful?
14:03
Come on, let's be honest here. This is a safe space, come on. Okay. Microphone. Okay, hello everybody, welcome to the live stream. This is our radio announcer voice. Right, so this is something, I'll be honest, I've done this. We now know a lot of people in the room have done this.
14:20
But what we don't want to have is with code coverage. It's supposed to tell us something about our code. But if instead we turn that into a target, that can really limit what we actually, you know, the kind of useful tests that we actually create. And that leads to a few quite big questions that we do genuinely care about. So I'll wait for that photo if you... Cool. Sorry, I'm very audience participation, I'm very sorry. So the next question that we ask there is how do we know if our tests are high quality?
14:43
How do we know if these tests are actually good quality tests? We test them. We test them, great, great answer. I've got a further follow up question for you. How can we understand what our tests are really doing? Same answer if anyone, I see a hand.
15:02
I literally had a code base where I could delete half the tests and nothing changed and they all, yeah. So I may delete half the tests, hello?
15:26
Yes, so just for the live stream, I'll just repeat that because that's a really good point. I won't repeat the swearing, but I do understand and appreciate the emotion behind it. If you end up, you know, shipping some code that does not do what it's supposed to do, you end up with users getting very angry at you.
15:41
And yeah, that's a problem, right? That's going to be an issue. And that is a way of finding out, but I guess the real question we're asking here is how do we know if we can trust our tests? That's really the crux of this problem, right? And so, as it turns out, the very famous Roman poet, Juvenal, actually, in 100 AD, after he'd had a few drinks,
16:00
he was able to summarize this in such a beautiful way and this was something that maybe wasn't appreciated at the time because, you know, obviously he was talking about mutation testing 2000 years before it was relevant, but I will mention it here, it's who watches the watches, right? And this is the question, who's testing our tests, who cares about that? How do we actually gain trustworthiness for our tests? And I see there's people in production who's having bugs, there's people who understand here that this is a really big deal.
16:26
Luckily, we have a two-word answer for you, which is the reason we're all in this room. Mutation testing. So, spot the odd one out. You might see here, that's Henry. He's having a great time, but maybe he shouldn't be stood on a row of pigeons.
16:42
But more importantly right now, I'll just explain the basic premise and then Paco here will explain in a little bit more detail how it's actually kind of done. So first of all, mutation testing, this is a really quick summary. What you do is you introduce some faults in your code, so just a few little things that you change, and for each of those little changes, that's a mutant version of your code. Once you've got that, you run your test suite against those mutant versions of your code,
17:03
and if they fail, awesome, because that means that your tests have actually picked up that change, and that's a good thing, right? That's good, we want those tests to fail if our code changes, right? But if they don't fail, that's a bad time, because that means those tests didn't test that change.
17:22
It didn't test for that, and so that's something that could have made it to production. So what mutation testing kind of gives you is a way to evaluate that test quality. But this is very abstract, so let's look at penguins. I like penguins. So, Henry here, he's a great example, and he's going to bring all this home. So, I was kind of unfamiliar to the topic, so I kind of created some analogies with penguins that really helped me,
17:42
so I'll share those with you. So, the way I kind of imagine my software is we do lots of stuff with messaging, and so I imagine software that works properly to be like a pigeon or a dove, like a bird that can fly. I've used a dove here because Paco has a deadly fear of pigeons, he's terrified of them. Not fear, vendetta. He has a personal vendetta against pigeons, sorry. He doesn't like them, so I've used a dove here.
18:02
But ideally, we want something that I can tie a message to the bird's leg, and it can go and deal with that message for me, right? So it can go do something like that. So, one of the key features of penguins is that they're not very good at flying, right? Can we all agree that that's probably not the best? If you want to tie a message to a bird's leg and get it to deliver it,
18:21
a penguin might not be the bird you choose unless you may be delivering something underwater. So, this is the kind of example here where we've got a bird, but it's not the kind of thing that performs the way we expect it to, and this would cause some serious problems if we tried to use this kind of thing in production. If we wanted to send a message via a penguin, we're going to have a tough time, right? So, Paco, I'd like you, if possible, to explain this in a way that makes more sense than what I just did.
18:45
Good luck. We only have one mic. It's a bit specific. Yeah, so let's get into the process of mutation testing. The first step of mutation testing, so what Max just taught you is about introducing faults. So, you can introduce faults manually, but this is a process that's, well, manually,
19:01
and that means it's a lot of work, and it's usually also not that reproducible. You don't want to do it manually. We want to do this in an automated manner. This is where mutation testing comes in. In the first step of mutation testing, we're going to generate mutants, and each mutant is just a very tiny version of the production code. Mutation testing works with the concept of mutators,
19:20
and mutators are the ones that are making these very small changes. So, what do we have? In this case, we have a perfectly fine DOF, which is the production code, and then at the end of it, we have a mutator which generates, makes a tiny change, which kind of transforms this into Henry, our penguin who can't fly, and we want our software to fly, so this would be a bad thing.
19:43
So, how does it look? Because this is still a bit abstract. I'm going to give you some examples. This would be an example here. So, for the Dutch, and I think for other countries as well, you have to be 70 years or older to apply for a driving license. This could be code that's in your code base, which will fly, which is good.
20:02
Now, the mutant would be, the entire code base stays the same, and just this little piece changed. So, here we inverted the logic. This is, of course, a bug. This is something we don't want to manage and get into production. And actually, just from this single line, we can already generate quite some mutants, because we can not only invert the conditional operator,
20:23
we can also change the conditional boundaries. So, this means that we now have age larger than 17, which is a very nice bug that would force us to test the edge cases, the famous off-by-one errors, whether we forgot our equal operation in our conditional check. This will help you find that one.
20:42
But it can also just return always true and false. So, we can generate quite some mutants for this, and we can do the same for, for example, mathematical operations. We can make each plus into a minus, each multiplication into a division, et cetera. And there are more. We also have the ability to remove statements. So, in this case, we have a method that adds a published date to some object.
21:04
And we can also just remove the whole setter. And now, this means that we have a bug in which we don't set this attribute anymore, which is something that, of course, we don't want to make to production. What's important to note here is that with mutage testing, it's always important that the code actually compiles, because we're not testing the compiler.
21:22
We're testing the code. The compiler is definitely out of scope here. Now, at the end of step one, we have a lot of Henrys. We have a lot of mutants. And now, Henry is going to try to fly. So, he already got his wings ready to try to fly.
21:40
And now, for each Henry, we're going to run the test suite. And if this test suite fails, as Max already mentioned, then it's good, because then we expose Henry for what he is, which is just a penguin, something that can't fly. So, this is great. The not-so-happy scenario is where the test passed, which means that Henry made it into production. And as we know, well, assuming that it also got through the PR,
22:02
of course, we have more than just tests. Is that a problem? Because Henry is not supposed to fly, and now we have a bug into production. So, this is something that you don't want. So, this is the theory of mutation testing. And now, Max, you can tell a bit more about the frameworks.
22:24
Alrighty. So, first of all, I just want to say, I'm so proud of this prompt. I don't know why Dali chose this, but I'm really happy. I think I typed in penguin trying to be a pigeon, and it came up with this, and I'm very happy. Okay. So, moving on. Yeah, frameworks. So, this is going to get a little bit more specific
22:41
to actually implementing this stuff. So, anyone here was a Python developer? Heck yeah. Alright. Awesome. So, I'm going to show you what I did in Python. So, as you can see, Paco is a Java developer. He'll explain Java in a sec. But I'll just show you the kind of basic concepts, but using my code and using what I did.
23:01
So, there's two kind of main supported packages that you can use in Python. It's not like, you know, in Java, there's like an enterprise thing you can get. In Python, it's very community supported. So, you're not going to get big products, but what we do have are these kind of like nice and supported repos for mutation testing, which have just these packages. So, I am not a professional in this.
23:21
I'm not a doctor. I'm not a lawyer. I'm not a professional financial advisor. I'm just a person who has a certain opinion. And so, my opinion of those two frameworks I showed you, there's mutmut and cosmic ray, and personally, I prefer mutmut, because it's easier to get going. Ooh, ooh, angry, angry face, shaking hairs. You don't like mutmut. We will talk later.
23:46
So, if we have time, we'll have a third presenter very shortly. So, for now, while I've still got the mic, while I'm still here, we'll talk about mutmut. And so, this framework is quite simple to use.
24:00
You know, the reason I kind of like it is because it's very much you install it and you run it. You know, there's a bit of config you can do, but really, it's quite simple just to get an idea of your code base and what's going on. So, I want to show you this slide. This is just, this is the SDK that I maintain, and I'm showing you this because it's what I've applied my mutation testing to, so it's where I'm showing my examples. But basically, what we do is,
24:21
when we go here, I had this locally first of all. So, I've installed mutmut with pip install, it's that simple, it's a Python package, it's what we do. If you went to my talk a mile earlier, you know why that's a bad idea, but I did it. So, after we do that, we've got mutmut run, which just runs those tests for you. So, when we do that, I'll show you what my output was.
24:41
So, when I run this myself, I actually got a whole lot of this output, but really what's important here is that, first of all, it ran my entire test suite. And the reason it ran my entire test suite is just to check how long that's supposed to take, and just to make sure everything does work as expected, because there's various types of mutants to do with timeouts as well that we might wanna consider. After it's done that, what it'll do is,
25:01
it will generate mutants based on lines of code in my code base, right, that's what it will do. And once it's done that, it will run my tests against those. So, there's a few different types, and it can characterize them like this. So, the first type is mutants that we've caught, not killed. We never kill a penguin, we love penguins. We catch them, we've caught them, and we put them back into the zoo. In this case, we've managed to say,
25:23
yep, our test failed, that's great. But, it could be the case where the mutant's timed out, so it's taken way too long for this code to run, or it's taken enough time that we feel like we're not so feeling great about that code. Alternatively, we might end up in a situation where the mutant survived and made it through our test code. In that case, it corresponds to a bug
25:41
that might make it to production. So, when I ran this on my particular SDK, what I saw was that, we checked this stuff, I created 682 mutants versions of my code with changes in them, and of that, it managed to catch 512 of those, but it managed to miss 170 of them. Now, if that's a good number or a bad number,
26:01
we'll talk about later, but what's important now is, let's just look at some of those mutants. So, first of all, the ones that we actually did catch, here's a couple of examples. So, here's a line where basically we say, here are some valid message channels, so for our messages API, here are some valid message ways you can send, right? What's important here is that this basically removed the ability to send an SMS.
26:22
And so, when I tried to test that, it failed, which is what we want to see. Here's another one, again, this is Python, so if you're a Java dev, don't worry, we'll look after you soon. And here's another one, we've got a decorator here, which basically runs this method, and we can see when we remove that,
26:40
that will never happen, this is actually through pydantic, if anyone has used that before, but basically it means that we're not going to round a number anymore, and so when we test for that, the number doesn't get rounded, and we catch that. But that is not really very interesting, like, that's not, that doesn't tell us anything, that tells us about this much, right? It doesn't tell us much at all. And the reason for that is that we kind of know that our tests work,
27:01
we kind of know that our tests work for that, thank you very much, I'll do the M&M thing, so we kind of know that our tests work for that. And so, what's kind of useful is to see, if we do much show, we can see the mutants that we didn't catch. We can also do much more HTML, which shows us, essentially, an HTML coverage output as well,
27:20
so we can see in a list, all of the mutants that we didn't catch. So with much show, on that code base that I just showed you, we can see the 170 mutants that survived, it shows you the indices of these, and then we can manually specify the ones we want to look at. So here we can see, for example, that we changed the authentication method to fail, and we can see in this case we caught that,
27:41
because we did a test for authentication, and it failed, so that's great. But more importantly though, is you get this HTML output, which you can then explore, you can explore every method, every sort of module that you have, you can explore all the methods inside of there, and which ones were and were caught, and you do that with the HTML command.
28:00
So to do that, I'll just show you, this is a mutant that we did not catch, and I want to show you why we didn't catch it, and what it's going to do, and I'll just do that for a few, just so you get some context, if that's cool. So first of all, what this mutant did, was it renamed the logger. Now, I think logging is out of scope of my test code, so personally, I don't care too much about anything related to logging, so I don't mind if I don't get a pass here.
28:23
Here's another one. In this case, what we do, is we've slightly changed the value of a constant. This is just part of a function signature, and again, we don't care about this that much. This isn't something that I really mind about. What's more important though, is this mutant here, because this is from our client class, where we instantiate all of our different API classes,
28:43
and you can see, we actually set voice to none, so we completely remove that instantiation, and our tests are still passing. Right? So the reason that actually still works, our test code still, our code base still works, even though this isn't testing that case, is because our tests actually, they test the voice API separately, they call it manually,
29:01
but if our clients are calling it like this, maybe we should have a test for this as well, right? So this tells me, hey, maybe my test suite does need to be expanded, right? Does that make sense? I'm seeing some very, very like, yeah, yeah, that makes sense. I like it. Awesome. Okay, so if you are a Python dev, this isn't the end of the talk by the way, this is, we've got some more context, and we'll show you about CI, but if you are interested, then feel free to scan this.
29:21
You've got like four seconds before I move slides, and as I move slides, I'll very, in slow motion, I'll be passing over this microphone, because this was just Python, of course, and I think there are more non-Python devs here, just not Python. Boo, boo. See, much better. We of course have more frameworks.
29:40
I think there are more languages out there, but I think they're the most important ones that I like personally, and pretty much the only really good one for Java is PyTest, and we also have Striker, and Striker is one that supports quite some languages. It supports JavaScript, C Sharp, and Scala. Of course, it doesn't do this in 1.2. Each one has their own dependencies, because you can't have one solution for all,
30:04
but what you particularly like about it is that it supports JavaScript, and this brings this kind of back-end heavy tool. Testing is usually mostly, I think in front-end, and that can use some love when it comes to testing often. This also brings the testing frameworks and the testing quality more to the front-end,
30:21
so that's what I really like. We wanted to discuss a bit more, Mike's already introduced it, so what is a good mutation score? We had the Goodhart's Law, where we saw that code coverage can also lead to people implementing tests just to improve coverage, not just to defeat the purpose,
30:42
you're doing it just for the metric, not for the actual purpose. How does this work with mutation score? First, here's a picture of how a pytest report looks, so not to bash on Python, but much prettier and much clearer, because now, particularly what is interesting about this one,
31:00
it shows you both the line coverage and the mutation coverage. We can ignore the test strain, and this shows us the sweet spots in a report, because at the end, we have generated a lot of mutants, we have a lot of classes, and we only have very little time. So where are we going to look and investigate this report to see where the strains are? The one that's the least interesting here is the notification service.
31:20
The notification service also doesn't have any coverage, and if there's no coverage, then the mutants are also not interesting, because you have a bigger problem here, which is you don't have tests at all for this. Then you have a choice, you have the proposal service and proposed service 2. Now, the fact that they are named equally is because they're from another example, but proposed service 2 is the one that has 100% coverage, and yet it didn't kill a single mutant,
31:42
and this is the sweet spot, because this means that we have code that is well tested, or at least there's tests that are covering this piece of code, but there's no single bug that was caught. So this deserves some attention, because it means that we didn't fully test this. So these are the hotspots where you open a report, the ones with high line coverage and low mutation coverage,
32:01
those are the ones you really want to go through. Those are the ones that give you the findings to go through a team and say, hey, see, we need mutation testing, because here, just these two classes alone already, it showed me that we need to improve our quality. Now back to the score. So the example we had, we managed to kill 512 out of 682 mutants,
32:24
which is about a 75% score. Now the question is, is this a good score? Is this a good score? Yes, yes, the golden answer, it depends. I love that answer. We already saw that 100% doesn't make sense.
32:41
Things like logging, and then more, things like generated code, et cetera, things that you don't necessarily want to test, even though there are mutase generators for it. Now there are a couple things you can, of course, do. You can also, depending on the language and the framework you use, you can tweak the mutation testing framework quite a bit. For example, the PyTest version actually,
33:01
out of the box, already ignores and doesn't mutate any logging lines, and all the big frameworks are known to the tool. So anything that goes to slf4j, it doesn't mutate it. So it also doesn't appear in your report, which is quite nice. And you can easily add things, like if you have a custom metrics facade somewhere, also typically something you don't want to cover
33:20
in your unit test, you can add that as well. So the thing here is that mutase testing is not really a score you want to achieve. It's more that the report can be interesting to look at and gives you sort of the nice spots. And once you completely set it up nice and you're familiar with the report, you can maybe start looking at the score. But definitely it shouldn't become an 80% goal or something like it was with code coverage.
33:42
It's just there, goes through the report instead. So now we've sort of discussed all the tools you need. We have discussed the frameworks. We have discussed the technology. And now it's time, of course, for you to fly.
34:02
So we need to, how would you get started on this? And the thing I think that's important here is if you want to start, so you now think, oh, this is a great talk. I want to talk about, I want to start with mutase testing. Depending on the size of your project, it might be wise to just start with just a single package. I've done this on projects that are a couple of,
34:21
say, a thousand lines big. And even though in the next example we had 682 mutants, this can also, depending on the kind of code you have there, easily grow to tens of thousands of mutants, which can be quite slow. It can also be that there's something weird in your code base that doesn't really work well with mutase testing or something that's just extremely slow. An example that I had was that we had,
34:43
so what's good to keep in mind is actually, just to take a side step, the mutase testing framework also measures in the beginning for each individual test which code it covers. So there's a nice graph from code, production code, to the tests. This helps us optimize, because if we want to run the entire test suite,
35:01
all the tests for every single mutant, it's going to take endless. Instead, because we know the coverage, we can also see if we mutate this one line, we know which test is covered, so we only need to execute those few tests. But what if you have tests that actually cover half your code base? For example, one of the things you can do in Java, if you're doing things with Spring,
35:21
is you can actually boot up the entire Spring application and start doing acceptance tests from your unit tests, which is typically also quite, not necessarily the worst thing to do, but you now have a very slow test that does cover half your code base that will be executed for each single mutant. So these are things you want to get rid of. You want to exclude this acceptance test, because otherwise you're going to be waiting endlessly.
35:43
So my point about starting locally and starting small was start just with one package. Start with the utility package to see if it works, see if the report works for you, and then from there you can expand, add more packages, and also you can see, oh, now it's taking ten times as long. Why is this? And you can find the painful packages there.
36:02
So as I mentioned, you can exclude some tests, and also there are often candidates, certain pieces of code you might want to exclude. For example, there's no use in testing generated code, but also it might be that you have certain domain packages that contain just all your domain objects, your POJOs, which is just setters and getters,
36:21
something that you also typically want to exclude to your coverage report. You might also want to exclude this from code mutation, from mutation testing. And now that's done, so we talked about running it on your machine. We also can do this in the cloud, of course. Thank you.
36:42
So as you can see, there's a pigeon on the slide, and Paco, as we've said, has a personal vendetta, so I've taken over this section. So here we can see that we're going to run off our machine. So why would you want to run off your machine rather than on your machine? Any questions? Any ideas? Yes.
37:01
So what happens in the background is what was said there. Any other reason you might want to run non-locally? No. I've got a couple. Oh, other hand. CI. CI. Yeah, you might want to run your CI system. In fact, that's what we'll be showing you. So, foreshadowing. I like it. So yeah, it takes some time. And if you're using a CI system, you get to use those cloud resources.
37:21
And also what's important is that you can, if you've got code which is maybe dependent on different OSes, might behave differently, you can specify different versions and platforms to run on as well. So, stop talking, I hear you cry. Well, I'm afraid this is what we're here for, but unfortunately, I will be keeping talking, but what I will do is show it a bit of an example. So I applied this to my code base,
37:42
my own code base myself, into my CI system. So you can see here, this is GitHub actions. And I've got a piece of YAML, essentially. I've got this mutationtest.yaml file. And what that does is sets up an action for me to use. So this is something that I manually run. And I can do this here. So I manually run that. What it will do is do the mutationtest non-locally,
38:02
and it will produce some HTML output for me to look at. Now, that seems, you know, I'll go a little bit into what that YAML does, but it seems like something that should be able for everyone to do themselves if they want to. So GitHub actions, the reason I show that partly is because what we use, but also, you know, it's free for open source projects. So, you know, it's been useful for me because I've not had to pay for it.
38:21
So, you know, just a heads up. So yeah, I'll be showing you this with GitHub actions really quickly. And I'll show you the YAML, I'll show you what I did. Hopefully by the end of this next couple of slides, you will see how easy it is actually to do this and why actually this is all good and maybe you want to try this yourself when you get home. So here's some YAML. First of all, this is our mutationtest YAML.
38:41
It's got one job. It's pretty simple. All we're doing, we're running on Ubuntu, running one specific Python version to do this. Depending on what your test base is, oh, they run a great time in there. Or there is thunder. So basically we have, yeah, we're testing on one version for me because my code doesn't vary enough between versions and OSes, so for me it's not relevant to do that.
39:02
But if we look at this next slide, I'll actually show you the workflow that goes through when I actually run this action. So first of all we check out the code. Then we set up a version of Python with it. Once we've done that, we actually install our dependencies, including now mutmut as well as our regular dependencies. So now we've got the new mutation testing framework installed here as well
39:21
on this kind of test runner. Then what we do is we run a mutation test. So we do that with mutmut run, but because we're running in a CI system, we don't want insanely long logs, and due to how it's outputted, we want a no progress flag there just to show that we're not seeing every line of output, we just see the important parts. We also have the CI flag, which is one of my only contributions to actual open source,
39:40
but I added that and I'm kind of proud of myself. So that basically means that you get a good, sensible output return code when you run in a CI system because the default for mutmut is depending on the type of mutants that were caught, it will give you a different exit code that is non-zero. So you kind of need to consider that or to suppress that with some scary, scary bash.
40:01
That's what I did at first. That's why I wrote the flag. So once we've done that, we save it as HTML and we upload it so that you can access that yourself as well. So that's it. That's the whole piece of YAML. It's 35 lines, and that set up the entire mutation test for my suite. So you can see hopefully, does this seem kind of easy? I think it seems pretty gentle to do, at least in this sort of scope. If you're a Java dev with a 20,000 line project,
40:22
you might want to be a bit more careful, but if you've got like a Python hobby thing, try it out, right? Try it out. What I would say, there are some more concerns. So first of all, I chose to run this manually when I want to run it. I chose not to run this on push or PR. I chose to run this manually. And the reason for that is that I don't expect my code base to sufficiently
40:41
change between like small commits. And what I want to do is really not use like mutation test as a, you know, that kind of score, that 75%. I don't want that to be a metric for me that I've just turned into a target. I want it to stay as just a good idea or an indicator of what my tests are doing and what I could be doing better. So for me, I don't want to run every time, partly because it takes a blooming long time,
41:00
especially if I'm using multiple versions, which we also have to factor in. So you might want to do that. So I didn't. I just ran on Ubuntu and that was fine for me. But yeah, it depends on what your code is. You might want to run on different platforms, right? So do factor that in. And that will help you a lot if you're in a CI system. So the other question there is should we run on push or PR? My opinion is no. I think there'll be people in this room
41:20
who disagree with me. Maybe say on a PR you should run that or maybe there's some kind of metric you want to associate with score that you then want to look at in some way. For me, that's not how I use mutation testing. And I think what I want to get out of this is we don't want a situation where mutation testing becomes a new target. We've got to get a certain score because then we're just kind of abstracting that problem of code coverage targets. We're just doing that all over again, right?
41:41
So we're trying to avoid that. So the final question here is one I'll ask of Paco to explain. Paco, do you think I should use mutation testing in my role as an audience member right now? What do you reckon? Yes, well, so we're there already. It depends. There are some things you can ask yourself because needed is a question.
42:01
So mutation testing is, of course, definitely not a silver bullet. It's something that the reports take quite some time to go through. And, of course, it's quite computationally expensive to run the process. So the couple of questions that you can ask yourself that are quite obvious are for projects
42:21
which have a really high quality goal. When people die or when a lot of money is lost or a combination of those two. So just to check, how many of you are working on a project that fits in these three? Okay, then you need this yesterday. Yes. But for the rest of the room, including me,
42:41
there are some other questions we can ask ourselves. And I think one of the important ones is are you using code coverage? Because if you're not using code coverage, let's start with that and let's first get coverage and get to see how many tests you have. Then the next question is, is how much value do you put into this, how much value do you get out of this code coverage? And what I mean with that is,
43:02
do you make decisions based on it? Is it like a definition of done or your sprint or is it, will the bill fail if there's 80% coverage? Or also in the case of due diligence, you're selling a company, not something we also would do. But then you would also want to know how well is the software I'm buying or how well is the software I'm working on.
43:22
So here I would say, if you're using code coverage and you're making decisions based on that code coverage, then yes, you should at least have a look at mutation testing to see what the state is. It doesn't have to do this, you don't have to do this always. You don't have to put it in CI. Just once a year or go home, run it on your computer once, just to see what the current state of your team is. Because it can very well be
43:41
that you're on a high performing team which already has their PRs and everything so well and set up that it's not worth the time maybe. Because apparently the mutation testing report might even confirm that, the fact that you killed all the mutants, so that would be great. And there's another question that I like, and it's what's the cost of fixing a bug? And I have two stories for this.
44:03
My first example is, and this is the first company I worked for, this was an enterprise company that built software that was running on-premise at the customer and the customer was often governments. And then you're in the line with all these big integrators, which means you have feature freezes and moments where you can actually go to the customer
44:22
and deploy your software. Which is quite expensive, which also means that if you get a bug after this feature freeze or after this upgrade window, you have a serious issue because you need to go to the customer, you need to explain what went wrong. It's a very costly thing, a very costly issue. So here definitely, again mutation testing can be quite interesting
44:41
because a lot of money can be involved with the reputation. The other example that I had was more of a greenfield project, which had more of the startup vibes, where it was really a fail fast and fix fast mentality. So this was a project where rather than focusing on getting our quality monitoring up to speed,
45:00
we were mostly focusing on making sure that we could very quickly fix bugs as well. It was of course running on-premises in the cloud so we could control it. And the most important goal was there to just click a button and be in production again in 10 minutes and have active monitoring to see if anything goes wrong. Here the cost of fixing a bug is already a lot lower,
45:20
which means that the reason to consider it might be a bit less, especially if you're again, for example, on a high-performing team, which are all working to each other, you know what you're doing, and you know you can trust each other because you're all professionals. Then maybe it's not worth to also spend half a day going through a mutation testing report if you already know what the outcome
45:40
is probably going to be. Again, still do it once. These are two things you could consider in when to use it. Those are the things I want to give away with you. Don't go into it blindly. Just ask yourself, should I really use it? And then for the last part. For the last part, I'd just like to sum up. I think hopefully, if we've gone here,
46:01
we've kind of shown you what mutation testing is, why you might want to consider using it, and how you could possibly get going starting with running that, and also why you should. If we're here, I just want to summarize. First of all, I'm sorry I used this penguin as an evil penguin earlier. It is adorable. I just like that Dali, when I asked it to give it some fake wings, it gave it three. It gave it this extra flipper here.
46:20
I'm not sure what that was for. What I'd like to do is just quickly summarize what we've talked about today. First of all, mutation testing is a way to test your tests. It helps you to beat the problem where you're using Goodhart's Law for coverage. It saves you from just trying to turn coverage into a metric that you then have as a target. You don't want to have code coverage.
46:42
It's got to be above this threshold or we don't merge. That's not where we want to be. What we want to do is write good tests. If you are going to do this yourself, an important part is to start small. Start locally on your machine. If you've got a big code base, then what you need to do really is run on a subset of that code base.
47:00
If you've got a smaller code base like me, you're probably okay. Either way, start locally on your machine. You also want to run, if you can, if you want asynchronous reports, if you want to use the resources available on a CI system, you can run mutation testing there. So do consider that if your stuff is in CI. Finally, I just want to say that mutants, hopefully we've demonstrated,
47:21
that mutants are like adorable penguins. They're valuable and they are wonderful. They're really great to use. They can tell you so much about your code. They're extremely useful. So don't fear them because you should love them. Thank you very much.
47:44
If there are any questions, comments, objections, love mail, hate mail, anything, shout at me.
48:02
So the question there was just, if we can give some more examples of the kind of range of things that are possible to mutate. So essentially, the short answer is anything that will still make the code run. So in the Java case, the code compiled. In my case, the code run. So in this situation, things like, I'll give you some Python examples.
48:20
For example, changing a variable from a certain type to another. So you might typecast something. You might, with a mathematical expression, you might add extra terms to that expression. You might change return types, error types. You might set things to none at any given time. You might call something and you have removed parts of it. Set things to zero. There's other stuff. Paco, can you think of any mutation testing
48:40
Java examples? Yeah, so I think that the examples you gave are there. So it depends on the mutators you use. So after you can also, of each framework, you can also go through a list of mutators to see what kind of mutators are out there. What's good to keep in mind is that it does use some basic regimental strategies to determine if it can be mutated. Because for example, if you have a stream and in this stream,
49:01
you do some operations which you could, in theory, cut out, you're still using the return value, which means that the mutation testing framework thinks, okay, let's keep that intact. The same goes for if you're using the spring reactor framework. You could do lots and lots of smart mutations in there, but it's not really there yet. It's really the rudimental things, the conditional logic,
49:21
the mathematical logic, I think are the two main things you'll see. And actually also, account for often the most typical programming errors, I would say. Awesome. I mean, anything you'd like to mutate, you know, because I guess a lot of these things are open source, you know? Anything that you might be good if it did exist? Any ideas?
49:42
Question? Answer?
50:05
Okay. Okay. So the question there just for the livestream
50:20
was two things. One is, are there any mutation testing frameworks for C or C++? I will say personally, I don't know. I haven't used C++ since my physics degree, so I couldn't tell you. I don't know if you know anything about that, Paco. Paco's... I just did a quick Google search. That's all. So I see there are some frames available for...
50:47
FAQAS. So based on the search, I see there is something for C and also for a bit more for C++. Regarding your other question, by the way, so should you do it
51:01
as a git hook? And given that it's... Right? That was the question? Yeah, the idea of testing the code which was modified in this one.
51:20
Yeah. So actually, there are some, depending on the framework, some have features which are incremental reports. So where they can just store the last state, then you can do a diff and use the results from your last execution to not execute all mutas and not generate all mutas because it knows I only changed these production lines, so I only need to generate mutas for these, and I only changed
51:40
these tests, so I only need to rerun the tests for these mutans, which can tremendously speed it up. But still, using it as a git hook, I'm not sure. You can, by the way, use the same logic in CI as well to use the incremental reporting that saves a bit because Python also supports that thing. Yeah, so with what you have, you have caching, so you can cache those tests
52:00
that you've done already, and if those cases aren't touched, then you're sort of good if that, you know, the changes to your code don't affect that. So that is an option. I would say, yeah, I... My opinion is again that maybe you don't want to explicitly, you know, mandate this on every run, and the reason for that is it's kind of like it can then become kind of a metric that you try and optimize for
52:21
or something to look at, whereas really I think the nice way to use it is every now and then is how I would say so. I think if you've got a super critical project where, you know, that's really important, you may want to run it like that. For me, I don't need to, but I think that's really up to you, up to you as an implementer what you want to do, and I think there's definitely a use case to do it in that way if that was important to you.
52:41
Hand over here. Hello. Yes. Short answer is yes. Long answer is, depending on the actual framework, it might be that you add a comment to ignore it. Alternatively, there is a config file
53:00
set up as well in Python where you can say only mutate these paths, only do these things. So what language do you use? TypeScript. Cool. That's a... Striker? That would be Striker. I think you... Yeah, I would say yes. I haven't looked that much into Striker, but I think they make quite some nice stuff. It's quite generic for all frameworks.
53:20
Exclude code for mutation, definitely yes. Yeah, and depending on the framework, some even have nice things like exclude, do not mutate any calls to these classes, which is interesting for the logging, for example. Do not mutate any calls to this logging class, but same you can do for packages, class paths, et cetera. Yeah, I'd say with Striker as well. So one of my colleagues uses Striker because he maintains our .NET SDK,
53:42
and he's actually also got mutation testing there in Striker, and yet it does seem very performant. It seems like it does have a lot of those features as well. So honestly, if you're interested in TypeScript, I think there is something for you there. Cool. Yeah, I think it's maybe free on open source repos. Sorry, another question? Yeah, are specific mutators reproducible for if you have a run
54:02
and you see a seed or something? Well, so there's actually not a lot of... Oh yeah, yeah. So the question is, how reproducible are the mutants? So if you find one and next run, is it still there? So as far as I know,
54:21
there shouldn't be any randomness in these mutant generations. It just goes over the code. Any condition that it finds that it can mutate, it will mutate. So the next time you run it, the same mutant should be there at the same place. So you could also see whether you killed it the next time. So yes, it's reproducible. A hand there. I think this person was first, sorry.
55:09
Oh, that was a good question. I'll repeat that one. That's a good one. So the question there was, so mutation testing, we've talked all the big game. We've come up here and been like, hey, look, this is important, right? That's what we've talked about.
55:20
And the question, which is a very valid question, is hey, if it's so important, why is no one supporting this in Python? Why is this all open source stuff, right? And you know what? I agree. That's a really good question. It's one I asked as well, to be honest. So no, I totally support the question. And the question, I'll properly say, is yeah, why aren't employers supporting this? The short answer, I think, is to do with ROI, unfortunately.
55:40
And that sucks, honestly, because I would like us to invest more time in certain things. And I think it's just to do with company priorities, right? So I would like to spend more time. Honestly, I had quite a lot of fun adding the one feature I did get to add. I'd quite like to do some more. But it's, again, I've got this API to implement, so do I have time? Well, no one's funding me to do it. So unfortunately, it really is like,
56:01
unless there's an obvious ROI, this just seems to be the way things go. Unfortunately, that's the way we've kind of structured our platforms and so on. So I gave a talk earlier on PyPi and malware, and there was actually the reason that that kind of is so prevalent and so possible on PyPi is because PyPi haven't really implemented many ways to actually protect against
56:21
malware being uploaded. So currently, I've uploaded some malware to PyPi that you can get yourself. And actually, the reason that they, it's not real malware, to be clear, it's a rickroll, but there are a few, you saw that. But basically, what I'm trying to say here is that that project kind of didn't really get off the ground in terms of protecting users, just because I think originally Facebook were funding it and they stopped funding
56:40
and that just didn't then continue. So unfortunately, yeah, this is just kind of the way that things are in open source right now, and yeah, I do feel your pain, I do understand, but that's all I can really say, I'm afraid. To quickly add to this, by the way, Striker, for example, is actually funded, is backed by a company who, for example, let's work, interns work on it as well. So some frameworks actually are backed
57:01
and there are people already investing in it. So it's not always that, but sorry, Neet, let's go to that side. Reports from the results of the written test. We all know all managers and credit teams love their KPIs. So I'm wondering, is there any integration or guidance to export the mutation test results in SonarCloud,
57:21
SonarCloud, or other platforms? That's a really good question. So I'll answer quickly for Python and then I'll pass it over because in Python the answer is quite short. The answer is unfortunately no. So the maintainers, the maintainer is not really a big fan of the CI system stuff and the report stuff, I think. I think the premise there is, you know, I like running this locally and you know what that's fair and that is really how
57:41
you can get started and get an idea. So in Python, unfortunately, the answer is no. But I think that Parker might have a more positive answer for you. Yeah, so let's also ask if you were the maintainer of the other framework. So how does it go for the other Python framework? So what was the question? Okay, so I talked about not having that facility, that feature. In a cosmic way, is that?
58:01
Well, not really. I don't want to say names but there is a very, very large 450 maybe vendor that uses it and we asked them, can you fund development, and they said, you know, no. And yeah, they have shown this
58:20
around at large events like in front of thousands and thousands of people. But yeah, they're like, okay, you know, we keep all the dangerous stuff or whatever we find as it is. Yeah, so for the Python frameworks,
58:45
there's not really CI plugin support. I do know that, for example, for Pytest, there is support for Jenkins and Sonar and I'm not sure about Striker but I know it's there and usually these things are relatively easy to build yourselves here as well because all you have to do is if there is a report in some
59:01
JSON file, you can quite easily parse it and make a nice HTML form about this because again, they're all open for contributions. Do we have time for one last? I want to just add to that a little bit. Okay. Really quickly, first of all, with your question, yeah, when I originally implemented my mutmut thing, I did do it on PR and in that case, I got an action
59:21
that would comment my coverage in a nice metric-y way and so you can, it's quite simple to do. So about cosmic query, first of all, that sucks and I'm sorry. That's blooming awful. Yeah, sadly, it does seem that a lot of what we've been discussing on the side of the room is just like, man, it would be good if some, you know, we all agree this is important and it's useful for a lot of things. It'd be great
59:40
if someone funded it. So I think, unfortunately with Python, that is the state of play and it does suck but yes, I get you. Any other questions? Fine, I think. Yes, hello. That's a really good question. So, sorry?
01:00:01
I will now repeat your really good question, the question was, if I have a certain type of mutant that I want to make, can I do that? So I would say with the stuff that I used in Python, the answer is you need to actually use the version you've downloaded, edit it yourself and add that stuff, so sadly there's not an easy customisable way, that would be an awesome enhancement though
01:00:23
that I would like to see, that would be cool. In other platforms, Paco, any others? I do know that I think Python did have some extension points, so it really depends. I know that the company I work for currently called Picnic, they're also working on extending it for example for reactive code, so there are some extension points often.
01:00:42
So in short it depends on the framework and how easy it is. Are we done? Okay, we're at time. Thank you so much, this has been a really nice discussion as well, so thank you for sharing with us.