Add to Watchlist

Testing with two failure seeking missiles: fuzzing and property based testing


Citation of segment
Embed Code
Purchasing a DVD Cite video

Formal Metadata

Title Testing with two failure seeking missiles: fuzzing and property based testing
Title of Series EuroPython 2015
Part Number 152
Number of Parts 173
Author Viner, Tom
License CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
DOI 10.5446/20214
Publisher EuroPython
Release Date 2015
Language English
Production Place Bilbao, Euskadi, Spain

Content Metadata

Subject Area Computer Science
Abstract Tom Viner - Testing with two failure seeking missiles: fuzzing and property based testing Testing with purely random data on it's own doesn't get you very far. But two approaches that have been around for a while have new libraries that help you generate random input, that homes in on failing testcases. First **[Hypothesis]**, a Python implementation and update of the Haskell library QuickCheck. Known as property based testing, you specify a property of your code that must hold, and Hypothesis does its best to find a counterexample. It then shrinks this to find the minimal input that contradicts your property. Second, **[American fuzzy lop]** (AFL), is a young fuzzing library that's already achieved an impressive trophy case of bug discoveries. Using instrumentation and genetic algorithms, it generates test input that carefully searches out as many code paths as it can find, seeking greater functional coverage and ultimately locating crashes and hangs that no other method has found. I'll be showing how with **[Python-AFL]** we can apply this tool to our Python code.
Keywords EuroPython Conference
EP 2015
EuroPython 2015
OK yesterday today in the tell you about a property-based testing fuzzing I'm calling this tongue-in-cheek failure seeking From here to challenge how you test you let's see where the audience is a starting point uh is what he had a test that had a hands up if you've only used power catered values as example data the constant you Canada for another air handling you some sort of random input get your hands up if you've used properly testing really bad hands and head of the domain of the fuzzy get so yes and there some good things to learn about
so I'm here to ask you I testing your current hauled off the stretching you ask me questions it wasn't expecting by you aggressive interview by yourself who interviewed lost the easy question the she wasn't expecting that question and maybe should ask OK some questions is not expected so what are we trying to achieve with
artists input data possible goes well to start off with you wanna cover the happy path that's the 1 that got me some money and get the job done but that's the absolute minimum you probably want a cover over the cage base you a covered uh say exception handling the validation when the user passes using data that's that you understand not to be valid but what about the unhandled exceptions by definition you know expecting them to how you can think of examples for that data if you knew what they were before recall uh and then not just covering each line is OK you really want cover the independent paths through UK-based this not something like work in so like just also take a moment to think about the data that you pass into the test the you've written this is to help you contemplate so which type of points stupid uh when you're writing your example dates and where wouldn't adversarial approach taking on on artists there would be a good image search the uh so this is not a compression of of the central parts of the input space and maybe the obvious examples uh jain @ example . com uh but maybe your on the testing more difficult examples of scene unique aid areas uh that would have been called this example data that included the united snowman something uh 0 sage passing an empty list and strings as good as a base standard for which case testing so how do we create test data uh we can write hotheaded values like most of us have done we can create purely random data with with no feedback the model money does this just gives you something conveniently random that will get the job done although I finally firing of failure seeking secondly so that I
can say Krycek is a have library the that the just yet so it's been around for a while have had look at head of so this is based testing the specify property of Ukraine that must hold and quick check will quickly check if it can prove wrong but it does best find counterexamples so this is a little bit of say the basic thing to take away here is to list of integers go into a property and a billion comes out that is whether this property holds on all those about reversing list of integers so basically imagine for integers list of 4 integers benefits to fall into fingers and is saying is gonna join them underestimate as opposed to arresting them and joining them the case it's a slightly dubious proposition but we're gonna see if they can prove wrong and the ball I can see 0 1 is the inputs office and shrinking has proved wrong but this isn't URI Haskell medical and so there's no lectern led to much has instead let's find out what the functional language developers think of our worlds this is a direct right in a particle language you have no guarantee the simple function that should just crunching numbers were going to have could special color the potatoes while crunching those numbers fair enough but we want to minimize the hypothesis is a Python version question uh is more object that's features let's delve into the kidnap don't well defined so just as a reminder this is the last version in Python there isn't a function that reverses list of made on uh and in a similar way you can see the act given decorated uh specifies 2 lists of integers which maps to the to
inputs to the test functions uh and then the bottom you can see the property is defined just with the standard set we're running it with apply test run and a little tiny hypothesis tests plugin just helps you see the output so prison and actually comes up with the same counterexample that back rejected and we don't have to think of an example data ourselves concentrated so was going on here how could this be working so maybe is doing a formal proof in the background may be from static analysis uh maybe it just passes symbol into the top of the program looks at all the manipulations ends up with a formula and then solve that formula now that even mathematicians they haven't quite got
there yet but which a really interesting article code will computers redefine the reason mass the what article uh that node then not the area mathematicians still job and the same introduce science it really Minister special of Python so that leaves us trying trancriptome examples Wilson and fuzzing that's what's really happening let's have a look the getting under the covers and if this is the 1st thing that list of integers that hypothesis is sending like the
pretty healthy and it turns out what this news falls on and so this is proved wrong in the 1st here didn't wanna say that because that's kind of ugly and the other thing that creative you should try to put that as a hotheaded example so has again making it simpler as we go down here you and see the 1st list is getting shorter and you 3 now this
worked out the big numbers a more knowing level numbers say today's numbers at the top they're getting shorter just 2 numbers 1 number he fungi and the lexicon it'll get shorter get down getting in the Aston
overshoots get something so simple that when you reverse and image right such a tree that's bad assumption that a simple simple Treisman to list and this is the simplest 1 I could come up with a and the last 1 elderly chasing so that when you copy paste into your terministic your test set as decided run again it won't through the whole business those massive list of integers because it's got like a database of successful examples say that's the kind of many for speed uh enhancement but also fits search really hard and it is a bit of luck but it's found a counterexample it wants to keep on to that because modifying the next time theoretically so what's going on here is generated random arrests in data does not purely random it's random but with a view to breaking it runs it has repeatedly sensory with bring 1 this is not like a standard unit test it just gets run once back at given decorator means that test that she gets from by default 200 times are released until falsifying example has been found the if it finds a counterexample will then try and shrink back the best of his ability is just to give you the cleanest simplest a counterexample to preview of a property rights the let's have a look at that a random test data whether the integers come from so this was a decorative uh this is strategies don't listen strategies to integers and the into the strategy is made up of 2 parts the random geometric in strategy is basically saying give me smallish numbers maybe 0 maybe minus 1 maybe 17 break your pride and along the wide range says well I'm going to give you basically any random number from like anything that you python interpreter can handle massive integers and just when you have made a lot of upset these strategies to relentlessly that's they're relentlessly devious plans to break your program and the
list strategies to say that you pass the elements you want in your list and averages of 25 elements but you can set it to maximum size minimum size is very good sensible defaults but you can override the machine need phase many passengers to that's coming at the the right lower the % does it's the remainder upon division uh and he has suggested to properties that should hold the result should have the same sign as wide and the absolute result should be less than the absolute of why the careless checked this is how we write that so this node list of integers there's just 2 integers and they relate to the x y inputs to the test
function there we go a new function here assume it is a way of giving feedback back to hypothesis and it says a uh at this assumption it proves false in other words if you give me a wide 0 I stop the test is not appropriate don't many more like the kind and say that in that way is guiding itself to be more helpful to the inputs the more likely to be relevant so we calculate the result and and and I had to create same sign function but apart from that is pretty much reads as English all copy-paste from the 28 let's see what the answer was the past because you should know better than doubting Read has Hessinger but I can and will property test trees have there so the data strategies probability distributions uh to pick an element from within the distribution the guy the back of our scene shrinking of counterexamples to be a clear to read and easy to understand why you know why the breaking of property and database of training examples the speech estimating TDD find something wrong with code you broken property he can fixing and outright straight away again until the major class internals of hypothesis a really interesting explain and they use a parameter system is worth having a reader patent documentation about Mr. Cuomo strategy so we've seen integers the flights a bit more complicated if you claim that your function except floating-point numbers it's going to be Matthew sounding things galcians bounded exponential maybe just some integers and I'm expecting that way uh food and then some gnostic closerto as well so you know 0 minus 1 over minus infinity pools infinity now and uh you can assume these away if you don't want them because if you don't matter they will probably break did this in great advanced features of hypothesis it makes it very easy to you take built-in strategies and make you and strategies said table function that accepts a comma-separated list of integers you could map uh a list of integers uh have been joined by a common and then impostor inch because you know you want your test data to be relevant to your text economist who fail at the 1st hurdle consisting random uh so you you might wanna Bojan strategy like that this plug the Django but like little money olfactory boy and the number as well it's pretty tight is illustrated with experimental but state testing uh where you give hypothesis the controls your program and it tries to find a sequence of actions which cause a test failure to the medicines are interesting the meaning of the this look at another failure seeking missiles is getting a lot of attention recently the American for the last and this is a further a 2nd biggest 2nd version the vessel is good money father so I think my cosily mislead the lights rabbits and a fuzzy so is specializes in security and binary formats low-level libraries are essential to everything we do I wouldn't be accessing database image processing encryption so will get on to the Python FL a minute but just for a moment let's think of the sea level so just remind you of fossil is something of data your program attempting to crash so we've kind of moved on from property-based testing is more about crashing UK at the than specifying properties so these are things that you wouldn't wanna leave running for a good while uh maybe on multiple calls uh and speed is very important because the more ground to cover the more likely you are to find it's an interesting and so 1st testing has been around for a couple of decades more uh this involves backwards after the people been using for a good while and have so is kind of a new style with some uh guiding going on but traditional fuzzing is not that this is the important FO might be with you know new coping economist did that Google has been running fuzzing against ffmpeg for a couple of years and found that thousands of blogs literally a thousand commits fixing those boxes not to be sniffed at as a student ffmpeg is a video processing library crime is probably in your local uh the applied that you have on your upon the desktop uh so the strategy was to take a small video examples in a small video files match them together with a mutation algorithms may be splicing together you know we take some bits or bytes here now uh and then admittedly they had 500 and then eventually 2000 call those over a period of 2 years say maybe not just the laptop but they found that they made great progress a lot of memory management buy-outs the foundations speaking to 1 of the ffmpeg developers last nite and I can confirm this not just because written all willful credit is just because this is a quite a hard thing to do uh the video specifications for the 600 pages long and they have to the write very very fast carried as what it is right in Python better look after all the that that they that look after the memory management and it's very easy to noting that now this equation price uh where thought 20 to 30 per cent of the problems found could easily be exploitable so there's 100 to 200 zero-day exploits that they found without causing so something tells me the local security services and all happens you know this would be a good approach they might be going on inside the good guys and find these bugs 1st say a Goals uh it does need to be very fast because it's got a lot of ground to cover the the reliable because if it breaks and overnight it's going to get much done uh I think in the past fuzzes took of configuration parts uh is this 1 tries to be very simple not required much so some partially in manner and it does the things traditional fosters do in terms of taking some sample inputs mutating them but also as a little secret which is it at compile-time instrumentation so this means that uh when you compile you'll see carried at each branchpoint we add a little of that 2 the path taken through the cut and so this level so earlier about code coverage and taking independent path to the it's able to get feedback when where it's test data many travels through the cable any range drop-in replacement so you might be using GCC before it is replaced with the the egg cooperation so is a toy examples that's a really actually just breeding 100 characters from standard and the barber simulating is if the input is then we can blow that is in the toy examples the list is compiled so there's no way you can do this they configure here is just which compiling it and will occur into the program opposing print statements says that 1 2 3 4 1 editable cancer works let's try fuzzing so in among the size the input directory circle oneself sample input which is literally form follows want often just to kind of say is something to get going on tell you what the answer is see if you can get to the answer and is not the a directory where the results will end up and so I think if your laptop might make you probably when he's around this because it's going to millions of rights so you might have you the start working quicker than we thought basis unified this test data into the standard in the program this is the
dashboard yeah with their fellow so drawing attention from so up here we've got take parts and many crashes so found for parts through the base which is this if statements that the strategy you want them this gives you a sample of some of the mutations operations is doing so bit-flip spikelets arithmetic the try some integers we have given a dictionary but show you 1 of those elements we right next to the that so the secret we can recover it
right there and the other thing to show uh based on almost a million rounds in was set to a half minutes so many this of the they can
so within the findings directory but yes you get a Q of interesting inputs that hit it is found take a different path through the code base so it's got to be that all I mentioned a few a sample input is found that often manipulating that it found on the start of the next so it's clearly trying thousand thousand examples but what happened to find 1 southern and it took a new code part so recorded that further re-used uh similarly if further so it's kind of using the it's kind stepping up making further through the cave base each time and in the crash directory uh it's found an example input the crashes occur so this is exactly
what we re looking for a list just have a look at that that's the crash to have a kind of long filenames but it it recalls this is words recorded what happens told you Signal 6 in single we did people that's expected it tells you that is based on the 3rd item in the queue it's done so that they arithmetic in other words which replaced that why would you now with makes motion log so you can kind of see how it's working you can see how as manipulating previous and it Table 2 stand on the shoulders of what it achieves a get 1 step further so in the last year it had very very interesting impressive trophy case here this is about the this by the way uh so this security libraries uh image libraries SQL libraries you name it almost uh but generally focusing on uh libraries that have can be random input but also the go back there yet to give its more help when you're doing uh of nonbinary input because if it just fires in a random characters at secure like is that they get free from so let's have a look at a specific example of so it's saying as pure light is a very very well respected library intensive testing it has already been faster traditional fossil so you might think there's not much you like hanging free the approach taken was to start with a dictionary of SQL keywords the did she put these kind of 1 per line in the file uh they correct out some hand written test cases from the SQL like test phase and they found 22 crushing test cases this is 1 of the simple ones ends up uh arriving in a function with ultimate not initialized or something like that was argument where is expecting list of 1 or more things so these vital fixed so you how to do this this just I've seen a review it is a trick right traditional father and you can use it without instrumentation it was searched for inputs that span different and he's a genetic algorithms to match together the examples is seen so far as well as just mutating these examples 1 time but you can imagine it's searching in the input space uh and it's got some help Wisconsin guiding by uh the instrumentation body is gonna be a slave process because the input space can be it certainly can't just go abcd and do not exhaustive list of take forever so let's have a look at forcing C Python is worth saying that obviously the units of Python are written in C uh this is a different proposition to fuzzing Python code which would see in a minute so you can download point source material that you can compile it it's uh very similarly to as the Python don't tell you but using the AFL uh lion compiler and you can start forcing it so good some sample input at some target program so they are not the types expert uh so explained that line them underlined the need that that's that's connects things out and the military's passing the standard in 2 adjacent Leonard the string is a follow and and we'll catching Python exceptions but we're looking for exceptions that happen in the sea carriage so are in this overnight and they have 8 but it finding bugs yet you that this should be so surprised to be that easy if it was I think that it did wrong 121 thousand times it is a lot slower than just uh you know running the example earlier because it's letting how Python interpreter accepted but this this tools within fell to make it faster and easier so they have a server and have various techniques she can used to make things faster it also gives you some hints before he stopped about putting your operating system into performance might consider power-saving manager said you know you could say this is more ethical than you know causing global warming by mining bitcoins but uh yeah certainly can close the laptop to get hot and then executes an intensive again want for I felt because this is not the euro C NullPointerException conference so this siphoned to connect sea lion and uh the Python line it connects the instrumentation that mentioned to the Python
interpreter by CIS thought set say every state presented will come at local level point isn't this traveling through the code base and you on handled Python exceptions get converted to 61 which a phone recognizer and you can see here a pioneer of profiles is just based on the tide height before fl follows this literally just as easy to use so here's an example of an extended it of using this to foster the cryptography library that stays pretty simple you have a little over a half notebook-style hook that uh that connects single and his passing standard input to educate a signature he said he said it was fruitful by then if you listed any particular parts and found that this is a general approach so it was an interesting questions raised by these the libraries so in the full magnitude hypothesis and fell fast will give you the new input data every time you run so this could be considered annoying to some people they wanna know if the new committee fails the test and if you got different test data every time maybe failed because it found a different test and rather than your Committee goes into the same consistent falsify out some people insist on on the other hand you my final boxes that would be handy as well so I think uh the resolution between the 2 areas to how do you the non deterministic testing may be uh not in your comment testing box to look for the counterexamples it pulled out and copy those into a deterministic suspect all just live with their non determinism and final parks that's what the author of hypothesis recommends but you can't indeterminist ignited and if you insist so we've been thinking about random inputs 1 might think about it is if things at random they would even get to the starting gates have to be relevant to your clothes well on the other hand uh you can't enumerate all the examples contain many of the 2 spaces often just to massive and if you give if you give a have because sample input and you don't need take enough industry go straight to the maiden come out of the and and you know think everything's fine so you wanna have that's sweet spot we're reaching dead ends of your code to warn you that is the maze which represent all the parts UK-based but uh but not having them all 1st look at the 2 year they wish but we should use but you insist on just using the standard unit test usually try to be more adversarial but I would say you can't expect the unexpected say should probably use hypothesis so if your input data is if the input data by construction and even with if they're not built in data structures you can build your own strategies it is quite fun as well as a means you have to think of the money that the hot-headed examples so people say that they're codebase becomes the test OK but becomes easier to read because they're not distracted by the specificity all you know all about example don't call it just says this function takes full strings or list of integers and something like if I like a fellow if you impose a binary or as we sold maybe your pausing text input like and a strong library so in conclusion we seen the styles and test data generation humans are generally bad picking random examples developers about it being adversarial to that encode bases which they understand the love computers a vast let them play with your page and find more box before your customer the Secret Service does let me end by saying don't interrogate UK-based like it's a fluffy bunny stuck up a tree fire guided-missile play the branches of the tree in the of the men not just me saying it celebrity endorsements this little reminder there and was of interest to be able to get out of the talk after this 1 which will be more informative and better presented by Morris combat you haven't told you yet I had put and cover your points as it is in this room after and there we go out into vinyl any questions in
B as I think back to the previous talks yesterday is hypothesis python 3 ready and could be made to use the type annotations if were available in the room there I think it is likely that uh identical type instead of the of the library yes sounds interesting thanks for the interesting talk and false how those people officers handle if the the cold under test exhibits some randomness itself user voluntarily may be a Monte-Carlo algorithm or involuntarily when Mr. Brady question but it will raise the fate flaky paid a warning so will tell you that you know it's what life is you can suppress the warning here but it it basically tells you that if you code is non deterministic then you know you are less likely to find helpful results you know so here you may want to have a patent hedonistic made itself will technology I just want to point out so if you will want to help making libraries that was the libraries more secure there is a project called the of course project by on which helps you with some documentation look at sort of the following forcing your favorite label library so that's already pointed it start if you want to make things more secure data should be noted that the passing approaches project only the and thanks to to the talk and you mention that type of this runs its some iterate test so many times to produce results given as well as standard your standard unit has workflow you run somewhere else in your testing with I think I might question would the and and use it in a TDD workflow um there was a told day about test uh which uses coverage of apply to any run the tests that are related to the case changing say you you could get speed improvement from tests on an and balance that with this lay down from hypothesis and just maybe end up where you have the floor not with uh finding upon and I have a quick question so you because there hypothesis is generating random input is a bit strange to use on a project we have multiple develop this because then the that each developer have different input into the test yesterday and in the interests of non deterministic start with say you even before we had like a database of examples Evans getting different test that's and that this idea of sharing the database of sound examples I think is still a work in progress I think the development of hypothesis to attract think through whether it makes sense units for example have that When CI server or whether that's a bit of a nonstop so you can add another decorator to test with 2 given specific examples to force it to give a specific example uh let's say if some developed found input that was used helpful they you don't commit qualitative that is always present already has please join me in thinking time once again which
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation


  461 ms - page object


AV-Portal 3.8.0 (dec2fe8b0ce2e718d55d6f23ab68f0b2424a1f3f)