Finding bugs for free: The magic of static analysis.

Video in TIB AV-Portal: Finding bugs for free: The magic of static analysis.

Formal Metadata

Finding bugs for free: The magic of static analysis.
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
"Finding bugs for free: The magic of static analysis. [EuroPython 2017 - Talk - 2017-07-14 - Arengo] [Rimini, Italy] Overview Static analysis is a powerful technique for improving the quality of software. It allows you to find bugs in your Python code without any the need for any annotations. At we provide state-of-the-art static analysis for a number of languages including Python. Our Python analysis can find bugs without the annoyance of many false positives. We have already found bugs in the standard library, requests, numpy and many others. In this talk I will briefly describe what static analysis is, how it can be useful to you, and then give an overview of the techniques we use. Intended Audience This talk is aimed at all Python developers, although I expect that those who have used pyflakes, pep8, pylint or mypy in the past will find it the most accessible. I hope that this talk will inspire developers to start using static analysis tools (hopefully ours) and to produce better code as a result. I will assume that the audience knows Python well, but knows little or nothing about static analysis.
Numbering scheme Gradient Authorization Sound effect
Code Source code Line (geometry) Funktionalanalysis Instance (computer science) Mass Formal language Array data structure Software Query language Personal digital assistant Software testing Spacetime
Web page Dynamical system Code Multiplication sign Range (statistics) Set (mathematics) Mathematical analysis Mass Quality of service Computer programming Formal language Goodness of fit Mathematics Operator (mathematics) Software testing Contrast (vision) Error message Social class Vulnerability (computing) Area Complex analysis Mathematical analysis Interactive television Code Bit Database Line (geometry) Formal language Cross-site scripting Type theory Category of being Arithmetic mean Doubling the cube Order (biology) Video game Right angle Quicksort Writing
Point (geometry) Sequel Combinational logic Set (mathematics) Online help Mass Computer programming Element (mathematics) Query language Object-oriented programming Causality Object (grammar) Core dump Query language Error message Library (computing) Loop (music) Social class Window Theory of relativity Key (cryptography) Expression Line (geometry) Arithmetic mean Loop (music) Query language Social class Quicksort Resultant Library (computing)
Point (geometry) Slide rule Code Multiplication sign Mathematical analysis Code Software bug Data management Causality Query language Object (grammar) Query language Social class Error message Library (computing) Loop (music) Library (computing) Spacetime
Numbering scheme Module (mathematics) Code Token ring Letterpress printing Abstract syntax tree 2 (number) Causality Data compression String (computer science) Green's function Energy level Data structure Module (mathematics) Area Expression Mathematical analysis Electronic mailing list Bit System call Abstract syntax tree Arithmetic mean Loop (music) Process (computing) Query language Personal digital assistant Network topology Statement (computer science) Self-organization Iteration Right angle
Computer icon Reading (process) Numbering scheme Logic Query language Code Integer Angle Open set Loop (music) Tunis
Logical constant Point (geometry) Module (mathematics) Execution unit Presentation of a group Numbering scheme Greatest element Graph (mathematics) Information Numbering scheme Code Ferry Corsten Multiplication sign Mathematical analysis Letterpress printing Computer programming Element (mathematics) Loop (music) Interpreter (computing) Right angle Control flow graph Tuple Formal grammar Control flow graph
Point (geometry) Logical constant Tuple Trail Dataflow Random number Numbering scheme Randomization Game controller Code Image resolution Correspondence (mathematics) Set (mathematics) Branch (computer science) Programmschleife Different (Kate Ryan album) Arrow of time Software testing Error message Condition number Control flow graph Area Dataflow Information Numbering scheme Mathematical analysis Approximation Loop (music) Personal digital assistant Statement (computer science) Control flow graph Tuple
Point (geometry) Dataflow Numbering scheme Numbering scheme Code Computer programming Loop (music) Personal digital assistant Heegaard splitting Flag Integer Software testing Right angle Endliche Modelltheorie Error message Control flow graph Tuple Flag Control flow graph
Dataflow Numbering scheme Matching (graph theory) Numbering scheme Transformation (genetics) Code 1 (number) Branch (computer science) Heegaard splitting Loop (music) Data compression Personal digital assistant Order (biology) Heegaard splitting Statement (computer science) Convex hull Right angle Software testing Control flow graph Control flow graph Flag
Web page Trail Context awareness Numbering scheme Open source Code Multiplication sign Coma Berenices Prime number Computer programming Flag Integer Area Context awareness Numbering scheme Projective plane Mathematical analysis Funktionalanalysis Line (geometry) System call Loop (music) Personal digital assistant Order (biology) Quicksort Tuple Resultant Flag
INTEGRAL Query language Multiplication sign Projective plane Instance (computer science) Continuous function Freeware Disk read-and-write head Computer programming Declarative programming Formal language
Code Java applet Multiplication sign Equaliser (mathematics) Source code Port scanner Set (mathematics) Parameter (computer programming) Inverse element Disk read-and-write head Mereology Computer programming Software bug Formal language Subset Web 2.0 Fluid statics Bit rate Different (Kate Ryan album) Core dump Endliche Modelltheorie Error message Position operator Exception handling Physical system Area Scripting language Boss Corporation Electric generator Software developer Moment (mathematics) Electronic mailing list Bit Funktionalanalysis Instance (computer science) Type theory Data management Process (computing) Configuration space Quicksort Resultant Spacetime Thomas Bayes Dataflow Numbering scheme Service (economics) Computer file Transformation (genetics) Mass Inequality (mathematics) Rule of inference Theory Intermediate language Revision control Goodness of fit Pi Causality Operator (mathematics) Software testing Compilation album Domain name Noise (electronics) Standard deviation Information Projective plane Mathematical analysis Plastikkarte Planning Query language Library (computing)
the killing of good morning can everyone hear me way the bank get grades OK uh just inches myself and company uh money's Martschat I've been playing around with Python for over a decade and most of us spend on the working on how to optimize it or how to analyze it all about to break it sort of as a side effect of those 2 things uh my the author the text for 12 and as the years delicate for PET for it form a fingers run numbers have to be willing to check for me the and I so before we come against the the depth of stuff and what he was still writing
which should have a a me space opera so this is the curiosity lander um In February 2012 it was on flights to Mars and NASA found a body in the land of software now that the software is written in C and in C. you can pass arrays around which you pass arrays around the C language doesn't bother into the opossum size Iran's is very easy to pass an array of 1 size 2 functions expecting array of another size so NASA found 1 of these cases in testing and a they asked to investigate so we could find any more cases this so I have to write a 9 line query um and we're about
find another 30 instances of this uh and wouldn't exactly what NASA did because all this software is secret and closed source and so military great but we presume that fix them and we do know the curacy landed safely so yeah nasa's quality of sentences but masses somewhat risk of us we wish to to try and convince you the that's code
analysis is something that you can use even if you are less risk-averse the mass it's still worth your efforts on a convinced you that in the long run it can save you time and money is a little bit of effort you put into using it will pay off a now uh overseen as code was written in C but we support a bunch of languages L TM and 1 of those and the 1 I believe you're probably most interested in is Python the the so what is
code analysis the code analysis is finding out facts about your program of basically anything or that you might want to say you might think is interesting so that could be matrix um you could look for complexity and the QoS uh look for hot spots of complexity all sorts of things but 1 particularly interesting sort of set of facts are all Bob so likely bugs and so I would like to contrast code analysis with a couple of quality assurance things you probably already did the so I 1st 1 is testing I'm going to assume everyone tests the code looking for asking that's an embarrassing around uh so testing is also very important for we checking your code is safe to be released but testing of specific to a code base and also you need to write test for everything you're interested in um if you run a piece of code you have to write new test for you can't rely on pre-existing tests all pre-existing sets of test to find what you want where's code analysis often you can rely on pre-existing stuff another quality assurance thing you probably uses code review now cutlasses is much more like code review so human coder via is gonna take the all the different your code and attempt to c way she's a confined within hopefully um builder and a positive constructive fashion but essentially uh looking floors for flaws in your code they looking for design flaws but also be just looking for smaller-scale errors no occurred analysis can take away a lot at work and code analysis can find those errors and it's much much more meticulous than any human could ever be it can also work out whether your changes interact with other pieces of code and double check the interaction which is something the calvaria could very easily a minute the so that's makes them so what does make for a good code analysis as 1 yeah it's not very useful OK so let's go through these 1 by 1 so uh ADC flexible so given the next example that was a uh an area they had not anticipated had there been as of general-purpose check the already had a refined but because they didnt anticipated we need to create a new analysis on the fly of reasonably quickly and reasonably easily that's important love analysis another 1 is is accurate after work imagine you have a watch and when you look at it it's right half the time what you do the what's right has been at what I do is write nineties and the time well it's kind of useful but not really that useful but we probably keep it if you don't have any others or means of taking the checking the time but what about is 99 . 5 99 . 8 % accurate then yeah sure you double check if it was a flight to catch but otherwise you pretty much rely on the and I assume you can rely on something then it just makes your life easier because you're not kind of like double checking it'll dousing it's all the time so accuracy is very important and finally you need to find it needs to be useful in to be insightful so Pepe so right well and good we'll look at page but it's really matters is 81 characters on a line all of you know these minor little things but what analysis can find things like you know cross-site scripting vulnerability or something like that then that's really valuable so this is a huge range of things we can find in this kind of you want because that find interesting things so what can we do this for Python and the answer is yes the spoiler than 1 the but it is a can be tricky conveyed is that you know a statically typed language so Python doesn't have any type annotations and decorations of course the type annotations they are used but is relatively few of them the also because it has a history of being dynamically typed people tend to just pass values around and then locally check for things like is something non uh something have a particu at tributes is a callable also on uh before they do some operation on it for we to understand these sort of things the also people do things that genuinely dynamic Python things that recognizes tool is always going to struggle with beings like creating classes for all database here on the fly and that's pretty difficult to analyze if you have a database schema to and then maybe you can integrate analysis of that but generally you're not going to be a to do that so in order to keep things accurate we also need to know what we don't know the John
so right of center that flexibility is kind of important so and Otto TM it is well what makes a flexible so hearts in contains an object-oriented query language and the advance of a query languages because it's declarative we can just by so you can just say I'm looking for this sort of problem and it's and that allows you to find things in fairly briefly uh make very brief queries that will find what you want so given the mass for example I said there was as an online query another body that because that uses the C libraries and so we're gonna bother with Python so I will give you a python example so
here's an example query so basically what we're looking for here it is a fully and the thing is iterating over is not an all and and this query is pretty straight from pretty short uh obviously at 1st glance it may not make a lot of sense or can explain how it works so the 3 clauses much like any sequel query we have a from clause which sort describes the that the program elements the we're interested in a where clause which relates them and selects which is just gets as a result so in the front cause we're looking for a for loop an expression a class and an AST anodized to node which is this port would say that's just some point in the program and we're interested in that so that we have some marker that we can look at when we see results so we know where we're what we're looking at and how to fix it so the key thing is if we just look for what we're not looking for any old combination of those were looking for them that uh specific relation between them so the 1st thing is that the expression is the thing in the fall is so that's our 1st line in the where clause basically saying that the it's a role in the fall loop is and then the next 1 is probably the key points and that's basically saying that the expression the values it refers where 1st you or the set of values it can hold the so we don't care about the value but we are interested in the class of those values and and the origin as we know where it came from for producing results and the last to the last line says of see that it's not it's revolt but you know the we're not saying that it isn't it from all we're saying that are either we don't know that it is a ball which is why we have a 2nd clause which says we don't we do know something about it a hand up the core of also I underscore underscore is a convention meaning ignore this value and I believe a sequel conventions or you see cause um and anyone could later on wrong this yes so that means we don't we don't care about the value and then we select the loop and the origin which is usually you for help to fight detail where value came from so we can fix the errors the so
in the flexibility is and you can write these brief queries and you can write your own queries so what makes accurate uh more precise this is the that refers to the we had in the previous slide
that is in that space the wrapping all our analysis in the library
so I'm not gonna go through the library um 1st shipment time get as good but again so to an i'm gonna go through some examples of manager an example code um it's kind of nice to show real code but there's a couple reasons not the real code is big 1 from a slide usually what tends to happen is that the cause of an error and the manifestation of error are not necessarily in the same place and makes it look to ground and there another equally important reason which as I don't really want to like choose some other freebies code point and say I look the bugs in your code so I think we so it's much better when that the finger points it myself and the code is clearly intentionally body as you can probably guess from the name but the code to their so hopefully I went up 71 so the 1st
pair the 1st piece of analysis we can do is basically to parse the source code and produce an abstract syntax tree an abstract syntax tree is basically a tree that describes the structure of the source code the so in a very simple code on the left it produces the abstract syntax tree on the right now this is assume this is a piece of the whole module not just a snippet of code the tree on the right of the top level is the module itself and that contains 2 statements an assignment and a four-loop the assignment is broken down into the target the assignment which is the left hand side which is numbers now know the numbers is just the name rather than a string so it's a name and then numbers and then the value is just the value 1 now this is abstract syntax tree rather than what's other called a concrete posturing our concrete policy would contain things like the Earth the parentheses around the 1 which we've emitted from half seconds extra because it doesn't affect the meaning is just extra syntax likewise is no actual marker for the that the year for the in um tokens they're just emitted and the developed a little bit more complicated the target again is the thing that gets assigned which is N. the itself which is if you recall from a query with the loop don't get it so that's the iterator and then there's a body which is just a list of statements in this case 1 statement that statement is an expression statement whose value is a call and the call cause a name just print and has the artist name and and those total suggests for that might be a few other bits do regarding the the presence or lack of of star auden's dust or organ and annotations and such folks the the items are tations Nicole forget intelligence but no OK
so if we have a look at that piece of code we run it through tools it tells us that indeed we can we have an area that so uh in so if this is green jobs trust Wi-Fi last think I should have and the where's that the
logic the the this non iterative falling 1 simple but was a small working but again this is
actually on the web and get so if I I click on the that's it'll highlights the the the origin which is where the origin that comes in so if you even we can say that the numbers it is an integer and you shouldn't iterate over and just OK so that's
the EAST that's are kind of 1st 1st thing we do in our analysis where the rights to this presentation the so next that a
user control flow graph there a few look at the program on the left you'll see that this 1 is uh somewhat redundant but uh is correct and that the numbers we're iterating over this time is a tuple 1 2 3 now the AST would have just said we have numbers at 2 assignments called numbers and it doesn't give us any information about ordering a so it is ordering information ordering but not accurate enough to be to be generally useful where the control flow graph does so control flow graph basically is a graph that emulates the way that the interpreter actually executes the code so if you so if you look fully you see the octagonal elements the module there's an entry and exit points this whole flowgraph the on the top is the entry and the gray 1 mostly at the bottom near the ball is the exit point and then is simply flows through the code so the exactly why evaluates is we evaluate a constant 1 then we assign it to numbers then we evaluate the constants 1 2 3 create a tuple uh and then evaluate assigned that's numbers and then we go through the for loop and the fall it basically is as low numbers and then in a loop over its items printing each the 1 at a time um Idaho there's reasonably clear idea these they're GoogleDocs 1 Lightfoot with SEGS up so I I'm not sure the print
resolution of the PNG is great a k
so that's great so we no longer we're not coming up with a false positive because of their the way choosing 1 when wind should know it's to the tuple but things can get
more complicated so that in this case if this was just the code obviously we would stop aciduria at random with a name error but let us assume the random is defined somewhere else as something that is either genuinely random or something the analysis cannot work out whether it's true or false does remake of the difference now that in the east at the beginning of the 2nd if statements we from our we know the numbers is either 1 or it's 1 2 3 so it's either in in 4 tuple now the it is an int or a tuple we track it through and we can see some errors so we're going to think that well any the for loop numbers convenient for to Paul so where there is in the above areas in both loops but of course there's an error in the 2nd loop because of the 1st loop checks to is regarded by the check to see if it's a tuple so the 1 cannot reach the control flow graph doesn't really show us that because of the control mergers so we need to do is was called data flow so dataflow essentially we track the uh the values or the set of values or some some some approximation to the exact values of a C because we can't excuse the code they give us the information we need in this case we can just because he's a simple constants wages just track and what happened of course is that of flow here 0 I should mention that if you can see apologies for those who are color blind but the green and the blue arrows the green corresponds to the case where the condition is true and the blue can corresponds a condition where that it's false and we can track through the value was so basically you as we go through the 2nd branch um of these that test will eliminate 1 of the values on either side of such that the uh 1st for loop we know that the into the value 1 will not get there so we track the value of 1 from the assignment and then as we go through the test and the test is true of C 1 is not a tuple so that point that value is discarded and then obviously there's nowhere in the 1st for loop but in the 2nd for loop obviously that's just MIDI testing isn't a tuple 1 isn't a tuple so then we hit that and then as an error there so we can see that
all in all website um and you can see this is error in the 1st in the 2nd loop but not in the 1st as we've are able to work out that the from the data flow that's we're not saying that to a we're seeing it an integer for the 2nd 1 but not for the first one
but of course there is a reason for this cases for that doesn't work so this is a slightly contrived case but you do see things like this where people do a test the set some conditional value and induce the same test later on and then use the thing they set in the 1st test uh for example and this can happen in try imports as well you might see code like try import through and then accept employer after equals was non and later on as a test This is if through because model was true uh do something using through when we we need to track that the so and here's the program and the control flow graph on the the right and we have no all data is insufficient to prevent is getting a false positive here because uh flags and numbers are different so all we know about numbers at point we hit if flag is that it's 1 or the tuple flag is nothing to do with that so both path through that tests and then we get and we get a false positive the was all we can do is to use but the
control flow graph um so basically what we do is this transformation and so if you see we basically what we do is we don't rejoined the flow off to the if statement and then we can basically duplicate and move for the test to be a flight test into each branch and then it should be obvious that it's fairly trivially falls out which Luke gets run which doesn't you know if force and if true a pretty straightforward and then it's fairly clear that in the 1st no on the right hand side uh that loop is never gonna be executed on the right hand side it's safety of executed because numbers is 1 to 3 now we don't do this transformation the source code because that messes up everything else we we do on the control flow graph and
the is the transformation of I now what the other thing to note here is that you might think this is this hesitancy just blow things up horribly as I she not so bad we do limit the amount of splitting we do in order to avoid blowing up but often we're the ones you splayed you can often because you split on cases where repeated tests you can often then prune some of those branches she will node on the right hand side a on the left hand branch which follows the true test of a first place the we also know that the 2nd test is therefore owed can only be false and because it's a and on the other other side is the other way around so consequently the um we're able to prune those extra branches and you'll see there's really no the right hand side is really no larger than the the left hand side sometimes it that it it does expand In particular cases where you have like an early tests and then a whole lot of code and on another test the match is amended that has reason duplication but generally wouldn't see much of an increase in the size of the control flow graph so this so as well so far so
good but you will notice to all of us that was very localised and data tends to flow around a program through calls and so on so have a look at this code so is this correct um what yes it is because we can I will either calling for numbers with false and an integer in which case if it's a flag is false we don't loop all with true and a tuple and if it is true that that's the case the so we need to do here is if we track the calls from prime numbers there and the the book code the bomb into the function that's not quite sufficient because what values confined having the call uh in the function could be true or false numbers can be 1 or the tuple and then were unable to distinguish so I needed something called call context which is that where we basically pass the context from which we call something and that enables us to disambiguate this then so basically in order to be correct by that we need to not get you to go track any values call over a very restricted sense always can musical contexts again call context is something worked and she can blow up this of the size of things were analyzing but here we need to so carefully limit that to try and find interesting stuff without being in the killing performs too much and is more but we
don't really have time is there could go I could carry on like this will die all day so I think hopefully I've convinced you that our analysis is reasonably accurate as result of using these techniques um just shipment routes the for can so the and I now quick it over the run avails Jim . com so here is the front page for Django um that we use analyze a large number of open source projects um not quite as many as we would like so if yours is not there and you think it ought to be come and talk to me but generally I think if there I think to analyze most is the most often the harbor and that book um are just quickly run through this so you can see the number of countries is the number of alerts which are things that some of those areas of the other things just sort of recommendations and warnings which for a less important uh lines of code and all bunch of the stuff that I'm picking Django because it's really a code so heavily on embarrassing anyone in something pretty much there and of no another
project 1 slightly less famous and of lower quality um but I wanna highlight that said we do pull request integration so if you think this is a might be valuable for your project you wanna look through the alerts and if you think this is then you wanna know about this every time you have a pull request freedom for across integration or you need to do is log into LGT and you need to get a user ID and then get your project and you can click for requests right now that's a
bit of for as you know his so interesting stuff the so as a that flexibility is key flexibility means you have a problem that I haven't anticipated time and you would like to find all instances of that problem well you can write your own query now this is a query languages custom free language uh it's declarative programming it's somewhat different programming you might be used to so I appreciate although these queries over a concise is sometimes not entirely straightforward to get your head around it um the but I would recommend you have a go and so just to recap
good code analysis should be flexible accurate and give valuable results insightful in some ways is up to you if if you need to know something about your code and you can write a query stood that's pretty insightful the can and think that is it so um there is an open space all and encourage analysis and when invite other people who were interested in the corner people and so on as well as and to that but if you wanna come along a chance me I'll be around after the talk albeit sprints and if you thought that was all kind of cool we hard that is 81 and a talk without saying that but we are all areas with hand of OK the that we we are hiring uh if you think this is cool you like to work on this we're and hiring for doing pies analysis and from the web front ends for the core infrastructure and if you wanted to use each + + job analysis alike rules are hiring for those um OK I think that's it so any questions but few of you in selective in minutes for questions and during the recent questions on there and I can go much room and I just write it in parallel and try to add my approach and it's is I can build it also handles are missing so as not handled if you not apply the tendencies also did not find all requirements text because it's not in the top directly really with requirements thought the and in general all requirements files because we don't have just 1 we with multiple soul of a of of the free barrier-free discuss the offline but other good did way or other problems we do essentially just scan everything and and if it's pretty standard we build and if it's not where 1 is a bit of custom configuration um but now all the Lycans Canseco was wanting to to fix these issues and it's still a static analysis great and and especially if you don't see people cluster job on any of those studied languages if you're program and those in part using static analysis start using it yesterday because it will find the mass lots of pastas otherwise you wouldn't put in place then the problem is that the use of all these tools and what it always seems to be happening is that are use you enable them and for because Python is is so that means you get massive amounts of false positives and is all well you and all you valuable things had lost so if you have any sort of like numbers for instance on home false-positive positive rate is of bodies is the accurate thing so this is why I said here about knowing what you don't know um so put technically I would say we attempt to and ensure that all knowledge of a set of facts we're trying to present to you on which you can base your queries is as large a strict subset of the truth is we can manage um and within die mind should in theory be no false positives at all um obviously we're not perfect as so we do have some false positives but we generally regarded any false positive bogged um with a handful of exceptions where the to do the analysis completely perfectly accurately would give us essentially no results and we're prepared to trade a few false positives to get meaningful results on most projects the um best in numbers it's kind of hard to say because we don't know how many errors are on a program so we can't come up with a number as to what the false positive rate is uh but as a I mean would we aim to 0 uh as non achievable goal but um Stephanie something we we take pretty seriously supposedly told you of the sunlight of the operator so costs of for companies that the reasons for not get cold OK um naive Bayes the answer is I don't do say was a my my bosses told me over trying sales because I'm an engineer overeager so I don't don't ask me and but this is all free for open source and if you if you want to use a commercially then you should contact the strictly from sales as a model called and this it so when do also there also you type annotations in might 3 will we also make use of that you run the OK so um adding income a time annotations and we don't use much no uh take much information from them uh basically was sort of like it's who bijeli from you know the did development is an ongoing process and will focus on what gives us the best of like improvement at the time so were including tiepins diffi something is on the cards and we particularly want to do it for analyzing stub files for the the standard library and so on because all standard library announces is kind of just we did a one-off analysis of the c code and then try rise inquiries on that to generate type information but that's relatively weak compared with what's now and the typing stops so we want to use those because the worry is that uh in any area and stuff I was just going to manifest itself as false positive so that's something that's every a concern but I'm yeah so that the plan is is to to to take advantage of others but not to try and do type checking as such as mn merely using those as sources of information for the more so general dataflow thank you any other questions of that a that do you have some examples of the absolutely fixed and projects thanks to projects that embraced not off the top of my head it's uh I can dig your list issue if you want some evidence later on uh but coming going back a long time we found bug in the the 2 7 standard library which is a kind of funny little 1 so into 7 if you implement done the equals but not done noise cause then the interpreted gives you different results for equality and inequality so I think it was a week rafts sets um into several in a while ago emits a long time ago as my for 5 years ago I think of as fixed and it was both equal and not equal to itself and we will use that as a little demo and for xi i think someone from the core developer team was not done on a fixed it with an hour so we had find 2 different domain and the and we found an inverse of the project so various things um so interestingly we have something that looked like a false positive and flask but it turned out that it was actually it something wrong in the requirements don't test therefore flask is saying needed click 2 . 0 or higher and it was giving is an error that said we there is a you calling it with a uh an argument that didn't exist in and In the function being cold but when you created the requirement of click fraud ended the error went away so so colors sometimes you find like rather indirect areas like that so I complain standard library and flasks and that any other questions we talk thank you just 1 and then the slightest here to generate called this was also point and put in who version so just 1 of the possible to the if you put in the project and you can use the service from so that we can do when you sit improved coded you mean the transformation on the flowgraph what would says improvement contribute code in itself positively not an improvement so and so analysis tools much like compilers a transform the code and do all sorts of weird things with it during the analysis um and some of the intermediate representations to code all would be truly atrocious incomprehensible code but they all the they make things more explicit as far as the analysis is conducted internally but you should leave a code as it is and I mean if there's there a new code fixed mostly and if things are unclear make them clearer but that's not you should really do there for the benefit of tools yeah and the question it and it when you have a question then the way that so can you give us an idea on about how many of the projects your your company's processing genera and of some kind of curious will kind of infrastructure if you can share anything about that you folks have to write your system on all this I projects urban just some numbers I have so EU 50 thousand 94 projects apparently uh and play out the numbers right the um yeah it's kind of so and this this is including Java JavaScript and Python uh I think probably half of roughly half of those will be or maybe more jobs scripts uh and for reasons the at any 1 alterations but it's it's and java scripts is the the largest number than Python and Java but I think that's just because uh opera promotion of British we analyze at the moment is lower for Java because of the build issues and so on um but yeah this this number should go up and then if we add languages of diffi go further the in the question going once going twice going on but I think the speaker again of that