Linespots: Predicting Bugs in your Code

Video in TIB AV-Portal: Linespots: Predicting Bugs in your Code

Formal Metadata

Linespots: Predicting Bugs in your Code
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Linespots: Predicting Bugs in your Code [EuroPython 2017 - Talk - 2017-07-14 - Arengo] [Rimini, Italy] In times of increased awareness of technical debts, reviewing and auditing code becomes more important. The main problem with code review is the amount of time that is being spent searching the needle in the haystack. You just don’t know what you are looking for and where to find it. One possible solution to the problem to the idea of bug prediction. If we could somehow know where bugs are in our code, focusing reviewing efforts on that area should, in theory, increase the effectiveness of our review. More bugs should be uncovered while less time is spent reviewing. This is what Linespots tries to offer. It is an algorithm developed during my thesis that analyses a project’s history and calculates a probability value for each line of code in the project, representing the likeliness of a bug existing in that line. Using the probabilities, reviewers can focus on the areas that are at a higher risk of containing bugs and spend less time on robust code. The research done so far showed, that by analyzing 0.5% lines of code with the highest risk values in a project, an average of 50% of the bugs fixed in the next 150 commits were correctly predicted by Linespots. This is an improvement by factor 10 compared to Bugspots, an algorithm developed at Google, which Linespots is based upon. Outline: Basics and functionality of Linespots Research results Pros and cons of Linespots Results of a case stud
Prediction Personal digital assistant Software Machine code
Area Mapping Plotter Projective plane Cuboid Machine code
Area Point (geometry) Trail Algorithm Trail Observational study Link (knot theory) Projective plane Price index Basis <Mathematik> Function (mathematics) Mereology Revision control Message passing Mathematics Hash function Cuboid
Area Slide rule Group action Algorithm Greatest element Mapping Multiplication sign Correspondence (mathematics) Projective plane Electronic mailing list Machine code Mathematics Average Personal digital assistant Cuboid Resultant
Point (geometry) Group action Game controller Algorithm Projective plane Interior (topology) Multilateration Machine code Disk read-and-write head Mereology Machine code Disk read-and-write head Hypothesis Number Number Population density Performance appraisal Population density Personal digital assistant Cuboid
Computer file Plotter MIDI Bit error rate Insertion loss Ordinary differential equation Average Mereology Machine code Plateau's problem Number Population density Cuboid Endliche Modelltheorie Area Pairwise comparison Algorithm Scaling (geometry) Video tracking Projective plane Feedback Graph (mathematics) Machine code Price index Arithmetic mean Data management Personal digital assistant Parametrische Erregung Ranking Table (information) Optical disc drive Resultant
Decision theory Multiplication sign Direction (geometry) Execution unit Sheaf (mathematics) 1 (number) Set (mathematics) Mereology Exponential function Formal language Software bug Fluid statics Sign (mathematics) Mathematics Machine learning Different (Kate Ryan album) Logic Cuboid Electronic visual display Area Machine learning Algorithm Constraint (mathematics) Trail Block (periodic table) Moment (mathematics) Electronic mailing list Sound effect Bit Lattice (order) Measurement Connected space Type theory Arithmetic mean Vector space Quantum Right angle Arithmetic progression Weight function Trail Functional (mathematics) Implementation Open source Real number Virtual machine Online help Graph coloring Power (physics) Hypothesis Number Goodness of fit Programmschleife Software testing Scaling (geometry) Projective plane Mathematical analysis Machine code System call Word Uniform resource locator Logic Personal digital assistant Universe (mathematics) Statement (computer science) Video game Library (computing)
Point (geometry) Greatest element Random number generation Division (mathematics) Weight function
thank you everyone so um as a
start I would like to ask you what your feelings are if you see a picture like this any positive or negative feelings here OK so this is what I am I pull requests or in this case just a differ from a pull request and get have looks like and this probably means you're reviewing which is hard for a lot of people the so what would you say if I tell you that there was something that made this confusing I have to look everywhere thing look more like this
give you a heat map of we like plot problem areas where you should look where there's probably about and and what if I told you that
there was a thing that would predict 50 per cent of your box of the box in your project by just looking at 0 . 5 % of the lines of code in your project yeah does that sound good are helpful I see some comes up so that's what lines woods In hopefully can do to death and
1st who knows what a good committees who knows what a hunk of our who doesn't know what a hunk this in the comet some hands OK so just to be able to explain the algorithm later the blue colored lines are the major data which tells you about who wrote the commit when you wrote it gives you a hash and uh hunkers than what's more with the becomes and that's just 1 part of Côte that's kind of like it and so near each other so that get decided to eat make it into a unit if and that's going to be used in the algorithm so just to explain it from so this is how line
sports works the ideas you take a project and it has to be get project at this point and generally you can use every version control system I guess and you iterate over all comets and you have to decide if it for each committee if it is about fixed or not and this is non-trivial and I'm going to talk about that later that that so let's say we somehow know if it is about 6 or not if it's not about we just keep track of all the movement of what happened and of the In the comic and that's it but if it was about fix we not only track the changes but we place score because the idea is if it was about fix somewhere that means there was about and if there was a back somewhere so that means that areas probably complicated or someone wrote it when he was tired or so it's just a place where obviously box will occur and some research has shown that if an area had a box in the past the probability is increases there will be more what's in the future so this is the whole basis of the output the and for my study I just use a simple keyword finer for the human messages to decide the committee was about fixed a not so for example at we use fixes colon the issue link or I think only something like Bach colon link stuff like that so that's individual for each project and felt like it can work worked so this is how the scoring
works and the idea is that every line in your project starts with a score of 0 every time there's a buck that uh is connected to the hunk that lines in um we increase the score ideally we wouldn't would increase the score every time we wrote about but there's no way of knowing when I wrote about it's just like implicitly known by solving it later so in this case there's a red line that is removed so we also drop the score and their to a green lines that are added and at those scores are increased the 1st 2 lines are not change in anyway so we just um give them a flat increase and the bottom lines that our and edit use something it's so like the small coarticulation in the bottom it's just the average of the out around if you're like very interested in the ensign out how the algorithm work you can come to me later but I don't think that's very interesting like for the whole group some mechanical too much into detail and the whole thing then this weighted by age so in your comments on change the score stronger than older comments that stress to make sure that and if you fix box in the past and there's no new books in that area because I decided that that area probably was fixed for now and we shouldn't do look at it even more and this
is what the result of the algorithm looks like you get a list of some slides course and corresponding lines and facts and this can then be used to for example create a heat map of your project or do whatever you think culture project yeah the it and those slot those scores are not absolute their relative to all the other scores in your project so for example you could use it to you look at the highest score 10 % of lines of code or something the the interesting
part and for me was to figure out how they evaluate if this works in so ideally I would want a lot of data using that's using this like in a group and we have a control group that doesn't use it and then have hundreds and hundreds of people reviewing stuff of and see if it works or not and that was really feasible for my Bachelor thesis so what I did I created something like hopes to the future and so this is what the history poverty project kind of looks like you have been in a comet where you started your project and at some later point there's your head command which is your announced it on and you can use this to go back in time and decide whom I'm just gonna go back like let's say 100 comets and I call this now and everything after those 100 commits is now if you truly don't know anything about because that in this case at the uh the middle point I have no idea what's coming after that so I can use the following comments and the future that have in my project to make that can that I can check against but I can't of influence my algorithm by the I am and what it then was so my used all the lines that were proposed by my by my algorithm and see if in the In those lines there would be but fix this in the future and
um the what it is for this and so I wasn't really sure if this is the best thing to use but it kind of sound sane um it's called density and adjusts um shows the the ratio between number of parts that are not cured in the lines I checked and the lines of code that used to find this box so I'm if this number is higher this just means that I found more parts per line of code I looked at and I think that's what's like efficient review looks like wanna find as many box by looking at this little lines as possible
so when you plot this kind of looks like this the blue line is the proportion of them back suffered so it goes between 0 and 1 which is then 100 % as in this case I arrived somewhere around that's a 90 per cent of all of all the boxes that happen in the future and the the red line is the air density which is the real interesting things so we can see that the density of spikes very all the meaning that people around 0 . 0 0 0 1 0 or something like that that we have the best ratio of parts we found 2 how many lines we looked at them and this was very consistent over a lot of projects and wanna show you a table with some numbers later uh and um it means that we can use this algorithm and by looking only at very small numbers of lines of code we don't have to look at like after project to find everything but we only have to look at very very small numbers of lines of code which is good because that's what CodeView usually consists of on so this is the comparison with feedbacks with algorithm with this what this whole thing and I did was based on Google's boxplots algorithm which kind of does the same thing I did but on a file-based so you don't get a score for each alignment for each file and if you compare both graphs of the that's what I rhythm finds more box but that's just because it proposes hold files and it seems that there is knowledge about parts of a cult where there's actually no knowledge about and the line sports results on the right side a spike uh much earlier and reaching a plateau much earlier so for using this also you can see the scale on the right side of her but suppose we have a ratio of 16 per cent of of box found per person line of code and on 4 lines but we have 2 thousand per cent the facts from Peru line of code which is pretty awesome and there's another picture which is completely send out a and you can't really see how steep it rises so this is just to show that overall box plots looks to find more box but often not for the cost of you having to look at like 20 per cent of your projects code which probably not something you wanna do and so these are some projects
tested with it and the green colour just there is an indicator of uh which um ranking algorithm of work best so what this shows is that I looked at 0 . 5 % lines of code because that was kind of a sweet spot um under that there's just the parametres I around the area them with and then there's some some projects you probably all know at you probably all know given to decay model loss is the file manager from be known from and I ran a few algorithms with them over them home and try to find out which 1 works best so um what the numbers show you on for example for a bird codes the best case that I encountered in my home the was that there was looking at 0 . 5 % lines of code I found 50 well almost 50 7 per cent of parts that would cure in the next hundred 150 commits so by just looking at a very small portion of the project I could find almost all the boxes and 3 quarters of the box that would secure in the near future the the it sounds pretty cool the in that so what can be
proved about the project and this is where this becomes really interesting because although it can sounds awesome to find box in your code so far we haven't really or I haven't really figure out how this can be used in real life because everyone i've talked about uh talked to you about this was like this is we also in the sound supercool but no 1 had an idea how this could help someone although everyone thinks it's us and so 1st some academic stuff what can be in a improved so 1st the decision the common is a fixed amount not is a very vague thing and there's the idea to use machine learning to increase the accuracy of that that ideally you would reach an accuracy of 100 % for deciding if something is a park fix or something isn't about fixed because that's what the whole of things based on if we these decisions are not made precisely then everything after that becomes on precise and the the next thing is the which a lot of people suggest a strictly logical truncal something like some functions or a you a logical blocks like if statements loops and stuff because if there's a box somewhere in the function that probably is assigned for that function being complicated so you better take a look at the whole function on this was just too much for the 9 weeks I had in my thesis but only I really wanna try this whole gets gets very complicated and has to be implemented for every language itself but so far earlier rhythm runs completely language agnostic and runs on everything you have that it takes place and finally we can apply some machine learning voodoo because at the moment you can solve every problem by throwing machine learning and big data but 1 part with would be the is it a fixed not a fixed the decision but generally I think I have received suggesting that suggest seems to use machine learning in every area of the sovereign power and so if you have like Claudius on how you could use machine learning to make the spirit tell them to me and there's like there is in the summer where it starts with machine learning variant and there's like a list of all ideas of and so finally I would really like to test this with people I would really like to gets projects to work for me to find a way to get this thing is useful in real life because of I think it's could be helpful with code review which is the thing we it what a struggle with because we just have so much to do and if you would be interested somehow collaborating with me or like just me running the algorithm over your project and then seeing if it has somehow helps or not on come forward to talk to me here at the colistin thank you thank you very much more luminous for in question the the talk I this is the only you can run as well the mall is open-source tools and then there is um my complete these is open source so you have you can use my the 1st implementation I wrote and currently rewriting it in a very much nicer way but you that the algorithm is open source and with the thesis on all the data I together is also open source can you talk in a little more about why you're looking only at fixed sections rather than like features commits might but fixed sections will be prone to new bonds yes so um that is just based on the research I based my thesis on because they found a connection between the back fixes occurring and they're being more box in the future but I'm pretty sure there is a lot of more signs telling you that there might be problems and yes new features probably are prone to have box because they're just newly implemented in the test and so that would be another thing to look for the but maybe display I only had 9 weeks for my thesis so there was just time constraints the that the the the thanks very much and it was really interesting so 1st the boat deciding whether the use of external will probably use all the means are linked to such a vector and you should labeled as possible so and write this and the other regional here's to you she in general you will want go all parts of the value in the purely logical 1 the also will once they misunderstood my stuff like that so that we may be different than what so I guess of the of the algorithm doesn't care for what type of box you feel some and I'm the research this is based on didn't go into detail on like if there is a difference between it's a title typos and some people argue that titles are box and some people don't I guess that it the probably doesn't matter because having the title somewhere in from what the research i'm basis on and told and was that if there's about there's some reason for about to be there and that also leads to the probability rising that there's just more box there so we you for example of try to label or issues with boxing on box that based on what kind of box we wanna find was because we can use this and so for example for we
don't label the type spots because we're not interested in having the typo parts in this out the answer but if that would be something you're trying to look for then I guess and labeling that is but also works that that answer the question the I thank you that I'm just wondering if you can look at locate high risk lines of code can also suggest and now there's that would be something that maybe I don't know if you could then feed that into some static code analysis the analysis thing but this is also not likely and this is just a probability if there's about fixing up this is not so saying there is a back this is just saying the probability of and there's a back in this line is higher than the next 1 for example so no fixed set the thank you for your attention was monitoring and making available and I think that we should request and gives you have a nice tool to highlight with the demands of the units in the case the and there's 2 guys it get lap running around summer they're thinking about using this and so far I didn't build anything in that direction but I think it could be cool although all the people I've talked to so far our on and weren't sure if they would use so like the idea sounds awesome and I agree that them all the like people with money that I tried to somehow get information out of if this could be useful somehow weren't really into the put this in to get up and make Europe over colorful things more questions yeah so thank you and so on there is something you do when you think about it on sometimes we write this as I understand always looking most of the other good added in the test with the university's commit the between just increase the school molarity of having a very here on the right the maybe you you right that that really it's not about to ensure that the value of fixed-income happening anymore so is the ways you avoid some kind of possibly code because this is the most obvious 1 but maybe there is all that kind of thing that this I'm not the so like the implementation that is list from this too simple for stuff like that but I guess it would be a rather easy to have like an aegon is that just looks for test maybe documentation stuff or something and just doesn't track them and doesn't offer them in the list of stuff you should look at we can say about and when you say the book thinks that you go schools around goes up the is there any way for the school to carry them afterwards all if you fix that but a few times that probably to go to the effects of what school now would stay all and so the that's what the a weighting functions for I didn't show it because it's just an exponential function but what it does is on you run this this is run incrementally but this is just run once and you get 1 set of course and then for the next call which would have to have to do it again so for example if you analyze last 500 commits then the 500 committed in the past and weighs less than the newest ones so if there was a box bark 5 and commits ago and there wasn't are about fixed and there are no bug fixes in the future the scale score is and produced by the weighting function the so in the that was the case with time and how much do you need to go up like the most active meeting the most active areas of decoder compared to the most error-prone areas of the code they if you analyze this about what it would you just have other than I think I will not light up and all the public libraries because your if there is a lot of back fixes in that area probably because that means there's a lot of parts and that at any of it's time and call recent changes on hello I use yeah that that it's kind of a problem like generally this awaiting by times not ideal because times that a very good measure on progress because there's projects where there's a lot of of these comets happening in a short duration there's project worst let weeks or a month of no commits so even though you could have 1 commits 2 weeks ago in the next it today and they are like in the sense of the project and they are not very far apart but there would be weighted very different due to the time of at the time used to weigh the stuff so I thought about using like member numbers of commits to waste of but then again numbers of commits that very measure because some project use small in some project uses large commits so that's also thing where no ideal solution that is there so far this the quantum questions the the so we have your assignment of each like some kind of way score right exactly how I hold someone's will score higher than the others some words in 3 so you won't get a list of assuming the example scoring this 1 the OK so and to go about it in a little bit more detail of the
um yeah immediately without my this here are of k so you're the old stories which you put 0 . 3 for the 1st lines and 0 . 5 for the next for the following 2 and here in the school and the way is the news causal calculated that at the top it says score increases your point 2 5 which is just a random number I choose this would be decided by where we are on the weight function to hope far I can't I we have and the 1st 2 lines are not changed so they just get the 0 point 2 5 increased by and the line is removed so this course also removed and the bottom 1 supplier and it as the alliance so so what is that it is I calculate the average score of the complete 100 which is done at the bottom and this year 0 . 4 3 which is then used as the old score and then increased by 0 . 2 5 the new score isn't question and the mood of the individual thank you