Automatic code reviews
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 56 | |
Number of Parts | 119 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/19951 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Berlin |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
Machine codeCodeSystem callSystem administratorJava appletLecture/Conference
00:39
Machine codeMathematical analysisSoftwareEmailOpen setRepository (publishing)Ideal (ethics)Gateway (telecommunications)Software developerTrigonometric functionsKnotPort scannerTrailOrder (biology)Electronic program guideHaar measureCodeFreewareRepository (publishing)BitService (economics)Line (geometry)WeightINTEGRALTraverse (surveying)TwitterOpen sourceSoftwareFluid staticsProjective planeLogicMultiplication signOpen setExecution unitQuicksortSlide ruleMathematical analysisCellular automatonDiscounts and allowancesComputer programmingSystem administratorContinuous integrationProduct (business)Error messageSocial classGoodness of fitDifferent (Kate Ryan album)Formal languageNumberPoint (geometry)Software developerCASE <Informatik>FeedbackJava appletInterior (topology)Interface (computing)Data miningMachine codeComputer animation
04:00
Repository (publishing)Function (mathematics)String (computer science)File formatService (economics)Port scannerRevision controlElectronic mailing listType theoryInductive reasoningParameter (computer programming)Monster groupWeb 2.0Conformal mapRevision controlCodeWritingElectronic program guideSoftware bugElectronic mailing listInterpolationProjective planeInformation securityMathematical analysisIntegrated development environmentMultiplication signRepository (publishing)Error messageSlide ruleSystem callDefault (computer science)Vulnerability (computing)InternetworkingAdditionVirtual realityLevel (video gaming)Library (computing)String (computer science)WhiteboardComa BerenicesPie chartSpacetimeWeb applicationLecture/ConferenceComputer animation
07:06
Greatest elementMultiplication signCodeLibrary (computing)Web applicationPlug-in (computing)QuicksortLink (knot theory)Information securityBranch (computer science)Population densityData miningSet (mathematics)1 (number)Lecture/Conference
07:51
Function (mathematics)TrailScripting languageServer (computing)IcosahedronTrigonometric functionsAbsolute valueProjective planeSlide ruleMultiplication signFunction (mathematics)Capability Maturity ModelCodeGraph (mathematics)CountingMeasurementNumberSet (mathematics)AreaWordError messageArithmetic meanOperator (mathematics)Level (video gaming)Message passingLink (knot theory)Order (biology)Web pageServer (computing)Euler anglesOpen sourceElectronic mailing listGoodness of fitTrailAbsolute valueGodSoftware testingService (economics)Library (computing)Right angleExecution unitComputer animation
11:08
Multiplication signCapability Maturity ModelCodeFunction (mathematics)Endliche ModelltheorieData managementInstance (computer science)Error messageRevision controlCoefficient of determinationSoftware bugPoint (geometry)Projective planeObject (grammar)Execution unitMereologyPlug-in (computing)GenderClient (computing)Run time (program lifecycle phase)PlanningWater vaporComputer fileMetaprogrammierungObject modelLecture/ConferenceComputer animation
13:18
Mathematical analysisRevision controlHaar measureMultiplication signState of matterPoint (geometry)Inheritance (object-oriented programming)Perspective (visual)Term (mathematics)Flow separationQuicksortMereologyProjective planeError messageAreaGenderSoftware frameworkDefault (computer science)Core dumpOpen sourceFluid staticsPower (physics)Mathematical analysisCodeLecture/ConferenceComputer animation
14:49
Flow separationSoftware developerMultiplication signBitLecture/Conference
15:06
TrailMaxima and minimaHausdorff spaceElectronic mailing listMetropolitan area networkDensity of statesData miningPC CardPredictabilityAliasingLandau theoryRippingDiscrete element methodMultiplication signLink (knot theory)Point (geometry)Primitive (album)TrailMetric systemBitObject-oriented programmingComputer animation
16:04
Electronic mailing listSoftware developerQuicksortMultiplication signPerspective (visual)MereologyFigurate numberAsynchronous Transfer ModeFluid staticsInteractive televisionFamilyStructural loadFehlererkennungError messagePlug-in (computing)Software testingConfiguration spaceSet (mathematics)Parameter (computer programming)Sheaf (mathematics)Level (video gaming)Control flow graph3 (number)Mathematical analysisWeightRule of inferenceArithmetic progressionTerm (mathematics)Graph (mathematics)Point (geometry)Default (computer science)CodePort scannerSoftware maintenanceComputer animationLecture/Conference
Transcript: English(auto-generated)
00:15
In this session, we're gonna have Carl telling us about automatic code reviews, which I'm very interested in. Carl is a
00:21
refugee from Java land and works as a system administrator. So if you could all please make Carl feel very welcome. Hi, so welcome to my talk, which is about automatic code reviews.
00:42
This is the long title, which they wouldn't let me put on the program. The general gist of this is that there's really really cool tools out there. And if people are willing to put in a little bit of effort, you can really get a lot of benefit from static analysis and warnings before they become problems in production. So it's traditional to start with a slide about me. I am a system administrator cum software janitor at Akvo.org, which means I do
01:05
apart from panicking about OpenSSL, I also do things like continuous integration and all the continuous deployment, that kind of stuff. And as a side project, I run landscape.io, which I'll come to in a second. And then email, Twitter, GitHub, blah. So landscape is
01:21
code reviews and automatic static analysis as a service kind of thing. The idea is that if it's really super easy, like you can click with your GitHub account and just log in and then kind of get all that stuff out straight away, then a lot of people will use it. And this has been going, it's sort of officially launched in November. I've been coding it for more than a year. And there's about
01:41
1,500 open source code repositories are on there right now, and they're being checked all the time. And it's free for open source. It looks kind of like this. Maybe you've seen this sort of health badge in the top corner there on some GitHub repos occasionally. So the majority of this talk is going to be about lessons I've learned while building this, specifically how it pertains to static analysis, code reviews, that kind of thing.
02:03
So, first of all, I just want to talk about what's the point of code reviews in general, just in case you're sort of not really into it. Code reviews are really cool because you can catch errors before they're problems, especially if you have new developers People who are new to the language, for example, maybe they come from a different background. For example, as said, I come from a Java background, and so my Python at the beginning was all like,
02:25
you know, sort of classes everywhere, interfaces everywhere, until a colleague of mine looked at it and went, bleh. So, and as well, like, people new to the company who maybe don't understand how you do things in this company. So that's kind of cool. But it's also, it's a really good learning experience. You can see how other people do things.
02:41
You can kind of see the things that they are interested in doing, and the way that they do things, and the techniques they've learned that you don't know about, or the other way around. You can teach them something that maybe they don't know. And then business logic errors are pretty important. Like, you don't want someone to put a 15% discount where a 5% discount is kind of the right thing. You don't want someone to accidentally, financially cripple your company, that kind of stuff.
03:00
So, like, the whole idea in general is just to kind of keep an eye on the code base, I think. And that's what, automatic code reviews can give you almost all of this stuff. So they won't give you the business logic side of things. All these tools, all these command-line tools, as far as they're concerned, an int is an int. They don't care if it's 15% or 5%. But everything else they can pretty much do, and the great thing is that it happens automatically after every code push.
03:23
If you put it into your, you know, continuous integration, you use Travis or Jenkins or whatever, then this happens all the time, and after every push you get a little bit of feedback to see what you just did, and whether it was better or worse. Which is kind of cool. And you can get, you know, PEP8, compliance, that kind of stuff. And for me, I really learned, by doing Pylint and running Pylint especially on all of my code, I really learned a few things.
03:48
And, yeah, so I'm going to come back to the sort of number to track in the end, because, yeah, I'll talk about that later. So, hands up who knows roughly what's going to happen based on this slide.
04:01
Okay, that's a fair amount. So this is a fairly common Python gotcha. You have a keyword argument with a default value, which is a list or a dictionary. You do something to the argument you get, and then you return it. So let's say you call append one with no arguments, it uses this list, it appends one to an empty list, you get a list of one. Great. You do it again, now you've got two things.
04:21
Okay, that sucks, and this is really annoying to debug. And like I say, this is a fairly common Python gotcha, but 20% of repositories that landscape checks have this error in them somewhere. So out of 1,500, yeah, 20% of those actually have this error. This is a really, you know, this is something you shouldn't do, and yeah, I'm not going to go into why, because you can
04:40
you can find that on the internet. But Python will find this for you. And interestingly, even PyCharm, which I think is a fantastic IDE and has the best static analysis of any IDE I've tried, will not warn you about this. So here's an example of how automatic code reviews can help you find bugs before they become bugs. This is another one. This is something that I really learned.
05:01
If you call logging.debug with a string, and you want to see the values of foo and bar, you will have the string with formatting parameters, and you'll kind of squash them in there, and then you will call logging.debug. The thing is, this is apparently, turns out, this is the wrong way to do it. What will happen here is that you will interpolate the string, and then call logging.debug. And if your logging.debug is, you know, if your logging level is info, warning, above, whatever, then
05:25
that logging.debug call is basically null, but you've already wasted time doing the string interpolation. And it turns out that you can pass in foo and bar as arguments to logging.debug, and it will interpolate the string for you. That's pretty cool. And I certainly didn't know that. And again, 10% of repositories do this, of the 10% of landscape checks.
05:44
Then, of course, the style guide. So, you know, maybe you work with a monster who'll write code like this. Like, this is horrible. This person probably uses tabs instead of spaces, OK? So, you know, it's nice to have this conformity in style. And again, this talk really isn't about why, but if you want that, you can run
06:01
pep8.py to enforce conformity with the pep8 style guide, style advisory, I guess you could say. So, next slide. So, up until now, what I've talked about are things that you can find by checking your code for errors. But a project isn't just the code. Especially for web applications, having old versions of requirements, for example,
06:27
it can be a vulnerability. Ask anyone in the Rails community how that can go horribly wrong. Fortunately, Django hasn't had so many problems, but there are consistently security additions to libraries and, you know, security bug fixes that come out.
06:42
So, a way you can check this is if you have a virtual environment with all of your dependencies installed, if you're on pip list-o, on a fairly newer, I think it's pip version 1.4, I can't remember exactly, it'll tell you what requirements you have that are old. And this is kind of useful. If you want slightly more involved, there's requires.io and gemnasium.com,
07:01
which will take your code, I think they only work with GitHub, certainly requires.io does, it will take your code out of GitHub, it will look at your requires.txt or setup.py, and it will tell you which libraries you're using that are out of date. And it will tell you, okay, this one's a security problem, this one's a, you know, just regular minor update, that kind of stuff. So, this is pretty important, especially for web apps.
07:22
And there are so many tools out there, like, I really don't have time to go into all of them. Pyflakes is kind of the big one that I'm not going to have time to talk about, and chances are, if you use sort of vim or emacs, you're probably using a plugin that uses flake8 under the hood, which uses pyflakes again. Frosted is a new fork of pyflakes, which is a little more maintained.
07:42
McCabe is really cool, it will, it takes your code and it kind of looks for too many branches. Oh, yeah, sorry, I forgot to mention, you see this little tiny link at the bottom that no one can see because it's too small? This is karlkrauter.com slash ep14. All of the things I'm talking about right now, there's links to them on this page, so you don't have to worry about finding these afterwards, you can just go to that page and look at all the links and everything.
08:05
So, yeah, there's all these other tools. Jedi is a pretty interesting new thing, which is, the guy who does it is doing a talk on Jedi straight after this talk, and I think it's B07, it's one of the B rooms, David Halter's going to be there doing a talk on that. Pyrom is pretty fun, it checks your set of the PY to make sure you're doing the...
08:23
You're a good citizen if you want to have an open source library, so there's so many tools out there. But these tools exist, why don't people use them, or how can you get the most out of it? Obviously, if you have a brand new project, then setting this kind of stuff up at the beginning is really cool. You can, as your project evolves, you can evolve how you're doing all of this checking and that kind of stuff,
08:43
but most of us don't have the luxury of starting a new project. You have an existing code base, you have a mature code base, and you run Pylint for the first time, and you get like a thousand errors, and you look at it and you think, oh my god, I can't deal with this. So, the way to get the most out of it is, well, step number one is understanding what these tools give you.
09:03
It's not a list of errors, it's a list of suggestions. It is like a code review. Sometimes you're going to have errors, you're going to find problems and you're going to say, okay, yeah, that's a good point, yeah, that's a bug. But sometimes you're going to find things that are more, is this right? It's like someone looking over your shoulder and going, are you sure?
09:20
Which is, you know, that's the attitude to have when looking at the output of these tools. Like, don't be intimidated by a thousand errors. That doesn't mean you have a thousand errors. The second thing is spend time tuning the tools. All of them are configurable and all of them have variable levels of, you know, how much do you care about this? Maybe you don't care about a specific area.
09:40
You can just turn it off. It's worth doing this. It really is. And I'll come back to that in a second. And then, again, measure and track over time is my next slide. This, for me, this is the most important slide of, or most important concept of this whole thing. You really need to be concentrating on how you've improved or made things worse than last time.
10:02
You can't just look at the absolute value. So, if today is better than yesterday, then you've improved and you're doing well. And that's all that matters. So, in order to do that, you need to measure and track the output of these things. So, as an example, some CI servers like Jenkins have a pilot plugin. And you get a little kind of graph over time of how many errors pilot finds, that kind of stuff.
10:23
If you have graphite or influx DB set up, which you may already have, maybe your ops team have this kind of setup and you can beg them to use it or whatever, it's really great, simple way to pass in numbers and then track these numbers over time. And if you have a nice dashboard in front of this, like Tessera or Grafana, you have this beautiful, pretty page that everyone enjoys looking at,
10:41
which gives you a little graph that goes up or a little graph that goes down. And if it goes down, you can think, okay, maybe we need to spend time refactoring our code. Maybe we need to concentrate on this. And if it goes up, yeah, you can feel good about yourself. At the very least, just do something like pipe the output into word count and see how many errors you had this time compared to last time.
11:02
If all you have is a number that goes up or down, you're still going to be doing pretty well. And then tuning. So, again, all these tools have a way to configure things. I think that the majority of people, when they try Pylint for the first time on a mature code base or all these other tools, but especially Pylint because it's the most involved one,
11:22
it's kind of the most mature one and it finds the most bugs because it actually does a whole bunch more stuff than, say, PyFlix does. You can sit there and you can tweak it and you can spend a long time doing this. And it's, again, it's really worth it. If you spend four hours digging through the config, digging through all of the manual pages and all that kind of stuff,
11:42
trying to figure out what's going on and then apply that to your Pylint RC so that the output is actually relevant to your project. You know, maybe it's four hours, maybe it's a day, but it almost certainly will save you four hours a day in the future because it'll find bugs or it'll make your project more maintainable. And that means that future problems, future features will be much easier to actually do.
12:03
So, it's worth spending the time. And, yeah, again, I don't really have time to kind of go into exactly what you can do, but, for example, if you Google around, you will find Pylint RC files that people have already created. Typically... So, the second one here, I created this a while ago because it annoyed me that Pylint would warn about things in Django
12:20
that weren't really warnings. So, if you have a Django model, you have a .objects model manager, that happens at runtime, it's part of the meta-programming of Django. So, it doesn't exist when Pylint looks at it, and therefore, Pylint will warn. It will say, my model.objects does not exist, and you're like, yeah, it does. So, you can write your own custom plugins or you can tune this with Pylint RC.
12:41
So, the benefit of the plugin approach, which takes a little more time and it's a little more involved, but you can control exactly what kind of things you can... Pylint will turn off or turn on, you can kind of say, okay, I'm going to override this specific error in this specific instance rather than having to turn off all examples of this specific error. So, if you're interested in writing a plugin, the documentation on pylint.org will help you.
13:04
Use this as kind of a reference, it's a little hacky, but it's better than nothing. And, yeah, again, spending the time up front is... It'll make things more useful, I believe. So, by this point, I hope you're all kind of thinking, wow, this is really super cool.
13:21
But if you're still not willing to spend a little time, then I suggest you try out Prospector, which is a tool that came out of Landscape. So, the original point of Landscape was, like I say, to try and make things super easy for people to get started. But there was a part of that which was bringing together all these tools, configuring them in a kind of best guess kind of way,
13:42
and adapting them slightly if you were running Django and so on. And I pulled that bit out and moved it into a separate open source project called Prospector, which is a command line tool. And it's really a layer on top of PyLint and PyFlix and Pepe and all this kind of stuff. It does its best to kind of guess what you want.
14:00
So, for example, it turns off a bunch of really picky errors by default, which you can turn on, but by default it tries to make it so that the first time you run it on a big code base, you will get something out of it which is actually useful to you, and you're not sort of swamped by little picky errors. You kind of really see the core of the problem. And, yeah, it tries to, from your requirements of text, if you're running Django,
14:25
it will try and figure that out and try and use the plugin. So, hopefully I'm going to create some more plugins for things like Twisted and other frameworks and try and improve this further and further. So, if you really, this is sort of, to steal a phrase from Python requests, it's static analysis for humans. I'm trying to make it super easy to start with,
14:41
and if you need the power, you can have it by just doing it yourself under the hood. But going from zero to now I have static analysis, hopefully this tool will help out. So, I kind of sped through this a little bit quicker than I expected. So, here is my conclusion, which is essentially, I've already said this several times, but spending time on this will really, really benefit you,
15:05
and it's about developer cost and that kind of thing. If you spend the time doing it up front, it may seem like it's a little bit like you're not working on features, but it will come back to you in the future. You will benefit from this at some point. And, yeah, most important thing is keep track of the metrics.
15:21
Keep track of what's happening from these tools. Keep track of what they're telling you, because even if you don't agree with them, if they're telling you less than last time, it's probably because you've got less errors, it's probably because you're doing better. Yeah, so, I've kind of sped through this a whole lot quicker than I expected, but anyway. So, this is the big list of links. The top one has links to all the stuff that I just talked about.
15:43
And, yeah, that's pretty much it. Yeah. Oops.
16:02
Sorry.
16:38
Yeah, so, I suppose that wasn't a question exactly, but, yeah, so, the first point was
16:58
the default argument might be intentional and how do you deal with intentional things
17:02
that the tools find that are problems that are actually on purpose. And so, yeah, adding a comment for the next developer is kind of useful, but you're still going to get that error out of the tool. And each of them have kind of ways to turn these things on or off. It's a little more involved. And one thing I'd really like to do with landscape and prospector is have a sort of interactive mode that will help you do that, because you have to dig through the configuration
17:23
and find out, okay, what is the error code and how do you turn this off. Yeah, of course, yeah, so, not just deactivating the error, but also explaining why it was deactivated.
17:49
Not exactly. So, sorry, yeah, the question is landscape doesn't actually deal with these, with the configuration files, especially for Pylint. And it's kind of on the roadmap.
18:01
One of the problems with Pylint is that you can sort of load arbitrary plugins and there's no real way to sort of do that, so I need to spend a little time figuring out which part of the config I can apply. Prospector will do its best to deal with talks.ini and setup.cfg for Pepe, PyFlix, that kind of stuff. But you're right.
18:21
Landscape, I mean, it's still a work in progress and there's still a lot to be done. So it doesn't do that for now. And it's really for people who don't do any analysis right now and they want to start rather than for people who already do it and want a way to graph it. But it's coming. How, as a relative newbie to the Python world, how discoverable did you find the testing tools?
18:50
So I wouldn't say I'm a relative newbie. I've been doing it for like three years or so. I accumulated this knowledge over the course of doing all of this stuff. So I forgot to repeat the question, which I'm supposed to do, which is how discoverable
19:03
are the tools? And I think, again, this is maybe something that needs a little work. I found out about it from word of mouth. There's no centralized thing. I think Matt, who does full stack Python, mentioned yesterday that he was thinking about sort of adding a separate section on how do you test, how do you do static
19:20
analysis, that kind of stuff, which I think would be a real benefit. So your tools are on Python testing, using C++ for Python bindings, and I'm a maintainer for the conversion scan for the code of Python development, if you have any interest
19:44
to know how you can check for these kind of errors, you can contact me. So just write me. So this can add Python above or grab me here for preference. I will certainly do that, yeah. Cool.
20:04
There's one here behind, he's hiding from you. Just a comment, there is a package that's called pip rot. Pip rot. Okay. Oh, that's pretty cool. So pip rot is a tool for, apparently, for figuring out how old your requirements are.
20:22
Yeah, that sounds pretty useful. There's a few. Yeah, I imagine. I listed a bunch of tools that I've heard of, but I'm sure there are many, many more. No more questions, then let's say thank you, Carl.