We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Managing the code quality of your project. Leave the past behind: Focus on new code

00:00

Formal Metadata

Title
Managing the code quality of your project. Leave the past behind: Focus on new code
Title of Series
Number of Parts
112
Author
License
CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
[Wicklow Hall 1 on 2022-07-13] As developers we often have to deal with legacy projects and, at the same time, we want to keep the quality and security of our deliverables under control. As soon as we start running some linter (like Pylint or Flake8) on such a legacy project, there is a huge number of violations. To handle those issues, we might want to start by only looking at the changed files in a pull request instead of the entire project, for example by using _git diff_ _pylint `git diff --name-only --diff-filter=d`_ During this talk I’d like to push this concept a bit further and outline an approach and philosophy that can be helpful in dealing with code quality : Clean as you code. 1. What is "Clean as you code"? - Not only about violations: It can be extended to code coverage and all code metrics in general. - The quality you want to measure should be based only on recent changes. 2. "Clean as you code" matters? - It helps your team stay focused on delivering new features - It helps you deal with technical debt incrementally: Sometimes you might need to modify old code, and, at that point, you might be able to fix existing violations 3. How to apply "Clean as you code"? - Shaping a *quality gate* in order to define code quality standards for the software delivered by your team today - Using appropriate tools (like SonarQube)
Machine codeGoogolHydraulic jumpCodeSoftware developerProduct (business)Computer animation
SoftwareSoftware developerMachine codeCodeMereologyMathematical analysisCodeOnline helpSoftware maintenanceFluid staticsSoftware developerFocus (optics)Line (geometry)Electronic mailing listCategory of beingProjective planeView (database)Position operatorObservational studyCASE <Informatik>Multiplication signBit rateSineMetropolitan area networkState of matterFreezingOpen sourceSoftware testingRadical (chemistry)Computer animation
Line (geometry)Machine codeEmailFormal grammarSoftware developerLinear regressionCartesian coordinate systemLengthFunctional (mathematics)Mathematical analysisSoftware maintenanceRule of inferenceMessage passingConfiguration spaceCodeGroup actionBitBoilerplate (text)SoftwareSoftware developerModule (mathematics)Projective planeSlide ruleError messageLine (geometry)Product (business)Linear regressionDirection (geometry)Traffic reportingCommutator40 (number)CASE <Informatik>Social classSoftware testingReading (process)WebsiteCAN busPhysical systemIdentical particlesRoutingOffice suitePosition operatorComputer animation
Focus (optics)Rule of inferenceCodePower (physics)Mathematical analysisBitMessage passingLattice (order)OvalDoubling the cubeExecution unitProduct (business)CASE <Informatik>Source code
Machine codePivot elementAlgorithmWritingConfiguration spaceSoftware testingRight angleUnit testingProduct (business)Revision controlFunctional (mathematics)Point (geometry)Thresholding (image processing)Numbering schemeCodeBitBranch (computer science)Term (mathematics)FrequencyLevel (video gaming)Integrated development environmentProjective planePosition operatorContrast (vision)Row (database)Key (cryptography)Message passingAnalytic continuationHypermediaQuicksortComputer animation
MIDIRegular graphVariable (mathematics)Dynamic random-access memoryComputer iconArithmetic meanMessage passingProduct (business)Different (Kate Ryan album)Line (geometry)CASE <Informatik>Compass (drafting)Projective planeComputer fileString (computer science)CodeQuicksortSource code
Regular graphBoom (sailing)Ring (mathematics)Inclusion mapCloningCodeFocus (optics)Message passingMathematicsCASE <Informatik>Product (business)Moment (mathematics)Line (geometry)Self-organizationFlagModule (mathematics)Configuration spaceNetwork topologyUtility softwareHash functionBranch (computer science)Computer fileDifferenz <Mathematik>Source code
CodeFocus (optics)Machine codeSoftware developerMathematicsSoftware testingMetric systemLinear regressionCodeCASE <Informatik>Open sourceProduct (business)Traffic reportingComputer fileCode refactoringLine (geometry)MereologyProcess (computing)Functional (mathematics)Software developerLevel (video gaming)BitType theoryMultiplication signArithmetic meanQuicksortSanitary sewerProjective plane
AverageMachine codeProjective planeGroup actionMathematical analysisElement (mathematics)InformationFocus (optics)Numbering schemeMereologyCodeLine (geometry)Point cloudEntire functionComputer fileImplementationGraphical user interfaceRevision controlQuicksortConfiguration spaceTypprüfungExecution unitLevel (video gaming)Message passingException handlingCondition numberFluid staticsRule of inferenceSoftware bugBranch (computer science)Cartesian coordinate systemRobotThresholding (image processing)Common Language InfrastructureTraffic reportingMoment (mathematics)Different (Kate Ryan album)Presentation of a groupWebsitePressureDemosceneMathematicsCASE <Informatik>Open sourceKey (cryptography)Phase transitionTorusService (economics)Matching (graph theory)Special unitary groupRight angleLink (knot theory)1 (number)Inequality (mathematics)Thermal conductivityThomas BayesBusiness reportingFood energyWordPoint (geometry)Observational studyProduct (business)Lecture/ConferenceComputer animation
Hill differential equationCodeRevision controlMereologyCartesian coordinate systemPoint (geometry)CASE <Informatik>Lecture/Conference
MereologyCartesian coordinate systemMathematicsRevision controlMeeting/Interview
MathematicsCodeProduct (business)Standard deviationPresentation of a groupMultiplication signLecture/Conference
Product (business)CASE <Informatik>MathematicsSoftware bugLecture/Conference
Lecture/Conference
Functional (mathematics)MathematicsProduct (business)MetreEvent horizonHypermediaTypprüfungCombinational logicCodeParameter (computer programming)Entire functionLecture/Conference
Graphical user interfaceInterface (computing)BitCodeLecture/Conference
Directed graphCodeComputer fileBitModule (mathematics)Functional (mathematics)Lecture/Conference
CodeCASE <Informatik>Message passingConsistencyRow (database)State of matterNumbering schemeWordAreaMassLecture/Conference
Lecture/Conference
Transcript: English(auto-generated)
So today, we're going to talk about clean code and about code quality. So the first thing that I would like to ask you, can you please raise your hand if you use some code quality tools in your daily job? OK, so that's good. So we have quite a few people.
OK, first, I introduce myself. So my name is Andrea. I work as a software developer for a company called Sonar. So at Sonar, we develop products that help developers to write clean code. And myself, I'm one of the maintainer for the static analyzer for Python.
So what we're going to see today, so we're going to start by seeing what is static analysis and why it's useful for developers. Then we're going to see a few problems, a few challenges that we might face when we use code quality tools, especially on big projects.
And in the second part of the presentation, we're going to see an alternative methodology called clean-as-you-code that really help you to keep the focus on the new code. So we're going to explain what do I actually mean with new code in this context. And we're going to see together a few techniques
to put this in practice. So what is static analysis? Static analysis basically is the ability to analyze code, so to find problems or any way to understand your code. Without actually executing it. So this is a technique that is often used by code quality tools, or even by tools
that you probably use daily, like Black, like MyPy. And this can be really useful to help developers to write clean code. And clean code is code that it's first maintainable. So of course, we want code to be readable. And we want it to be easy to change for our colleagues
and for ourselves, of course. Then we want code to be correct, so we don't want to have any bug on it. And finally, and of course, we want it to be correct, and we want it also to make sure that we write tests so that our code is testable and tested. And finally, so we want the code to be secure,
so we don't want to have any vulnerability in it. So why static analysis is useful? So in this code nipet, so you might not realize that at line 203, we have actually a small problem. And in this case, so we are using X10 API of a list.
However, X10 API is mutating the list in place. So it's not returning a new list. So in this case, probably the developer forgot about the property of this API. So I think that this kind of problem can be really difficult to actually spot in a manual review. So I think the static analysis tool are really
helpful to find this kind of problem. So today for our use case, so I'm going to use a project that I like a lot. And it's an open source project. It is called FlaskBB. So FlaskBB is already a good size. So it's a 20k length of code.
And so it's used. It's written in Python, and it uses Flask. And so probably you already encounter it online because it's often used to build forum software. And so why FlaskBB? Because so first, indeed, they have already a good size.
So they already have in place some static analysis. And for our presentation, so we're going to assume that we would like to add PyLint. So PyLint, for those of you who don't know, is a very, very popular linter in the Python ecosystem. And we're going to use PyLint also, because sometimes it's known to have a lot of rules
and to have a lot of messages. And so it can be quite a challenge, actually, to use it on an existing and big project. So let's suppose that we want to do that. So we want to add PyLint to the CI configuration of FlaskBB. So for this presentation, we're going to use GitHub Action.
So we have some boilerplate code here, but it's not really important. So what actually matters is this last line. And it's as simple as that. So we're going to run PyLint on a FlaskBB module. And then, so let's suppose that we do that. We open the first commit, the first pull request on FlaskBB.
And when we do the first analysis, so we're actually going to see that. So we're going to see a lot of messages. So you see that even on a one-band team project like FlaskBB, we can actually still have some reporting from static analysis. And in this case, so I think that we
might be a bit confused. Also, we might not know where actually to start. And also, because in this case, we have 1,800 messages. And if you compare it to the overall size of the project, that it was 20k lines of code, it's actually it's already a quite high ration.
So here, we have a small recap of all the messages that we saw in the previous slides. So you see that we have a lot of convention that are not respected, some refactoring suggestion, some warning, and a lot of errors as well. So you might see that. And so probably, the traditional,
so the reflex that we developers might have would be to try to go there and fix as many problems as possible. However, there are a few challenges if you want to do that. So the first challenge is, the first challenge
could be actually, if we work in a company, for example, it might be difficult to actually to find budget to spend days or to spend weeks to work fixing on old code, so on old problem. And also, so if the application that we are maintaining is already in production,
so we have to be anyway pragmatic here and say that anyway the application is working is already in production. So maybe it has some shortcoming. However, those are probably already handled by, if in your company, there is, for example, a support team or the quality assurance team,
maybe they know already some work around to actually to fix existing shortcoming of the application. So this is all to say that if you go there and change old code, you have actually quite a high risk of functional regression. So you really have to be cautious if you want to do that.
And finally, if you suppose that maybe you are a new maintainer on this project, so it's not the code that you wrote yourself, maybe it has been written by someone who already left the company, so probably you don't really understand the code very well.
So it can be, I think that we as developer might have a bit of pushback towards the old code. And it can be, I think, a bit boring also at the end to try to go there and fix all the existing issues. And by the way, so what happens usually with PyLint
is that so to reduce all the messages, something that can be done is to disable all the existing rules in PyLint and to just enable a few of them. However, I mean, this solution can of course work. However, I think it's not the optimal solution
because in this case, we are actually limiting the power of PyLint itself, so from static analysis. And I think we should aim for better tooling that really keep the focus on really on the new code. So it should change a bit the focus of those messages.
And also, so we talked mainly about messages, but let's not forget also about unit test and about coverage. So on FlaskBB, it's already very good that they monitor the unit test coverage of the entire project. However, in this case, so we have 44%.
So let's suppose that maybe at some points we decide to have, that we would like to have a threshold of 80%. So does that mean that we have to write test for days and for weeks to reach the 80% on all code? So probably we don't want to do that.
So you see that if we see the entire project, so we have a lot of messages, a lot of errors, the coverage is not at the right level probably. And probably this can be a bit confusing, a bit difficult to handle. So what I would like to suggest here today is an alternative approach to handle this kind of problems.
And so what I would like to suggest here is really to keep calm and to focus on new code. But what is new code actually? So let me clarify that. And so the easiest way to think of a new code, so probably the most intuitive way
could be really to think about a new commit and a new pull request that you're actually writing on the main branch. However, we can extend a bit the concept and we can think it also in terms of number of days. So for example, if you're working in an agile environment,
so you might think that actually the new code period might be the duration of the sprint. So it could be maybe three weeks or four weeks. And if we push this concept a bit further, so we can also, if the product has some semantic version,
so we can think that, so if we already deployed, for example, the version 1.0 and we are working on features for the 1.1, so the new code that we want to focus on is really the duration between the 1.0 and the 1.1. Well, however, for my presentation,
I'm gonna pick the simplest approach. So to think of all new code as being just a new pull request. And for example, let's suppose that I want to add a new functionality on Flaspb. So in this case, I'm gonna add the quicksort because it's an algorithm that I really like and for whatever reason, I might need it to be in Flaspb.
I'm gonna add it to cli-utils module. And of course, I'm gonna add also some test because I want to make sure that my code, my algorithm is working. So let's suppose that we do that. So we have the CI configuration in place.
And again, so I'm gonna see all those messages that come from pylint. And so here, if you look really carefully, you're gonna see that we have indeed the Flaspb cli-utils module line. And you see that at line 1,300 something, so we actually spot those three new messages
that are actually relevant to the new code that I just wrote. So in this case, it's about some missing conventions. So about, I didn't write the lock string for the quicksort method. The variable naming is not respecting some naming convention and also having one unnecessary as.
So you might imagine that this approach is not really scalable. So it's not really scalable to actually do, to dig in the log to actually find the new messages for each pull request that I want to open on my project. So I think that we should strive for something better.
And so the first approach that we might try to have is to say, okay, so I just want to analyze the file that have been changed in my pull request. And how can I do that? So the easiest way would be to, so here we are again in the CI configuration.
So we're gonna change here again the last line. And we're gonna use, we're gonna run pylint on git diff. So git diff is a utility command that tells you basically what's changed between two commit hashes. And in this case, so we're gonna see what's changed between the main branch and the feature branch.
And we're gonna use also the flag name only to actually to get only the file names that have changed. And so if you do that at the next analysis, so we're gonna have this, that I think it's already better. So you see that we only see the FLASB BCLI utils module.
And in this case, so it's already much easier to spot what are the new messages that are relevant to my code. And are for the same tree that you saw before. However, I think that I'm still kind of bothered
by the fact that I still see old messages on the old code. Because here I'm anyway, I'm analyzing the CLI utils module. So in this case, I think the file was pretty small. But if you imagine that the file is longer and this can actually happen quite often in projects.
So we may end up in a situation that it's quite similar to what we had before. So I think that what we would really like to have is really to have the focus on the changed lines. But first, let me pause for a moment and let me formalize a bit better what I'm trying to do here. So this is the clean as you code methodology.
The clean as you code methodology is really something that it has at its core, the fact that the quality should be measured on recent changes. And why is that? Because recent changes are actually, is the more risky part of your code.
Because I mean, you don't know anything about the new code that you're adding to production. So you really want to make sure that the quality is really high. And that it's of course tested. And so you, and what also matter, and of course also it's important
because the new code is something that you actually own. So it's something that you probably understand. And that you have a strong sense of ownership on it. And in this case, I will repeat that we want to focus on new lines, so not on new files.
And we want also the metrics not to be on the entire project, but to be on the actually on the change, on the new code. So on the change files, on the change lines of the product. And let's see again. So if we see the recap again, from the paid-in reporting.
So we see that with the clean as you code approach. So we would like indeed to have only two conventions. So the two convention that we saw before on a quick source and only one refactoring suggestion and zero warning and zero errors. And if we see again the problems,
the challenges that we mentioned before, so where to find budget is not a problem anymore because in this case, so I'm working on a new feature. So it means that I already have budget. So to fix this kind of problem is really part of the job. So you don't need a dedicated budget to work on that.
Functional regression. Okay, in this case, you always have the risk of functional regression. However, I think in this case, if you focus only on the new code, it's quite limited. And anyway, you're gonna have the code quality tools that will help you to make sure that the quality level is high.
And finally, we talk about the developer pushback. And in this case, I think it's not the case anymore because it's the code I just wrote. So in this case, I feel a sense of ownership on this code, and I actually understand it. So let's see how to put this in practice.
So we saw before that it was possible to analyze the changed files. So we would like to analyze in this case the changed lines. And to do that, I'm gonna suggest two alternative solution. So the first one is gonna be based on an open source tool that is called Darker. And again, have you heard of Darker before?
Can you raise your hand if you did? Okay, so quite a few of you. So that's interesting. And so Darker is a project that is often used in conjunction with Black, the code formatter. However, it can be used also in conjunction with PyLint or with Flak8 or any static analysis tool.
And the concept here is that, so we're gonna simulate that. We're gonna introduce Darker in our CI configuration. And so in this case, so we're gonna run Darker on top of PyLint analysis. So we're gonna run again the PyLint Flaspb.
And in this case, so we're gonna tell Darker to do a difference. So in this case, it's gonna be implicit between the current branch, so the code where Darker is run, and the main branch of the application. And so if we do that, so you see that at the next analysis, we only see the three messages
that are actually relevant to our code. So I think this is really the fundamental step in the cleanly-as-you-code methodology. So now we have actually the focus at the right level. We are not bothered with all the old messages. So the second solution that I would like also to outline here is based on Sonar.
So this is the company we work for. And at Sonar, actually, we strongly believe in the cleanly-as-you-code methodology, and we have it in our products. And here, so we're gonna use the cloud version of the Sonar solution, and it's called Sonar Cloud. And again, here, we're gonna modify slightly
the CI configuration. So on Sonar Cloud, it has its own GitHub action. And in this case, so we're gonna modify the PyLint analysis to, in this case, to produce a report. And this report has to be parseable.
And Sonar Cloud then will be fed with the report coming from PyLint. And if we do that, so you see that in the request, so here we are in the GitHub UI. So we're gonna see a message from the Sonar Cloud bot that tell us that it founds three code now and one bug.
That are actually only relevant to the NUPO request. And if we click on this link, so we're gonna end up on the Sonar Cloud UI. So in this case, it's not a CLI interface, it's more the graphical interface. And so we see that here,
you might recognize again the same three issues that come from PyLint about the convention that are not respected and the unnecessary else. And also, so on Sonar Cloud, you get also this dashboard
that is relative to the pull request. So it's not on the entire project, but it's just on the pull request that we just opened. And you see that, so we have a failed condition. So it means that the quality is not good enough. And also, I would like to draw your attention to the coverage number. So you might remember that the entire coverage was 44%.
However, in this case, we see a number that is much better. So it's 75.9%, because it's only relative to the 17 new lines that are part of the quick sort implementation. So it's not on the entire code base.
And also, we get the information that if we merge this code, so we're gonna have the 44.7% on the entire product. So I think that this information is more actionable. So it means that I just have to write a few unit tests more to actually have the right level of the tool to actually to get the threshold of 80%.
And actually, so we see that we have also one more bug that has been detected by Sonar Cloud. Why is that? It's because Sonar Cloud, so if you use Sonar Cloud, you have also the benefit of running its own rules. So on top of the pydint.
And in this case, so Sonar Cloud is able to find that there is actually a bug, because in this case, I forgot to add the race keyword on this exception. And it means actually that if the array is known, so we're gonna continue executing the code. And at some point, we're gonna access the first element of the array to get the pivot,
and we're gonna have a type error. So I think it's a very useful information to have. So let me recap for a moment. And so you might remember that the title of this presentation was about to forget about the old code. However, this was kind of a lie,
because anyway, when we have to modify to add new feature, we have actually to touch old code, so to modify the existing code. And I think that at that moment is the right opportunity to go there and clean your existing issues.
And here, so if we assume, so we just made an assumption that in one year, maybe you will need to touch 20% of the existing code base so it means that the 20% of the code base will be clean, even on old code. And why is that? Because you put in place this CI check to make sure that the quality is good.
And again, if we assume again that in five years, maybe you're gonna modify half of your existing code base, it means that you clean incrementally half of the code base. So with the clean as you code methodology, you actually get also this benefit of cleaning incrementally even the old code.
So let's wrap it up. And so let's recap quickly what we saw today. So we saw that to handle quality reports, especially on big project, might be quite hard and challenging. We saw together the clean as you code methodology.
So we saw the first approach that was based on git diff, so that is only based on the changed files. And later we saw a more advanced approach that it's based on darker and so on, that it's only on the changed lines. And finally, we saw that thanks to the clean as you code methodology,
you also get a way to clean incrementally the technical depth of your project. That's all what I wanted to say, and now I'm ready to take questions.
Okay, thank you very much for a very nice talk. So my question to you is that, you saw there is an old code and I want to make it very good. But what if there is a technical tab?
One example, I'm upgrading the Python version here. Okay, for example, let me take a worse example. I'm moving from 2.7 to three, right? So can it work or like, can you suggest, how can we do it in a better way?
Well, I think that what we should strive to do, I think that, well, maybe in this case, it's a bit complex, but we should really try to have the smaller increment as possible. So even if we want to change the Python version, maybe we might think to do it only on a small part of the application.
And I think that we, so I would try to any way to use the clean as you code methodology only on a small part. So even the change to the new version, I think that it has to be incremental in nature, maybe to be easier. Okay, okay, thank you. No problem.
I have a question because when we apply changes to the new code, like kind of new style, wouldn't it be like clumsy that you have some code in the standard, this, you know, conventions
and yeah, new code in new convention that could be confusing if you read the code. So how to do that, how we approach that? Yeah, it might be confusing. I think it really depends on the needs of your team. However, I think that, I mean, to try to change
all the entire product is probably quite challenging. So I think it really depends on the use case. So if we're talking about style, maybe it makes sense to try to do it for the entire file or for the entire product. However, if we are thinking more about potential issues,
about bugs, I would go instead for the incremental approach.
Have you spotted any disadvantages of using sonar instead of darker, for example, or advantages that you can find?
Okay, so one advantage that I might think of if you use sonar is that, so on sonar anyway, there is the concept of the entire product anyway. So even if you change the new code, so it's able somehow to know that,
so if you imagine that you change when the function definition, I don't know, you add one parameter to a function, for example, so it will be able to actually spot that even the old code that is coding this new function has actually an issue, because in this case, you're gonna have a type error. So it will report it even if it's not technically
a new code, however, it's a change that it's related to something that you actually modified in the new PR. And I think that with the combination pydint plus darker, it's not possible to do that. And then maybe, I mean, I think also it depends
if you prefer the CLI interface or the graphical interface, maybe also can be a pros or cons of using darker versus sonar. Okay, and thank you a lot for your attention.
There's one more. Sorry for that, sorry. My question is, of course, the reason why we want to do it incrementally on new code is because changing everything would be too much. I can see that the downside of that is now you incrementally have two camps in your code base, one which is the old style of doing things, which is a bit more lacking in conventions and idioms,
and then when someone new comes and looks at this code base and looks at different modules that are written differently, can you see that that's still a good price to pay in exchange for that major hurdle of having to fix all of those hundreds of pydint issues in one go? Like to onboard new people and getting someone new
exposed to the code base and seeing that some files have even some function in the same file are written following modern pydint conventions and then everything else is a bit weird. Yeah, I know, I know. I think that, I mean, it's not a silver bullet, right? So I mean, I think that we have to be pragmatic somehow and say that, I think that one approachable way
to actually handle all these kind of messages is really to focus on new code and live with the fact that we might have some inconsistency in our existing code base. But again, I think it really depends also on the needs of the team. So maybe in some case it can be better.
So if we think for example, for a code formatter, maybe it makes sense to do it on the entire project. So you have the same style everywhere. But yeah, I think it really depends on the effort also that is needed to actually handle this technical debt, so thanks.
No problem. Thank you. Thank you.