We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Deadcode - a tool to find and fix unused (dead) Python code

00:00

Formal Metadata

Title
Deadcode - a tool to find and fix unused (dead) Python code
Title of Series
Number of Parts
131
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
No longer needed code creates technical debt if it is not removed from the code base. Unused code has to be maintained, it complicates code base and increases cognitive load. It might even depend on no longer necessary dependencies with vulnerabilities and might increase attack surface. Therefore, removing dead code saves time, money and reduces security risks. Recently, Ruff has became a de facto linter, which provides almost all existing linting rules from other linters. However, it is only capable to detect locally unused Python code, which is only a tiny portion of unused code. Vulture is the best known tool for detecting globally unused Python code. However, its configuration is not very flexible and disabling false positives in a larger code base might require a lot of effort. Also, unused code detection is sometimes inaccurate, because scopes are not taken into account, when detecting unused code. This presentation introduces a new Python package called `deadcode`, which tries to move globally unused Python code detection to the next level. First, it provides a large set of options to flexibly disable various types of false positives. Second, deadcode implements more rules for detecting unused code than Vulture. Third, an improved strategy which tracks scopes and namespaces into account is being used to more accurately identify unused code items. Fourth, a --fix option is provided, which allows to automatically remove detected unused code items. In addition, an idea to prune Python code in order to reduce its size will be consider, which might be relevant when serving Python code in a browser. Lets make Python ecosystem even more awesome!
CodeState of matterElectric currentStrategy gameType theoryCodePresentation of a groupUsabilityStrategy gameMereologyState of matterComputer animationLecture/Conference
State of matterCodeSheaf (mathematics)Electric currentShift operatorBit rateChannel capacityCognitionRevision controlInformation securityFraction (mathematics)Variable (mathematics)Rule of inferenceCodeUsabilityRevision controlChannel capacityParameter (computer programming)Local ringMoment (mathematics)CASE <Informatik>Numbering schemeAttribute grammarMeta elementAdditionOverhead (computing)Social classVariable (mathematics)Fraction (mathematics)Information securityPosition operatorEndliche ModelltheorieComputer configurationRule of inferenceVulnerability (computing)Multiplication signAuthorizationMatrix (mathematics)Data miningSoftware maintenanceComputer animation
Sheaf (mathematics)PrototypeCodeShift operatorType theoryMaß <Mathematik>Computer fileTendonLattice (order)Distributed computingCategory of beingTerm (mathematics)Bit rateComputer configurationEntire functionSocial classVariable (mathematics)Function (mathematics)Sorting algorithmInheritance (object-oriented programming)Execution unitData modelEndliche ModelltheorieCodeRule of inferenceFlagType theoryComputer fileDifferenz <Mathematik>Computer configurationUsabilityInheritance (object-oriented programming)Position operatorSocial classDifferent (Kate Ryan album)Endliche ModelltheorieMedical imagingTerm (mathematics)ExpressionMereologyProjective planeCASE <Informatik>Line (geometry)Positional notationComplete metric spaceTunisComputer animation
Sheaf (mathematics)Function (mathematics)Letterpress printingSocial classInstance (computer science)System on a chipOpen setShift operatorInstallation artCodeVariable (mathematics)Positional notationElectronic mailing listCommon Language InfrastructureFunction (mathematics)Computer fileLine (geometry)UsabilityTunisCodePosition operatorDifferenz <Mathematik>Computer configurationInstallation artComputer animation
CodeSheaf (mathematics)Finitary relationAbstract syntax treeComplex (psychology)Strategy gameMereologyCodePresentation of a groupNamespaceLine (geometry)BitCASE <Informatik>Task (computing)Resultant2 (number)Sound effectTheory of relativityAbstract syntaxSocial classAssociative propertyFiber bundleWeb browserMetadataHuffman codingFlow separationAbstract syntax treeComplex (psychology)Computer animation
CodeShift operatorPseudodifferentialoperatorFreewareSineSystem on a chip2 (number)Ordinary differential equationSatelliteOptical character recognitionDesign of experimentsObject-oriented analysis and designCodeStatement (computer science)Radical (chemistry)Plug-in (computing)Dynamical systemLibrary (computing)Fluid staticsFunctional (mathematics)Electronic mailing listChainDisk read-and-write headMereologyRight angleCASE <Informatik>Product (business)UsabilityLine (geometry)Speech synthesisVariable (mathematics)Real numberSystem callMultiplication signComputer configurationCode refactoringPrototypeBitAbstractionProjective planeMathematicsDoubling the cubeLogicProcedural programmingWeb pageNetiquetteImplementationSoftware developerAlgorithmInterface (computing)QuicksortRule of inferenceSoftware crackingMultilaterationComputer animationLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
Thank you. Today I'm going to present a tool which I'm still developing. It's called .code, and it is used to find and fix unused Python code. My presentation consists out of four parts.
Firstly, I will present current state of .code detection using existing tools. Secondly, I will provide a set of features of .code package. Thirdly, I will show how to use it. And finally, I will compare two different strategies
of detecting unused code. Let's begin with definition of .code, because there might be different definitions. For me, .code is no longer used, no longer needed code,
which should have been removed, but it was not. So this code is still in the code base. And this type of unused code creates technical debt,
because it consumes cognitive capacity, because you have to sometimes understand it. It consumes time, because unused code has to be maintained. For example, when a Python version gets upgraded, you still have to update all your code base,
including unused code. And in addition, unused code might increase security risks, for example, if unused code brings in outdated dependencies, which
have security vulnerabilities. So debt code could be compared with evil agents from the movie metrics, which are waiting to cause you trouble. And we definitely want to get rid of them.
I want to share a short story of mine. Once I have bumped into a pull request, which contained a new feature, but it also contained an unused class.
Probably the author wanted to use this class at first, but he decided not to. And forgot to remove it. And I was surprised that this type of unused class has slipped into a pull request undetected,
because we were using bleeding edge linters like rough with all the rules enabled. This made me curious and led to an investigation. I have reviewed all the rules implemented in rough,
and there are more than 700 of linting rules, and found out that there are only several rules which check for unused code. And those rules are for unused imports, local variables,
and unused arguments. So it's only a tiny fraction of all possible unused code cases in a code base. So rough is not capable of detecting globally unused code at the moment.
I have searched for other tools, and found the most popular and capable linter called vulture, which is for finding that code. I tried using it, but I have faced several issues.
The main issue was that vulture provided a lot of false positives, and there were no ways to tune out this behavior. Now, if an attribute is added to a model type
the CMR or even in class called meta, this attribute has to be also added to vulture configuration.
And this adds a huge overhead in using this vulture package. In addition, this package is only capable to find unused code items, and you have to remove them manually yourself.
I tried looking into ways to improve vulture, but I have concluded that this would require too significant modification. And I have decided to create a new tool called dot code.
And this dot code tool is implemented in Python, but I think that it might be re-implemented in rough and re-implemented in Rust and integrated into rough
eventually if it gets recognized. So my tool has three main advantages over vulture.
And these advantages are, firstly, it provides a lot of different options to tune the behavior. The second advantage is that it provides slightly more unused code detection rules.
And third advantage is that it provides an option to automatically remove unused code. And let's look into more details
about these three features. So regarding tuning out the false positives, I have identified three terms which
are highlighted in this image. These are name, body, and definition. So now it's possible to ignore part of expression or whole expression based on its name, not only the name itself.
Another way to ignore code items is by inheritance or decorators applied. You can list a decorator name or base class,
and all subclasses might get ignored. So here we have a simple, short configuration example which tunes most of false positives for Django projects. First, it ignores all meta definitions.
It ignores bodies of data classes, type decks, models, and, well, in my case, it removed most of the false positives.
That code supports 14 types of checks, and it is possible to disable these checks by writing no quality assurance comment in line
or by specifying these checks per file. There is one check for empty Python files, and this check is working only for files which are not
using dundernaming. The final feature is a fix option, which allows to remove those findings from the code base.
And also, it is possible to use dry flag, which allows to see the diff which would be applied to files, but those files are not getting changed. And sometimes it happens that after a cleanup,
file becomes empty, and that code removes that file completely. Also, if it's not named using dunder notation. So how to use it?
It is pretty straightforward. You have to install that code package by running pip install.code command. And you can use CLI command by providing a list of Python files which have to be checked.
And here we have a code example which contains a lot of unused code entries which are highlighted in purple. And all of these unused code items are reported in the command line.
And if we add minus minus fix option, the output is almost the same, but we get one additional line at the end, which reports that seven unused code items were removed.
And the file itself gets updated. I want to once more highlight that it is important to fine tune false positives before applying fix option, because useful code might
be impacted as well. So you can use dry option to double check whether all findings are being correctly removed. And it is also possible to specify a single file of which
diff has to be shown. And it is possible by providing file name to the dry option. So even if a lot of files would have to be updated,
you can see a diff of a specific file. So let's move to the last part of my presentation, which is about comparing two different strategies of detecting unused code.
And in general, detecting unused code is a complex task, because abstract syntax tree doesn't provide a relation between used code definitions
and defined code. So both vulture and dot code uses abstract syntax tree, but these two packages have to create this association,
this metadata between defined and used code themselves. And let's see how the strategy of vulture package works. So this strategy is implemented in three steps.
First step is for finding all defined names. Second step is for finding all usages. And first step is for reporting those findings. So in the first step, we find two names, foo and spam.
And in the simple code example, these findings are correct. And on the fifth line, we can see that foo class and spam method is being used. So used items are also found correctly.
So in this simple case, there won't be any findings. But let's complicate this simple example a little bit by adding additional class. This class is bar, and it has a single method, spam.
And this method has the same name as method of foo class. And let's execute those three steps once again. In the first step, we'll find foo, spam, and bar definitions.
And in the second step, we'll find foo, spam, and bar usages. And as a result, nothing will be reported, even though bar spam method was not used.
This is because Vulture doesn't take namespace into account. Let's see another strategy which takes namespace into account. So the same code example analyzed
using a different strategy would find foo, spam, and bar spam definitions as separate entries. And in the second step, only foo, spam usage would be detected.
And as a result, bar spam method would be reported correctly. Well, it's important to point out that there are a lot of cases when constructing a namespace is
hard or even impossible for a usage. So in those cases, a dot code falls back to Vulture strategy and compares names without taking namespaces into account.
And having this more precise way of detecting unused code has a nice side effect, which allows to prune Python code
tree more precisely and hence allowing to reduce the size of a Python code bundle. And this might be useful when code size is important, for example, when it has to be served in a browser.
And let's draw a conclusion here that dot code package prevents technical debt from appearing out of unused code. And I hope that this package will be useful to you. So thank you very much for your attention.
Thanks for the talk, Alberto. And thanks, everyone, for listening. We can have any questions. And please line up behind the microphone. So the problem you're solving here seems to be extremely similar to or maybe identical
to essentially a rename refactoring. The algorithm would have to do, if I've got a code base and I've got a name and I want to rename that name, find the places that have to be modified. Have you considered just leveraging an existing rename refactoring implementation and using that as your detector? Or are they different?
Maybe I'm wrong and they're actually different problems, but they seem very similar to me, which is why I ask. No, I have not considered, but I will. And OK, totally other question is, can you detect unused code that's unused because of logic?
You're detecting unused names, but if I have a chunk of code that can't be reached because the code will never, ever get there under any circumstance, do you detect that as well, or is this purely a sort of name-based dead code detector? Yes, there is a rule for unreachable code, for example, which goes after return statement
or any terminal statement, so yes. Oh, so if it can be statically identified as unused, but you don't do any dynamic? Yeah, yeah, it is static code checker. OK, it's very cool. Thank you for. Hi. If you have a library, most of the API functions
will never be called, so they might be detected as dead code, but you can list them under the dunder all variable. Is dead code able to detect these functions and don't list them as dead code? If you list your functions in the dunder all variable?
I think so. I can't answer for sure now, but probably it might be a new feature which will be implemented. OK, thanks.
Hi, thanks very much for your talk. So this is also half a feature request, but if you're seeking to close the gaps that Ruff leaves, maybe it's something worth thinking about. So this is just a toy example of the toy in my head, but it's more about the general idea. So say I've got code A equals B equals 1,
like a chain assignment. I then write code that does something with A. I write some code that does something with B. But then I decide to get rid of the code that does stuff with B. Oh, sorry, with, yeah, with B. So this will still technically be used because it is part of the chain assignment, right, which is really just two assignments after another.
But clearly, I'm not doing anything productive with it. So I'm fairly confident this would go through the cracks with Ruff. But it might be worth thinking about these kinds of cases as well, because to me, that seems like dead code, even though it is technically used. Yeah, for sure. It's a good insight.
And probably it will be hard to implement those checks. But yeah, definitely it could be added to requested features. Thanks for the speech. I have a small question.
Sometimes we have situations about unused variables that we don't need. We apply the variable, but it's the value, the return value of the function. And we just don't check it. So if we delete the whole line, it
means we will delete an important function. But we need to delete only variable. And is there any way how to solve that? Yeah, I have thought about this case. Currently, the whole line would be removed.
But I am thinking about an option which would allow to slightly adjust the execution. And it would allow to remove only the assignment, the variable, but leave the function call as it was.
Thanks. Hello, these next three questions are from our remote attendees. So the first one is, what are your thoughts on classifying code that is only temporarily dead? Real world example. Customer asks us to remove a third party API.
But they ask to enable it again after one year. During this time, we have done refactors around the surviving third parties, potentially simplifying due to less moving parts. If you remove the feature, we have to redo a little bit more work to reenact it.
Do plus one refactor abstraction EDC. If we keep it alive, it costs more to do those refactors, but less time to re-enable the future. Where is the borderline? Well, probably the answer is it depends.
It would be possible to kind of not apply that code fully for those places where it is expected that code will eventually be used once again. But as a good practice, as we kind of
try to keep only working code in our code base, it would be possible to use Git to restore that removed code later. OK, here's the second question.
Why did you decide to write an own package project to solve the problem instead of adding missing, lacking functionality to rough, pile int, et cetera? May that code be someday integrated into existing linters? The main reason was that I am now
able to prototype and experiment way more flexibly and faster. And I kind of tried to imagine how modification of vulture package would look like. And it seemed that I had to add a double amount of code
than there was in vulture package. And some of those changes were breaking. So I kind of decided that it is way easier to create a new tool instead of trying to fight
the owners of vulture package. And I'm currently prototyping and trying to standardize this procedure. And when it's done, when the dead code package is stable, I'll probably try to migrate the logic into existing
linter, most likely to rough. And I saw that there is a ticket in rough GitHub page which discusses an interface for external developers
to write their own plugins. So I hope that this interface will be created and we will be able to create plugins for rough. OK, thank you.
Thank you very much.