The joy of deleting code
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 130 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/49973 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 202030 / 130
2
4
7
8
13
16
21
23
25
26
27
30
33
36
39
46
50
53
54
56
60
61
62
65
68
73
82
85
86
95
100
101
102
106
108
109
113
118
119
120
125
00:00
CodePhysical lawRight angleCodeComputer virusSoftware developerMeeting/Interview
00:26
CodeIntegrated development environmentBlock (periodic table)Module (mathematics)Parameter (computer programming)Parity (mathematics)MaizeBinary fileNoise (electronics)Presentation of a groupBitCodeIntegrated development environmentDifferent (Kate Ryan album)Task (computing)NP-hardSoftware repositorySocial classReal numberMereologyAdditionProgrammer (hardware)Open setStatement (computer science)CASE <Informatik>MathematicsComputer fileSoftware testingFunctional (mathematics)NumberSystem callComputer programmingProduct (business)Computer clusterSoftwareCuboidForm (programming)Traffic reportingMultiplication signPrisoner's dilemmaWordProjective planeSpring (hydrology)Computer animation
05:49
Binary fileCodeArchitectureBlock (periodic table)Shape (magazine)Code refactoringSocial classVariable (mathematics)Function (mathematics)Functional (mathematics)Multiplication signFactory (trading post)Covering spaceMereologyControl flowSoftware testingStaff (military)DivisorEndliche ModelltheorieCASE <Informatik>Statement (computer science)Traffic reportingLine (geometry)Forcing (mathematics)Data miningVideo gameSoftware developerRevision controlProduct (business)Machine codeCuboidResultantExpressionCodeBitComputer programmingProjective planeData managementSystem callLibrary (computing)Flow separationPerspective (visual)Software repositoryComputer fileSoftware bugAnalytic continuationMessage passingUnit testingProcess (computing)Code refactoringComputer animation
11:04
AverageModul <Datentyp>Level (video gaming)Configuration spaceCodeCommon Language InfrastructureFunction (mathematics)Attribute grammarSocial classVariable (mathematics)Set (mathematics)Field (computer science)Meta elementFamilyIntegrated development environmentFlow separationAerodynamicsFluid staticsInterface (computing)Inheritance (object-oriented programming)Communications protocolParameter (computer programming)ArchitectureSimilarity (geometry)CloningPairwise comparisonSoftwareCodeSocial classVariable (mathematics)Goodness of fitCloningUsabilityInterface (computing)Multiplication signDifferent (Kate Ryan album)Integrated development environmentMatching (graph theory)Extension (kinesiology)Computer architectureSet (mathematics)Field (computer science)Communications protocolSoftware testingPosition operatorComputer fileMobile appString (computer science)Group actionConfiguration spaceFunctional (mathematics)Hexagon1 (number)Module (mathematics)Software developerLevel (video gaming)Projective planeQuicksortServer (computing)Web 2.0Abstract syntax treeEndliche ModelltheorieAreaLibrary (computing)Staff (military)Dynamical systemGame theoryAttribute grammarProduct (business)State of matterRevision controlInheritance (object-oriented programming)Computer configurationOrder (biology)Right angleWebsiteMusical ensembleGodUltraviolet photoelectron spectroscopyComputer animation
19:14
Social classWindowLattice (order)CodeRight angleSoftware testingBranch (computer science)Computer fileFunctional (mathematics)Projective planeProcess (computing)Link (knot theory)MereologyRule of inferenceInheritance (object-oriented programming)CAN busUsabilityPhysical lawMetropolitan area networkCASE <Informatik>Drop (liquid)VotingRow (database)Queue (abstract data type)Meeting/Interview
Transcript: English(auto-generated)
00:07
Let's start with the next speaker. Please welcome our next speaker, Radzlaw, who is joining us from Poland. He is an experienced Python developer and works mostly with Django. We'll be talking about the joy of deleting code. Let's start with the talk. Over to you, Radz.
00:26
Hello. I am Radzlaw, and I will tell you about the joys of deleting code. In this presentation, in this talk, I will first talk about some different thoughts I have about
00:43
deleting code. I will tell a bit about my story, and then I will focus on the concrete methods that I used or that I recommend. Let's start. Why do you want to delete code? The code that
01:04
lives in our repositories, either it is legacy or just developed code, it might contain in some places the code that is not used. We don't want this code to bloat our software, our repo.
01:20
We just want to get rid of it. You may ask yourself why. Why do you want to delete code? Because less code is less bugs. It's as simple as that. Hard to oppose. However, also, when you delete code, you also can introduce more bugs, and when you, for example,
01:40
have multiple places when the same code is used and you make a function out of it, you might miss some things, some differences, minor differences that will introduce bugs. We need to be extra careful. Also, especially in Python, that will bite us many times, many places. Sometimes our
02:01
code is referenced. Only it is referenced in YAML files like defining classes in open API definition or code that is used by another module, which we don't see that it is used. Also, the code that is used only in a particular environment. For example,
02:23
our code coverage will show us that it is not used, but it is reused, but somewhere else. Also, it is very easy to fall into this intimidating mindset that real programmers delete
02:41
code. We don't want – additional code, unneeded code is not something that we want to have, and we should – part of our professionality is to – so that we detect and delete what we don't need. However, this statement is as true as any other statement we can coin.
03:04
It only makes us a bit sad, a bit guilty, but it doesn't change anything. Every case is different, and for example, if we have the same code in different layers of the
03:20
program, we might need to keep it because they have different purposes. If they look the same, maybe they will evolve into something other. Also, there is a wet principle that opposes the dry principle, which says write everything twice. It is also commonly said, in other words,
03:40
that duplicates start from three. That if you have two duplicates, it's okay. If it's three, okay, let's get it out. But also, yeah, there are many problems, but also sometimes you just need to speak up, because there are projects, especially with legacy code,
04:04
that you won't understand what the code does. You won't understand how the code is used. The number of WTFs per hour will be very high. The rotten code, the code where Dragon lives, maybe it would be good to delete it instead of spending months on getting how it really works.
04:28
So, sometimes you just need to do it. Just get rid of unneeded code, and whatever happens, happens. Of course, it's very careless, but for example, if you have a legacy project,
04:42
no CI, no test coverage, no things that could help you with detecting the possible unused places, you just can sometimes remove something, but test it on a test instance, not production, of course. Sometimes it's the only way to see what happens.
05:07
Also, many people don't like deleting code. They say, oh, we are going to need it, or we are going to need it again, which is the worst, because this code was used some time ago. Maybe we will use it in the future. It's a trap. Don't go for it. There's a principle. You ain't going to
05:26
need it. So, now you ain't going to need it. Maybe in the future you will, but maybe you will need it in a different form, or maybe if you are adding a code that will be used in the future, you are just bloating your work. You will not be able to finish the task in sprint,
05:45
because you added something that you think would be useful. On the other hand, if something is really going to be needed and you really know about this, maybe it would be worse to extract it to a separate library or module so that you will
06:04
see that, okay, here are some stuff we are going to need it, but it lies on the side, so we are – we know that it won't break our test coverage or something like this. It just lies here and waits for its turn, but I don't recommend this. Maybe another package or
06:25
something like this, or I don't know. Sometimes there is a situation that you have a divine – the divine force is you have a revelation how to solve something. You don't need to solve now. Maybe just put it in Gist or a file in your file system, not exactly in the repo.
06:48
Also, yes, I said a couple of times about the legacy code. The legacy code is a very difficult case, because if you like to comment legacy code, it means you need to delete it. In most
07:02
modern projects, we use something that is called Git or other version control system, so nothing is really lost. The code is somewhere there, and if we'll need it, we'll get back. Problem is commented code rots a lot more than the usual codes that surround it.
07:30
So, in the end of all these random thoughts about the code, you need to just do it, but don't do it carelessly and don't fear too much about the outcomes.
07:46
In modern projects, we use continuous integration, other tools, and they could help you spotting the possible problems, possible bugs as soon as possible. So, you just are going to do it and
08:01
see how good it feels to delete something and to get your program a bit lighter and a bit more understandable. So, a second about my story. So, I'm a developer. I worked with Python for ten years, and I changed my job and projects quite often. Also, it's counting along with some
08:30
side projects, and half of all my projects contained legacy code, and by legacy code, I mean not something written a month ago, but a year ago or five years ago, and half of the half
08:46
projects was in so bad shape, it deserved to be rewritten, but we didn't have time to do this, or we didn't have courage to do this, and that led us to constant refactoring and getting
09:03
the code better. There was a project when we refactored it for a year, and then even our manager, so, guys, it's leading to nowhere. Let's rewrite it all and do the smooth pass from the old codes to the new. So, sometimes even people, all the people around you see that you should
09:24
give up and rewrite it, but you don't see, so it's good to have a perspective, but suppose we are given a legacy project, and we need to refactor it. It is working. It needs to be
09:40
working. We don't have time, so that leads us – the first thing you're thinking of doing in such case is, at least I am thinking, is deleting code, because less code is less refactoring. What's the purpose of writing unit tests for legacy code if you know that some part of it
10:04
will be deleted? Of course, it is extremely dangerous, because you don't know the code. Part of code looks unused, but if you have any way of finding it out, of checking it in a environment, it might be good to just drop the code that looks dead and see what happens.
10:27
A friend of mine once told me that in his project, there was a thousand-line function where there were big chunks of code under if statements with contradicting expressions, so
10:48
it was just unreachable code. When he cut out of the unreachable code, the result was a hundred-line function, not a thousand. A hundred-line function is still big, but it's better than a thousand.
11:06
How do we prevent having unused code, or how do we find it? Let's grab the first and easiest way, so unused imports. It is basically a full auto approach. There are tools that can help
11:23
you isort. Isort will not find the unused imports, but it will get rid of duplicated imports and sort everything gently, and PyFlakes or PyLint will show the unused imports. This is not a big deal, not a big gain, but it will make your code more clear,
11:43
and it will be useful for next steps. So, next step is unused packages. So, here we use the previous step because here we will take advantage of it. For example, there's a simple or maybe not so simple bash command that extracts all imports from your
12:06
project and you can compare it with your requirements file or setup file dependencies. Of course, some of the imports will be imports from the standard library, so you need to know it,
12:23
but if you won't be able to find your requirements in one of the imports, it might mean that it is unused. Of course, there are some packages like development tools, which are not imported, like PyLint or PyTest. PyTest is imported, or G unicorn, which are not
12:42
imported in your code, but still there are uses. So, you need to have some knowledge, have some – know what you have in your code base, but it will help you to get the – get the obvious candidates. Okay. So, unused modules. So, modules are big animals,
13:06
so it is good – it is easy to try them, and we do this just like with packages. We see what's imported. We can do the same, grab us here, but just find only things imported from our
13:24
– our project, and also we can see code coverage, but not only test coverage, because coverage module can also track, for example, stuff used when you use the site. You can run – if you have a web app, you can run your server with coverage, and you do some things
13:42
and you see what code was actually run. So, it's good if you have – if you don't have tests, you can use coverage this way. But also here, as I said in the beginning, modules can be imported in an untrackable way, especially in Django when you have middleware which are strings,
14:04
strings with the module path or installed apps. So, yeah, you need to think about it. Okay. So, let's talk about level advanced. There's a tool called Vulture that is finding
14:26
unused code, but it is giving us lots of false positives. As you can see here, variable field sets is used by Django admin, not explicitly in the code. This is used automatically found and used by Django or imported in the settings. So, there are many problems with it.
14:45
But for some things like for classes or functions, it is – the findings of Vulture can be helpful, but you always need to think that the Vulture doesn't know everything. It just seizes at the code. Also, it is good if you run Vulture on your code to exclude the – of
15:08
course, tests, because even if your code is unused, it might still have tests that imports it. Finding and removing that code is complicated. It's better to just get your code clean.
15:25
For example, with classes, clean architecture or hexagonal architecture can help you, or using mixins, because mixins won't be used in config files. If you see unused mixin, it's probably really unused. Or if you have registered classes or just group classes somehow,
15:48
it is easier to spot the unused ones. About methods, it's pretty much the same. So, if you have – for methods, it's good to keep the interfaces or protocols. For functions,
16:05
you can separate them as small – as much as you can so that you will see the missing – the functions are unused on the coverage. And – but here you can – here and with the attributes,
16:25
there is dynamic access of get out on Python, which can prevent us from finding out this. And about class attributes, it's good to group them into data classes or named tuples so that it will help us to – if we have to remove something, we have to remove it in one place.
16:48
And for the variables, PyLink is pretty good with finding out which variables are used and which not. So, to sum up, PyFlakes finds unused imports. Mypy keeps class interfaces. PyLink
17:02
finds lots of unused stuff, and you can code coverage, clean architecture, and good practices to help you here. Okay, a quick one about duplicated code. So, as I said previously, there is dry and there is wet, so write everything twice. And sometimes it's good to
17:22
assume that duplicates start from three, because you don't need to always go for the same – always track the duplicates. About tools, there are no modern tools that could help you. There is old Clone Digger, which is pretty awesome with finding duplicate code. PyLink also finds
17:46
duplicated code, but PyLink findings are exact matches. Clone Digger analyzed syntax tree and didn't find, for example, duplicated code with different variable names.
18:01
So, we are pretty much left alone on this road. We need to write good code, keep the good practices in mind so that we won't find out after some time that we have lots of duplicated code. Yes, so we just need to find out which is used where and do a lot of
18:26
review and good practices so that it will help us detect any problems. Also, the IDE has many extensions that could track the duplicates or unused code, so just check if your IDE has anything like that. And that's all from me. Thank you. I had to
18:56
– thank you, and do you have any questions?
19:10
Yeah, it's time for questions now. Thanks a lot for your talk. It was really good. We have one question. Is Vulture a software? Yes, Vulture is a Python package,
19:23
but also it's a CLI tool. You can install it with PEEP or with your OS package. All right. All right. Any other questions? You can pop in the questions in the Q&A window. Let's see. Where's the Q&A window?
19:53
All right. We'll wait for a couple of minutes, and if no one has any questions,
20:01
you can follow up with Rad in the Microsoft Talk breakout channel. I'm posting the link to the same in the Microsoft Drive channel. Okay, thank you. Yep, thanks a lot, Rad. It was a really nice session. It doesn't seem like there was any
20:28
questions coming. Okay, I now see the Q&A questions. Okay, I'm seeing – okay,
20:48
so that's it, yes? Thank you for keeping the room. Yeah, so there's a question. Do you have a process to detect dead code process?
21:07
Um, I usually start with Vulture, and as I said, I start with looking for classes and functions in the Vulture log because it's most probable that Vulture will detect
21:27
unused class or function. Also, if the project has some tests, I run the test coverage to see
21:43
if the parts with missing coverage are really used, because sometimes it's easy to spot, especially if you are measuring the branch coverage, if you have branch coverage enabled in the coverage RC file. If you can see a big branch which is uncovered, that means that
22:07
either the tests never touched it or maybe it is really unused and you can find it. So, in most cases, it is just finding possible places where there is unused code and checking it manually
22:26
later, verifying it by a human. Thanks for answering that question. I think that was really good explanation. Okay, so if no one has any questions,
22:47
maybe we can conclude this session and you can follow up with Rad in the breakout channel named Talk Decoding Joy of Deleting Code. I posted the same in the Microsoft channel.
23:04
You can follow on that. Thanks Rad for joining.