Clean code in Python
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 135 | |
Number of Parts | 169 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/21182 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 2016135 / 169
1
5
6
7
10
11
12
13
18
20
24
26
29
30
31
32
33
36
39
41
44
48
51
52
53
59
60
62
68
69
71
79
82
83
84
85
90
91
98
99
101
102
106
110
113
114
115
118
122
123
124
125
132
133
135
136
137
140
143
144
145
147
148
149
151
153
154
155
156
158
162
163
166
167
169
00:00
Software developerSoftwareSystementwurfArchitectureComa BerenicesEmailCodeCoroutineFormal languageMachine codeLogicFlow separationAbstractionMathematicsDifferent (Kate Ryan album)Projective planeCodeFunctional (mathematics)Software developerMathematicsSoftwareConcurrency (computer science)Line (geometry)Self-organizationOrder (biology)Forcing (mathematics)Formal languageCross-correlationMereologyAuthorizationSoftware bugEndliche ModelltheorieError messageBookmark (World Wide Web)Condition numberOpen sourceLogicSet (mathematics)Slide ruleAbstractionStatement (computer science)Arithmetic meanProgramming languageImplementationValidity (statistics)Multiplication signExtension (kinesiology)Revision controlSoftware maintenanceView (database)TheoryExecution unitVideo gameMoment <Mathematik>Water vaporSinc functionPerformance appraisalDialectNumberData structureDirection (geometry)Reading (process)Physical lawFocus (optics)System callStructured programmingParameter (computer programming)CollisionMilitary baseParallel portMassUniform resource locatorLecture/ConferenceMeeting/Interview
06:43
CodeInterior (topology)Function (mathematics)LogicProjective planeSoftware frameworkFunctional (mathematics)QuicksortElement (mathematics)OntologyMultiplication signLibrary (computing)Price indexCodeSoftware developerSoftware design patternOrder (biology)Arithmetic progressionXMLComputer animation
07:39
Exception handlingDatabase transactionDatabaseCursor (computers)Order (biology)Interior (topology)DatabaseMereologyException handlingSequenceLogicTable (information)System callFunctional (mathematics)File archiverLine (geometry)Cursor (computers)Subject indexingProper mapError messageRow (database)Task (computing)CodeSoftware maintenanceSoftware frameworkLinearizationArithmetic progressionMoving averageFundamental theorem of algebraData structureConstructor (object-oriented programming)Uniform resource locatorWordGraph coloringMathematicsFlow separationComplex numberBlock (periodic table)Set (mathematics)Sound effectComputer animation
11:25
ImplementationAbstractionLogicContext awarenessMereologyTask (computing)ImplementationCore dumpEscape characterLogicQuicksortDifferent (Kate Ryan album)Functional (mathematics)Computer animation
11:57
Key (cryptography)Social classImplementationLogicWeb 2.0Point (geometry)Connected spaceOnline gameImplementationIntegerMatching (graph theory)Operator (mathematics)PlanningDigital photographyComputer animation
12:58
Point (geometry)Category of beingKey (cryptography)Social classAttribute grammarObject (grammar)WritingElement (mathematics)Variable (mathematics)Category of beingNumberObject (grammar)Product (business)Data storage deviceOrder (biology)CodeView (database)Web 2.0Point (geometry)Connected spaceAttribute grammarPatch (Unix)Regular graphType theoryNeuroinformatikSoftware testingAreaStrategy gameSocial classKeyboard shortcutImplementationArray data structureCASE <Informatik>Computer animation
14:39
Category of beingAbelian categoryProduct (business)CountingStatement (computer science)BitSystem callProduct (business)CodePurchasingLine (geometry)FamilyComputer animation
15:13
Product (business)Element (mathematics)Social classCategory of beingAbelian categoryState of matterFunction (mathematics)Task (computing)Complete metric spaceSearch algorithmCodeSocial classOcean currentProduct (business)State of matterFunctional (mathematics)Task (computing)Parameter (computer programming)Term (mathematics)PreconditionerCASE <Informatik>Interface (computing)Public domainCore dumpServer (computing)FamilyProcess (computing)Computer animation
16:15
Context awarenessSocial classCodeWordImplementationKolmogorov complexityLogicPauli exclusion principleFunction (mathematics)String (computer science)Software testingExecution unitMereologyInformationMachine codeComplete metric spaceWeb pageBackupStandard deviationData managementCodeCASE <Informatik>Interface (computing)Data structureFerry CorstenService (economics)DatabaseContext awarenessFormal languageUnit testingOnline helpFunctional (mathematics)Integrated development environmentInformationObject (grammar)Continuous integrationUniversal product codeVariable (mathematics)Order (biology)AdditionCore dumpOpen sourceSoftware testingMetric systemStatement (computer science)Axiom of choiceProjective planeServer (computing)Software maintenanceFeedbackExecution unitException handlingConnected spaceError messageMereologyDisk read-and-write headLogicSeries (mathematics)TouchscreenComplex (psychology)Inheritance (object-oriented programming)Block (periodic table)Right angleMultiplication signQuicksortCausalityWritingFamilyEvent horizonSystem callPlanningEndliche ModelltheorieWeightDifferent (Kate Ryan album)Streaming mediaP (complexity)Computer animation
21:03
Multiplication signMedical imagingLecture/Conference
Transcript: English(auto-generated)
00:00
Luke, this is Mariano Anaya, he's going to talk to us about clean code in Python. Let's hear a big applause. Hello everyone, thank you for coming. Let's talk about clean code in Python,
00:23
and software quality in a favorite programming language. First, a bit about me. My name is Mariano, I work at Anapsys as a software developer. I'm interested in open-source technology, software architecture, high-level design, Linux, and Python of course. Feel free to contact me or reach me by any of these means if you're interested in talking about some of these concepts after the talk.
00:44
So, before we start with definitions, I'd like to make a few comments first. The code in the slides is written in Python 3, but there should be any problem, no problem at all if you're using another version of Python, so don't worry about that. Second, what I'm about to tell is by no means something strict or rigid,
01:06
or you must implement, instead there are some ideas or guidelines, some of them are opinions, so you might think otherwise, and that's perfectly okay. So, in concurrence to that, we can say that there's no sole definition of clean code,
01:20
and instead you will find as many definitions as developers and authors available out there. So, let's try this one which says that clean code is one in which every function does pretty much what you would expect, and that you can call it beautiful code when it also makes it look like the language was made for the problem. And the reason why I picked this quote is precisely because of that last statement,
01:41
because that's what we call Pythonic code, or code that is idiomatic in Python. And we'll see examples of that and how we can achieve that. So, to have a common ground of understanding, we can say that clean code is focused, which means it does one thing well, and that thing that it's doing
02:00
should be pretty much what you would expect, so the code should not be misleading, or error prone, or confusing, instead it should be clear. And this is important for many reasons, because arguably the quality of the code will determine the quality of the software. We all know there is a strong correlation between a poor code base and a software that has a lot of errors, and it has to maintain.
02:22
Whereas the opposite is also true, if you can maintain your code clean, readable, and understandable, it would be much less likely that there are errors and problems in the code, and if there are, it would be easier to spot. So, readability counts, as we know from a set of Python, and it makes total sense, because if you think about it,
02:42
as developers we spend much more time reading code than actually writing code. Whenever we want to make a change, or add a new feature, we first have to read all the surroundings of the code we're going to modify or extend. And the extent to which you can read the code and actually tell what it's doing is what ultimately will determine or define how fast can we ship
03:02
new changes in the code or new features, so it's related to Azure development. We all know that pre-use messes, it slows you down, and deprives you from shipping new functionality faster. Last, we can mention that the code is like a blueprint, it's like another model that you have where you should represent the business logic and the requirements of what you try to do,
03:22
so it has to be readable, so it's useful. On the other hand, we have some scenarios where we are exactly the opposite to what we want. For example, complex obfuscated code, code that is misleading, or that has misdirections. Duplicated code is like the worst thing we can have in the project, and code that is not intentionally revealing,
03:40
and that instead of revealing the business logic or the business requirements, it's exposing implementation details, which should be encapsulated or obstructed. And this is all part of technical debt. There are many ways of technical debt, but having a poor code base is arguably one of the worst, and to make things worse, technical debt is also invisible.
04:02
It's not only negative for the project, but it's something that is sometimes hard to spot or identify. So, let's try to see some examples of this with Python code. The very first example is something really simple, which is speaking about meaning in the code. Let's consider this function that, given a year,
04:22
it should bring one line per day of the year. You think that it works, and yes, it does work, it does what it tries to do, but now suppose like, I don't know, six months, a year elapsed since you first wrote it, and you find yourself trying to figure out what it's trying to achieve or trying to do, and you see that it's trying to do some calculations
04:42
and see if the year is divisible by some numbers, and you cannot actually spot what it's trying to achieve, but you find that if that condition is met, it's adding an extra day. So you say, okay, maybe that's trying to figure out if the year is asleep or not. But that's the problem. The fact that you have to guess is the problem. You shouldn't be guessing. The code should actually be telling you what it tries to achieve.
05:02
So if I want to know if a year is sleep or not, I'd rather have a function, let's call it sleep even a year. So I can read the code, and I can actually replace that anonymous statement with something meaningful in the code. This is a very simple yet powerful thing you can do in order to increase the availability of your code,
05:21
because it's not about reusing code. It's about separating concerns, differentiating different problems into different layers, and having an organization in the project. Remember that functions are the first line of organization in any project. Functions do one thing, one thing only, and do it well. Starting from this very simple example,
05:42
we can say that it's actually related to code duplication, because if you think about it, most of the times, code is duplicated because it didn't have a proper abstraction or a name for it. So you might say, okay, let's say we have a validation in some part of the code, and we say, okay, I need to add a similar validation. So someone might say, okay, let's copy this line from here to here,
06:03
paste it here, let's change this number, the one by two, I'm all set. But actually not quite right, because maybe you introduce some duplication in the code inadvertently. And the reason why that happened is because it didn't have a name, it didn't have a proper abstraction. We all know that we want to avoid code duplication at all costs,
06:23
because duplicated code forces you to do parallel changes. You have to change things in many, many places in the software at the same time. And if you forget one of those, you have a bug or there is a problem. So instead, we don't want duplication, and you can remember the TRI principle,
06:41
the TRI acronym for this, which stands for Don't Repeat Yourself. And the things must be defined once and only once in the project, in order to be efficient in the work. So this has been the sought-after principle in software development over the past few years. And in that regard, there have been many enhancements of progress.
07:01
For example, libraries, frameworks, tools, design patterns. Those are great, and it's usually a good idea to have those in mind. On top of that, we can say we have a so-called extra tool when it comes to Python, which are decorators. I will not explain all the details about decorators, because it's to extend the topic,
07:21
and it's worth a talk on itself. I will only mention what's relevant for the purpose of this talk, which is addressing code duplication. The general idea is that you can have some functionality abstracted in one place, but repeated or reused many, many times. Let's see this with an example. So let's say you have a maintenance database task
07:42
that is part of a framework, called like this. And the high-level idea is that it is made out of a sequence of commands, and these are going to be executed. So the first task is to update the index of a database in Postgres. So I have a sequence, which is a constant of one command,
08:02
and the logic follows like this. I execute every command in the sequence with the cursor which is provided. If there is an error or something went wrong, I log the exception and return minus one. Otherwise, I log that work fine and return zero. So far, so good. Let's assume that these are the valid goals that the framework requires,
08:21
but then another requirement arises, and says that I need to move some records from a table to an archive table to only leave the most recent rows in the table. So I say, okay, I can do that in two commands, two SQL statements, one for inserting the rows into the archive table,
08:41
and then a second one for deleting the affected rows from the main table. But then, as part of the framework and the logic, I have to preserve the exact same logic. So I can do something like, okay, for every command in the sequence, I execute it with a cursor which is provided. If an exception occurs, I log the exception and return minus one.
09:01
Otherwise, I log that work fine and return zero. This actually also works, but now we see the problem. The problem is that I have to preserve the logic, but this is exactly the same code as before. Starting from the try except block, it's the exact same lines as before. The reason why that might happen is because those lines
09:20
that were in charge of doing the error handling and logging didn't have a proper abstraction, location, or name for it. So it's kind of similar to the very first example with Libya. There was some code that didn't have a name, didn't have an abstraction, so it was actually very error prone to duplicate the code. Let's see if we can address that with a decorator.
09:41
So the idea is that I create a new function, which I'm going to call dbStatusHandler, which is going to be a decorator. So it assumes it receives a function, which is the one that is going to be a decorator. Inside it defines a new function, and there I can put the logic for doing the error handling. I execute every command in the sequence.
10:00
Again, if an exception occurs, I log the exception and return minus one. Otherwise, I log that task completed well and return zero. This assumes that the function that's being decorated provides the sequence of commands. So there's an interesting thing here. First, we have a name for that. It's no longer an unused code.
10:20
Now it's called dbStatusHandler, and it's in charge of one thing, which is the error handling and executing the commands. So now the previous two functions can be changed to use the decorator. So I can remove all that duplicated logic and instead only return the sequence of commands, which is the relevant part. But they're still preserving the same logic
10:40
because they're being decorated by the dbStatusHandler on top of the function definition. So we actually did three things with this change. First, we assigned a name to the previous anonymous call, which now is called dbStatusHandler. Now there is a separation of concerns of the logic. Remember that the decorator now only handles the commands and the logging and knows nothing about the commands itself,
11:02
whereas these functions have the opposite behavior. These functions only return the sequence of relevant commands and know nothing about the error handling. And last, a third, we remove the duplicated call and we have it defined only once. So this was a cool way to use decorators
11:22
to address duplication. So this is somehow related to another topic, which is managing implementation details. The idea is that you have to run a task as part of your core functionality, but you also have to do some other things that are related from technology you cannot avoid, you cannot escape from.
11:40
And the idea is similar to the one before. We do not want to mix up different things into the same problem. We still want the same logic separated into smaller pieces. So let's see what Python has to offer for each scenario. Let's consider a very simple example. Let's say you have a web application,
12:02
an online game with players playing online. And I have a requirement that says that when a player finishes the game, I need to update the score with the new points that player has just earned for the match it just finished. So the idea is that given a player status object,
12:20
I call the accumulate points for the new points that I have to set for that player. And you might think this works, but if you take a closer look, you'll think that indeed this is not very good because it's mixing up implementation details with business logic. All I wanted to do is to add new points for a given player, but instead
12:40
I'm having to deal with a raised connection with a key zero, which seems to be a default value. An integer then follows what I actually want to do, which is to add a new score for the player, and then again another implementation detail, another technical detail with the set value. So instead, what I would like to have is not the previous accumulate points method,
13:02
but instead just points, like if it were any other regular Python variable because it's easier to follow and understand. So if I want to get the points of a player at any given time, I just type points. The same for the set. So the previous accumulate points, there's nothing particular about it, it's just plus equal 20 on any number
13:23
as I do with any other regular Python variable or attribute. So in order to achieve that, we're going to use the property decorator, which is a built-in decorator in Python. And the idea is that you find points to be a method of the class, and there we can move and encapsulate and abstract the implementation details. So from where it's calling,
13:41
it doesn't know what's behind it, and that's a good thing to have. It only knows about the points, and we have two methods or two properties that are smaller in size and easier to understand. So you can use the property decorator to use some computations based on our object attributes. You should prefer this approach instead of writing
14:01
custom getters and setter methods, because the code will be easier to read, to follow, and also to test, because now you can test your code with anything that has a points attribute. You don't have to have a redis connection for running your test, or mock the connection, or patch it, et cetera. So it's easier to understand and follow.
14:22
It's much more Pythonic. Now let's suppose I have a web application, in this case, that is an online store that has a stock representing all the products that are in the store. So there are dividing categories, and I have a view like this one
14:40
that says, like, request product for customer, that is going to handle the scenario when some customer is trying to make a purchase of a product online. If you care about code quality, you will find that this is a bit hard to read. In particular, these lines are not very expressive, but you still go to the trial to read those lines, and you find a clue near the if statement when it says
15:00
if product available in stock. So you say, okay, maybe the previous line, we're actually trying to figure out that. Again, I'm guessing over the anonymous call. So you say, if I want to figure out if a product is available in stock, I might rather just simply write that, like if product in current stock. What actually makes perfect sense is speaking in terms of the domain problem and it's self-documenting.
15:21
It doesn't need a comment, it doesn't need an explanation. And the way this works is because whenever you write something like that, if product in current stock, Python silently translates that by calling the contains method, the so-called magic method because of the double underscore, and passes the product as a parameter. So the idea is, like, okay, now we know that I can implement the contains
15:42
method into the stock class and have an interface like the one I had before, which actually also made sense. So the search algorithm can be encapsulated away in the class that actually makes sense to me. Another case for managing the state or handling scenario is when we have a code that, again,
16:01
runs a core functionality in your project, but you're also required to do certain tasks. For example, you might have a code that has preconditions or post-conditions or both. For example, you're connecting to a server, you want to make sure after processing the data you need, it's actually making sure that it's closing the connection or releasing
16:22
the resources it allocates. So the problem is here that we might, again, fear the risk of actually trying to mix up those things, and instead it should be together separating into different layers. So let's see if we can do that with a context manager,
16:40
which the idea is fairly simple. Let's say I have to run an offline database backup, so my backup requires that the database service is stopped before running, then run the offline backup, and then, of course, I want to make sure I'm leaving the database service up and running again. So instead of trying to put the stop database service and start database service
17:02
inside the run offline backup, which doesn't belong to and is making the code more coupled or more acuplated, instead we can separate that into a handler, which is going to be called with a context manager like this. So when I write like with an object that implements the context manager protocol,
17:21
Python suddenly translates that and calls the enter method automatically, which in this case will stop the database service, then follows the wait statement, the block, and then I can run the core functionality where I actually want to do, which in this case is running the backup, and then after that statement completes, it automatically calls the exit method, even if an exception
17:41
occurs. So it's making things easier because I don't have to do the error handling myself or manage edge cases or scenarios. I will make sure that even if this fails or there's something wrong, the database service will be left up and running no matter what. You can slightly improve this by using the context decorator,
18:01
which is an interface provided in the context lib, and once you inherit and extend that interface, you implement the enter and exit methods, and once you have that, you can use that as a decorator for the function. So in this case, if I call it like this, whenever I'm calling DB status handler, this is going to be called automatically
18:21
inside a context manager calling the enter and exit methods. So from all of these, we can draw the conclusion that there's always a much more Pythonic way to write things. The best way to write Pythonical is to actually take advantage of the features of the language. I would conclude that you're achieving so if your variables or your objects
18:41
are playing well with Python server variables, like two pieces of ASICs that match together in a way that makes sense, like an example of a stock. Sometimes, in order to achieve that, the most common answer is a magic method, but now there are many other tools, many other magic methods or features of the language such as scripters,
19:01
etc. So if you're starting, you're beginning in Python, I really encourage you to try to find this choice in Python in order to transform your code into a much expressive one. And if you're an experienced developer, you might use these examples as ideas in order to provide feedback in a code review or in a pull request. Try to see
19:22
if the code is Python. So to wrap up, we can say that the best way to write Pythonical is to actually take advantage of the language or the feature languages. Sometimes that means using decorator, removing duplication, using a context manager, etc. Aside to that, you also have some
19:41
standards that can help you write better code and can be understood by the team. For example, the page or coding guidelines. But know that you can be 100% with them and still not have Pythonical, because they're actually looking for different things. Although it's a good idea to keep them in mind. Try to put docstrings or function annotations you want to keep in your readers
20:01
about what you expect at any function in the code. And everything that applies here for the so-called productive code also applies for unit tests. You should also maintain your unit tests clean and with good structure so they are useful. And also, it's a good idea to use test and TDD because you will naturally follow this logic or actually trying to define
20:21
smaller pieces because you will want to make your code testable. And in addition to that, we have finally some tools that we can use to provide metrics for the code. We have PyConstyle, Pylin, and Radon. You can run this in your project and it will give you metrics such as psychometric complexity, maintainability index, etc.
20:41
that you can use as a head start to know where the code needs more improvements urgently. And finally, Koala, which works with the previous tool but also lets you define your own standards to be run or checked automatically as part of your continuous integration environment. If you are particularly interested in some of the topics, you can have more information in some of these sources that you use as support for the
21:02
talk. That will be all. And if you have any questions, we will answer them. Thank you very much. I think we have time for questions. We have time for a couple of questions.
21:20
Anyone? No questions? Thanks to the speaker again.