Writing Beautiful Code
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 160 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/33673 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 2017149 / 160
10
14
17
19
21
32
37
39
40
41
43
46
54
57
70
73
85
89
92
95
98
99
102
103
108
113
114
115
119
121
122
130
135
136
141
142
143
146
149
153
157
158
00:00
Machine codeSoftwareWebsiteMachine codeComputer animationLecture/Conference
00:21
BuildingComputing platformComputer programData structureComputerCache (computing)SummierbarkeitVariable (mathematics)Social classTask (computing)Scheduling (computing)Function (mathematics)Line (geometry)Interior (topology)NumberRange (statistics)Loop (music)Subject indexingType theorySingle-precision floating-point formatSocial classComputer fileFunctional (mathematics)Electronic mailing listMachine codeElement (mathematics)Disk read-and-write headControl flowNP-hardDirectory serviceMultiplication signReal numberComputer scienceOpen setData typeSummierbarkeitReading (process)String (computer science)Line (geometry)Computer programmingInterior (topology)Loop (music)Range (statistics)Context awarenessComputing platformVirtual machineLatent heatGroup actionComplete metric spaceDifferent (Kate Ryan album)Similarity (geometry)Type theoryComputer architectureNeuroinformatikTerm (mathematics)Web 2.0Task (computing)WordThumbnailData structureVariable (mathematics)CountingPattern languageRule of inferencePoint (geometry)Video gameCycle (graph theory)WritingForm (programming)Statement (computer science)Process (computing)Lattice (order)Water vaporNormal (geometry)Medical imagingArithmetic meanFigurate numberTheory of relativityFlow separationMereologyCoefficientRight angleLengthFamilySubject indexingOrder (biology)MathematicsPhysicalismNumberPrice indexMusical ensembleWebsiteJames Waddell Alexander IIHuman migrationRow (database)Category of beingBoolean algebraCore dumpCommitment schemeFunction (mathematics)Mechanism designState of matter
10:16
Mathematical optimizationAxiom of choiceData conversionParameter (computer programming)Personal digital assistantLine (geometry)Process (computing)Computer programSelf-organizationIndependence (probability theory)Modul <Datentyp>Function (mathematics)Rule of inferenceObject (grammar)Read-only memoryKey (cryptography)Exception handlingGroup actionLevel (video gaming)Query languageError messageEmailImplementationString (computer science)ProgrammschleifeVariable (mathematics)Data structureComputer programmingConnected spaceCoding theoryFunctional (mathematics)Computer programmingFigurate numberNumberImplementationFrequencySocial classAdditionSemiconductor memoryWordRight angleChemical equationWebsiteHeegaard splittingCore dumpCondition numberRow (database)MereologyStructural loadBitProgrammschleifeVariable (mathematics)outputEndliche ModelltheorieWave packetException handlingCASE <Informatik>Operator (mathematics)Arithmetic meanVideo gameMultiplication signGradientLevel (video gaming)PixelNichtlineares GleichungssystemData structureSearch engine (computing)Data analysisBlock (periodic table)Independence (probability theory)System callMachine codeWritingLengthStress (mechanics)Compass (drafting)Line (geometry)Point (geometry)Computer fileType theoryLogic gateAxiom of choiceInterior (topology)Validity (statistics)IntegerQuery languageDivisorFrequency responsePhysical lawError messageDisk read-and-write headNeuroinformatikElectronic mailing listFlow separationAddress spaceProcess (computing)Volume (thermodynamics)Self-organizationProgrammer (hardware)Reading (process)String (computer science)Module (mathematics)Object (grammar)Open set
20:10
LengthBlock (periodic table)QuicksortComputer configurationWeightMetropolitan area networkBit rateNumberNetwork topologyFile formatWebsiteWordMaxima and minimaMultiplication signMachine codeFunctional (mathematics)Constraint (mathematics)Group actionMereologyTrailElement (mathematics)Line (geometry)Condition numberLocal ringVariable (mathematics)Computer programmingChemical equationPoint (geometry)Object (grammar)Web pageRule of inferenceSpacetimeThumbnailSlide ruleTwitterOpen setSingle-precision floating-point formatLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
00:04
Good afternoon. So I'm going to talk about something that's dear to my heart, writing beautiful code. So before I start, let me introduce myself. So my name is Anand. I teach Python professionally.
00:22
I conduct advanced programming courses at People Academy. I'm also a co-founder of a startup called Roto Data. We're building data science platform there. I use Python heavily at my work and also teach what I will learn to students. So let me start with the quote from Christopher Alexander,
00:42
who coined the term the pattern language. So it's really hard to say what is beautiful. Have you ever looked at the code and actually felt, wow, this really looks awesome. Have you ever felt, raise your hands. Yeah, yeah. So it's really hard to say
01:01
what the code is beautiful, what makes a code beautiful. So it's kind of, some people say it's, Christopher Alexander says it's the quality without a name. So you look at it, you kind of feel it, but it's very hard to say. So he talks in the context of architecture but applies to many other art forms as well if you consider programming as also an art.
01:22
So let me quote this from, it's called the Wizard Book, Structure and Interpretation of Computer Programs. It says programs must be written for people to read and only instantly for machines to execute. It's a profound statement because usually we think programs are written to just get some job done, okay.
01:42
So we write for a computer to execute something, but if you look at it deeply, programs must be written for people. The reason is, if you look at the life cycle of any typical computer program, write once, first time, for the computer, for the remaining duration of the lifetime of the program,
02:01
sometime, I mean, someone else look at the program after some time, he has to understand the code. A lot of times what happens is we write the code and after a week or two, we just can't make sense of the program. I'm sure like all of us went through that phase, okay. So it's very important to write programs to keep in mind of the people who are going to read the code at a later point.
02:21
So you should always try to improve the readability of the programs. So let me start with the very simple low-hanging fruit that everyone knows that you have to do it, but not many people do this, okay. Choosing meaningful variable names. It's so important that even people with a lot of experience fail to pay attention to this.
02:43
I teach Python professionally. I do Python, advanced Python courses to working professionals. So I really find it frustrating when people with like three, four years of programming experience and they can't even pick right variable name. It's not that they can't pick, it's just that we don't just pay the attention to those details.
03:01
I'm going to show you with a couple of examples how important it is to pick a right variable name. So let me quote Phil Colton on that two hard things computer science are cache invalidation and naming things. Believe me, naming things is not easy. It's hard. What's the longest time you've spent banging your head
03:22
to figure out what name to give to a class or a file name or a variable? I remember having spending literally two full days to figure out what should I name this thing, what I'm trying to implement, okay. So naming things takes time and you should give the time. It's very, very important to give that.
03:42
So the first tip is avoid generic names. They really don't make sense. If you want to, so what we do is you want to have temporary variable temp to manage your data. It's kind of, it's two generic names. You really can't understand what they mean. You have to use a little more specific names, for example.
04:03
Yeah, so you should use a little more specific names to the context what that means. The other thing is, hope you can see the colors, yeah. So the red is to indicate not so nice code and the green is the one which is what I'm suggesting. So avoid using abbreviations.
04:21
People say UCF, uppercase formatter, but people can't understand what UCF is. It's okay if it's saying HTTP or SMTP because that's a well understood acronym, so that's fine. But using something like VEA for a bank account doesn't make sense, okay. It's very hard to understand. You can probably say formatter or an account or something.
04:41
So that's a, I think a good thumb rule. Don't use abbreviations unless that's a very common thing that everyone knows. These are the common mistakes that I keep finding in people as they use a data type as a name of the variable. They say it's a list, say it's a string.
05:01
Okay, but list of what? String holding what, right? So it's better to say actually what it says. It says sum of numbers, count words. It takes a string but actually a sentence. It's not a paragraph, it's not a file name. So it's better to say specifically what it actually means, not just saying the type of it.
05:21
So that makes it a lot more readable. The other thumb rule about nouns and webs is to use nouns for variables and classes, saying concepts, and use webs for functions. So say actions. So these are the tasks that you want to do. So size, price, task, scheduler, et cetera.
05:42
They go well for variables and classes and use actions. Get file name, instead of just saying a function as file name, it's a get file name so that it kind of indicates you're doing an action, or make account, or a deposit. So these are the kind of examples where using webs makes sense for functions.
06:01
And, okay, they're not, it's very simple rules. It's not something that I invented. These are the age old wisdom that's been talked about from so many people. If you find out practice of programming, or there's so many other books people have written, but I'm just trying to take that wisdom and put it in the context of Python. So if you look at, if you have a list of values,
06:23
it's good to use a plural for that. So say largest line of lines, or a list of directory, and you get files there. But look at other examples. So file equal to list of directory. It's not a file, it's a list of files. So it's better to use a plural
06:41
to say that it's actually a list of values. And these are examples actually, real world examples have formed from people when I'm doing teaching. So people say for lines in open file name dot read lines, reading lines gives you a list of lines.
07:00
But each element is a single line, not lines. So, and saying int of lines doesn't make sense at all. So it's better to use plural for a list. And when you're using loop indices, these are i and j for loop indexes only, not for the values, okay?
07:20
For example, for i in range 10, it's fine. i is an index, you're going over its value zero to 10. But if you're going over a list of values, using i doesn't make sense, because i usually is used for an index. And if you're using i for numbers, that feels like it's a single integer, but n could be a number,
07:41
or it could be a string or anything, right? So use something which is more conveys what it actually holds. For n in numbers, it probably makes a lot more sense. You might be thinking like, why am I talking about silly things, right? These are things that everyone knows, but it's really, really hard to get this in practice. Okay, let me show you an example. This is a small file and function I've written, okay?
08:02
Can you try and understand what this is doing? Can someone, let's take a minute and then see what this does. Can someone explain to me what this function does? Yeah?
08:22
Wow, awesome, yeah. It actually, that's what it does, okay. But let's look at this, okay? So if you look at the same function I've written like that, all I've done is I've changed the names. I've not changed any structure of the program. So it's just a file and program, and it's really hard to understand
08:41
what this is doing, x and y, and adding things to z, okay? So by changing the names, it made so much of a difference, okay? Now, by looking at it immediately, you know that it's a dataset, it's an index, and you're taking a row, and then from the row you're taking, yes. I know, I know. But the thing is I'm talking about, I agree.
09:02
So I picked this example because to show how much difference names makes, I agree, you can use list comprehension, but just trying to show how much names makes sense. So if a file and code can make so much of a difference in picking the right variable names,
09:21
imagine what would happen if you're working with the final line program or 10,000 line program, right? So it's really, really important to pick the right names. So the other thing is when you're using similar names
09:41
for complete different data types, it's kind of very confusing. So when you're writing code, you should also keep in mind what people think unconsciously, okay? So when you see names which are sounding similar, we unconsciously expect that they actually hold similar type of values. So a1 and a2, you think they must be the same kind of values,
10:00
but if you put a list in one other integer, that's very, very confusing. It's very hard to make sense of that. So probably say values are n instead of saying a1 and a2. And that's one of the issues that we had in the previous example as well. We'll say x and y. Kind of feels like x and y must be the same types. Maybe both are listed, both are integers or something, but they're not.
10:21
Okay, now let's look at comments. We all say people, we all say writing comments is good, but is it really? There's really no need to say the obvious, okay? Increment x by two, but that's okay. I mean, Python programmers will look at the Python code and figure out they're incrementing x by two.
10:41
But that's just, that's the obvious thing. There's no need to write a comment. But you can actually explain why you're doing it. Compensate for border on both sides. So you're adding one pixel on both sides, so you're adding two. So that makes more sense. You're conveying why you're doing that. So don't say the obvious. Say why you're doing that. And a lot of times, it makes a lot of sense
11:00
when you're commenting, add a comment to explain why you made the choice. So the following is an optimization, saves a lot of memcache calls. So I'm saying that this is the reason why the following code is being done this way. And also, it's good to document special cases. For example, you figured out
11:21
there's an Unicode error happening, and you don't know why, what's happening, and this magically fixes that issue. Okay, so put a timestamp and say this is a special case, and this is how I'm fixing it. So in future, people will be careful when they're touching that part of code. And it's actually good
11:41
if you can actually make comments, isn't it? You can write code such a way that you don't really need to write comments. For example, if you look at the first case, the find length of the longest line. So the list comprehension and finding the max of that. Instead of that, if you say any code length of longest lines, you don't really have to comment that. It's kind of very clear in the program itself.
12:02
So you can write self-documented code. So the code is simple enough for people to understand by the code itself, you don't have to write the documentation. So that's something really awesome if you could do that. Yeah, the other thing is if you have longer functions,
12:20
what we typically do is we have stage one, stage two, add a block comment, say that this is what, so process documents and upload them to search engine, and there's a part of code. But it makes more sense to split that into smaller functions, and then say docs equal to process documents, and then search in, submit to search engine. So you don't really need comments to explain that part of code.
12:44
Now that we have looked at the simple things, or more accessible things like variable names and comments, let's look at program organization. So how do you structure your program? So divide and conquer. So split your program into smaller,
13:01
independent modules and functions. That way it's a lot easier for people to make sense of that. Let me quote Miller's Law. What it says is the number of objects an average human can hold in working memory is seven plus or minus two. It has profound implications on
13:22
how we should write our code. The reason is seven plus or minus two is the number of things that you can hold in your head. That means, that probably means when you're writing a function, you should not have more lines than that. Or you have a class, then the number of methods that are in your public API should not be more than that. So when someone looks at your class, and then starts working on it, starts using it,
13:43
if it has seven plus or two minus functions, then you're able to kind of keep all of it in his working memory and then able to work with it. But if it has a lot of them, it's pretty difficult. So that's a kind of ballpark figure to keep in mind when you're writing classes or functions. So I think that's a good number to keep the size of a function as well.
14:04
And the other common thing people always don't pay attention is duplication. So duplication is bad. For example, it's a function, it takes an input data and tries to convert that to an integer
14:20
and then sums them up and give the value back. Okay, so since data is coming from users, we want to do some validation and converting integer before we actually do the summing up. So there's a try and accept, but we're doing it twice. If you see actually, I've deliberately added in,
14:40
so just people, you write one thing and then copy, paste, and then modify things. And you see I kind of deliberately left X here because that's kind of where it's usually get into. So instead of duplicating, it's a lot better if you can actually generalize that. Say you get int, take input data,
15:01
and then X and Y, and you write an X and Y. So if you look at the add function here, it's very easy to understand now than what we had before. Now, we should also avoid too many nested levels when you have a function. It may have, it may end up writing code like this.
15:23
It's updating a blog post, so there are many cases where you want to update a blog post, you want to update a title or a tag, so there's a big function that kind of going over the steps but if you see there are too many nested levels here, that's kind of hard to understand. So what can be done is you can take each part of that
15:42
and then make that separate function. That really makes it a lot more readable. So if you look at this function, this function is just delegating it to other sub-functions. So if you look at this function, it's easy to understand and if you want the further details, you can look at each of these other functions. The other useful thing is to make sure
16:01
you handle errors separately. For example, this function is trying to get a user from, given the email address. So if I check it's a valid user or not, if it's a valid user, just make sure the account is not blocked and then you query the database and give the user back.
16:21
But if you see the code of the function, the main function of this function is hidden in two levels deep. Okay, so the query, so the data spreading is the core of the function, but that's hidden deep inside so many conditions.
16:43
So wouldn't it be better to be the most prominent part of the function? So what I can do is you can keep the errors handling separate, for example, like this. So you do the error validation first and then at the top level, you have the code of the functionality.
17:01
So you get a query and then query the database, get results, and if you're really trying to understand this function, you could just skip the error validation then directly jump to the main part of the function, which is not so straightforward if you're doing it like this. Now, the other very important thing is
17:21
we should try to suppress implementation detail as much as possible. The reason is, when someone is trying to understand program, the intent, what the program is doing and how the program is doing are two different things. So when someone comes through, your program is more interested to understand what the program is doing. So we should suppress implementation details
17:43
as much as possible so that we can understand the intent of the program pretty quickly. And then if required, you can go and then see what each of the functions are doing. For example, this program takes one command argument, which is a file name, and then reads all the words in the file, computes the frequency of the words,
18:00
and prints the frequency. This is what the program is doing. So it has four lines, and this is all the program is doing. So the intent, what the program is doing, is very clear by looking at this program. But write a longer function, and then put in how the words are read and how the word frequency is computed right here in the same function. It becomes too difficult to understand why someone is doing it. Now, they're reading word frequency,
18:22
computing word frequency. I can go to word frequency function and figure out how it's actually being done. The implementation and the intent can actually be separated. The how and what, it's good to separate those two things together. So now, that's, I think,
18:42
the things that I want to mention about writing beautiful code. And this is a quick summary of what I've covered so far. So choose many more variable names. I can't stress enough how important it is. It sometimes takes a lot of time, but it's worth spending the time
19:01
because the amount of time we spend later trying to figure out why the program is written like that, or trying to figure out how it's actually working, is a lot more if you don't spend enough time early to pick right variable names. And use comments when required, but don't put comments just because you have to. Split the program into small,
19:20
independent modules and functions. Avoid duplication at all costs. Suppress implementation details and always optimize for readability. Let me stop with a quote from Tawa Programming. A program should be light and agile. Its subroutines connect to like a string of pearls. So, talking about the elegance of the program.
19:42
The split and int of the program should be retained throughout. Should have the clarity that it should have in the program. There should neither be too little, not too much, neither need little loops, nor useless variables, neither lack of structure, nor volume rigidity. Also, you'd have the right balance in the program
20:01
to make it beautiful. Happy coding, and I'm open for questions. The slides of the talk are on my Twitter feed if you can find out, if you want to follow that. I'm open for questions, if you have any.
20:22
Please, for the mic, yep. Hi, so when you talked about the number of objects that the human brain can track, what's your opinion on the sort of recommended maximum, minimum length of a code block? Sorry?
20:40
The sort of recommended length of a code block, so there's a school of thought that says if a code block is larger than N lines, it becomes hard to read, where N is said to be any number between 40 and... Yeah, so I think the thumb rule is it shouldn't be more than half a page of sheets, is I think is what people say, but I would say my usual recommendation is not more than 10 lines is what maximum
21:03
I would say for a function. So I would say like the seven plus seven minus two rule it works for even number of lines in a function. What are your thoughts on code formatters like YAPF? Sorry, could you please repeat, I couldn't...
21:21
What are your thoughts on code formatting tools like YAPF? So, I think it's a, I think you'll, let me repeat. So you're talking about the Go format tool, right? Yeah, I think it's a really good tool. I think it's called the Bikeshare Discussions, right?
21:40
Should we use one space or two spaces or put a space before or after? I think we just get used to whatever it is, and I think Go format, Go has really done a very good thing about how you don't have options, you have just follow just one formatting thing, but I don't think we have that luxury in Python. Though there's by paint, it's up to the people to decide whether to follow or not.
22:01
I think I would really love to have something like that for Python. So, just a quick question on your thoughts on something more specific about intelligible variable names.
22:23
I agree with most of what you said, I think it's great. I just wondered about this habit of deleting all the vowels in variable names, like trying to, like picking a name and then abbreviating it by deleting all the vowels. I hate this, what do you think about it? This practice?
22:41
Oh, sorry, I didn't get, could you please repeat the question? Sorry, so this kind of habit of deleting all the vowels in variable names once you've chosen a name to try and make it shorter, like some, I guess there's a trade-off between variable length and intelligibility. Yes, so I think it's an important point. So, I would say the variable,
23:01
the length of the variable can be proportional to the scope of it is. So for example, if it's a local variable, if you have a function which is five, six lines, doesn't make, it's probably okay to have a single letter variable. But whereas if you have a global variable, which the scope is much larger, it's better to be verbose and have the full name.
23:21
So if it's just a small function, or instead of a loop, I would say for w in words, w kind of indicates it's a part of, it's an element of words, so that's a single word. So that's probably okay to have a single letter. But I don't really like to take out vowels or kind of make it short, just to,
23:41
I think variable is more important than the length of a variable, that's what I would say. Thank you, Anand. In your opinion, what is a good time to clean up your code when programming? I would say it's a habit that you have to build over time.
24:08
So, once we start keeping these things in mind when you're writing code or looking at someone's code, we kind of feel that it's kind of violating that principle, okay, so we should start getting that sense of identifying the bad smells, so that when you're writing your code,
24:21
you don't actually write that kind of things. So, I mean, probably when you're beginning, for beginners, it probably takes a while to get that habit, so they'll write it and then clean up, but once you get through that, go through that habit, you'll probably start getting that sense. It kind of becomes your habit, I believe.
24:43
Any more questions? Okay, so thanks a lot, Anand. Thanks.