Keeping your projects nice and clean
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 131 | |
Author | ||
Contributors | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/69468 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Conic sectionPauli exclusion principleLine (geometry)CodeCountingComputer animationLecture/Conference
00:56
ConsistencyInformation securityRight angleFunctional (mathematics)Task (computing)Computer programmingVector potentialCodePasswordShared memoryIntegrated development environmentCountingOpen setBitSurface of revolutionMereologyStandard deviationOcean currentRevision controlPoisson-KlammerAbstract syntaxProjective planeWritingProgrammierstilText editorFamilyFeedbackCognitionDifferent (Kate Ryan album)INTEGRALDecision theoryMultiplication signTextsystemDefault (computer science)Point (geometry)Computer configurationVulnerability (computing)Online helpSoftware bugFile formatFlow separationVideo gameStructural loadPiString (computer science)Group actionLeakCASE <Informatik>Programmer (hardware)Computer animation
09:20
Conic sectionObject (grammar)Confidence intervalMereologyProgrammer (hardware)Server (computing)Group actionHookingComputer configurationOperator (mathematics)Integrated development environmentLine (geometry)Rule of inferenceVirtual realityObject (grammar)SpacetimeCodeSoftware testingSequenceString (computer science)Continuous integrationDenial-of-service attackAnalytic continuationProper mapDifferent (Kate Ryan album)Functional (mathematics)Projective planeNeuroinformatikText editorGoodness of fitData structureConfiguration spaceBitSystem callMultiplication signOpen sourceComputer fileRecursionForm (programming)Structural loadShared memoryMathematicsFile formatPlug-in (computing)Graph (mathematics)PiState transition systemRight angleWorkstation <Musikinstrument>Software maintenanceDependent and independent variablesFormal languageControl flowComputer animation
17:44
ConsistencyFile formatMathematicsLogarithmRevision controlString (computer science)Configuration spaceServer (computing)Projective planeSet (mathematics)Different (Kate Ryan album)ImplementationLattice (order)Functional (mathematics)Software developerElectronic mailing listScripting languageStructural loadNP-hardProgrammer (hardware)Software development kitZoom lensType theoryMultiplication signWritingReading (process)InformationTemplate (C++)Object (grammar)Pattern languageComputer fileCode refactoringCodeRule of inferenceDivisorSoftware testingBus (computing)File formatSlide ruleMathematicsSynchronizationRepository (publishing)HTTP cookieData managementIntegrated development environmentParameter (computer programming)MereologyPhysical systemPerspective (visual)VotingLibrary (computing)Right angleWrapper (data mining)Fluid staticsComputer animation
26:07
Artificial neural networkHand fanFirst-person shooterPersonal digital assistantMultiplication signDefault (computer science)Software developerLogic gateBranch (computer science)CodeRule of inferenceFlagServer (computing)Group actionRoundness (object)Commitment schemeFile formatComputer animationMeeting/InterviewLecture/Conference
Transcript: English(auto-generated)
00:04
What does it mean, keeping a project nice and clean? Well, first things first. You already saw Zen of Python today at the keynote talk.
00:22
This is a collection of some Zen quants that describe the general philosophy of Python. And I picked up just a few lines from it that are related to my talk. So, beautiful is better than ugly. Having nice and tidy code means it is beautiful to look upon.
00:43
Also, readability counts. I have to emphasize on this. This is really important, and I will come back to it later. And there should be one, and preferably only one, obvious way to do things. So, why should you care, sorry, anyway, why should you care whether your code is readable,
01:01
consistent, and are things? Readability counts. Before I started working in Python, I did my fair share of programming in Perl. I don't know whether any of you ever programmed in Perl, but it's very easy to write incomprehensible code. You can write tidy and nice code in Perl,
01:22
but it's not that easy, and not as easy as Python. But also, as I mentored junior programmers, I learned that you can also write incomprehensible code in Python. It's not that hard. It makes cooperation easier.
01:40
If you have consistent style throughout your code base, throughout many projects, and you have a team that works on all of these projects, then being consistent will help cooperation a lot. Because when you have different projects with different style, you will have to adapt
02:00
to style of each project. And if you switch fast between individual projects, it can be quite demanding. Also, it can help avoid bugs and vulnerabilities. This is mostly regarding clinters that I will get to during my talk. And also, it can save time, if done right.
02:26
First thing first, what tools can we use to make our project look nice and clean? Look at these two examples of an init functions. They are functionally identical. One is aligned by the open end bracket.
02:43
The other is aligned like this. So, which one looks better? Who thinks the first one looks better? Great, thank you. Who thinks the second one looks better? Okay, great, a lot of hands. Who doesn't care?
03:00
Yeah? Thank you, okay. Because whatever your answer, we can argue about that in the full request. I've seen my share of full requests where people argued about which alignment is better, which code style is better. And I think this is a waste of time.
03:23
You shouldn't waste time arguing about the specific style. Here comes the autoformatter. So, keeping consistent formatting is a manual task. And manual tasks should be automated.
03:40
Well, there comes our heroes. Black, the, well, not the original autoformatter, but the original autoformatter that got really popular. I think it's like seven years old, and it came. And it, what it did is it took your code, parsed it into abstract syntax tree,
04:01
and it outputted it in another way, way that was pleasant to look upon, and it was easily readable. Yeah, great revolution. What is great about this is that you don't actually need to think about formatting anymore, because when you don't use autoformatter,
04:22
you still do some kind of formatting, but you do it manually. And you have to spend time, spend portion of your cognitive function thinking about this. So, even when I didn't use autoformatter, I had to write, and I already formatted the code during writing the code.
04:42
That took some part of my cognitive abilities. And then, when I finished, I had to go through it again and reformat it even more. But when you have autoformatter, you can write your code any way you like. It can be really dirty, and then you just, in the best case, you just map autoformatter
05:01
to safe in your editor, and you have nicely formatted code after each safe that's really readable, really comprehensive. Rough is much younger brother of black, and it actually does a lot more than autoformatting. I will get to it later. But it's only like one and a half year, two years old,
05:24
and it's still evolving rapidly. But as far as autoformatting goes, it's on par with black. The code they produce is almost the same. There are some differences that are explained in the rough documentation, like this is where we chose to diverge from black,
05:41
and this is why. But I don't think any of these decisions are really significant. There's also the app, I made it in italics because I didn't use it, unlike black and rough. It's made by Google, it's also an autoformatter, and it's configurable, much more than black and rough, which have only a few buttons you can turn.
06:04
And that leads to my another point. Less options are less to argue about. So what I would recommend is just pick an autoformatter and stick with it. Just use the default. If you have very strong opinions in your team, sure, turn the knobs.
06:21
But if you don't, don't argue about them, it's waste of time. The main difference between black and rough is that black is written in Python, rough is written in Rust, so it's fast, as everyone who writes anything in Rust tells you, but it's only significant in big projects.
06:42
So it's up to you. Linters. Linters are small programs that go through your code and make suggestions about how you might improve it. They can detect bugs, they can detect password leaks,
07:03
typos, bad practices, and other things. They also short-term feedback loop. If you have them integrated into your IDE or your editor, you immediately see as you write the code that there's something fishy about some parts of it, and you can react to that.
07:21
And it's much better if your editor tells you, maybe you should improve this, then if you write the code, send it to the reviewer, and the reviewer says, okay, I don't like this bit, can you pre-server it, and it then delivers to you, you have to work on it again. So it shortens this feedback loop.
07:41
There are tons of these tools. Okay, so there's flake, flake8, which is actually integration of several other tools, like pyflakes and pycoon style and others. And they make general suggestions about code, how it should be formatted and such.
08:01
There's esort, which takes care of sorting your imports and organizing them into neat groups. There's pydocstyle, which takes care about formatting your docstrings. There are two widely accepted docstrings convention, NumPy and Google, and pydocstyle can check both of these conventions and tell you that anything's wrong.
08:24
Bandit is really interesting, because it looks for security flaws, or potential security problems. There's also pyupgrade, which is a bit different, because it helps you migrate your code to the newer Python standard. Because when you write your code,
08:41
you probably support only some subsection of Python versions, and as old Python versions go to the end of life, you want to upgrade and keep your code current. So pyupgrade introduces new Python features
09:00
into your code. And there's doc8, that's a bit different, because it cares about formatting of restructure text, which is the main documentation format for Python. Just a few examples. Bandit, if you have a code and you call yaml load,
09:20
well, then if you run Bandit on this code, it will say you, it's not safe to call yaml load, because it can cause, in station, arbitrary objects. This means you can, for example, create infinitely deep recurse structures, and that can lead to denial of service attack
09:41
or something like that. So use yaml safe load. It even gives you a suggestion, what you should do. It's great. Another example, like awesome bugbear, you have a string, I hope everyone sees this, and you call strip.txt on it,
10:00
and that should strip the string.txt from the end of the string, right? Although it doesn't, because if you give multiple characters to strip, it actually strips all the, all sequences composed of these characters from end and beginning of the string.
10:21
So text.txt, strip.txt would become e. There is remove suffix function for this, but it only came in Python 3.9, so you can still bump into this, especially in our code. And there's a bugbear that tells you, yay, it's not a good, are you really sure
10:41
you wanna use strip with multi-character strings? If you are, you can tell it with simple comment, yes, I am sure, this is what I intend to do. So it doesn't forbid you to do it. It just tells you, are you really sure? Okay, and there's one linter I didn't talk about it,
11:02
and it's rough. Okay, I already mentioned rough as an outer formatter, but rough actually started as a linter. And it took all these great tools that we already have, and it took rules that these two enforce and integrated them into itself.
11:21
So now rough implements around 800 rules, and it ingrains all of these except for the doc8, because it doesn't care about structured text, but all that concerns Python code, rough can already do. And it's very fast because it's in rough, a bit of it is in rust. So is there any reason why you should not use rough?
11:44
There may be, because you still can have very specific needs, and you need to write your own plugin, and as of now, rough doesn't support writing plugins. Flake does, but other than that, I would argue that using rough is simpler
12:03
at the time because you only have one tool. There's also one more advantage to it, and that's autofix is just great. So, autoformatter is great because you don't have to spend your cognitive share to think about formatting the code.
12:20
And autofix is great because you don't have to spend your time fixing these things. Autofix means you just run rough with an option fix, and it fixes these issues for you. Not all of them, not all of them are automatically fixable, but a lot of them are.
12:41
And my final thought about lintros is, don't lint formatting, okay. Some of these tools like Flake do have rules that tell you, like, this line is too long. This operator should have spaces around it. There should be a line break.
13:01
And these rules, you have to correct by hand, unless you use an autoformatter. But if you do use autoformatter, why would you want your lintros to tell you to do these things? That's just a visual clutter. You don't need that. All right, how do we enforce these things?
13:25
How do we enforce using autoformatters, lintros, and so on? There are a few things. First, your editor, or IDE, it doesn't matter whether you use VIM, VS Code, PyCharm, or anything else.
13:41
If you use any decent editor, I mean, anything better than nano, it should be able to run these lintros and to run the autoformatter on safe, and I highly recommend it. You just set it up and forget about it forever. Unless, like me, you work in a company that has different projects,
14:01
and some of them are autoformatted, and some of them are not, because of their maintainers. And then you have to have a special hook in the editor that actually recognizes whether this particular project should be autoformatted or not, but it's still doable. I still recommend it. You can use pre-commit. It's in italics because I have my objections to it,
14:22
but I will cover it because lots of open-source projects do use it. Continuous integration is a great way to enforce anything because it runs on the server. And finally, there's code review, which means someone else looks at your code and gives you comments. So let's go over these.
14:42
Sorry. So pre-commit. I actually ordered these in the, like, closest to the programmer and furthest from the programmer. Yeah, so that's. All right, pre-commit. Pre-commit, not the pre-commit hook,
15:02
but the pre-commit project is a project that creates hooks in Git, pre-commit hook mainly, but also other hooks. And it runs different checks and changes in your project. The good thing about this, that it can create virtual environments for these tools,
15:22
and not only virtual environments in Python, but also in other languages, like their counterparts. The bad thing is it's not enforced on server, per se. It's something that you have at your own computer. And if you install the hooks,
15:40
then before each commit or other action that you map the pre-commit to, it runs these checks. And if those checks fail, it won't allow you to do a commit, which is my main objection to it. I am happy to discuss it after the talk, but it's not the main thing. But it's not enforced on the server.
16:02
So if the user or the programmer ignores it, he will still be able to push it, push the code to the server. It can be called as a part of continuous integration, and then it does serve its purpose properly. What's good about this is that you will get
16:21
immediate response. You don't have to push the code to the CI only for it to fail. You can immediately see there's something wrong with it. There's just a simple configuration file for this.
16:41
Looks like this. It doesn't matter the exact form. You can find it in the documentation, but this is roughly how the configuration looks. Continuous integration is great. You can run linters and formatters check in it. And if you use GitHub, you have GitHub Actions.
17:01
If you use GitHub, you have GitHub Pipelines. And I think GitHub slash forge ago also has something like that. And I guess most of you does use some kind of environment like this. So you can run continuous integration, and I would recommend it.
17:21
This is simple because if the pipeline fails, well, then there's something wrong with the code, and you should fix it until the pipeline passes one way or another. Only then, well, yeah, sorry. If you don't use Docs, I highly recommend it. It's a way to run your checks,
17:42
and most importantly, your tests in different environments, like with different versions of Python, different versions of dependencies, different versions of Python implementations, and so on. So you just say, I wanna do, sorry, I wanna do rough check, I wanna do rough format check,
18:03
because it doesn't make sense to call rough format in the CI, because it would change your code, but why would you want to change your code in the CI? This is just check. I also have mypy here.
18:21
Static type checkers are great, and I wanted to talk about them, but the talk was too long. So do pay attention to typing as well. All right, this is getting better. Code review. How many of you does code review in your company or in your projects?
18:41
Great, that's a lot. I'm happy. Okay, for those of you who don't, please start. Okay, so what is code review, really short. Code review means that you create a code, you write a test for it, hopefully,
19:02
and then you submit it and create a pull request on GitHub or merge request on GitHub, it doesn't really matter, and then someone comes and looks at your code, and he tells you this might be better this way, or are you sure about this,
19:21
or I don't understand this part, can you explain it? This is great from the cleanliness and tidiness perspective. This is great for things that can't be automated. As an example, for useless lock strings. I chose some, these are from our projects,
19:41
and yeah, I mean, everyone sees that if you have function that's called load list, that it probably returns a list, but what list? Return object list is useless lock string. It shouldn't be in the code. There should be a useful lock string instead, and that's something that you probably won't be able to discover using automated tools,
20:02
because it will check whether the lock string is there. It can check whether the lock string has a particular format, but it won't be able to detect whether it conveys useful information. Well, not yet anyway. We'll see what AI does with that. Another great thing about CR is that
20:21
it helps with mentorship. If you have junior programmers, or if you are junior programmers, and you want with some more experienced programmers, CR code review is great for mentorship, because you can learn both ways. If a junior programmer does code review, he learns some advanced patterns. If the junior programmer is writing the code,
20:42
and the senior is making the code review, then he can point out some beginner mistakes that could be improved. And also, it improves buff factor. Buff factor is an imaginary metric that says how many people on your project can be run over by buff, because it gets to problems.
21:01
And if you have buff factor one, then if one person would go out, you would have a problem, serious problem. So if everyone, if every piece of code is seen at least by two people, it at least improves buff factor to two. And that's 200%.
21:22
All right. Real quick, some follow-up notes to all the things I said here. How to sync the project configuration. Let's say that you agree that you want to use rough, and you want to use it with some set of rules, and now you want to synchronize that set across many projects.
21:40
And let's say you are not using vulnerable to make things more entertaining. You can use cookie cutter, which is a thing for creating projects from templates. But that only works for new projects. You can also use Ansible. Okay, why would I say that?
22:02
Because it's what we do. Okay, so Ansible is orchestrating tool written in Python that can make changes on servers, and that's what you should use it for. But you can also use it for pointing it to your own computer, like if you use the host, local host, and then it can do all kinds of changes,
22:21
like synchronizing the configuration between projects. What I would recommend actually is writing a custom script. It's not that hard. There's a great library called Tomlkit, and it can both read and write Python. Don't use templates, if you can,
22:40
because you usually have to edit pyproject.toml, and pyproject.toml contains other information than just configuration for these tools, and you don't want to affect that part. So if you do use your custom script, just write some wrapper functions that say like, ensure that this settings has this value in this list,
23:02
or doesn't have this setting in this list, but otherwise don't change the config. That's the way to go, I think. I'm happy to discuss it further outside. Another thing, preserving git blame. If you introduce auto-formatter, especially auto-formatters to your code
23:20
for the first time, it can create a lot of changes. And if you use git blame, the git blame will become mostly useless after that, because all the changes will start with this big refactoring. But you can avoid that, because you can tell git blame to ignore some specific revision. And even better than that, you can create a file and put it to git, version it,
23:42
that says these revisions should be ignored for git blame, and then just set it in the repository. Caveats. Okay, I said that this all can save time, and that's true, but it also takes some time.
24:01
And it can be hard to sell it to your team leaders and management, if they are not on the boat already. But the main argument should be, eventually it will save more time than it costs. Agreeing of common style can be also very tricky.
24:21
One person can decide what it will be, presumably the team leader, it will be autocracy. Or everyone could vote about it. That's horrible, it's called democracy. And also, what we do, we have kind of meritocracy system, which means only a few developers do agree,
24:42
we can take votes, but it's not all the developers. And you can also have style meetings. Like, if you are setting up big new changes, big changes in your style conventions, then just meet either on Zoom or physically, and discuss it.
25:00
Keeping checks up to date can be tricky, because the tools evolve. New versions of RUF and other tools are coming out, and they can break your CI, because they make changes. You can pin those tools, but then you have to take care about updating them. Or you can just not pin them,
25:21
you can let their version grow, and then you have to fix the problems from time to time. That's what we do. And you can go over the top. If you introduce too many rules, especially rules that don't have autofix, you can find yourself spending too much time on it. So don't overdo it.
25:40
If you are new to this, just select a reasonable amount of rules, maybe even default, and start with that. And you can build up from that. This is my recommended basic setup that's stored in paper project tomorrow for RUF. You don't have to use all of these,
26:00
it's just for an inspiration. The slides will be online or already online, so you can find it in there. All right? Now we have time for some questions.
26:21
Thank you so much, Jan. If you have any questions, you can move to the microphone and the links, and ask. Why shouldn't I trust the formatters to do the reformat in the CI?
26:41
Because what we are currently doing is exactly what you propose. We just check it, and we are not able to merge into the main branch if it fails. What the developer usually do is just say, ah, it's black again, goes back, hits black dot,
27:00
and commits it anyway without checking anything. So why shouldn't I do it in the CI? Okay, that's a good question. I will say it shouldn't do it, it's just not that trivial, because to change the code, you have to make a commit. I assume that you work with Git. So to change the code, you have to make a commit,
27:21
and it's not very customary to create commits in the CI. But if you are willing to do that, you can do that. Okay, great, thank you. But, and then you also have to push it to the server. So you have to set up some credentials and so on. Not that trivial, but it could be done, I think.
27:40
Oh yes, hello, thank you, great talk. Quick question, you mentioned that we shouldn't do reformatting in the linter, which makes sense. But do you have an easy way to do that for flake8, or do I still have to eliminate all the formatting rules one by one? Well, you usually have formatting groups,
28:01
rules groups, sorry. So you can disable all the, well, all the groups of rules that concern formatting. If you want to use rough, their defaults actually do exclude all these rules by default, because they assume that you would use auto-formatter as well,
28:20
and don't want to intervene with that. If you use flake, you will have to disable them, but it's quite easy. It's like disabling three groups of rules. Okay, thanks. We don't have time for any more questions,
28:41
but you can always ask Jan outside. Our next session will be in five minutes. Let's give Jan a round of applause.