The Art of the Pull Request
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 131 | |
Author | ||
Contributors | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/69508 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Personal area networkNetwork topologyGradientComputer iconData structureBit rateSoftware engineering2 (number)Order (biology)Multiplication signBitCode refactoringFunctional (mathematics)Shape (magazine)Flow separationDatabaseEndliche ModelltheorieBit rateCodeFood energyTask (computing)Decision theorySoftware bugProduct (business)Row (database)Complex (psychology)MultiplicationSoftware developerGroup actionDiagramDisk read-and-write headFundamental theorem of algebraMechanism designNumberComputer configurationIntegerUser interfaceTotal S.A.MalwareSlide ruleRevision controlFront and back endsSystem callGreen's functionPrandtl numberData structureFormal languageBasis <Mathematik>Computer architectureFigurate numberDifferent (Kate Ryan album)WebsiteMaxima and minimaAuthorizationPhysical systemMereologyAbstractionMathematicsProcess (computing)LogicKey (cryptography)Connectivity (graph theory)Entire functionCore dumpInterface (computing)CodeOcean currentCache (computing)Computer animationLecture/Conference
08:52
Object-oriented analysis and designCache (computing)DatabaseEndliche ModelltheorieConservation of energyCodeToken ringFigurate numberDependent and independent variablesSoftware testingConnected spaceSelf-organizationRule of inferenceCode refactoringEndliche ModelltheorieMixed realityDatabaseThumbnailLine (geometry)Functional (mathematics)MultiplicationCASE <Informatik>MathematicsNumberSheaf (mathematics)Decision theoryContext awarenessMessage passing1 (number)Computer fileConnectivity (graph theory)Key (cryptography)Authorization2 (number)Term (mathematics)Uniform resource locatorSystem callCache (computing)AreaError messageDiagramFlow separationLogicControl flowFile viewerVariable (mathematics)Multiplication signDisk read-and-write headSoftware bugWave packetPairwise comparisonTask (computing)MereologyLevel (video gaming)Food energySemiconductor memoryTotal S.A.Different (Kate Ryan album)Arithmetic meanMultilaterationSet (mathematics)Process (computing)Heegaard splittingGame controllerAtomic numberCommitment scheme3 (number)Interactive televisionComputer animation
17:37
Context awarenessCodePresentation of a groupMereologyTerm (mathematics)Connected spaceCodeMultiplication signTheory of relativityDefault (computer science)NumberCASE <Informatik>String (computer science)AuthorizationAxiom of choiceData conversionMathematicsFinitismusMultiplicationType theoryMathematical optimizationRoundness (object)WritingTelecommunicationPerfect groupCommitment schemeInformation2 (number)Functional (mathematics)Inclusion mapAddress spaceMessage passingContext awarenessEmailSoftware developerKey (cryptography)Branch (computer science)Revision controlPower (physics)Food energyDifferent (Kate Ryan album)Process (computing)Group actionDecision theoryControl flowContent (media)Logical constantComputer filePresentation of a groupFile viewerPrandtl numberDescriptive statisticsPoint (geometry)Level (video gaming)WhiteboardComputer animation
26:21
2 (number)Exact sequenceRSA (algorithm)Multiplication signPoint (geometry)Sheaf (mathematics)Computer animationLecture/ConferenceMeeting/Interview
26:45
Computer architectureNumberRoundness (object)Control flowSheaf (mathematics)Software developerIterationMultiplication signComplex (psychology)Different (Kate Ryan album)MathematicsLine (geometry)Branch (computer science)BitMereologyProper mapPrandtl numberDisk read-and-write headLecture/ConferenceComputer animation
Transcript: English(auto-generated)
00:04
My name is Ben Lomax, my pronouns are he, him, and I am a lead software engineer at Kraken Technologies. We are an energy company trying to drive forward the green energy transition. We do have a booth here at the conference in the back corner, so come over for a chat, or if you want a chance to win yourself one of these cute little plushies.
00:26
So, before diving into how you can craft better quality PRs or pull requests, I want to spend a bit of time explaining why it's worth your time to practice structuring your PRs better.
00:45
So, to start with, back to fundamentals. Why even bother with PRs? Why not just let everybody on your team push code straight to production? It will certainly be faster. So, I think it's fair to say there are a number of pretty clear benefits. Firstly, code quality. PRs are good mechanisms for finding bugs before they hit production,
01:08
making good architectural decisions, keeping the code quality genuinely high, and if we're being a little bit more cynical, stopping malicious code from getting into your code base.
01:21
And finally, PRs make for excellent learning and teaching opportunities for a whole number of reasons. The problems are real with nuance and complexity. You hopefully care enough about the problem to see it through to the end. You can discuss with multiple people at the same time, or sorry, you can discuss solutions with multiple people at the same time, and you have a written record of the decisions and explanations to refer to afterwards.
01:46
Okay, so flipping the question around, what are the costs to developers being required to do PRs? Well, reviewing PRs takes time. It also takes mental energy. Now, both of these are limited daily resources
02:03
shared with all of the other tasks developers are expected to do as part of their jobs. If PRs are slow to be reviewed, they can also block work from getting deployed and can be incredibly frustrating, which can in turn kill team morale.
02:20
Okay, so why improve the PR structure? Since getting PRs reviewed is something that a majority of devs and teams do on a weekly or even a daily basis, it makes sense to try and maximize the benefits whilst minimizing the downsides. And so to paraphrase what I've said so far, it would be great if PRs could be reviewed sooner,
02:43
be faster and easier to review, make it easier to spot bugs, either unintentional or malicious, make it easier to spot bad architectural decisions, keep the code quality generally high and offer better learning and teaching opportunities.
03:01
Now, I'm going to argue that you can get all of these benefits by just improving the structure of your PR completely independently from the features that you're implementing in the PR or the language that you're using. Now, the rest of the talk we'll look at tips on how to do exactly these things.
03:21
Oh, and I'll mention now there is an underlying approach to all of these tips which I'll reveal at the end. So see if you can figure it out, see if you can figure out what it is before I reveal it. Tip number one, keep PRs small because large PRs discourage reviews. So if during my working day I decide that I have a bit of spare time and I want to review some PRs,
03:45
and I come across something like this, there is a very high chance that I will do one of three things. First of all, just not do it. I just don't have an hour to spend right now on this.
04:00
Option two is that I can start the review, burn out halfway through and then all run out of time and just not finish the review. Now I'm both burnt out and the PR didn't even get reviewed. What a massive waste of my time and mental energy for the day. Option number three is even worse because I might start the review, burn out halfway through and approve the PR anyway.
04:24
I've not finished reviewing the code properly, but I just don't want to be looking at this PR anymore. Now, none of these are good. At best, you force a reviewer to use up a big chunk of their time and mental energy, which are their finite resource for the day, and at worst this leads to code not actually being checked and critical bugs getting merged.
04:45
So, keeping PRs small is the absolute best way to help them get reviewed sooner, faster and more thoroughly, as reviewers won't be scared away or burnt out by them. Okay, so how do you keep PR small? First sub-tip let's go with, do only one thing.
05:05
So, keep each PR focused on one problem. Don't try and do multiple features or tickets in the same PR. If you find an unrelated bug that you want to fix straight away, that's fine, but maybe try and pull that out into its own different PR.
05:21
This also reduces the total time a PR stays open. If you do features A and B in the same PR and there's an issue with B, this one, it will block A from getting merged even if you coded A perfectly the first time. The second thing you can do is pull out any initial refactoring.
05:43
So, sometimes in order to cleanly change functionality, the existing shape of the code needs to be rearranged. This one depends a bit on how much initial refactoring you need to do. Sometimes these are small refactors and can just be done in a separate commit rather than a separate PR. Examples of when it can be worth creating an entire new PR for initial refactors might be when the current code,
06:08
yeah, when the current code is in such a bad shape that you have problems understanding the logic before even starting. Another example might be when you want to add a layer of abstraction to the code which your functional changes will then use in a different PR.
06:24
Okay, so, thing number three is splitting your dependency diagram. Now, this is where the magic happens. Sometimes you have a large PR with lots of interconnected codes that doesn't require any initial refactoring.
06:41
The key to breaking up these PRs is to find the logical components of the change and figure out the dependency diagram. So, for example, let's say that you want to allow people to control a heat pump in their home from your website. In order to do this, you need to do a number of different things. You need to build a user interface to let them actually do stuff on your website.
07:05
You need to send the desired temperature to a third-party API, the heat pump API. It will be useful to know the current temperature, so getting that from the API. Authorization checks, so a user can only control their own heat pump.
07:20
And you'll need some concept of a heat pump and a user in your system, so I would argue database models are usually how this is done. And finally, we want to avoid the API rate limiting us, so caching our cores is probably a good idea. All of these are required for a minimum viable product, an MVP, but this is very likely too big for one PR.
07:42
This leads back to the plus 1,200 minus whatever. To break this up, you need to decide which pieces are dependent on which other pieces. Now, I'm going to give some example dependencies, but I'll sum them up next slide, so don't worry if you don't follow all of them. So, looking at these, you need the user and the heat pump database models in order to do the auth checks.
08:06
You need the API code to be able to cache those calls. And arguably, you could create the UI independently and then plug it in afterwards, but maybe you also want to allow for parallel work, so you want to have the back end in place before it can be tested, sorry, so it can be tested before or once you get on with the UI.
08:21
And you certainly can't finish the UI without the auth checks in place, preferably with the caching working too. So, a dependency diagram of what I've just described for this feature might look something like this. Now, you might disagree with this diagram, and that is completely fine. There are different versions of this diagram which are all valid.
08:43
The important thing is that you create some version of this diagram, even if it's just in your head. Now, looking at this, it's clear that you can build the database models in one PR without needing any other pieces in place. Same with setting up calls to and from the API.
09:01
So, you have three PRs right there that you can start working with without any dependencies. Once the database models emerge, you can start working on the auth checks, and once the API calls emerge, you can start working on the caching, et cetera, et cetera. Now, all of this is context dependent as to what makes the most sense.
09:21
If the API work is super simple, then maybe it can be done in one PR, so that's the setting and getting from the API. Maybe it makes sense to do the caching in that same PR as well. Being able to split out the components and figure out the dependency diagram means that you can consciously decide how to best break up large features into small but still useful PRs.
09:46
And understanding how the components relate to each other will also force you to think more deeply about the code, making more deliberate architectural decisions and raising issues earlier. So, a couple of rules of thumbs, and to note all of these rules of thumbs are definitely my own personal opinions.
10:06
PRs shouldn't be more than a few hundred lines of code. With a caveat, unless you've moved some files around as part of a refactor commit, in which case they will blow up your numbers. And if you ever feel like apologizing for the PR being too big, you already know that it's too big.
10:22
Just break it up. Tip number two, create atomic commits. You should only express one thought with each commit. So, we've now broken our feature up into multiple chunks of sensible sizes. Let's look at how to structure the commits for one of those chunks, for one single PR.
10:43
The thinking here is going to be very similar to the last section but more tightly focused. Okay, so let's say that you've decided the heat pump work is small enough, sorry, the heat pump API work is simple enough that you're going to do all of that in one PR. Stepping through the same process as before, you figure out all of the required steps.
11:02
You have to craft the correct get and post syntax for the API. You need to parse the responses that you get from the API. You need to use the correct auth, sorry, account auth token so you control the correct heat pump for an account. You also need to create an initial connection using the organization's secret key and then adding tests for all of those things.
11:28
Now, whilst the total amount of code could be suitable for a single PR, doing this in one commit requires the reviewer to keep in mind multiple different trains of thoughts, checking that each one is correct, tested, and interacts correctly with the other ones and also with the existing code.
11:46
Doing this mental juggling is a very difficult task. Even if the reviewer is able to do all of that, they are likely to only be able to do it at a superficial level. Checking a single change is much easier as the reviewer can easily keep all aspects of the change in their heads at the same time.
12:04
This, in turn, means they are able to consider each change more deeply, which can include how it interacts with existing code, if the solution is well architected, and helping to find more subtle bugs in your code. They are also more likely to take the time to explain alternative solutions
12:20
or question your approach, which are fantastic learning and teaching opportunities for you. So, in this case, the dependency diagram might look something like this. You need all of the auth stuff in place on the left to be able to make the calls, and then you need to actually make the calls to parse the responses. I've dropped the test component because, ideally, each commit should have tests
12:43
in it for the new functionality you're adding or changing in that commit. Some of these may seem like useless steps by themselves. Getting an account's auth key isn't useful without actually using it, but there's also no harm in doing it. In having a fully tested function that gets the correct key.
13:03
Doing so creates a piece of functionality that can neatly be used in later commits when we're putting together the API calls. Now, this can mean that you create many more commits per PR, and I'm going to say that that is completely fine. The number of commits per PR is very rarely the problem.
13:21
It's almost always the size of each individual commit, which is problematic. A one-line change to fix a typo takes exactly three seconds for somebody to look at, consider, and move on. And it stops it from visually polluting the next more complicated commit. So, rule of thumb for this one is if your commit message mentions doing X and Y, each of those things will probably be in its own commit.
13:48
Okay, tip number three. Refactoring versus functional commits. Don't mix them together. Now, this is partly because reviewing, refactoring, and functional changes requires slightly different things from a reviewer.
14:01
Refactoring changes don't change the behavior of the code. For these, a reviewer just needs to make sure that the code at the start and at the end of the commit does the same thing. There's arguably no need to really understand the business logic of the code since you should not be changing it. This would be things like renaming variables, pulling code out into separate functions, and moving files to different locations.
14:23
Ideally, you also have tests to make sure that all of the functionality continues working. Functional changes, on the other hand, do change the behavior of the code. These are usually the reason that you're doing any of this work in the first place, and these are also the ones that require more from a reviewer.
14:44
The reviewer should understand what the code is doing and why, along with any relevant context around the changes, and also needs to make sure that the tests that you've written actually test the changes. So, in short, you want to make the functional commits as clean as possible,
15:03
so the reviewer has the best chance to catch any errors and to give insightful comments and suggestions. Now, here's another toy example. Imagine that you're partway through a commit that spans a few dozen lines and several files, and suddenly you run into this.
15:20
Now, we used to do things with foo, but now we're doing handle foo instead, and then also doing something with the outcome of this other function. Wait, why? Well, it turns out that a few files ago, this change was made. Here, the author has added some new functionality, and whilst here, has also renamed the function to be more explicit.
15:42
Both of these changes are useful and will help to raise the quality of the code. However, if you spend the majority of your valuable mental energy checking that the functionality, raising an error if foo.x equals two is correct, then maybe it slipped your notice or your memory that the author has also improved the name.
16:03
It's just not that important a change in comparison. But if it trips you up later in the PR, it might be enough to force you to lose your train of thoughts, need to find where the name was changed, and then have to re-immerse yourself in the functional changes again. Now, obviously, this is a toy example, and simple enough that you might all have been able to do this without problem.
16:23
But for larger, much more complex commits, this can be a pretty big hit. So, splitting up the refactoring and functional changes would look like this. First, a tiny commit, which renames the function to be more explicit. This takes three seconds for a viewer to be happy with and then move on.
16:41
Next, the functional change, when in the later section, it's immediately obvious that we're running the same function, handle foo, but now we're getting the return values from it and doing something with them. Hopefully, most of you agree that this is easier to follow. Now, one final thought is that I would recommend putting refactoring changes before functional ones if possible.
17:04
Refactoring the code is usually done to improve it. So, why not improve the code first and then add the complex functional changes to the improved code? Tip number four, give enough context.
17:22
So, context is important, but it does have a proper place. And context here is a broad term. It can cover why the work is needed, deadlines that affect it, existing code that it might impact, and business rules around the changes. Whatever it is, you, as the author, understand them much better than the reviewer does,
17:42
because you've already been thinking and working with them. So, it's your job to make sure that you've explained any context which is relevant for the PR. So, in my experience, there are about five different groups of context, and these should be communicated in different ways. The first is explaining why a piece of code is doing what it's doing.
18:02
These are usually business requirements or non-obvious technical requirements. For example, if you have a piece of code that removes an email address from a log, it might be useful to have a comment to explain this is because of GDPR requirements. This context is useful to have as comments in the code or as docstrings in the code,
18:21
preferably next to the code itself. The key here is that this is something that is useful for future developers to know about, so the code is a good place for it. The second type is giving context for a commit in the PR. This is usually to explain the rationale for structuring code in a certain way.
18:41
For example, I'm creating an empty function which will be filled in in a later commit, preferably with an explanation as to why you're doing this, is a good example. This information isn't important for future developers, because by the end of the PR, the function will have been filled in, and so this context should instead live in the commit message.
19:06
The third type is giving context for the PR as a whole. This is usually explaining why the work is required. It might point to a ticket on a board, but it can also include details like deadlines or other related PRs. This context should be added to the PR description,
19:21
visible before any actual code changes, because they are useful for reviewers to think about before even starting the review. Should a different team also review it? Should they prioritize this PR because it's extremely urgent? It also lets the reviewer know to keep in mind other code that the PR might affect that maybe the author doesn't know as well or hasn't considered.
19:43
Number four is asking for help. If you've written a code change but you're not sure if it's the best solution, or you have questions about the code, comments on your own PR are an excellent choice. They can sit right next to the code in question, and they can handle a conversation between multiple devs figuring out a solution.
20:04
It's worth noting that these sometimes get converted into the first type of context, where the outcome of the discussion is stored in code to explain why the final decision was made. Number five, explaining what the code is doing.
20:22
And this is the type of context that I would argue well-written code shouldn't need, and certainly not as comments in the code. If you find yourself explaining what the code is doing rather than explaining why, you can probably improve the readability of the code instead.
20:41
This typically includes things like renaming things to be more accurate or descriptive and breaking large functions into smaller, well-named subfunctions. Tip number five, write perfect code first try. I think I can move on. No. Rewriting history to only show ideal final commits. Now obviously most of us aren't able to write perfect code first time round.
21:04
However, we are all able to create PRs that look like we did by making sure the commits show the perfect path to the completed PR. So for example, if you create a variable name in one commit and change it in the next, you might as well rewrite history so it looks like you named it perfectly the first time round.
21:22
Looking at another toy example, let's say you create a function that gets a foo and then adds a default name to it, does some other stuff, and then returns. Now if your reviewer spends time explaining why setting the name to a literal string isn't a great idea, finds documentation to help explain the problem, puts together a tentative solution for you, in this case using a constant from the constants.py file,
21:46
only to find that you've already done this in the very next commit, I can guarantee you they're not going to be super happy about it. You've just wasted 10 minutes of their finite time and mental energy for nothing to try and help you. Even worse is if you decided you didn't even need foo to be named in the first place.
22:05
If this is how you coded it when you initially wrote the code, that's fine. By the time you ask for a review from somebody else, it is worse than useless to show the intermediate stage.
22:22
Please, do not do this. Instead, merge the deletion back into the original commit that creates the function to make it look like you created it perfectly the first time round. If there's genuine reason for leaving a commit in an intermediate stage, for example, if you want to create and call a function, but the contents of that function need to be created in a separate PR for some reason, that's also fine.
22:44
But please explain that in the commit message or even as a comment in the commits that the reviewer knows that's what you're doing. And a small note that this usually only applies to a branch that you're working on before asking for reviews. Once a PR has been reviewed or once your PR might have been branched off,
23:02
there are valid reasons why rewriting history might be a bad idea. Tip number six, review your own PR. Make sure that you find your own PR easy to read before asking for reviews. Reviewing your own code before asking others to do it lets you practice all of the previous tips.
23:22
By this point, you've hopefully made sure that your code does all of the right things and is tested, so this is your chance to focus on making sure that it's a really well-structured PR without needing to worry about any further functional changes. So, whilst doing this review, try to notice what you find easier or harder to review.
23:43
Maybe one of the commits takes a lot more brain power to review. Maybe you've had to reread it twice. Whatever your problems are, your reviewer will have a much harder version of those exact same problems. So, try to make sure that all of your PR is nice and easy to read. And on a more concrete note, you might spot a few obvious mistakes,
24:03
and so fixing those up before asking for reviews, it's going to save you at least a couple of rounds of to and fro with your review of fixing those up. And finally, doing this review yourself will help to build your ability to spot ways to make PRs easier to review.
24:21
The great thing is that this is then something that you can pass on to your team to help them craft easier-to-review PRs, which is a direct benefit for you reviewing those easier-to-review PRs. Okay, so what was the underlying approach? I mentioned that all of these had a shared underlying one, and hopefully none of you will be surprised to find out what it is.
24:44
Optimize for the reviewer. I can guarantee you that the easier a reviewer finds your PR to review, the better it will be for you as the author. There's one final topic which I want to touch on, which is more about optimizing the communication between you and your reviewer.
25:01
The human side. Because your reviewer is also human, for now. The final aspect of PRs I want to touch on is the one about human communication that happens during the review. Now, it's important to remember that the person reviewing your PR is, well, a person, but also that written communication is hard,
25:22
especially when it's a discussion about your work and how it could be better. Now, this is a relatively nuanced issue, which I don't think I can do justice to with the 30 seconds that I have left of this talk. So instead, I'm going to point you to an excellent article by Michael Lynch called How to Make Your Code Reviewer Fall in Love with You,
25:42
which goes into a lot more detail about this, as well as other tips on how to write great PRs. I'm also a firm believer that we should all aim to be better, kinder communicators in the workplace, because even tech companies are made up of humans, and it's those human connections that are hardest to build,
26:00
but I believe give the biggest reward in the long term. So on that note, I'll also point you to Michael's other related articles, How to Do Code Reviews Like a Human, which unsurprisingly looks at how to communicate in a more human way when reviewing PRs. Thank you. That's the end of my presentation. Thank you for your time in your country.
26:26
That's a wonderful talk, Ben. Great, and thanks for the time also. So now it will be Q&A section, so if people have some doubts, please come to the mic point and you can ask some questions or doubts.
26:43
So if you want to just make your way up to any of the mics or... So there's some iterations of work that need big PRs. There's no way around it, right? It happens.
27:01
My question is, when we separate, following your ideas and tips, when we separate these PRs, wouldn't it be harder for the reviewer to get an overview of the feature we're trying to release? Where is it? Over here.
27:27
I would say that's a valid concern, and I think where you draw the line at what is too big for a PR comes down to you and your team and how experienced you are.
27:41
It's quite possible that more senior developers are more able to keep larger chunks in their heads and are able to understand the architecture. My experience has been that it has always been possible to break up PRs into smaller chunks. Even if, because you can architect the solution up front,
28:03
decide what the overall feature looks like, do the work to break it up into small chunks, and then those small chunks become PRs. So the whole feature itself is big and it's complex, and it should have somebody understanding it end to end. But breaking that down can happen way at the start
28:22
as part of the work to understand it, to understand the dependencies and say, hey, this is something we can separate out, this is something we can separate out. Yeah, my experience has been I've always found ways, or my team has always found ways to break up the big stuff into smaller chunks. A follow-up question is, we've gotten a review of those PRs, all of them.
28:43
How or should we rewrite history to kind of collate those into just one big commit? Like when we're merging into mastering. Another excellent question, and the answer is once again going to be, it depends.
29:00
So I think this one really comes down to how your team or how your company wants to handle its Git history into master. So at Kraken, we tend to create, so we do trunk-based developments, we pull out from master, do a PR's worth of work.
29:22
That then gets merged back into the master branch, as is. So when your PR has been reviewed and you've made changes, you can choose to do it a number of different ways, right? My preferences are if it's a short PR, just merge the changes back in and I will re-review the PR.
29:48
If it takes me five minutes to review a PR, I don't mind reviewing it end-to-end a few times. If the PR takes me an hour to review because it's a particularly chunky one for whatever reason, or someone hasn't broken it up appropriately, then I might ask them to leave, basically fix up commits at the end,
30:03
and then merge those back in after the review. And basically say, I trust you are able to merge those back in properly. And if you don't trust somebody to merge them back in, maybe that's a wonderful bit of learning and teaching that you can do with your team to help them level up. Thank you for the question.
30:24
So there will be like five minute breaks. After that, the next section will start. So please give another round of applause for Ben.