Fighting the controls: tragedy and madness for programmers and pilots
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 160 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/33696 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 2017126 / 160
10
14
17
19
21
32
37
39
40
41
43
46
54
57
70
73
85
89
92
95
98
99
102
103
108
113
114
115
119
121
122
130
135
136
141
142
143
146
149
153
157
158
00:00
GUI widgetSoftwareIntelCrash (computing)Boundary value problemCausalityReading (process)Tube (container)Ising-ModellMultiplication signPlanning2 (number)Computer programEuler anglesCondition numberCrash (computing)Row (database)Metropolitan area networkPosition operatorError messageQuicksortState of matterGrass (card game)Maxima and minimaWater vaporArithmetic meanEmailEvent horizonWhiteboardTwitterSystem administratorGenderAbundant numberProjective planeSoftware developerFunctional (mathematics)Rational numberDependent and independent variablesResultantData recoveryOffice suiteDifferent (Kate Ryan album)WordForm (programming)Right angleGame controllerBitPlateau's problemKeyboard shortcutNeuroinformatikInsertion lossLengthControl flowCore dumpIrrational numberLecture/ConferenceJSONXMLUML
09:07
Row (database)Computer programmingMultiplication signComputer programCycle (graph theory)MereologyLogicProper mapAnalytic setDisassemblerPhase transitionRow (database)Software frameworkEndliche ModelltheorieResultantOperator (mathematics)SubsetProgramming paradigmCASE <Informatik>Direction (geometry)PlanningExistential quantificationNumberAuthorizationContext awarenessCognitionInsertion lossCausalityDecision theoryCharacteristic polynomialWordParallel portRootBitComputer programmingComputer scienceTelecommunicationTerm (mathematics)Game controllerSurgeryError message2 (number)Negative numberDimensional analysisExpected valueQuicksortRational numberBlind spot (vehicle)Sound effectData loggerSheaf (mathematics)Bit rateComputer animation
17:51
Error messageServer (computing)Crash (computing)CognitionMereologyEndliche ModelltheorieBlind spot (vehicle)FeedbackAlgorithmLogicMathematicsWindowContext awarenessGateway (telecommunications)Radical (chemistry)Disk read-and-write headBridging (networking)WordInsertion lossPlanningSet (mathematics)Computer programDifferent (Kate Ryan album)Parameter (computer programming)SpiralInclined planeCognitionCycle (graph theory)TouchscreenSoftware bugObject (grammar)Formal languagePerspective (visual)DistanceGame controllerLetterpress printingOrder (biology)Functional (mathematics)Web 2.0Software developerCartesian coordinate systemMultiplication signCausalitySound effectState observerCodeNeuroinformatikEuler anglesGroup actionInformationProcess (computing)Server (computing)Crash (computing)Natural numberDecision theoryDegree (graph theory)Error messageMoment (mathematics)Procedural programmingStrategy gameProgrammschleifeUniversal product codeWeb applicationComputer programmingDebuggerComplex (psychology)Link (knot theory)Compilation albumData conversionData managementPlotterCondition numberResultantCellular automatonNetwork topologyMultiplicationStress (mechanics)Wave packetMögliche-Welten-SemantikCache (computing)LoginUML
26:35
Execution unitComputer iconMaß <Mathematik>GUI widgetLevel (video gaming)AreaTerm (mathematics)Disk read-and-write headControl flowPerspective (visual)Multiplication signRow (database)ChecklistCharacteristic polynomialEndliche ModelltheorieSemiconductor memoryTelecommunicationMoment (mathematics)Decision theoryAxiomProcedural programmingComputer programObject (grammar)Game controllerSound effectProcess (computing)CausalityDescriptive statisticsComputer programmingGroup actionoutputAsynchronous Transfer ModePhysical systemDuality (mathematics)Level (video gaming)Water vaporPoint (geometry)Stress (mechanics)Blind spot (vehicle)SurgeryUniverse (mathematics)MereologyVirtual machineTwitterSystem administratorOrder (biology)Video gameTheoryNormal (geometry)PlastikkarteMathematicsCodeStrategy gameRule of inferenceLine (geometry)Software bugDisk read-and-write headCognitionJSONUML
35:19
Pattern languagePressureChecklistArithmetic progressionElectric generatorArithmetic meanGroup actionComputer programmingLecture/Conference
36:57
Computer programmingContext awarenessChecklistRegulator geneComputer programRule of inferenceMultiplication signCASE <Informatik>SoftwareReal numberSound effectComputer fileTouch typingUniform resource locatorElement (mathematics)SynchronizationDistanceMeeting/Interview
39:42
Multiplication signDisk read-and-write headIterationSoftwareWebsiteCombinational logicBlogCASE <Informatik>Overhead (computing)Row (database)Mathematical analysisChecklistFlowchart2 (number)Wave packetOnline helpComputer programmingBlock (periodic table)Expected valueRule of inferenceRight angleLevel (video gaming)WordLogic synthesisProcess (computing)CurveService (economics)Bit rateGroup actionSoftware testingTheoryBitNoise (electronics)Computer programLecture/Conference
Transcript: English(auto-generated)
00:05
I'm, um, that's me, Daniele Peroncida. That's what I look like on television. I work for a company called Divio, and we, it's a Swiss company,
00:20
I'm very lucky to work for them, and we do cloud hosting for Django and Python developers who don't want to be system administrators. So, something I wish I'd had a long time ago. And we do everything, of course, also on Django and Python. If you want to talk about Django and Python deployment
00:43
afterwards, please come and talk to me. I'll be very happy to talk to you about that at length. I'm also heavily involved in the Django community. I'm a core developer of the Django project. I am a board member of the Django Software Foundation. In fact, I discovered I'm the vice president
01:02
of the Django Software Foundation, but it's not actually as glamorous as it sounds, so. And come and talk to me about Django. My contact details are there. I'm not in any danger of saying anything interesting on Twitter, but if you want to follow it by all means, or email me, or just come and talk to me.
01:21
I like talking to people. But that's enough about me, because I want to talk about tragedy and madness instead, because these are subjects that programmers need to talk about. You should know this picture. This is Peter Bruegel the Elder,
01:44
Landscape with the Fall of Icarus. And you know the story of the myth of Icarus falling from the sky. And there he is over there, in the corner of the picture, falling into the sea unnoticed when the wax on his wings melted and the whole world is going by.
02:02
So you can imagine the fall of Icarus and a time of fear and agony and terror, and his father watching him helplessly. And you might think, well, this is a senseless tragedy,
02:20
a pointless, meaningless loss. And that's partly right and partly wrong. It's a senseless, meaningless loss. But a tragedy, if we understand it in the original meaning of the word, it's an ancient Greek dramatic form. And a drama is a story.
02:40
And stories are never meaningless. Stories are always meaningful because they tell us something. In fact, a story helps us to find sense in some senseless loss. So that's what a tragedy is. It helps us tell a story to make sense of something that doesn't make sense.
03:01
So what may have happened to Icarus might have no meaning in itself, but the story we tell ourselves and each other about it over and over again does have a meaning. And the meaning might be that it's really important to take your father's advice or not to be too proud or something like that.
03:26
Now, what happened belonged to Icarus. That was his time and it's gone and he's gone. But we still have the story to work with. So there was the time, but we have the story of the time
03:42
and in this talk, I'm gonna be talking about the story of the time as opposed to the time. I'm interested in what we can learn from the story of the time, from the record of the time. So, what's madness? Well, maybe you're familiar with this famous quote
04:02
from Einstein that madness, the definition of insanity, is doing the same thing over and over again and expecting different results, so Einstein. Actually, it wasn't Einstein. It was the writer, Rita Mae Brown, who said that. I don't think it was her. It was Mark Twain, apparently, who said that. And it might be an ancient Chinese proverb
04:21
and actually nobody knows who said that. But if you're a programmer, that probably sounds a little bit familiar, this experience of doing something over and over again and expecting a different result. Because that's what we do. We sit in front of our keyboards, run the same thing again,
04:40
and we're surprised and outraged when the same error happens again. And there's something slightly mad about this experience. It's an experience that programmers are familiar with. So what is going on? What happens to programmers when they're afflicted by this madness, when they're sitting there in front of the computer,
05:01
in front of their machinery, doing the same thing again because the results weren't what they wanted or expected, and they do it again, and they look at it in disbelief at the crash, and then despite the evidence of their eyes, all they can say is, damn it, this can't be happening.
05:25
On the 31st of May, 2009, Air France Flight 447 was flying from Rio de Janeiro to Paris. And about three hours after takeoff, it encountered icing conditions.
05:42
And ice crystals began to accumulate in one of the pitot tubes. These are little forward-facing tubes that measure air pressure and therefore air speed. And this caused them to give inconsistent readings, which in turn caused the autopilot to disengage.
06:01
And a warning sound went off in the cockpit. And they were in some turbulent conditions, so the pilot flying, Pierre-Cédric Bonin, had to adjust the plane's attitude. And as well as adjusting for roll, he pulled back on the stick, causing the plane to, causing the aircraft to climb and lose air speed.
06:21
And within about 60 seconds, the icing event was over and the instruments were reporting correctly. But the plane had climbed to its maximum altitude and it had lost air speed and lift, and it was in an excessive nose-up position and began to stall. When it stalls, the wings aren't supporting it any longer.
06:43
The stall alarm sounded in the cockpit. And the aircraft started to fall because it wasn't flying any longer. And the pilots fought the controls down towards the Atlantic Ocean. And they were trying to make sense of what was happening to them. And all the while, Bonin kept pulling back on the stick,
07:02
trying to climb to recover altitude and not realizing that what he was doing was keeping the plane in a stall. The captain, Marc Dubois, was on a rest break. He was called back into the cockpit and he sat behind the two pilots. And on the cockpit recording, you can hear the stall warning going off about 80 times.
07:22
And finally, just a few thousand feet above the water, the first officer and the captain finally realized what Bonin had been doing. And they correctly put the nose down so the plane could pick up speed and recover. And it was too late.
07:40
And the plane crashed into the ocean and everyone on board, 228 people, died. So this story of Air France 447 has the same kind of hold on people's imagination
08:01
as the story of Icarus. It has the status of a myth in aviation in which people think they're going to, or hope they're going to find some kind of deeper truth or the revelation of a significant mystery, maybe like the story of the Titanic, people also keep telling each other over and over again, trying to find some meaning in it.
08:21
How do we explain how an Airbus 330, one of the safest, most reliable planes ever built, fell out of the sky? There was no reason for it to crash. The crew were not incapacitated. The engines were working normally. The plane responded to the controls. There was nothing wrong with it
08:41
apart from a brief sensor error. And it functioned normally all the way down to the ocean. And it can send a chill down your spine to think about it, just for many reasons. And one of the things that's chilling about it is that when we think about what happened in the cockpit, it's like a glimpse into madness or irrationality.
09:03
And that's something that we find hard to look into. This is Laurie Anderson. I don't know if anybody knows Laurie Anderson or this piece, okay. She's a musician and performer, and her piece, From the Air, is a kind of story of an in-flight emergency,
09:21
and it starts with the kind of calm authority that we expect from aviation, and then it descends into demented confusion. If you don't know it, you should listen to it, not before you get onto an airplane necessarily. Now, she makes the point that in an air crash, there are two times. There's the time, and there's the record of the time.
09:45
There's the time that we live through when something's happening, and then there's the recreation of that time afterwards when we still tell the story of what happened. There's the time that belongs to the world now, maybe your flesh and blood and fear and so on, and there's the other time that belongs to a world
10:02
of technology and numbers and data. There's the time when things happen, and the time when we recreate the story. So our time now, that we live right in now, is contradictory, it's confused,
10:20
it's counted out in remaining years, or in some cases, maybe minutes and seconds, but there's the time that's gonna be pieced back together afterwards by the data recorders, sorry, captured by the data recorders, and put back together by the researchers and investigators
10:41
who are trying to work out what happened. So this is the time, and this is the record of the time. And our time goes in one direction. It unfolds inexorably, and it comes to an end.
11:01
But the record of the time can be played backwards, can be played over and over again, backwards and forwards, we can slow it down and we can pause it. So in their time, Air France 447 is gone. But in the record of the time, we can suspend the plane in its fall, and in the record of the time,
11:22
the passengers are still there, unaware, asleep, the pilots are still fighting the controls, the passengers don't know that madness has taken over in the cockpit, and that Pierre-Cédric Bonin is doing the same thing over and over again and expecting a different result.
11:41
So we've been talking about madness, and people talk about pilot error, but these are really bad terms. They're simultaneously judgmental and unrevealing. And aviation has a much better way of talking about what happens, and it calls it loss of situational awareness. Bonin, Robert, and Dubois lost situational awareness,
12:03
and it's what happens when a pilot loses the true picture of what's happening to them. When you lose situational awareness, all the cues and clues you need might be staring you in the face, but because of a contradiction with your expectations, you will fail to understand what they are telling you.
12:24
In aviation, needless to say, loss of situational awareness is absolutely deadly. It's a kind of cognitive breakdown and has a number of notable characteristics which we'll talk about later, very briefly. Blind spots, faulty mental models,
12:42
misjudgments about cause and effect, and poor decision making. So the question I want to explore is, does the same kind of madness, the same kind of cognitive breakdown afflict programmers and pilots when they make apparently irrational judgments
13:02
and decisions? And I want to look a little bit more closely at programming now to see whether we can find some deeper understanding of what programming is. So some people think that programming is a science.
13:23
They're all wrong. Don't get confused by the idea of computer science. It's not to do with programming. It's concerned with computation. Programming is much more like engineering or flying a plane. Some people think engineering is a science, and they're wrong also. Programming, engineering, and flying a plane are crafts or skills or arts.
13:42
What in ancient Greece, again, would be called techni, literally meaning skill. In fact, if you read Plato, one of the examples that he's always using of techni, of skill, is piloting a ship. Techni, of course, is the root of our word technology.
14:01
And we can find various interesting parallels between programming and piloting. So for example, I don't know if any of you have done pair programming, but it has its counterpart in the two-pilot cockpit where there's the pilot flying, who's operating the controls, and the pilot not flying, or the pilot monitoring, who is telling them what to do
14:21
and handling communications and overall flight operations. I'm interested in finding out some parallels that happen when things go wrong. Now, in nearly all crafts, you'll find two phases or activities, a primary and a secondary activity.
14:40
And they correspond in programming to programming itself and debugging, the creative phase on one hand and the troubleshooting phase on the other. So in the primary activity, it's synthetic. We put things together. In the secondary activity, it's analytic
15:02
and we break things apart. So we've got assembly and disassembly. In the creative phase, we are moving forward. In the secondary activity, the debugging part, we're moving backwards. In the first, we have imaginative and goal-oriented thinking.
15:22
And in the secondary activity, it's logical and problem-oriented. In one side, we ask questions like, how can we get there from here? And in the other, we say, well, how did we get here? Or we say, let's make something happen. Or how did this happen?
15:41
Or why did, often in programming, why isn't it happening? So that's the distinction between programming proper and debugging. And this happens in all crafts. But for programming, it's quite unusual in how much time we spend on this half of the cycle.
16:00
This really dominates a lot of our work. If you look at how much time you spend programming, most of it will be in that side rather than this side. And that's quite unusual. And that's the part that I'm interested in, not this part because we talk about this a lot as programmers. We've got methodologies and models and frameworks and paradigms.
16:20
And a great deal has been said about the practice of programming, about how teams work and how work can be planned and managed and executed and delivered and reviewed. But about debugging, we say very little. And that's odd because we spend so much time doing that. I think there are a few reasons why this is the case.
16:41
So one is that programming is a creative craft like cooking, well, sorry, like writing a book. And in a creative craft, it's acceptable for us to get things wrong many times until we get it right. Unlike crafts like surgery, for example,
17:03
when you go to a surgeon, you want to be sure that the surgeon has not only done this operation many times before, but it's going to be conducted in exactly the same way with the same result. Same thing when somebody makes you a meal. That's not the time when you want to hear that somebody is doing something groundbreaking and experimental.
17:20
On the other hand, when you're writing a novel or composing a jazz masterpiece, experimentation is fine. In programming, failure is unarguable. You don't get to argue about whether it worked or not because you're staring at a trace back. And it means that we don't have time now to stop and think whether that really worked.
17:41
If you're not satisfied with your piece of writing or your painting, you can sleep on it and think, well, maybe it does work or maybe the critics haven't caught up yet. Maybe if you're John Coltrane, you can say, no, no, it works. You're the one who needs to catch up with me. In programming, you can't do that.
18:01
It has failed. In programming, trying again is instantaneous or nearly instantaneous and cost-free. And that means we're tempted to do the same thing again immediately. Whereas if you're an engineer building a bridge or an airplane, that temptation is much less
18:22
because it's so expensive to try again. In programming, feedback is nearly instantaneous and cost-free. So it means we're tempted to try things just to see if they will work. You tend not to do that with aircraft, bridges, and dams. And finally, and this is a really important one,
18:41
debugging is a mostly private affair. There's the programmer and the code and the error and they're living in a miserable menage a trois. It's a misery that we deal with in private like being sick. It's not, we want to do it on our own. And so debugging stays in our heads.
19:01
Programmers love talking about programming, but you will never get a conversation out of a programmer who's in the middle of debugging. Afterwards, they'll say, well, there was this fantastic bug and I did this and then that happened and they'll happily tell you the war story. But the only thing they want to do when they're in the middle of debugging
19:20
is look at their screen. And so we fight with our code alone. So all these things come together and what they do is they tip the programmer more quickly into debugging and keep the programmer there. They're a vicious cycle which forms a tighter and tighter spiral with no inclination to step back or reconsider.
19:42
It's a closed cycle in which we don't want to talk to other people about it and get away from the code. And so there we are, programmers debugging, finding ourselves doing the same thing over and over again as if we expected a different result, fighting our controls.
20:01
And my argument is that the nature of debugging leads us towards cognitive breakdown, that debugging itself provokes loss of situational awareness in programmers. I'm arguing that the activity of debugging is responsible for producing the same kind of mental conditions, cognitive conditions,
20:23
that tore Flight 447 out of the sky. You're a web... You're a web application developer and you're trying to restart an application that crashed and you've checked the logs, you've flushed the caches,
20:41
you've restarted the application, you've restarted the web server and now you're logged in directly in the terminal inserting print functions into the code desperate for any clue. I said, damn it, this can't be happening. And next thing you know, you are in the grip of madness doing things that you would never dream of doing like changing production code on the fly or trying to reconfigure the web application gateway
21:01
and finally you discover what is happening. You've been doing all those desperate and futile things to recover from the crash on the wrong server. You were typing the commands into the wrong terminal window. You were desperate for clues and all the clues were right there in your face
21:20
and you couldn't see them and you didn't see them because you lost situational awareness and you were fighting the controls. Or you're debugging a complex algorithm that was working until yesterday when you made some minor and insignificant changes. And you know this algorithm inside out because you've been living inside it for a month
21:41
and now somehow its logic is collapsing all around you and you take it apart piece by tiny piece and you're checking and rechecking parts of it that you know can't possibly have anything to do with a problem. Changing it so much that your own mental model of it starts to break down and dammit this can't be happening. And finally you realized that in one of the changes
22:02
what happened is the algorithm is now receiving a different data set. You dived deep into a rabbit hole chasing after a completely irrelevant solution to your problem. There was nothing wrong with your algorithm. It was just receiving different data.
22:20
And you lost situational awareness because you were too busy fighting the controls. And then you fly your plane into the sea telling yourself that this can't be happening. Those are amongst the last words on the cockpit voice recorder from that flight.
22:41
So loss of situational awareness is a common enemy of the pilot and the programmer. What can we do about it? When it happens to programmers it makes us look and feel foolish. When it happens to pilots they die. So unsurprisingly aviation has developed very good strategies for dealing with it. Strategies to mitigate it. And they target the kinds of cognitive breakdown
23:04
that characterize it. So remember these I mentioned them earlier. So your eyes got a blind spot. At the right distance if you close one eye you won't be able to see the other dot because it will be in your blind spot where the optic nerves leave the eye. They're deadly blind spots.
23:20
You can't even tell when you have a blind spot. The only thing you can do is get another perspective which is why we have two eyes or maybe invite another person in to look at it or change your own perspective. We need mental models of the world as a kind of shorthand for navigating the world
23:40
and the more complex the world or the part of the world the more complex the mental model. When they break down we need to rebuild them. And we need to rebuild them from something objective that's not inside us. German's a very useful language. It has multiple words for thing. So one is ding as in thing and another is gegenstand
24:01
which means literally something like stand against. And that's what an object is. It's something that we can stand against. It's something outside of us. And we need something that we can stand against. Something that's outside ourselves in order to rebuild a mental model.
24:22
We don't have access to causality in the world. We can simply observe the world and make observations and judgements and we make judgements about cause and effect. We don't see cause and effect themselves. In complex, this is difficult enough at the best of times. In complex stressful situations it's much harder.
24:40
So the flight computers alerted Bonan to the problem. But when he pulled back on the stick and the attitude of the plane increased even further, the stall warnings stopped because the flight computers were receiving such anomalous information that they couldn't even process it.
25:02
And when he let the nose recover, which is what he should have done, the stall warnings went off again because now the plane could detect what was happening. So he was trapped in this horrific feedback loop in which his actions and cause and effect were 180 degrees out of phase with each other. So we need to find ways of rescuing
25:21
our judgements about cause and effect, pulling them out of these complex and tight stressful loops. And finally, our decision making processes need to be exposed and made explicit. Had Bonan said what he was doing, his reasoning would have been clearly and immediately obviously faulty.
25:43
As panic set in, his actions became more and more irrational. And the decisions he took, the way he was thinking went unchallenged because it stayed inside his head. And the rest of the crew made poor decisions too that were revealed in actions or implicit in remarks,
26:01
but they were never brought out into the light until the final moments. So aviation has spent over a century learning how to deal with these problems. Its lessons are woven into every pilot's training. In aviation, cognitive breakdown is dealt with
26:22
through the employment of very simple practical strategies and procedures. And it means that pilots don't have supernatural competency, supernormal competency, they are ordinary people doing ordinary simple things. And those simple things are what make them safe.
26:46
They use checklists. When you're flying, it doesn't matter how many times you've done a procedure, even if you've done it several times that day, you do it from a checklist. So here's a checklist for a 747 normal procedures manual.
27:03
This will be on a card in the cockpit. This is for the primary activity of flying, the programming part of flying, the forward-going part. And it covers everything. And they also have checklists for emergency procedures, for the debugging, the secondary activity of flying.
27:21
There are checklists for absolutely everything. They're big manuals. Each pilot has one down by the side in the cockpit. In the few moments after the autopilot disengaged, Bonan should have been, actually Robert should have had the checklist on his lap,
27:41
working with Bonan and going through the checklist with Bonan. Instead, Bonan fought his controls, relying on his own memory, his skill, and his experience. A checklist, so all those characteristics of cognitive breakdown, a checklist serves
28:00
as an alternative perspective to avoid blind spots. It helps rebuild a mental model. It gives us something to stand against, a kind of objectivity. It helps guard against poor decision-making. We use checklists often, system administration,
28:23
sometimes in programming. I've never seen a checklist used in debugging. Communication in aviation, it's always explicit and verified. So we'll say, you have the controls. I have the controls. Descending to flight level 350,
28:40
descending to flight level 350. Again, it brings in the other perspective. It rebuilds a mental model. It helps reconnect cause and effect and exposes our thinking processes. This is what happened in Flight 447. And you can read this and weep because Bonan had been pulling back on that stick, trying to climb, and the other pilots did not know because he hadn't told them.
29:01
And Robert, on the left-hand side of the cockpit, had been pushing forward, trying to lower the nose. But the system was in dual input mode, where it sums the commands, and Robert hadn't made his decisions or actions explicit either. Nobody in the cockpit knew what was happening
29:20
until it was too late. We see all this all the time in debugging, and it happens to me I can struggle with something for hours and hours. I go and ask IRC finally, describe the problem, and the answer pops into my head because I've made it explicit. You don't need a co-pilot or IRC. You can just tell it to the wall.
29:42
Often the act of description will turn out to be 90% of the work towards the solution. Stress and panic drive us tighter into our loops of bad decision-making. One thing that aircraft like to do and do really well is fly. An airliner wants nothing more than to fly level
30:00
and in a straight line. Had Bonan let go of the controls when the autopilot disengaged, or at almost any point in the next few minutes, they would have been okay. He could have done nothing, and they would have all lived.
30:21
So that's quite good advice, yeah? So simply doing nothing not only allows an aircraft to recover, it allows a pilot to recover. Now, a pilot can't go on doing nothing indefinitely, especially when, for example, an engine's on fire, but they can do it for longer than you think, even in an unexpected emergency. But programmers, we can leave the code alone
30:42
almost indefinitely. If you go on holiday and come back through later, that code will be waiting patiently for you to come back and tackle it again from where you left off. If you walk away and come back, that's fine because there's no pressing need for you to do anything.
31:01
So often the best thing that we can do is get away from the code, because when there's no pressing need to do anything, then we should do nothing. And I've solved more problems falling asleep or riding my bicycle or doing the dishes than I have by fighting with the controls, and I've certainly caused less damage, which is...
31:22
So we could learn to do these things. We can adopt checklists. How hard would it be for us to develop systematic checklists for debugging and programming? We can improve the way we communicate so that we do nothing and even think nothing
31:40
without communicating, without exposing it to another person as we do, for example, in pair programming, although we rarely do pair debugging. And most urgently, I think we need to learn is to stop fighting the controls. So these are all ways of getting away from a failed, failing perspective, all ways of stepping outside yourself
32:01
and finding something else in the universe to get a grip on to help you. They're ways to reset your thinking and prevent bad judgments and decisions. And they can work in programming. They work in aviation. We're talking about Flight 447 not because it's something that happens, it's because it's something that never happens.
32:21
A once-in-a-lifetime anomaly. Flying on an airliner, I can assure you, is the safest way to travel. In the meantime, other disciplines like nursing and surgery and the nuclear industry are also learning and adopting these strategies. And programmers love to think and talk about programming and about their practice and methodology and theory,
32:44
but they don't tend to theorize about debugging, as though it might be an embarrassing private misery. And I think that needs to change. So as I said, in an aircrasher there's the time and there's the record of the time.
33:01
And the record of the time is scrutinized and replayed endlessly in order to force it to yield up its secrets so that we can learn from them and make things better. And we don't do that in programming. As soon as we've, you know, the last thing you want to do when you've had a horrific debugging session is look back at it. You just want to get on with your life and work.
33:22
But we need to learn to dwell on our failures and confront them the way aviation does. Because we're so often caught up in the time and we won't look at the record of the time. By the way, these techniques also work for other things,
33:40
you know, other crises and emergencies like a malfunctioning washing machine or relationship. The truth is these things work in aviation because they're built into it at every single level. Not because somebody heard about them at a conference talk.
34:00
Aviation is safe because they're a defining part of its culture and processes. The truth is that as individuals we will only get so far. They work for aviation because they're a systematic approach adopted throughout the industry. Until our debugging mistakes routinely kill us,
34:20
our colleagues and our customers, I think probably our habits as programmers won't change very much. Our industry won't change. And this is an industry, this is just deep in the culture of the industry. Aviation paid for its lessons with people's lives.
34:43
People died to make it safe. In aviation, every time you break a rule, however simple, however banal, every time you decide you don't need to follow a checklist, every time you do something the way you happen to feel like doing it at that moment, you are literally dishonoring the dead.
35:05
In programming, it's not gonna change. Thank you very much.
35:32
We've got a few minutes for questions and yeah, let's have some questions. Maybe you want someone to make some comments but let's have questions first. So we'll start at the front, please.
35:42
Or maybe you should. To me, it seems like your conclusion is unnecessarily pessimistic. It seems like your conclusion is unnecessarily pessimistic. So for example, you changed my opinion that perhaps I should use checklists and that seems kind of like a good idea.
36:02
So maybe by talking more about we actually can achieve some progress in the industry. What do you think? We might. I think that yes, I am pessimistic and cynical. But let's keep talking.
36:27
Hi, thank you for your talk, it's very interesting. Do you think there's a role for professional bodies of programming similar to how doctors, pilots,
36:40
and other professionals have their own body which regulates their behavior, their actions, and enforces penalties within their own industry? Conceivably, yes. And I think there is a kind of movement for this starting,
37:02
for example, in academic and research software. I think there have been one or two talks about this already at this conference. And the keynote, Catherine's keynote yesterday about ethics in programming also touches on that. But I mean, look at us.
37:22
Are we the kind of people who are going to, or our industry, does it look like a very regulatable industry? I think one thing that's gonna be really, I better stay here. One thing that's gonna be really interesting, look at the companies that are building self-driving cars, Apple and Google. When you've seen how Apple's programming works
37:45
on something like iCloud synchronization, you think, okay, you're going to be making self-driving cars, that's interesting, or Google. Maybe the regulation will come in, not from the programming side, but from the engineering side and the real world effects.
38:03
But I can crash my programs 10 times a day and nobody even notices, so yeah. Thank you, Daniel. I am aware that checklists are actually used in programming, interestingly in aviation software. So you find there are things like, has the variable been initialized exactly once,
38:23
and this is because of regulations. So I would like to re-ask the question, are you aware or is anyone aware of checklists being used in Python? It's interesting you say that.
38:41
And we sometimes use checklists in programming, but never in the debugging part, yeah, in that secondary activity. And yesterday I was doing a workshop on my DevioCloud workshop, and I was thinking, why the hell is this URL conf not working? And I was literally typing it into the whiskey.py file.
39:05
I was typing, adding URL comps into the wrong file, and finally, my checklist has make sure that you're in the right file. You mentioned that you spent several hours trying to debug something,
39:20
and then as soon as you wrote it down, you discovered what the problem was. A few years ago, I don't know if you're aware, an Akamai engineer wrote the 15 minute rule. In case no one else has ever heard of it, it basically says try debugging for 15 minutes. If you can't solve it, write it down and ask someone else. I think it's great advice to live by, and at least then you only waste 15 minutes. I hadn't heard of that, who did you say wrote that?
39:42
It's some Akamai engineer, Akamai the CDN company. If you just Google the 15 minute rule, you'll find a blog post by him about it, it's a really great read. In case you didn't hear that, the 15 minute rule, debug for 15 minutes, and then tell someone else, make it someone else's problem, it sounds like.
40:01
So thank you very much for your talk. I think when I saw you putting up analysis and synthesis, maybe that fits very well to highly iterative processes, and I think all these programming things that we're using like fail fast, fail often, that should lead us to doing more iterations more quickly
40:21
and maybe your proposals here could help us working quicker on iterations and acknowledging debugging steps as in a very serious step of the activity that we're doing and not like you failed, you have to debug, but more like, okay, you have to do debugging now. And two comments,
40:40
I'm aeronautical engineer working for Lufthansa, so congrats on very good story that you told and very accurate, and you should try making this into a keynote, in my opinion. Thank you. Well, thank you. Every time I get a trace back or a whatever error, I'm surprised every single time.
41:06
You know, really? And so this is the kind of mentality we're dealing with. You know, how long have I been a programmer? And I am still surprised that, oh, that really has to change.
41:21
I don't know what, you know, maybe my head's just a bit, I should bang my head hard enough so the idea somehow gets in, but there's a real block there about expectation of failure. We never expect to deal with it. The best thing, the closest we've come, I think, is automated testing. But that still doesn't help us with debugging.
41:44
First of all, thank you very much for your talk. I think we are programmers, I mean, as a programming role, we don't have the most stellar, let's say, record of, you know, debugging things by methodology. Because I was actually reading a blog by a guy
42:01
who started as a site reliability engineer in some small, like, online casino or something that became one of the largest online casinos in the world and he became a head of site reliability engineering group. And I think we have a lot to learn from site reliability engineers because they have a lot of methodologies
42:21
how to debug, right? Because they don't debug code, they debug infrastructure, right? They debug services. And, well, in this blog, so I never really worked with the site reliability engineers, but in this blog he said that he spent a year going through all the tickets that were created like since the birth of the company, right? That something failed, something went wrong
42:41
in site reliability, right? And he classified everything that went wrong, right? And he created a huge, pretty much like combination of like checklists with a flowchart of things that can go wrong. And then like any junior software reliability engineer can basically just go through it. So what I'm saying, yeah, we don't have the best record,
43:01
but there are actually people, right? Who have way stronger, more robust methodologies, and yeah. I think we can definitely learn from our site reliability engineer colleagues because they, you know, they can't just do nothing as easily as we can as programmers.
43:25
Hi, I think I'm also a convert to the debugging checklist idea. That's excellent. Although I did wanna ask about the overhead of constructing checklists. I think we're in an industry that's unique and that no one that I know of has died from one of my tray specs.
43:43
But is there an overhead to always using a checklist given how easily and cheaply we can fail? I don't know. As I say, my debugging checklist so far has got one item on it. It took me 20 seconds to write and since I've started using it, I've always forgotten to use it. So I, you know, it's just not in,
44:04
I was gonna say in our natures, it's certainly not in our training. I have no idea what the overhead might be. Probably less than the time we've spent futilely debugging things. Do you have another question? So maybe congratulations because your talk is a real success.
44:24
Sorry, say it again. Your talk is just a success. Thank you. Welcome.