Improving the culture of automated testing in FOSS
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 490 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/47364 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2020441 / 490
4
7
9
10
14
15
16
25
26
29
31
33
34
35
37
40
41
42
43
45
46
47
50
51
52
53
54
58
60
64
65
66
67
70
71
72
74
75
76
77
78
82
83
84
86
89
90
93
94
95
96
98
100
101
105
106
109
110
116
118
123
124
130
135
137
141
142
144
146
151
154
157
159
164
166
167
169
172
174
178
182
184
185
186
187
189
190
191
192
193
194
195
200
202
203
204
205
206
207
208
211
212
214
218
222
225
228
230
232
233
235
236
240
242
244
249
250
251
253
254
258
261
262
266
267
268
271
273
274
275
278
280
281
282
283
284
285
286
288
289
290
291
293
295
296
297
298
301
302
303
305
306
307
310
311
315
317
318
319
328
333
350
353
354
356
359
360
361
370
372
373
374
375
379
380
381
383
385
386
387
388
391
393
394
395
397
398
399
401
409
410
411
414
420
421
422
423
424
425
427
429
430
434
438
439
444
449
450
454
457
458
459
460
461
464
465
466
468
469
470
471
472
480
484
486
487
489
490
00:00
Interior (topology)Digital photographyFundamental theorem of algebraSoftwareBitStatistical hypothesis testingStatistical hypothesis testingNeuroinformatikState of matterProgrammer (hardware)MereologyComputer animation
02:11
Statistical hypothesis testingSoftware bugMilitary baseCommitment schemeProjective planeOpen sourceStatistical hypothesis testingCodeMetric systemSoftwareSign (mathematics)Computer animation
03:57
CodeStatistical hypothesis testingProjective planeArchaeological field surveyCodeMetric systemBitResultantHazard (2005 film)Library (computing)FlagLine (geometry)NumberSynchronizationMathematicsSoftware bugCartesian coordinate systemVariety (linguistics)Core dumpTouch typingComputer animation
06:17
CodeComa BerenicesStatistical hypothesis testingSoftwareFreewareMetric systemCodeEndliche ModelltheoriePrice indexOrder (biology)Open sourceSoftware developerStatistical hypothesis testingSign (mathematics)Multiplication signDiagram
07:20
Software engineeringMachine codeState of matterField (computer science)Software engineering
08:01
SoftwareStatistical hypothesis testingSystem programmingSoftwarePhysical systemCompilerTerm (mathematics)Software developerFreewareInformation securityBitPoint (geometry)Arithmetic meanProper mapGoodness of fitSoftware bugEuler anglesStatistical hypothesis testingOpen sourceCodeComputer animation
10:26
1 (number)Endliche ModelltheorieNumberSoftwareComputer animation
10:55
Software developerSoftwareSoftware developerCodeGoodness of fitTerm (mathematics)SoftwareMultiplication signStatistical hypothesis testingProjective planeAutomationPlanningFreewareSuite (music)CASE <Informatik>Limit (category theory)Software bugFocus (optics)Computer animation
13:13
Statistical hypothesis testingCodeStatistical hypothesis testingSuite (music)State of matterMathematicsNumberLimit (category theory)Natural numberSoftwareProjective planeFreewareEndliche ModelltheorieSoftware developerBasis <Mathematik>Interactive televisionComputer architectureSoftware bugGraph (mathematics)VideoconferencingLibrary (computing)Term (mathematics)Field (computer science)Computer animation
17:41
User interfaceMenu (computing)Convex hullSound effectEndliche ModelltheorieCategory of beingReading (process)Statistical hypothesis testingProjective plane
18:24
Rule of inferenceSoftwareStatistical hypothesis testingGoodness of fitComputer animation
18:54
Statistical hypothesis testingLimit (category theory)Statistical hypothesis testingComplex (psychology)Vapor barrierCodeProjective planeMathematicsOrder (biology)Goodness of fitStatistical hypothesis testingOnline helpSoftware maintenanceComplex systemSoftware frameworkPoint (geometry)SoftwareComputer programmingCASE <Informatik>BitSoftware bugFreewareOpen setProcess (computing)Computer animation
23:05
NumberStatistical hypothesis testingProjective planeRight angleSoftware bugCross-correlationSoftware maintenanceGoodness of fit
24:43
CodeStatistical hypothesis testingStatistical hypothesis testingProjective planeCovering spaceInterface (computing)DatabaseMereologyCore dumpStatistical hypothesis testingNumberFunctional (mathematics)Variety (linguistics)Interactive televisionPhysical systemServer (computing)Process (computing)Kernel (computing)Graph (mathematics)Goodness of fitCommitment schemeCycle (graph theory)Diagram
29:29
FacebookPoint cloudOpen source
Transcript: English(auto-generated)
00:06
Good morning, everyone, from me too. First of all, it's great to see so many people here today. With all the FOSTA part is going on, that's kind of a surprise, right? Because people don't tend to show up for the morning sessions after being drunk, for example.
00:24
Thank you very much for coming here today. Before we start, who can tell me? Louder. Louder? Okay. Okay, so before we start, who can tell me what's in the picture here? Any guesses? Shout if you know. Right, excellent. That's the venerable electronic and numerical
00:45
integrator and computer. And that's a photo of photoviniac from late 40s, early 50s in the US. And I find this picture very, very fascinating for various reasons. The first is all the wires, they're just mesmerizing. And I don't know if I should be
01:02
ashamed or proud that my desk looks like this sometimes, but you know. But the most important thing here are the two programmers. In particular, the programmer on the right, who is Betty Holberton and is the creator or rather the inventor of the Breakpoint. The Breakpoint is this fundamental tool
01:20
of debugging, but also of testing, especially in those days because how did you do testing in those days? Well, you could test like end-to-end and test the end result, but if you wanted to test anything internal, then you just had to stop the world and check the hardware state. And that's where the Breakpoint is very useful.
01:40
And we have come a long, long way since then. We have better tools, we have better processes, right, but improved practices. But at the same time, it feels that we may be a bit stuck in this era also, right? We tend to forget to underutilize the tools and forget all the lessons. And in this talk,
02:00
I'd like to explore this from a free software perspective, where things I feel go wrong and how we can improve. So this is a picture I took perhaps 10 years ago from a tourist shop, and that sign in the picture had caught my attention immediately, even though I hadn't
02:23
realized what was going on from the start, right? And take a minute to see if you can find out what's going on wrong here. But this was my subconscious telling me, you know, ringing bells and alarm bells, something is wrong, danger, danger. In the same way, over the years, I had internally developed this impression that not
02:42
everything is rosy in the automated testing world concerning free and open source software. It was mostly back then based on anecdotal evidence of a general feeling that, you know, I was seeing project bugs that were inconsistent with having comprehensive automated testing.
03:07
And this kept bugging me for many years. At some points, I decided to explore. But how do you explore this, right? How can you make just a large-scale survey and make, you know, have some kind of conclusion for FOSS in general, right?
03:23
So the canonical metric for test comprehensiveness is test coverage. But this is quite hard to extract from various code bases. And for example, you have different languages,
03:42
different tools, different build systems. So I was thinking, what else can we use? And I considered two metrics in the end. So the first is test commit percent, and the other is test size percent. So test commit percent is the percent, the percentage of commits in a code base which affect the test in some way.
04:07
And it makes intuitive sense that if this number is large, then the tests are being taken care of, that they are developed in sync with the code, and thus they are probably more comprehensive.
04:21
And similarly for the test code size, you know, neither of these metrics is fail-proof, but I think they are good enough for at least large-scale surveys. And then the other question is on which projects do you run this whole thing? Do you extract the metrics from? And I decided to use the Gnome and KD projects,
04:42
basically all the thousands of subprojects that make up these top-level projects. And the reason is that they contain a wide variety of subprojects that range from, for example, end-user applications to core infrastructure libraries to graphics libraries,
05:00
command-line tools, so a bit of everything basically. And yeah, I run some tools I developed to extract the metrics, and who wants to hazard a guess what this was going to look like? Okay, no brave men here, but this is what I got.
05:21
So basically, notice the very tall line at the zero point. That means that about 50% of the projects don't have any testing at all, and that's kind of sad, right? And then the other line at 80% tells us that in 80% of the projects,
05:43
at most 1 in 10 commits affect the test. So think about this for a minute. For every nine changes that you make to the code base, bug fixes or features, you only have one commit that touches the test. And that's a very big red flag. I mean, it's not the end of the world perhaps,
06:03
but it's something very fishy here. And I got similar results for the test code size ratio for this project. So one thing I also did was I got ratios for other projects that I knew that were better tested, just to have something to compare to.
06:25
And you can see that the metrics are much better here. And this is a good sign that our metrics are actually working correctly. So, okay, so we have some indication that not everything is correct
06:41
in the free software world concerning automated testing. But why is this so? That's the burning question, right? And there are various reasons. There are some reasons that don't have, that are not particular to free and open source software. For example, many people feel that codes,
07:00
yeah, testing code is not worth their time or perhaps it's too expensive, especially in the beginning. And that doesn't really have to do with free software or proprietary software or any other development model. But I believe there are a few reasons that are particular to free open source software that are worth exploring. And in order to do that, I want to go back to the past,
07:21
back to 1968 when this conference happened. This is the first software engineering conference ever made. And it was organized by NATO. So there are many of the big minds of the era gathered and discussed all the things
07:40
that they thought need improvement in the field. And they came up sometimes with solutions but often with very witty and insightful codes about the state of things. And if you haven't read the proceedings, I highly recommend that you do. So here's one interesting quote from the proceedings there.
08:04
This is from Alec Galini. He's often credited as being the first, the creator of the first compiler. And he said, "'Software manufacturers should desist from using customers as their means of testing systems.'" And in the 50 years since this was written,
08:22
I'm not sure that we have learned the lesson properly. And sadly, I think that for free software, it's a bit worse than the general industry standards. So why is that? You see, in free software, having a bug in the code is often not considered to be such a big deal, right?
08:41
The software is provided as is and without any warranty. And to be honest, that's completely fair. Myself as a free software developer, I wouldn't have it any other way. But at the same time, that means that there's this conception that fixing bugs, that bugs are cheap and fix them is also cheap.
09:02
And in reality, that's a somewhat pragmatic attitude to have because why spend resources, precious resources, trying hard to prevent bugs when fixing bugs is actually quite cheap from the developer perspective? But here's the caveat. This is a very developer-centric idea.
09:22
And from the user perspective, things may not be so simple. For example, if as a user you have lost data, if as a user you have had your system compromised because of a security bug, then you certainly don't feel that bugs are cheap.
09:41
And yeah, at this point you may say, okay, so are you saying that there's no, that free software sucks in terms of quality? No, because there's another thing, pulling the rope in the other direction, and that's professional pride. Free and open source software has a lot of that.
10:00
And the developer is in the spotlight for the good and bad things, right? We have good blame, we have good praise, everyone knows what we're doing. So it's just that many incentives in free software seem to point to a more reactive rather than proactive approach, right? So, and this is something that we're going to see
10:21
also in the next topic. And moving to the next topic, I want to go back to the future, to 1999, when this book was published. So this is The Cathedral and the Bazaar. This book explores two different ways of developing free software, the cathedral model and the bazaar model, and this book in particular supports the bazaar model.
10:44
And it contains a number of quotes or lessons as they're mentioned about how free software works or should work. And here are two very interesting ones in the book. Every good work of software starts by scratching a developer's personal itch, and the other, release early, release often,
11:02
and listen to your customers. So for the first one, you may, first of all, you may have doubts about the absolute terms in which it's phrased, but one thing that is true is that many software projects in the free software world start
11:20
by scratching a developer itch. And often, they start small and without any plans for significant growth or adoption. And at this point, the incentives to have an automated test suite to spend time on this are limited. At the same time, even in cases where projects start with more lofty adoption goals,
11:44
then they may follow, to a great extent, this mentality. Release early and release often. And this mentality has great benefits, but if followed to an extreme, and also if followed very early in the project development,
12:03
leads to projects placing the focus too much on features, too much on becoming as relevant to the public as possible as soon as possible. And often, the incentives again here, from that perspective, spending limited time and resources on writing tests
12:22
may seem like a bad use of time. But regardless of how a project starts, typically if it starts to grow large, then bugs start creeping up. And at this point, developers say, okay, so perhaps now it's time to have an automated test suite. But you know, it's too late by then in most cases.
12:43
The code has become test-friendly. It's difficult to add tests at this point, typically, and most projects just don't do that. So again, we see here this idea of tests as a forethought compared to tests as an afterthought.
13:04
So like a reactive versus proactive approach. So now that book contained another very interesting code, which is probably the most well-known in the free software world. Given enough eyeballs, all bugs are shallow.
13:22
And this is called Linus's Law, in honor of Linus, Linus didn't say that, it's just in the book there. So and it refers to code reviews. So let's see, code reviews, free software is very privileged
13:40
because code reviews are in the heart of their development model. And that is because only limited people have access to the code base, so every change, every merge request, pull request that needs to come in needs to be reviewed at some point. Right, unfortunately some projects take again this to an extreme,
14:02
and the trust on code reviews is so great to, you know, that other practices include automated testing are forsaken. So don't get me wrong, code reviews are one of the best ways to maintain quality in the code. They help maintain a design, a sane design,
14:21
they ensure that all the changes align well with the architecture of the project and the overall goals of the project. And they also help catch bugs, but only some of them and some of the time. And the problem here is with not the code review idea itself, the idea of code reviewing,
14:41
but the fact that we are doing the code reviewing, humans. And humans, we have inherent limitations, right? Our brains are limited. We are great at creative thought, but we are also great at overlooking details. And also when there are gaps in our understanding, we are very happy to fill them
15:00
with our own unicorns and rainbow-based reality of things. In addition, as the code base grows, all the interactions and the possible states grow sometimes exponentially. And it's very difficult for us to keep all that state in our mind
15:20
and so follow the code paths and the implication that one change may have. And so in theory, this problem of human limitation is offset by the open nature of the code and the fact that we have enough eyeballs. But what number is enough?
15:41
Even the biggest free software project, Linux kernel, for example, only have a limited amount of reviewers checking the changes. So it's good. I mean, more is better than one, right? But at the same time, that enough is very indeterminate. So sometimes it works, sometimes it doesn't.
16:04
And that means, in the end, for all the reason I mentioned, that code reviews by themselves, as excellent as they are, cannot stand as the only tool. And we need to be careful not to place all our trust in code reviews and forsake other tools that we have.
16:23
So for the next topic, I'd like to explore a more fundamental question. And that is the question of learning. How do people learn in general? They basically, and in particular, how they learn that some software practices
16:40
are beneficial to them so they can follow them. So they learn by example. They learn by mimicking what the best in the field are doing, and they learn about this from books. They learn about this from videos. They learn about this, if they are lucky enough, they have a mentor. And again, free software is very privileged here,
17:01
because we have this huge library of Alexandria of code. We can go into this library and explore. And this is indeed what many people do. They explore this library and try to imitate what their role models are doing. And when they go into this library, what do they see?
17:21
So based on the graphs I showed you earlier, then they see a big pile of nothing, in terms of automated testing. And on the other hand, the other extreme, they have this monstrous code basis with huge test suites and arcane test suites. So this is an example of a test
17:42
that belongs in this category. On purpose, I made the font very small, because it didn't want to scar you for life, so please don't read it in detail. Right, and the whole thing, this creates a negative network effect. So why should you bother doing automated testing
18:01
when your role model project doesn't, or when you see something like that, and I mean, this is ugly, so you don't want to do something as ugly as that. And we need to be very, very careful with the examples and the culture we create and promote, because it's really a slippery slope, and things can get really out of hand.
18:21
I'm sure the 80s started with all the best intentions, and then we got this, right? So this is not something that we want to happen to free software and automated testing. So it's very important that we should create more and better examples. And as the saying goes, beware of advice,
18:43
but follow good examples. So I'm now going to break that rule a bit, and because I'll try to give you some advice, and starting with this. Embrace automated testing from day zero. So as we talked about before,
19:03
and perhaps many of you have experienced, as the larger project gets without testing, the more difficult it is to add testing after the fact. And this is particularly important for free software,
19:21
because as I mentioned before, the incentives are there for being reactive rather than proactive. So having tests from day zero is an important step. But it's not enough to just start with tests. We need to maintain a good testing culture for the lifetime of the project.
19:43
And this is where the next piece of advice comes in. So we need to set the bar high, first for ourself as, for example, maintainers of the projects, and also set the bar high for contributors also. And so yeah, basically we need to lead by example.
20:03
It's often the case that contributors may need some help to get over that bar. And we can do that a bit proactively. For example, we can have a very nice and clean test suite, easy to understand, so people can just jump in. We can use frameworks that,
20:23
testing frameworks that are well known so that the barrier for entry is low for people. And sometimes someone will come to the project, will have an interesting bug fix, but they're not interested in writing a test. They'll just throw it at you. And it's up to us to try to encourage them,
20:40
but if that doesn't work, we'll write the test ourselves. This is fine. And finally, the last point here, be humble. And you may wonder, what does humility have to do with automated testing? And I say basically everything, because automated testing is an acknowledgement
21:02
of our inherent limitations, like as I mentioned before, in ourselves and the limitations to invariably create infallible complex systems. It's basically humility in the face of overbearing complexity. And we need to be humble in order to accept
21:20
all the help we can get, from tools like automated testing, like code reviewing, whatever works for people. So, an interesting note here. When I started out programming many, many years ago, I had the impression that the more experience I got, the more I got exposed to projects and people and teams,
21:44
the more confident I would become in just sitting down, getting in the zone, and streaming out super correct code, right? And that didn't happen. Actually, the exact opposite happened. The more experience I got, the more I realized how fragile this whole process is,
22:02
how intricate the act of writing code is, and how I needed to depend on other tools, external tools, to help me maintain the quality. So, closing, I would like to mention one last thing. Free software is a culture of openness.
22:22
It's a culture of cooperation. It's a culture of respect. So, we want to promote automated testing, but we cannot demand it from others. We can only encourage it, but most importantly, we need to lead by example. We need to be the change that we want to see. So, thank you.
23:07
Right.
23:24
For example, projects with the tests, that they are hard to maintain because of these other networks, or because of these problems that arise, or because of these amount of merge requests, or something along those lines.
23:40
Okay, so the question is, we have all this data that tell us that projects don't have tests, and does this correlate with projects actually being harder to maintain? That's the question, right? And so, I don't have exact data for the maintenance side of things.
24:01
At least, you know, hard data for this, but I do have personal experience in this, and I, yeah, it's a valid question, right? We need to have data, like, we've got data for this. Is the correlation there? But no, sorry, I don't have data. Like, number of bug reports, or, you know, that'd be one.
24:23
So supposing for a month that all tests are written properly, what would you consider to be a healthy percentage? I mean, of course, if I'm not doing a test in the world, so like 40%, 60%, just in terms of general, what do you consider to be healthy?
24:41
Right, so there's no good answer here, because, yeah, sorry. The question is what kind of percentages would be normal for healthy projects that have a good amount of automated testing? And there's no good answer here. For example, look at this graph here.
25:01
There's a variety of projects spanning all kinds of things, from database to display servers to core libraries, which are considered well-tested. And their ratios differ dramatically, but they are all high. I think we can perhaps set a minimum amount. Like, we can say from this graph here that 20%, at least, should be something
25:22
that we should aim for. For example, note the mere project there. You notice the very high test commit ratio, and that is because I know that it is using test-driven development, so that's an indication of that.
25:42
And the exact numbers also depend on the commit practices, right? For example, some people may squash the commits, or it may have separate commits that are merged, so there's not really a very good answer there. It's just a feeling of being comprehensive.
26:01
And by comprehensive, I mean having at least the core functionality covered. Because in my experience, and something that actually started this interest, the interest for this topic for me, was I was getting projects, I updated them, and then something very core broke. And I was asking myself, did no one test this?
26:22
And of course, the answer was no, right? There was no testing there. Testing the tests or the testing framework? Oh, the tests. The tests themselves.
26:43
It depends, right? So I normally don't do that, to be honest. Unless I feel that a test may be too complex. For example, I may have, for example, test doubles that try to imitate some complex part of the system. For example, in one project, I had a test double for Dbus,
27:03
like for interactions with Tbus. So I actually wrote tests for that, right, to ensure that my double was working correctly, and then the test. So probably only for the more complex parts of the test. Because then, when do you stop, right? Testing the tests, and the tests, you stop nowhere.
27:23
Sorry? Yeah? Yeah, if that makes sense for that project, yes.
27:40
So yeah, one of the questions, the comment that the Linux kernel has tests, that tests their tests. So yeah, that's a great thing to do if it makes sense for your project.
28:05
So I guess it depends. Yeah, sorry, the question is, is there a way out of the misery of not being tested? There are some books and articles
28:20
about dealing with legacy code, and that perhaps should help there. You consider this project legacy in some way, not tested, and you go through the process of figuring out, first of all, what it should be doing, write some tests for this, and then you start refactoring it to be even more testable and go through the cycle.
28:40
But in most cases, I would say no, unfortunately. I don't want to be not optimistic, but I haven't seen that very often, unless it's a very high-profile project that someone is very interested in. Yeah?
29:08
Yeah, the thing is that for many people, if it is too late, then it's not worth it to go back and add. It's just a big pain. You have a complicated project. You don't have all the internal interfaces to check things,
29:21
but of course it depends on the project. If you can do it, that's great. That's awesome. Thank you.