Kernel development: How things go wrong
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Alternative Title |
| |
Title of Series | ||
Number of Parts | 64 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/45929 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 201120 / 64
3
7
10
11
17
19
21
28
33
34
35
37
40
44
48
49
52
55
57
59
62
63
64
00:00
ConvolutionSoftware developerBridging (networking)Condition numberSurfaceSmoothingConvolutionPoint (geometry)Process (computing)QuicksortSoftware developerMultiplication signBridging (networking)Operating systemSource codeObservational studyRight angleCycle (graph theory)Figurate numberSupercomputerResultantSlide ruleMereologyPosition operatorRaw image formatMathematicsComputer animationXML
03:34
TheoryTorvalds, LinusDigital photographyFile systemRootkitBootingNetwork topologyMomentumQuicksortImage resolutionNumberNetwork topologyProjective planeConvolutionFile systemAreaSoftware developerPoint (geometry)Search engine (computing)CodeProcess (computing)CodeComputer fileMomentumBootingRootkitPhysical systemMultiplication signElectric generatorTheoryComputer animationLecture/ConferenceMeeting/Interview
06:08
Digital photographyPatch (Unix)ConvolutionCodeAdditionBlock (periodic table)Function (mathematics)Torvalds, LinusSoftware maintenanceControl flowMomentumGroup actionConvolutionCodeProcess (computing)WebcamGame controllerComputer fileMultiplication signSoftware developerFile systemAttribute grammarProjective planeElectric generatorLevel (video gaming)Information securityVideoconferencingMessage passingPatch (Unix)Electronic mailing listPoint (geometry)Degree (graph theory)SoftwareRight angleSoftware frameworkQuicksortLine (geometry)LacePhysical systemQueue (abstract data type)Software maintenanceGraphical user interfaceFreewareComputer animation
10:06
Digital photographyPatch (Unix)Integrated development environmentCodeFreewareSoftwareGame controllerConvolutionMiniDiscInsertion lossSoftware developerSoftware maintenancePatch (Unix)Set (mathematics)Multiplication signIntegrated development environmentSeries (mathematics)Entire functionPosition operatorNumberProcess (computing)Point (geometry)Moving averageQuicksortPersonal digital assistantComputer animation
12:47
ConvolutionMultiplication signCodeLinear regressionControl flowCodePoint (geometry)NumberUsabilityVideo gameSinc functionTerm (mathematics)
14:10
Scheduling (computing)ConvolutionMereologyLatent heatInclusion mapCodeSoftware developerConvolutionTask (computing)FrequencyQuicksortProcess (computing)Scheduling (computing)CodeHeuristicMereologyInteractive televisionMultiplication signWave packetMathematicsCore dumpWindowPatch (Unix)Profil (magazine)EmailPoint (geometry)Electronic mailing listHypermediaOrder (biology)Latent heatIntegrated development environmentPlastikkarteGroup actionView (database)WorkloadMessage passingPhysical systemInheritance (object-oriented programming)AreaSeries (mathematics)SupercomputerRight angleAlgorithmLine (geometry)BefehlsprozessorFreewareNumberComputer animation
19:58
Latent heatInclusion mapCodeFile systemBenchmarkSemiconductor memoryPhysical systemImplementationPhysical systemFile systemStandard deviationConvolutionSoftware developerComputer filePoint (geometry)Directory serviceGroup actionMetadataComputer-assisted translationStability theoryControl flowView (database)Multiplication signExt functorQuicksortSeries (mathematics)Product (business)Computer hardwareBitCartesian coordinate systemOperating systemCodeDifferent (Kate Ryan album)MathematicsWeightRevision controlMachine visionLine (geometry)Special unitary groupNeuroinformatikMereologyRight angleFreezingNetwork topologyNumberBenchmarkReal numberImplementationNoise (electronics)CASE <Informatik>ResultantElectric generatorComputer animation
25:47
Sign (mathematics)ConvolutionSoftware developerEmailPatch (Unix)Right angleTheoryType theoryObject (grammar)Electronic mailing listComputer animation
26:48
Group actionSoftware developerEvent horizonSoftware developerCodePhysical systemSoftware maintenanceRevision controlConvolutionQuicksortSemiconductor memoryDependent and independent variablesNumberDifferent (Kate Ryan album)Event horizonStatisticsTap (transformer)Special unitary groupMultiplication signComputer animation
29:08
ConvolutionPhysical systemAbstractionSoftware developerUsabilityComputer virusInformation securityEndliche ModelltheorieData modelEvent horizonRootkitOperations researchPatch (Unix)Extension (kinesiology)Wireless LANData storage deviceLevel (video gaming)ConvolutionData managementGroup actionDecision theoryCodeSystem callSet (mathematics)Antivirus softwarePhysical systemUtility softwareMalwareFile systemHookingComputer virusProcess (computing)Event horizonComputer fileSoftware developerBlock (periodic table)Open setSemiconductor memoryVideo gameQuicksortRandomizationVapor barrierUsabilityPatch (Unix)Mechanism designSign (mathematics)MereologyInclusion mapTable (information)CASE <Informatik>WindowIntercept theoremArithmetic meanSpacetimeFocus (optics)SoftwareEndliche ModelltheorieInformation securityEmailCore dumpComputer-assisted translationCase moddingMultiplication signRoutingBelegleserCycle (graph theory)TopostheorieType theoryTap (transformer)Computer animation
36:24
ConvolutionHacker (term)CodeProjective planeSoftwareMereologyMultiplication signFreewareLattice (order)ConvolutionPhysical systemPhase transitionSpecial unitary groupIncidence algebraEntire functionQuicksortVideoconferencingCodeReal numberSeries (mathematics)VotingSoftware developerInsertion lossType theoryMathematicsEvent horizonWordProcess (computing)SpacetimeNetwork topologyIntegrated development environmentDifferent (Kate Ryan album)Row (database)FrequencyLinear regressionCategory of beingElectronic mailing listAndroid (robot)Message passingGoodness of fitDirection (geometry)Computer virusNP-hardLecture/ConferenceComputer animation
42:33
MathematicsDevice driverDistribution (mathematics)CASE <Informatik>Point (geometry)Pattern languageControl flowProjective planeCuboidRule of inferenceEndliche ModelltheorieMultiplication signProof theoryBitBinary codeStagnation pointLaceQuicksortProcess (computing)ConvolutionInsertion lossWorkloadMachine visionPhysical systemSoftware developerTouchscreenAuthorizationVotingMereologyMomentumTap (transformer)SpacetimeScheduling (computing)Electronic mailing listMultilaterationCartesian coordinate systemCounterexampleOpen sourceException handlingStaff (military)Human migrationUtility softwareEnterprise architectureIntegrated development environmentComputer animationLecture/Conference
48:08
XML
Transcript: English(auto-generated)
00:07
Thank you, thank you all for sticking around through to the end, I guess I'm in the unenviable position of being the only thing between you and your post-session beer or going home or whatever it is, so I'll try to be merciful.
00:20
Anyway, those of you who have seen me talk before know that I tend to talk an awful lot about how well the kernel development process is working, how well things are going. And in fact it is going really well, we put out four to five releases every year, pretty much every 80 days just like clockwork, we get something like 10,000 changes into every single one of those releases, we've got over a thousand developers participating in
00:44
every single one of those release cycles, this is a big process that's working quite well, the results are showing up in everything from your toaster through to your supercomputer and everything in between, the whole thing works, and so I spend a lot of time talking about that, but I'm not going to do that today, I'm tired of that, that's boring.
01:00
Instead, today we get to talk about when things don't work quite as well. Which is kind of fun, and I have a few reasons for wanting to do that, one of which I forgot to make a slide for, is the simple fact that development failures cost us developers that we can't afford to lose. Even with thousands of developers participating in our process, we're not so rich that
01:21
we can afford to lose talented people, we never can do that. Beyond that, you see things in the news every now and then when something falls apart, and then you see unfavorable articles, that sort of thing, this shows up, you see these kinds of headlines that pop up on raw reputable news sources, and so on, and it makes us
01:44
look bad, and I'd rather not have that, but the real reason for wanting to look at failures is that you learn from failure, that's how you really learn about how things are going on. So, another quote from a very influential figure in our particular community, and yes, it's fine to celebrate success, but what really matters is to heed the lessons
02:08
of failure, I'm sure that he has done this. Perhaps that's why he's out of the operating system business now. Anyway, more to the point though is this quote here, this comes from a book called
02:21
The Science of the Artificial by Herbert Simon, who's a Nobel Prize winner in economics and all that, here he was looking into the study of how the brain works and all that, and he was looking at how things fail, and his way of putting it was that a bridge, when it works, is just a piece of road that you can drive over, you only
02:41
learn about how it's built when it's been overloaded and something goes wrong. So if we want to learn more about how kernels are built and how the development process works, then we should look at how things go wrong, so that's what I intend to do here. But I want to just put out a quick note quickly, because there are a lot of people
03:01
in the kernel development community, some of whom are rather more serious than others, and I'm not going to be talking about the people who are just sort of there to be strange and so on, that's not what I'm after here. I'm going to have to name a bunch of names, because I don't know how to talk about specific failures without doing that, but everybody whose name I put up here is going
03:22
to be somebody that I respect, somebody who is a welcome part of our community. I'm not putting anybody up to mock them at all, because they're all people who I think are better than me at this stuff. So even this guy, who's perhaps one of the more clownish of the bunch, who points
03:41
out that we're not failing, we're just finding ways not to solve the problem. So with that in mind, let's sort of hit the road and look at ways not to solve problems. Example number one is a file system called Tux3. Came out back in 2008, and Daniel Phillips has been around our community for a long time,
04:03
very smart guy, done a lot of stuff, and he came out in July of 2008 and said, I'm going to make a new file system. There was a lot of interest in new file systems then, as there is now, trying to take us forward in the area of next-generation file systems. So I've got all these great ideas, puts out his announcement, puts out his code.
04:20
There were a lot of very interesting discussions that went on between him and other file system developers. By November of that year, he had it to the point where he could actually boot a Linux system with a root file system based on his Tux3 file system. He had other people contributing code. It literally looked like the project was getting going. But if you look forward into the next year, things trickle off.
04:41
And by August of 2009, the last commit goes in, which was some sort of a whitespace fix, something like that. And the whole thing just sort of died. The project is dead. The code's not in the kernel. Nobody is using this file system, and all the work that went into it is for naught. So somebody saw this coming.
05:00
This is Andrew Morton. Think of him as the number two kernel developer, if you like, who gave Daniel a warning saying, Don't keep adding stuff to a project that's out of the mainline kernel tree, because that just makes it harder to get it in when the time comes. So that warning went out, was not heard.
05:21
So the project kept on developing outside of the mainline tree until Daniel lost interest. He got a job at a certain search engine company that I won't name, which is often the kiss of death for community contribution, unfortunately. And he went away, and the project died.
05:41
And he even acknowledged later on, yes, I should have listened to this. I should have gotten my code into the kernel when I was warned to do that. So the lesson from this is something that a lot of us have been saying for quite a while now. If you have code that is outside of the mainline project, it is essentially invisible. It doesn't have the attention.
06:00
It doesn't have the momentum. It just doesn't have the activity around it that code does when it gets into the mainline kernel tree. If you've ever watched a bike race, you notice how they all ride together in a peloton, a tight group all together. If you've ever ridden in a group like that, you understand why. The group carries forward the air with it, and you can ride at great speed with
06:21
almost no effort within a group like this. As soon as you go outside of it, you're pushing against the wind by yourself, and you have to work much harder to ride much more slowly than you do. The kernel process is really an awful lot like this. This code that's in the kernel gets carried along by the momentum of the kernel itself. It gets carried along by all the people who are focused on it.
06:40
If you're outside of it, then you're against the wind by yourself. So the lesson is really clear. Get your code into the mainline as quickly as you can. If you look at the development of the butterfs file system, which was being developed at about the same time, Chris Mason put that file system into the kernel even though it was nowhere near ready. It's still not considered to be ready.
07:02
The pace of development at that point picked up, and butterfs is still a very strong project and will be the next generation file system that we'll all be running in the near future because he did that. If he'd kept it outside, it would have been harder. I'll take that as an agreement.
07:29
All right, moving on. The M28XX driver is a video for a Linux driver, it's a webcam driver, that actually was put in the mainline kernel back in 2005 by a guy named Marcus Rechberger.
07:44
Over the course of the next couple of years, a whole lot of things happened, and by the beginning of 2008, he was no longer contributing to that driver. Later on that year, he actually tried to replace it outright, and that effort was rejected. And then in 2009, we saw the last patch from him in anywhere in the kernel, and we lost
08:06
him from our developer community at this time. What went on was a whole lot of disagreement between Marcus and the higher level video for Linux maintainer, and it was really best summarized by a message he sent out in
08:21
the middle of this, saying, companies should be aware that if they submit code to you, they lose control over their work. It was an issue of control over whether Marcus had the absolute control over what went into that driver, or whether others can contribute it and carry it forward and enhance it in ways that they saw fit.
08:41
Here's another example. Back in May of 2004, Hans Reiser saw an attempt to modify the Reiser 3 file system. We're talking about Reiser 3. Chris Mason came along and added code to add support for access control lists and extended attributes to the Reiser 3 file system.
09:01
This is a feature that you need to support things like the SELinux security framework, that sort of stuff. And Hans said, no, you cannot add that to my file system. It's supposed to be stable. I want people working on Reiser 4 instead. We'll talk about Reiser 4 later. So Hans lost that battle, right?
09:21
The code went into Reiser 3. The enhancements were there. They're still there. They're being used. He was overridden with that. Linus described it this way. He's had to make this lesson a whole lot of times. If you maintain code, if you contributed it, you don't own it. If somebody else comes along with something that needs to be done to it, then you do
09:42
not have the ability to control it. This is something that's true of the kernel, but it's really true of any free software project that merits the name. When you've contributed code, you've put it under a free license. You've put it out there for the community to work on. You have given up a certain degree of control. And if anything, if things are going well at all, others will come along and work
10:04
on it. They will improve it. They will make it better. Once you put it into the kernel, you've turned it loose. You have to let it fly. In my mind, this is not a downside of contributing code. This is one of the most beautiful things that there is, that you can put some code out
10:20
there and watch it get better, and you don't have to do it. I think that's great. But if you want to maintain control over that code, then you should really just hang on to it, because that's just not compatible with how free software works.
10:40
Back in 2002, this is at the beginning of the 2.5 development kernel series. The position of maintainer for the IDE disk subsystem was actually vacant at that time. We had no maintainer. It was really a pretty critical piece of code, because we actually still had IDE disks in those days. There's no maintainer, because that code was widely held to have driven insane everybody
11:03
who tried to take control of it over the years. And so people kind of went off after a while. So Martin Delacchi shows up. This guy shows up and posts a patch that says, here's a bunch of cleanups for the IDE subsystem. It goes in. Within a few weeks, he was up to the 18th set of cleanup patches.
11:23
These are fairly significant restructuring patches at this point. And he set himself up as the maintainer of the IDE subsystem. He continued to send in more and more patches, which were being merged by Linus, until by August of that year, he was up to number 115. This is a whole lot of patches all going in there.
11:42
During this time, he was invited to the kernel summit to represent that work there and so on. So things really seemed to be on a roll there. One week after number 115, he quit the kernel development process entirely. All that code was ripped out. The IDE code was put back to where it was at the beginning of the 2.5 development series.
12:02
All that work, his work and the work that everybody else put into helping make it work, was lost and went away. It was a major loss of a developer and his time and his work. So anybody who was actually running 2.5 development kernels during those days knows what happened here, right? The IDE subsystem was highly unreliable during this time.
12:24
In fact, it was considered that if you wanted to run these kernels, you were really best off using SCSI disks during that time. And he described it this way when some people questioned him on it, said that, well, breakage is the price you have to pay for advancements. There was perhaps more breakage than advancements.
12:44
But he was really trying to carry it forward, but he did it with a sort of scorched earth policy that really made the kernel unusable for people and took things backwards. And the lesson is clear, right? Don't do that.
13:02
This lesson has become much more clear in the time since then. Code that breaks a subsystem for months at a time would just not be tolerated now because we have adopted a policy that is very strongly against regressions. When you are evolving a piece of code as quickly as the kernel is changing, you really
13:21
have to be careful to ensure that you are not going backwards in terms of quality. And knowing that you're not going backwards is actually very hard. How do you measure the quality of a kernel? It's not just a number you can pick out. It's not just a metric you can have. But one thing that you can do is you can insist that a kernel that works for people
13:41
at one point continues to work going forward. If you don't allow things to go backwards, then you should be creating kernels that are getting better over time. If you don't do that at this point, if you break things, your code's likely to come out within a week at this point. It won't go on for six months like it did here.
14:01
But one way or the other, don't break things and life will be better. This was perhaps one of the highest profile failures that we saw with a lot of media attention and so on. The scheduler early in the 2.6 series, up into the early 20s, was called the order
14:27
one scheduler done by Ingo Molnar and others. Over time it had developed a whole lot of little tweaks trying to improve interactivity to make interactive desktop systems more responsive. The code had gotten very complex, very twisted, very hard to work on, full of heuristics
14:46
and it still didn't really perform the way people wanted it to with regard to interactivity. So Kon Koulibas, who's actually not a kernel developer by training at all, he's a doctor, he's an anesthesiologist, but he managed to train himself in kernel development and
15:00
get quite good at it. He came along and he said, okay, I don't like this idea at all, let's just throw it away, we'll get rid of all the heuristics and we'll put in a very simple scheduler that works on simple fairness. If there are four processes contending for the CPU, each one gets 25% of that CPU period. So much simpler algorithm, you can put it in, simplify all the code, and as it turned
15:24
out that made interactivity better than all the heuristics and the complicated code that we had before. At least in some situations. So he puts it out there, the very next day Linus looks at it and he says, yeah, I could consider merging that, I like this, it simplifies things, it gets rid of a lot of code. But if you follow the discussion and you see that within a couple of weeks the tone
15:43
was getting rather grumpier. And what was going on was once again breaking things for people, right? Kon's scheduler made things better for some people, but it made things worse for other people. And he was not as responsive as he needed to be to the complaints of the people whose
16:02
performance was going backwards. This got to a point that eventually Ingo Molnar went off and doing as he is wont to do it, times took a day or two and completely wrote his own thing that did it his way, using the same basic algorithm, it was called the Completely Fair Scheduler, or CFS.
16:22
So that was posted, within a few months it was CFS that was merged into the main line, not Kon's deadline scheduler. And within a couple of weeks of that, Kon left the development community and he left in a very public, sort of disgruntled, unhappy sort of way, saying out of here I'm done,
16:41
I'm going to leave before I get so fed up I end up running Windows. And we lost a developer who was really trying to do good stuff, a very smart guy, somebody we couldn't afford to lose. It was not a good thing in any way. So what do you learn from something like this? You need to learn from these things. So number one, improve the kernel for everybody.
17:02
You cannot go in and improve the kernel for one group of people at the expense of another. The kernel at this point is running on your telephone, it's running on your desktop, it's running on huge super computers, it's running on all kinds of things. We have a very wide ranging user base and you simply cannot make it worse for some of those people.
17:21
So if you can't make it better for everybody, you at least need to not make it worse for people. Related to this is the fact that certain parts of the kernel are simply hard to change. This is especially true of core kernel areas that are coded with a whole lot of heuristics
17:41
that have been developed over time and where we have a lot of experience that says if you mess with these things, you tend to find surprises on other workloads far into the future, when it's harder to fix. In fact, we're still finding things that relate to the scheduler change and still fixing them. It takes a long time to do this. So there's a fair amount of resistance to making changes in parts of the kernel
18:02
and you have to have a lot of patience if you want to work in those areas. That's just the way it is. It's a hard task to do. Participate in the discussion. Kahn had his own mailing list for the discussion of his patches. The people who subscribed to that list were naturally the people who were interested in his work and liked it.
18:21
So he was working in an environment where everybody was saying, yeah, this is great, you're doing good stuff, keep going, we want more of it, that sort of thing. He wasn't participating in the discussion on the Linux kernel list, where people were looking at things from a wider point of view. So he missed the wider discussion, he missed the view of the situation that he really
18:43
needed to have. You just can't do that. You cannot isolate yourself from the community. Even if, say, not subscribing to Linux kernel and getting 500 messages a day out of your inbox is an appealing sort of thing to do, you really have to be part of the community,
19:01
or else you're not going to work well with the community. But perhaps the most important thing, the key lesson to draw from this in my mind, is that you really need to look for a solution to your problem and not the incorporation of a specific body of code. Because if you look at what happened with the completely fair scheduler, Khan got what
19:21
he wanted, he won. He was able to, through his efforts, replace the scheduler with one that was based on fair scheduling. It just wasn't his code. And so that hurt, but if he took a step back and looked at it, he got what he wanted out of all of this, and was widely credited for having pushed things that way.
19:42
Dan Fry, who's the vice president at IBM, runs their Linux technology center there, gives a talk. He talks about how IBM approaches this sort of thing. Within IBM, if you work for IBM, and you push the community towards the solution of a problem, you're credited for having done that, whether or not it is your specific code that is merged.
20:01
They don't care if it's your code that was developed at IBM that was merged if the problem is solved. It's a very enlightened view, and you can see it in action, the way that IBM's developers worked with the community. It's something that I would like to see much more widely adopted throughout the kernel development community, and beyond, really. If you look for the solution to the problem, you're a whole lot happier than if you're
20:23
looking for an entry in the changelog. All right, so the only other time I came to FOSDEM, I actually spoke right next to Hans
20:45
Reiser. He had put us in two sessions right next to each other. Hans is a really smart guy with a lot of very interesting ideas. There are certain aspects of his behavior that he just cannot approve of, but, you know, and honestly, I don't think it's all that funny, but if we think about Reiser
21:06
4, back in 2002, it was already fairly clear that the file systems we had at that time were not adequate to what we needed going forward into the future, that we were really carrying with us the way the file systems that were designed back in the UNIX days,
21:22
prior to UNIX, really, the EXT series of file systems really carries forward a lot of ideas from the fast file system and such, from our UNIX heritage, old stuff. Our needs have come forward, the hardware has changed, and so on, so we needed something different. Hans saw that, plus he had a whole lot of wild ideas of his own that he wanted to put
21:42
into a file system, so way back in 2002, he put out the first version of the Reiser 4 file system. He worked on this in 2003, just as Linus was finally trying to pull together a 2.6.0 release and get it out there, he said, well, why don't you throw my file system into there? You've thrown in everything else into the sun, because 2.6.0 was in feature freeze for
22:04
the better part of two years, for a very Linus sort of value of feature freeze, shall we say. So, I mean, a file system would really just be in the noise for something like that. But he didn't succeed, but he did in 2004 manage to get it into Andrew Morton's
22:21
MM tree, which was seen as being the main path into the kernel at that particular time. That's changed a bit since then. Still didn't get in in 2005 and 2006, he made major pushes to get this stuff merged, never succeeded, finally he kind of left our community forever more, and Reiser 4 has since languished, and I don't think we will ever see it merged into the mainline
22:44
kernel. So, why was there so much trouble? Why did we have a next generation file system that we couldn't get put into the mainline kernel? Well, a lot of things that you can point to. It behaves very strangely, the only file system I've ever seen where you actually change
23:01
your working directory into a plain text file and then cat out the metadata like the modification time as a separate little file. No one else has done that sort of thing, so there's certain things that don't conform to the established standards for Unix-like operating systems. There are a number of technical difficulties, things like locking problems, that sort
23:22
of stuff. A lot of those result from the fact that Reiser 4 was developed behind closed doors for a long time and given to the community as a sort of finished product. If he had brought it forward sooner, a lot of these problems would have been simpler to fix earlier on. Hans' approach to benchmarks was creative.
23:43
Shall we say, people who ran benchmarks independently tended not to get the same results as Hans did. His approach to others in the community was antagonistic. If you questioned his work, he tended to get put into the group of people who were conspiring to suppress his work, associated with various companies that he didn't like
24:03
who were obviously trying just to put his work down and so on. It was a very difficult thing to the point where a lot of people refused to talk to him anymore because they tended to get attacked. And finally, the episode with Reiser 3 that I mentioned before was something that was still in people's minds, and they were really afraid that Hans was going to dump
24:24
Reiser 4 into the kernel, then go off and work on Reiser 5 and not want to continue with the stabilization and development of Reiser 4. For all of these reasons, there was a whole lot of resistance to getting the code into the main line, so it never happened. So the lessons from this are fairly clear.
24:43
Linux is not a research system. There's a whole lot of very innovative work that goes into the Linux kernel. But in the end, this is a production system that is used for all kinds of real-world use cases. It's not something that you can just put any kind of wacky thing into and expect to
25:01
get away with it. So if you were going to break from something like the POSIX standard, then you have to do so very carefully in ways that don't break existing applications and so on. You have to be very careful with that. No matter how brilliant you are, and no matter what kind of vision you have, and I don't
25:22
know if his documents are still on the net, but this guy had a vision for where operating systems should go that was quite well thought out, and it may not be where you wanted to go, but he really had a lot of interesting ideas. But none of that will get you past an implementation that has technical problems. No matter how brilliant it is, if it's going to deadlock the computer, then it's
25:44
not going to make it in. You can't get past that. Conspiracy theories are not going to help you. This kind of thing happens fairly often. We've seen some of it fairly recently on the kernel mailing list, where people will say, well, you're just criticizing my patch because your employer doesn't want it in.
26:06
I won't say such things never happen, because we're human, and human things happen. But kernel developers tend to think of themselves as kernel developers first, and employees of whatever company second. They think it's fairly likely that in five or ten years they'll still be working
26:22
on the kernel, but may be working for some different company. They're not really interested in compromising the kernel for any particular company's objectives, even the one that's paying their paycheck right now. So you don't see very often corporate conspiracies of this type going in. If somebody starts accusing people of it, that's usually a sign that the discussion
26:43
is done, and that they're not really going to get much further. Just don't do it. Then finally, the community has a long memory and a long time horizon. If you are posting code to go into the kernel, people will always be thinking, what will it be like to maintain this five or ten years from now?
27:03
Because they know they're likely to be there in five or ten years and stuck with it. So they're going to want to know, will you be there and maintain it? What will this code do to our maintenance going forward? It's always on people's minds, and very strongly affects how people look at things.
27:28
System tap. Back in 2003, Sun Microsystems comes out with this kernel-in-user-space tracing facility called DTrace, and they give it a lot of publicity saying, this is a great tool,
27:42
we have better visibility into how our system works than anybody else has, and so certainly you want to run Solaris instead of Linux. So this of course inspired a response within the community, and within a couple of years we had an update to Red Hat Enterprise Linux 4 that included SystemTap, which is
28:01
a tool that did very much the same sorts of things that DTrace does. It allows you to put probes into the kernel, so you do all kinds of complicated data collection, aggregation, statistics, and so on, and try to figure out what's going on within your kernel. So this was posted way back in 2005, but we never saw it merged, and said in 2008
28:22
we saw a different tracing facility, a much simpler thing called FTrace put in. In 2009 we saw perf events, which is events, collections, statistics, that sort of thing. Very different sort of development there was merged there. Even though last year, actually not last year anymore, but in 2009 we saw 1.0 version
28:40
of SystemTap and 1.4 just a few weeks ago, I don't think we'll ever see SystemTap in the mainline kernel. Which is fairly surprising given that this is a development that had something like a dozen full-time engineers on it for years, funded by a number of companies who were very core to Linux development, creating a facility that everybody really sort of acknowledged
29:03
that we need. So one wonders what's going on, why did this happen? The key here, if you think back to the 2008 kernel summit, this particular group of people here was asked, how many of you have tried to use SystemTap?
29:23
About half of the people in the room raised their hands. And then how many of you have actually succeeded in doing it? And most of those hands went down. This group of people here is not just sort of any group of random users, right? This is the top level of the kernel development community, the people who can be invited to
29:42
the kernel summit. If they cannot make SystemTap work, then this is a fairly bad sign with regard to the usability of your system. And so Ingo Molnar kind of described it like this later on, that what you really have to do is to not concentrate on requirements drawn up by management or so on, which is
30:04
really what SystemTap was, and instead focus on usability, and in particular, usability for developers. That's a key aspect of getting stuff into the kernel, is usability for developers. Because if the kernel developers don't see the value of the code, it's not going
30:23
to go in, regardless of what people at the management level say. This is usually a good thing that the developers make these decisions, that it's not a management decision. That's part of why the kernel is as good as it is. Sometimes it can be problematic, because kernel developers, like anybody else, can
30:42
be kind of myopic at times, and will sometimes fail to see things that really are needed, even if it's not useful to them in particular. So as one example of this, we actually had a dynamic tracing facility that was posted for inclusion back in about 1999.
31:01
But nobody saw any value in that, and so that code languished and so on, and we had to do it all over again ten years later. So it just happens. Here's another example that kind of ties into the same thing. Back in 2008, developer Red Hat came out and posted a thing that he called Talpah.
31:22
This was a subsystem that provided a new set of system calls allowing virus scanning, malware scanning utilities to hook into system calls. The idea being that if some process on the system opens a file, then the virus scanner actually gets an event saying, somebody's trying to open this file. It can go and scan the file first.
31:42
If it doesn't find anything it dislikes and it says back to the kernel, okay, let that open proceed, and life goes on. Otherwise it can actually block the open of the file and not allow it. So the idea being, block viruses as they pass through the system. So this didn't go in at all, in fact.
32:01
Shall we say the reception was chilly? Because after all, first of all, Linux doesn't need virus scanners. That's not really a security model that has much value on a Linux system. We don't have that particular kind of problem. So why should we be bothering with broken security models? Now, of course, the real use of this was not to protect Linux systems, it was to protect
32:25
Windows systems that are mounting at mail school or something like that that's on a Samba exported file system, that sort of thing. So it goes beyond that, but that, again, was not necessarily a use case that is interesting to Linux kernel developers, who are not really usually concerned with maintaining
32:42
a lot of Windows systems out on the network. Beyond that, the requirements were not expressed very well, right? There was no threat model, they couldn't really say what they're trying to defend against. In fact, that sort of came into focus over the discussions, and they focused on the solutions that it needs. Their requirements said basically, we need Talpa, not we need to try to defend against
33:02
this particular sort of thing, that sort of thing. So this code went down in flames. But if you look later on, in August, we saw the merger with a thing called FA Notify. FA Notify is a set of system calls that provide hooks for antivirus scanners, which sounds fairly familiar, if you look at it.
33:22
So you might think, okay, well, what changed here? There are two things that were very different. This was, in fact, the same code. So one of the things that changed was the name, to sort of leave behind the memories of what had come before. But one of the things they had is, this is essentially a file system event notification
33:42
mechanism at its core. We already have two of them in the kernel before FA Notify, one called D Notify and one called I Notify. So we were adding a third one. What the developer did is he went and he cleaned up the existing event notification code, which was pretty ugly, and made it work for both of the existing mechanisms
34:00
and for his as well. So instead of having three, we went back down to one core notification mechanism in the kernel. And the other thing is that he rephrased the requirement. So instead of saying, we want to enable virus scanners, we want to say, we want them to hook into the system without using the rootkit-type techniques that they're using now. Because if you actually look at some of this commercial proprietary virus scanning
34:21
code that people sell for Linux systems now, it actually will go and patch into the system call table and do things that you normally associate with rootkits so that it can intercept system calls and do what it wants. So that's really ugly stuff. That's not something we want to have happening. So this allows that code, which already exists, which is out there, to function without having
34:42
to do that kind of nasty stuff. And that's an improvement for everybody involved. So by rephrasing the requirements and by cleaning things up, he was able to get that code into the mainline kernel. So the lessons from this? Sell your patches to the developers, not to the managers, not to the customers.
35:02
You have to sell them to the developers. And if you clean things up on the way, then you build goodwill. Cleaning things up, by the way, does not mean white space patches for anybody who's tempted to do that. It means truly cleaning up the code. So there's a few examples I could do a whole lot more.
35:22
If any of these interest you, you can ask me during the question time, and I can go into that. But suffice to say that there's no shortage of examples out there. So one can look at the history and say, well, we have an awful lot of examples of how things can go wrong. When things go bad, you might say, well, why bother?
35:42
Why should we be concerned with – why do we want to deal with this when things can go wrong so easily? So I just wanted to talk briefly about that, starting with the fact that for all that we have, all these high-profile failures, things don't go wrong that easily. It's not as hard as it seems.
36:01
Remember that we're dealing with a development process that in every release cycle, every 80 days or so, is incorporating the work of over a thousand developers. So every few months, there's over a thousand people who succeed in getting their code into the kernel. So clearly, it can't be that hard. The barriers cannot be that high if that many people are able to get this done.
36:26
More importantly, it's fun, right? As in working in a reasonable free software project, it's a good time. You want to be a part of it. But beyond that, even though it's not that hard, even though it's fun, it's still
36:44
not a club that everybody can join. It's something that you have to want to do. It's something you have to work towards. It is not sufficient to simply look good in your swimsuit. So it's something that's fun to be a part of. And if it's something that you're concerned about, it's certainly a path towards employment
37:05
and such. The fact of the matter is, and this has been true for some time, that if you've established an ability to get code into the kernel, then people will come to you, and they will throw jobs at you. That's kind of a nice thing, if that's something that you're after. Gets a little tiring after a while.
37:27
Perhaps most importantly of all, and the message I kind of want to leave you with, is that this is how you get the kernel to meet your needs. This is how you drive it forward. The kernel is really open to everybody who is willing to push it forward in good directions,
37:43
and this is how you get it to where you want it to be. This is your vote. This is how our community works. This is true of the kernel. This is true of every other project out there. You don't just sort of go and put in improvement requests or so on. The way that you get things working the way you want them to be is to actually get your
38:02
hands dirty and to get the code into the kernel. So whether you're trying to simply make a device work, or whether you're trying to enable some of the sort of freedom type technologies that Evan Moglen was talking about yesterday, this is how you do it. This is how you get things to where you want them to be. So I hope that all of you will be inspired to do that, to be a part of this, to try
38:25
to push things forward. Because, after all, in the immortal words of a former vice president of the United States, if we don't succeed, well, then we run the risk of failure. So I have a fair amount of time for questions, and would be delighted to answer a few.
38:42
I assume we have somebody with a microphone, so somebody might have a question.
39:06
If there are any questions, please raise your hand. Over there, front row. Hello. I note that a lot of your examples were from the 2.5 era of the kernel.
39:21
So do you think that having abandoned, having long periods of instability has guarded against this kind of thing happening more recently? I'm sorry, I don't hear that very well, it's very echo-y. A lot of your examples happened during the 2.5 unstable phase of the kernel. So do you think that abandoning the unstable phase has meant that these kind of incidents
39:46
will happen a lot less frequently? OK, so a lot of the examples happened during 2.5, but in fact, really only the IDE example is exclusively within the 2.5 development series. Everything else was taken from 2.6 and forward.
40:02
So the advancement of the old long-term unstable series has certainly changed the process, and made certain things different, for example much reduced the tolerance for regressions and so on, because we just don't have the time to fix them that we used to. But otherwise, I don't think it's changed much other than that, other than just bringing
40:26
things to the fore more quickly. So what, hi, over here, at the back. Somebody's out there somewhere. Don't worry, we're way back here. So which category do the Android wake locks fall into?
40:43
Is that the badly described requirements category? Wake locks, OK. Wake locks are the sunblockers. I could do an entire talk on that. In fact, if you look, Matthew Garrett did do an entire talk at LinuxCon, and the video of that is online for people who really want the full details of that.
41:04
The real failure with wake locks is out of tree development, where they actually went and they developed this feature for the Android system off in their own corner, without involving the community at all, and most importantly, they shipped this feature
41:20
to users before they ever proposed it for merging into the kernel. There were a whole lot of problems with wake locks, the way they were originally developed. They were insecure, they required a lot of changes to drivers and so on. Nobody really liked the way that wake locks worked, so they had to change, which creates
41:42
all kinds of compatibility problems with your existing user space when you have to make those changes. There's been a lot of trouble trying to come up with a suitable replacement. We actually, I believe, have a good replacement for wake locks in the mainline kernel now, although the Android people have not yet really looked at it or committed to using it.
42:00
So hopefully we'll have a happy ending to that story. If you want more details on that, either look at Matthew's talk or look at the stuff that's been written on LWN about that. Of all the things that you listed, both that you talked about and that you just put in your list, which of them, in your personal opinion, is the most significant loss to
42:23
the kernel and to the Linux community? Which is the most significant loss? Hard to say. I mean, they all represent significant losses in a way, but in my mind I still really regret the loss of Khan Khalifa's, because I think he was trying to work for a constituency
42:42
that doesn't always get the attention that it needs to have and so on. He was trying to do interesting things, and I wish he were still a part of our community. So then that's perhaps at the top of my list, but I think they're all significant. John, on a lot of your examples, it seems like a recurring theme is once something either
43:04
doesn't get merged or it's obvious that it's not going to get merged, it sort of leads to the sooner rather than later kind of death of it. It seemed like the one counterexample in there was system tap that didn't get merged yet still got to a 1.0. Is there something about that that makes it an anomaly that it lived on even when
43:23
it became obvious it wouldn't get merged, or is it sort of just a matter of time? Well, part of that is relatively easy to answer, because system tap for all its failings is in fact very useful to the support staffs behind enterprise Linux distributions.
43:46
It comes, it works out of the box when it is packaged with an enterprise Linux distribution, and the technical support behind it can make use of it, and they like it. For that reason, the companies and the specific company, one you know well I suspect, continues
44:02
to put resources behind the development of system tap and will probably support it for some time yet. There's a certain commercial interest there in that sort of very rigidly defined environment. Beyond that, a lot of the development resources that went into system tap has just been removed
44:22
and put into other things, but I think system tap will continue under its own momentum for some time, because it does serve a need that some people have. We are getting to the point where the other tracing facilities can fill that in, but we're probably a few years from really replacing system tap for the needs of that particular use case.
44:45
There's a general pattern I've seen in open source projects where older established projects that are very popular become very very conservative and refuse changes that would destabilize all the features that they have existing users that appreciate.
45:01
So how do you prevent requirements like not being able to change the scheduler unless you can prove everyone on earth, nobody has a performance degradation anywhere? How do you prevent requirements like that from leading to stagnation and eventually projects get very conservative and stagnate until another project comes along and they can
45:23
try experimental things because they don't have that baggage? How you go forward in a situation like that can indeed be a problem. With something like a scheduler, the only thing you can do is to test it extensively
45:40
under all kinds of workloads and see if people don't screen for long enough. For certain other sorts of things like the user space ABI, we really just don't allow ourselves to break it ever. So if change would break applications, with few exceptions it just can't go in, at least not without a migration path that can take at least five years until you get to the
46:06
point where you're really convinced that nobody is using it anymore. We take the don't break things rule quite seriously to the point that you can still run A.out binaries from the pre-1.x days, and if you've got the libraries around for them they'll still work, we're very careful about that.
46:22
That is a bit of a straitjacket at times, it does constrain how we can do things. But we want the Linux to be useful going forward, and more importantly we want people to upgrade to current kernels for all kinds of reasons. So we will continue to be very careful about that. And yes, that slows certain things down, but it hasn't, I don't think, stopped the process yet.
46:44
I think we'll continue to keep things going. Hi John, during Monte Vista Vision 2008 we spoke, and then we talked about your device driver's book. And we are now almost three years later, I think you decided not to do a new book?
47:06
There will be a new device driver book. I am working with the other authors, trying to figure out a model for publishing a book for something that changes as quickly as the kernel does, in ways that don't go obsolete.
47:21
I'm hoping, or ways that are maintainable, let's put it that way. So I'm hoping to have something to say at least within the next few months, but I'm not quite ready to say how that's going to work yet. But yes, something will happen, because a book that describes 2.6.10 is of limited utility in the current world, to say the least.
47:49
Are there any more questions? I think we can talk for hours with Jonathan, but sadly FOSM is almost over, so before you all leave I want to give the floor to Matthias, but first let's thank Jonathan once again.