MINIX 3
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 23 | |
Number of Parts | 79 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/19549 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FrOSCon 201523 / 79
1
2
10
11
12
13
17
21
23
24
26
28
29
30
31
32
33
34
35
38
39
40
42
43
44
47
50
51
53
57
59
60
61
62
63
66
67
70
71
75
76
77
78
79
00:00
FreewareSoftwareOpen setEvent horizonGroup actionAsynchronous Transfer ModeWordMikrokernelXMLUMLComputer animationLecture/Conference
00:49
Single-precision floating-point formatKnotSystem programmingTerm (mathematics)Model theoryExt functorComputerService (economics)SoftwareAntivirus softwareSystem callCurve fittingPower (physics)Operations researchServer (computing)Computer hardwarePort scannerMiniDiscRead-only memorySocial classInfinityBefehlsprozessorDataflowModul <Datentyp>IcosahedronOperating systemSlide ruleWindowHacker (term)SpacetimeWordSurvival analysisFactory (trading post)1 (number)RAIDDatabase normalizationInfinityProgrammer (hardware)Revision controlProjective planeInformation securityMultiplication signTopological vector spaceSoftwareLine (geometry)System programmingMathematicsNeuroinformatikProduct (business)Machine codeAssembly languageOperator (mathematics)Computer hardwareBit rateSemiconductor memoryMultiplicationComputer simulationDevice driverPower (physics)Server (computing)Electric power transmissionMereologyAntivirus softwareSpywareTerm (mathematics)MikrokernelTable (information)Source codeService (economics)Extension (kinesiology)Matrix (mathematics)Point (geometry)Parameter (computer programming)BitEndliche ModelltheorieExecution unitFocus (optics)GenderStudent's t-testUniverse (mathematics)GodCellular automatonBusiness reportingBand matrixOcean currentPresentation of a groupNP-hardVirtual machineNumbering schemeNumberCASE <Informatik>Level (video gaming)AreaError messageParity (mathematics)AlgorithmCodierung <Programmierung>Roundness (object)CD-ROMDifferent (Kate Ryan album)Arithmetic meanTorvalds, LinusMiniDiscBefehlsprozessorSocial classQuicksortCovering spaceComputer networkSoftware bugProper mapService PackComputer animation
09:31
System programmingKernel (computing)Machine codeModul <Datentyp>Computer fileFlow separationModule (mathematics)Limit (category theory)System callBasis <Mathematik>Asynchronous Transfer ModeInterrupt <Informatik>MiniDiscServer (computing)Virtual realityDatei-ServerRead-only memoryComputer networkBlock (periodic table)Cache (computing)Table (information)Group actionServer (computing)Machine codeOverhead (computing)Kernel (computing)Computer fileInterrupt <Informatik>MiniDiscSystem programmingLevel (video gaming)Block (periodic table)Cache (computing)MultiplicationSystem callLine (geometry)Message passingVirtualizationWindowSoftware bugObservational studyAsynchronous Transfer Mode2 (number)BitMultiplication signSemiconductor memoryReading (process)Inheritance (object-oriented programming)Error messageTerm (mathematics)EmailFile systemConnectivity (graph theory)SoftwareFlow separationProcess modelingCASE <Informatik>Basis <Mathematik>InterprozesskommunikationMemory managementModule (mathematics)Regular graphPower (physics)QuicksortOperating systemLimit (category theory)Computer architectureAddress spaceScheduling (computing)Radical (chemistry)MikrokernelMehrplatzsystemException handlingOperator (mathematics)Table (information)Forcing (mathematics)Information privacyoutput1 (number)System administratorLinear regressionGreatest elementGroup actionOffice suiteReal numberStandard deviationMetastabilitätEndliche ModelltheorieMixed realityModal logicBusiness reportingTelecommunicationState of matterOpen sourceRight angleSpacetimeMultilaterationData managementParametrische ErregungParameter (computer programming)Validity (statistics)AuthorizationDevice driverPerturbation theoryCore dumpSet (mathematics)BlogRepository (publishing)Program slicingBootingDatei-ServerService (economics)Computer animation
18:12
Asynchronous Transfer ModeMiniDiscData recoveryCrash (computing)Kernel (computing)Server (computing)Datei-ServerMessage passingMiniDiscQuicksortTable (information)Connectivity (graph theory)Order (biology)Data structureBitMarginal distributionPointer (computer programming)Semantic Web1 (number)Server (computing)Crash (computing)Equivalence relationSystem callData recoveryFile systemProcess modelingComputer fileSocial classMachine codeError messageRight angleLecture/ConferenceComputer animation
19:41
Asynchronous Transfer ModeSystem programmingMiniDiscData recoveryCrash (computing)Kernel (computing)Server (computing)Metropolitan area networkMachine codePower (physics)Message passingData managementModul <Datentyp>Touch typingPointer (computer programming)InfinityProgrammschleifeModel theoryBinary fileSample (statistics)MathematicsData Encryption StandardLoop (music)ArmPoint (geometry)Error messageCombinational logicQuicksortAreaMultiplication signSystem programmingKernel (computing)Programmer (hardware)Leak2 (number)Software bugCategory of beingInformation securityMachine codeBranch (computer science)Boss CorporationComputerOperator (mathematics)Endliche ModelltheorieServer (computing)InfinityBuffer overflowMessage passingInterrupt <Informatik>Data managementOperating systemTouch typingCrash (computing)Data structureModule (mathematics)Software developerComputer programmingConnectivity (graph theory)Variable (mathematics)SoftwareCycle (graph theory)Line (geometry)Student's t-testProcess modelingData recoveryType theoryPower (physics)InjektivitätSemiconductor memoryWritingClient (computing)Numbering schemeComputer virusBuffer solutionDifferent (Kate Ryan album)Pointer (computer programming)Position operatorAddress spaceState of matterFault-tolerant systemProgrammschleifeBinary codeLengthGastropod shellRandom number generationSystem callEmailNumberExtension (kinesiology)WordRight angleDecision theoryAverageInferenceRun time (program lifecycle phase)Noise (electronics)Order (biology)Programming paradigmSummierbarkeitMereologySpacetimeArithmetic meanDynamical systemExecution unitDimensional analysisFluid staticsGraph (mathematics)Group actionBit rateNormal (geometry)BootingLecture/ConferenceComputer animation
25:55
ArmShooting methodMathematicsMachine codeLevel (video gaming)Computer hardwareSystem programmingMetropolitan area networkBefehlsprozessorFlash memoryInsertion lossVideoconferencingOpen setPRINCE2SoftwarePortable communications deviceProduct (business)CompilerComputer fileFile formatState of matterCompilation albumPlanningCategory of beingMultiplication signRight angleEndliche ModelltheorieSystem programmingSource codeComputer networkEmailLibrary (computing)Context awarenessQuicksortLevel (video gaming)Network topologyLine (geometry)Machine codeComputer programmingPortable communications deviceComputing platformData managementDimensional analysisComputer architectureTask (computing)Limit (category theory)Business reportingCompilerMoment (mathematics)NumberCartesian coordinate systemExecution unitWater vaporTheory of relativityGoodness of fitScaling (geometry)Greatest elementBuildingMusical ensembleProduct (business)BootingSmartphoneOpen sourceMultiplicationComputer virusPlastikkarteCountingPersonal identification numberSoftwareProcess modelingSemiconductor memoryCharacteristic polynomialFile formatElement (mathematics)ArmComputer fileOpen setMiniDiscStandard deviationFlash memorySoftware testingBitDifferent (Kate Ryan album)Computer animation
31:51
CompilerMachine codeSystem programmingKernel (computing)Server (computing)System callCloningProcess (computing)Software testingGroup actionCountingArmAsynchronous Transfer ModeMiniDiscWhiteboardMetropolitan area networkMach's principleOpen setWeb browserJava appletLaptopSingle-precision floating-point formatComputer fontSystem programmingQuicksortPattern languageTheoryCuboidProjective planePoint (geometry)VirtualizationOperating systemCartesian coordinate systemWhiteboardCategory of beingConnectivity (graph theory)Interface (computing)Asynchronous Transfer ModeBus (computing)Serial portPeripheralDefault (computer science)Process (computing)Software testingMereologyStructural loadSystem callIntegrated development environmentLaptopCountingMessage passingWeb browserCompilerInterrupt <Informatik>Axiom of choicePosition operatorType theoryData managementSoftware developerSoftware frameworkWindowKernel (computing)Multiplication signOperator (mathematics)Computer iconTouchscreenFault-tolerant systemGraphical user interfaceMikrokernelOpen sourceVirtual machineSeitentabelleLibrary (computing)Modul <Datentyp>Game theoryLevel (video gaming)Computing platformLink (knot theory)WordExistencePatch (Unix)Process modelingServer (computing)Task (computing)Sampling (statistics)User interfaceMögliche-Welten-SemantikThread (computing)Electronic mailing listWater vaporComputer virus1 (number)Real numberEvent horizonNetwork topologyComputer architectureLine (geometry)AuthorizationWeightNormal (geometry)SubsetInsertion lossLecture/ConferenceComputer animation
39:10
System programmingSoftwareRevision controlKernel (computing)Queue (abstract data type)Radio-frequency identificationMachine codeState of matterTask (computing)Computer fileTable (information)Electronic mailing listFile systemType theoryObject (grammar)Message passingServer (computing)Business reportingService (economics)Variable (mathematics)Semiconductor memoryData structureCartesian coordinate systemProcess modelingOpen setMathematicsPoint (geometry)Data managementOperator (mathematics)Natural numberTheoryScheduling (computing)Patch (Unix)Integrated development environmentState of matterComputer programmingMultiplication signFrequencyVirtual machineRight angleReal-time operating systemQuicksortWindowRevision controlWeb 2.0InformationFile formatUniverse (mathematics)Category of beingDifferent (Kate Ryan album)Level (video gaming)Machine codeSystem programmingSingle-precision floating-point formatAreaConstraint (mathematics)3 (number)Euler anglesLogical constantVariety (linguistics)Proof theoryNeuroinformatikOperating system1 (number)Data conversionHeat transferTranslation (relic)SoftwareHash functionPointer (computer programming)CoroutineBuffer solutionProgrammer (hardware)BitQueue (abstract data type)Computer animation
46:10
Regulärer Ausdruck <Textverarbeitung>Variable (mathematics)Port scannerData recoveryBit rateRead-only memoryInformationRevision controlTabu searchBlock (periodic table)CompilerSoftware testingOverhead (computing)Software testingReal numberMatching (graph theory)Semiconductor memoryDifferent (Kate Ryan album)Block (periodic table)InjektivitätBuffer overflowRevision controlSpeicherbereinigungInformationMessage passingFile systemData structureCrash (computing)Structured programmingPointer (computer programming)Machine codeLine (geometry)Programmer (hardware)LeakSystem programmingComputer fileMultiplication signComputer programmingPoint (geometry)View (database)Hydraulic jumpInformation securityElectronic mailing listFunctional (mathematics)Buffer solutionAddress spaceOperating systemFrequencyPatch (Unix)Overhead (computing)Binary codeBranch (computer science)2 (number)Run time (program lifecycle phase)Task (computing)Group actionReal-time operating systemCASE <Informatik>Formal languageCompilerRight angleNumbering schemeComputer clusterProcess modelingFront and back endsDataflowStudent's t-testMereologyMathematics3 (number)Process (computing)Error messageCompilerLimit (category theory)Faculty (division)Lecture/ConferenceComputer animation
52:02
Dedekind cutMaxima and minimaWebsiteGamma functionMetropolitan area networkValue-added networkSystem programmingTotal S.A.GoogolKernel (computing)Flow separationArchaeological field surveyKnotWeb pageStudent's t-testVideoconferencingCircleKernel (computing)System programmingOperator (mathematics)BitWikiStatisticsOperating systemPhase transitionFile systemQuicksortCAN busRevision controlData storage deviceNeuroinformatikSource codeMultiplication signMathematicsLink (knot theory)Connectivity (graph theory)Web pageVolume (thermodynamics)Flow separationState of matter1 (number)Archaeological field surveyLevel (video gaming)VideoconferencingCASE <Informatik>Right angleComputer programmingMessage passingStudent's t-testPointer (computer programming)Server (computing)Web crawlerOpen setSoftware developerNumberConservation lawSystem callRaster graphicsTable (information)Line (geometry)Ocean currentProcess modelingOnline helpMotion captureMassHome pageExecution unitWordUniverse (mathematics)EstimatorOptical disc driveProjective planeCondensationCartesian coordinate systemStatement (computer science)SpacetimeStructural loadRow (database)RoutingSelf-organizationOntologyComputerDifferent (Kate Ryan album)CausalitySoftwareService (economics)Group actionFacebookMachine codeInformation securityCurvatureSelectivity (electronic)Personal computerLecture/ConferenceMeeting/InterviewComputer animation
59:00
Metropolitan area networkSystem programmingElectronic mailing listCategory of beingDirection (geometry)Projective planeMultiplication signCircleQuicksortStandard deviationWave packetNumberDifferent (Kate Ryan album)Mechanism designTelecommunicationFrame problemSpacetimeThread (computing)Right angleOpen sourceInternetworkingThermische ZustandsgleichungReal-time operating systemAreaInstance (computer science)Source codeInformation securityCartesian coordinate systemRule of inferenceInformation technology consultingAssociative propertyLecture/Conference
01:03:17
Open setSoftwareFreewareComputer animation
Transcript: English(auto-generated)
00:09
Okay, I'm about to introduce a guy who doesn't need an introduction. Many remember him from some news group post like many, many years ago.
00:22
Most of you will own at least one book of him. And I present to you Andrew Tenbaum. Thank you, but wait, maybe you won't like the talk.
00:44
Okay, I'm going to talk about Linux 3. Currently, the mode is a re-implementation of NetBSD using a microkernel. And I'm Andrew Tenbaum, but my students did all the work, actually, my programmers. Okay, the goal of the project is to build a reliable operating system.
01:05
So let's start with my definition of a reliable operating system, okay? An operating system is said to be reliable when a typical user has never experienced even a single failure in his or her lifetime, and does not know anybody who's ever experienced a failure.
01:32
In engineering terms, this probably means something like mean time to failure of like 50 years, I don't think we're there yet.
01:42
Let me describe what I think of as the television model, okay? At least old style TVs, it's getting less so. You buy the television, you plug it in, and it works perfectly for the next ten years, okay? Now let's look at the computer model, Windows edition, okay?
02:04
You buy the computer, you plug it in, now we're two-thirds of the way there, okay? It's just this little part about it works perfectly for the next ten years, that's a little bit different. First you install service packs one through nine F. Then you install 18 new emergency security patches that happened after nine F.
02:24
And then you find and install seven new device drivers. Then you install the anti-virus software. Then you install the anti-spyware software. Then you install the anti-hacker software. Then you install the anti-spam software.
02:44
Then you reboot the computer, okay? But I'm not done yet, I just ran out of space on the slide, so there's more. It doesn't work.
03:01
You call the help desk, okay? You wait on hold for 30 minutes, okay? They tell you to reinstall Windows. Which is what you're trying to do in the first place, okay? The typical user reaction to this is something like this. I saw a story in the New York Times which said that 25% of computer users
03:24
have actually gotten so angry, they hit the computer. Most of them don't know where the computer is though, this is the monitor, which is not where the problem is. Anyway, so you might say, is reliability so important? Who cares if it works or not? It's annoying when it doesn't work. And you might lose some work if it goes down.
03:40
But also think about other situations, like industrial process control systems in factories. Think about power grids when their computer doesn't work. Think about hospital operating rooms when the computer doesn't work. Think about banking and e-commerce servers. What happens if that doesn't work?
04:00
Or emergency phone centers. Or control software in cars and airplanes and places like that. So there are places where it actually matters whether it works or not, okay? So the question is, is it feasible? Is it possible to make software and hardware that works? Well, first of all, we won't find out if we don't really try. And so the Dutch Royal Academy gave me 2 million euros to try, so
04:24
I said thank you very much, we'll try. And then the European Union, I have an EOC advanced grant, if you know what that is, for 2.5 million euros to give it a shot, so we're trying. Okay, thank you very much, Royal Academy, and thank you very much, European Research Council.
04:43
The first question is, is reliability achievable at all? Is this even possible? Well, systems can actually survive hardware failures. For example, raids can survive a failed disk. The disk dies, and if your raid is properly working, the system merrily continues even with a dead disk.
05:01
Cuz there's some redundancy in there, and there's algorithms, raid one, raid two, and so on. You can survive a failed disk if configured properly. ECC memory can survive parity errors in memory, cuz they have redundancy basically using a Hamming code. And they can survive memory errors. TCP IP can survive lost packets, cuz there's an acknowledgement algorithm. And you send the packet, dunk it, and acknowledge it, you send it again.
05:22
Keep doing it until you get an acknowledgement. So you can do that. CD-ROM drives, CD-ROM blanks, and DVDs, and Blu-rays, about three-quarters of the bits are actually error-correcting bits. It's a very complicated scheme where they use 14 bits to code 8 bits, and then there's these 2K sectors, and it's multiple levels of redundancy.
05:42
So they can recover from a very large number of errors, cuz the disk is stamped mechanically. It's not that bad, maybe it is, I don't know. You can recover from hardware errors in many areas. So you think that software errors ought to be doable if you can recover from fatal hardware errors.
06:01
This requires organizing your software a little bit differently, okay? So I think operating systems research, which is sort of a community I'm in, needs to be refocused a little bit. We have basically nearly infinite hardware on PC class machines. An ordinary PC buying this thing now. It's got, compared to where we were 10 or 20 years ago. This is basically infinite, okay?
06:22
There's lots of CPU cycles, there's lots of RAM, there's lots of bandwidth. Current software has tons of useless features that nobody wants, having to do with the economics of the software business. Version 14 has to have more features than version 13. Even nobody has used any feature beyond version 7. But they gotta add new features, cuz that's the way they sell it, okay?
06:41
So the software is slow and bloated and buggy. And to achieve what I would call the TV model, I think future operating systems need to be changed somehow. They have to be smaller, they have to be simpler, they have to be modular, which is very important. They have to be more reliable and they have to be secure. And I think self-healing.
07:01
I think self-healing is a key word. They have to look for their own errors and try to fix them on the fly. And that's what our research has sort of been focused on. Let me give you a very brief history of the work we've been doing for about, I don't know, since 1987, I think. 1976, a professor named John Lyons at New South Wales in Australia wrote a book
07:21
on Unix version 6. When version 6 came out, he wrote this commentary on it, sort of like commentaries on the Bible or something. And then AT&T, in its brilliance, when it came out with version 7, had a clause in there saying, thou shalt not write a book about version 7. God forbid that students all over the world learn about their product.
07:41
We cannot tolerate this. So they did that. In 85, I said maybe I could rewrite Unix on my own all by myself. I was young and crazy, I didn't know that it was hard. So I did it, took me two years, long stories about that. But so I wrote it, and then I wrote a book about the software.
08:01
And we released it, and it was free of AT&T code. And so all the source code was out there, and people could do whatever they want with it. There was some minor licensing initially, but basically it was available, certainly to universities and for non-commercial use. In 97, came out with another version, version 2, and a new book. And this was POSIX rather than version 7 compatible.
08:22
And then in 2000, we changed the license. The publisher, Prentice Hall, I had arguments with them, but they didn't understand software. Finally in 2000, they gave up and said, I said I want to use the BSD license. And they said, we don't know what it is, but do it. 2004, I started working on this reliable stuff.
08:40
2006 is the third edition of the book with Albert Woodhull. 2008, I got the European grant. And then we really started to hire programmers, and it got more serious at that point. And then the focus started moving toward embedded systems. So there's, and then we moved toward net BSD compatibility. Somebody may know that Linus Torvalds was one of the first Minix users and
09:02
began changing it and changing it and changing it. Pretty soon, he had his own system. So to some extent, Linux is kind of a fork of Minix. Okay, there were three editions of the book. I think the cover got better as time went on. Okay, let me talk about intelligent design, at least as applied to operating systems.
09:22
Minix is a microkernel. It's got about 15,000 lines of code. Most of them are C, a little bit of assembly code. And Linux has got 15 million lines. Windows is probably above 100 million lines now. It's really pretty bad. People have done studies of bugs, companies, bug repositories, and whatnot. And one bug per 1,000 lines of code is about the best you can do with
09:43
really state-of-the-art techniques and code reviews and all things. You can get down to one bug per 1,000 lines of code, you've done pretty well. You've got 50 million lines of code or 100 million lines of code, you do the arithmetic. Now, not all the bugs are serious. They may be spelling errors and messages and stuff. But some of them are always serious, right? I mean, these things are getting passed all the time because
10:02
the Minix has got maybe 15 kernel bugs. Linux has got 15,000. Windows has got a million. There's a lot of bugs out there. And drivers, typically, if people have studied this, have three to seven times more bugs than everything else. Cuz everybody wants to study the memory management code, and nobody wants to study the Epson 2156 printer driver,
10:23
which is enormously complicated, etc. And 70% of the code is the drivers, and nobody ever looks at that. So all the open source, in theory, people can look at it. In reality, nobody ever does. And in Windows, nobody has the code except for Microsoft, and they're too busy. So I think what you need is highly modular systems.
10:41
You need the operating system to run basically as multiple user mode processes, not in the kernel, these things separated from each other. And so step one is to isolate the components very well, okay? So move all the loadable modules and everything except the very, very hardcore kernel out of the kernel into user space, okay?
11:02
And that means all the drivers out, all the file systems out, memory management code out. Run everyone out as a separate process using the POLO, the principle of least authority. That is, don't give a component any more power to do damage than it actually needs to do its work. And we'll come back to that a little bit later, principle of least authority, very important principle.
11:22
Step two is isolate the I.O., isolate all the I.O. devices, limit access to the I.O. ports in a conventional, everything in the kernel system. The audio driver has access to the disk. It's not supposed to, but technically if it wants to, it can write on the disk. That has to be prohibited by putting each driver in a separate
11:41
user mode process and restricting access to the I.O. ports. You gotta constrain the DMA so you can't DMA over somebody else's memory. Then you gotta isolate the communication. You gotta limit inter-process communication. You have to restrict the kernel calls on a per component, you know, need to use basis.
12:03
You gotta restrict the inter-process communication. Not everybody can talk to everybody, but you can only talk to those other components that you need to talk to to get the work done. Make sure that a faulty receiver can't hang a faulty sender, so if A sends a message to B, you know, and then A doesn't wait for the reply, you don't wanna hang B.
12:23
So here's the architecture of MINIX-3. In the kernel running in the actual kernel on the bare metal is the microkernel. It handles interrupts, it handles process scheduling, it handles inter-process communication. It does not handle any of the device drivers, it does not handle memory management, it does not handle
12:42
anything else. It's very, very bare bones stuff. That's still 15,000 lines of code, okay? But that's just, you know, you're gonna make all the I.O. devices, you know, there's stuff you gotta make work. When an interrupt happens, certain things have to happen next, so it can do the next step. And the registers have to be saved, and it turns out there's a lot of that stuff, okay?
13:01
And there's a little bit of scheduling you have to do at the bottom, you gotta manage the MMU, there's some stuff you gotta do in the kernel. But it's only about 15,000 lines of code if you're careful. And L4, which is a comparable system, is also about this size, somewhere in the 15,000 line thing. The next level, which are all user mode processes, are the I.O. devices, okay? So, the disk driver, the terminal driver,
13:21
the network driver, the printer driver, all the drivers, each one runs as a separate user mode process. With the MMU turned on, it's limited what it can do in terms of accessing physical resources and so on. Then at the next level are the servers, which are sort of the real operating system, the file server,
13:41
possibly multiple ones, process server, memory servers, things that actually we normally think of as the operating system, each running as a separate process. And the top layer are just the regular POSIX programs, okay? This is sort of the architecture of the system. Now, user mode device drivers, each driver runs
14:01
as a user mode process, protected by the MMU. It doesn't have any super user power, just a regular old user process, okay? It's protected, the MMU is turned on, so it can't get out of its address space. It does not even have access to the I.O. ports, so the disk driver can't even write on the disk. It's gotta ask the kernel, it's gonna make a kernel call saying, here, here's a bunch of registers,
14:22
here's a bunch of values, go write these values in those registers. The kernel first checks, is this allowed? And if it's allowed, it doesn't. If it's not allowed, it gets back an error message saying no permission. So the disk driver can write on the disk, but if the audio driver tries to write on the disk, it gets back an e-no-perm kind of message and it can't do it.
14:41
The servers are in user space. Each server runs as a separate process. Some of the key servers are the virtual file server, so you can have multiple file systems. There's the actual file systems. There's the process manager, does most of the work managing processes. The memory manager figures out who goes where in memory.
15:00
There's the network server. And there's a thing called the reincarnation server, which I'll come to later, which is an interesting little beast. Brings back the dead. Here's a simplified example of some of the stuff. Here's what happens if you try to read something from your read system call, POSIX call, and you're lucky because the block you want
15:22
happens to be in the file system's cache. So the user makes a call to the file system. That little colored thing under FS, that's the file system's cache. File system checks, is the block I need in the cache? No, if it's lucky, the answer is yes. And it calls the kernel and says, go copy that block to the user,
15:41
copies the block to the user, and everybody's happy. So that's the easy case. The harder case is the block is not in the cache. So now the user calls the file system, file system calls the disk driver saying, go read that block and put it over here. The disk driver calls the kernel saying, I wanna do disk IO here, the parameters,
16:00
checks it, it's valid, it turns the disk on. Little bit later, the actual drive sends a note, it doesn't interrupt basically, but at a very low level, it's turned into a message. So get read interrupts at a very low level. A message comes into the disk driver saying, hardware interrupt. And then the disk driver goes and reads the registers, finds out if it worked, and then eventually reports back to the file system
16:21
saying read completed correctly or there was an error. And then the user's informed and the copy is done into user space, okay? So all these are separate processes, and there's a little bit of overhead here. It's like a microsecond, I haven't timed it recently, but it's less than a microsecond of overhead.
16:40
So there is some overhead in the process. But if you're reading from the disk, that's milliseconds. Even an SSD is 100 microseconds. So a couple more microseconds here or there isn't gonna be the real killer. Now what does the reincarnation server do? It's the parent of all the drivers and servers. So it's like up there when the system boots,
17:01
I think it's the parent of, it's in the etc.rc file. So before you start forking off all these servers and drivers, it runs. It's the parent of all the things, the servers and drivers. And if something dies, it collects it. It's the parent, okay? And what does it do if some piece dies? It looks up in a table saying what am I supposed to do?
17:22
And the table typically will say log it somewhere, maybe send an email to the administrator, and then try to restart it, okay? And if it's a driver, it'll go to the disk and go get the driver and start a fresh copy. You might ask, what if it's a disk driver? Well, we're clever enough to keep the disk driver in RAM all the time. So if the disk driver dies, take the RAM copy and put it in.
17:41
And once we have a working disk driver, we can read the rest of them from the disk, okay? And it also pings the various servers all the time. So the reincarnation server will say to the disk driver, hi, disk driver, how you doing? Disk driver will say, great, I did 62 reads in the last second. And then a little bit later,
18:01
pings the disk driver again, how you doing? And says, great, I did 104 requests in the last second. A little bit later, pings it again, says how you doing? Then it says, disk driver, how are you doing? Give you one more chance, how are you doing?
18:24
At this point, the reincarnation system's not doing well. Kills it, goes and starts a new one, and the new one is okay. The other components are told something happened. So if the file server was waiting for a request to be completed, it has to record in its tables where it was, okay?
18:40
And then it's told where the new one is, sends a message to the new one saying, go do the command again. So the commands have to be sort of item potent in order to make this work. But basically it's doable with a little bit of structuring. So you can't make it transparent to everybody, but you can make it reasonably transparent, okay? So disk driver recovery sort of looks like this basically.
19:01
So user calls the file system, file system calls disk driver, now disk driver crashes, okay? So the reincarnation server hears about that because it gets sort of a equivalent of a SIGNO child or whatever it's called. And it says, oh, I'll start a new one, tells the file system about it that the driver crashed. Your problem, file server has saved the message
19:22
it sent to the disk driver. Says, oh, there's a new one. Send the new one the message, and then hopefully the new one does the work and everything's fine. If the new one doesn't do the work, the process will repeat itself long enough until the work finally gets done, okay? If it's a really hard error, there's something very, very wrong with the code,
19:41
it probably can't recover. But our experience and everybody's experience is most errors are transient. There's some weird timing combination that causes things to fail. If you run it again, probably it won't happen again. Most errors are sort of transient. And so this is the whole point about a self-healing system. It detects its own errors
20:00
and it could correct the errors on its own on the fly. So this is the kind of property we wanna have in the system where you can detect and correct your own errors. In the same way, for example, TCP will send the packet out, it'll start a timer. If it hasn't got an acknowledgement, the timer goes off. It says, oh, there's a problem. Takes the recovery action, sends it again. That's an example of doing this
20:21
in software for the operating system. And so we've tried to use that as kind of our model. So some of the issues about reliability and security. Well, fewer lines of code means fewer kernel bugs. So we don't have as many bugs in the kernel as everybody else, because we've got less code. I'm not claiming a better rate, but I'm just claiming less code. Well, 15,000 lines of code
20:40
means a smaller trusted computing base. There's no foreign code in the kernel. It's only our code. And other systems, if you get a driver, you have to install a driver written by some kid in Taiwan whose boss was breathing down his neck, saying, we got a ship, we got a ship. And the kid says, the code's not ready. And the boss says, I don't care, we got a ship. That doesn't happen in Minix because there's no kernel, there's no driver code.
21:01
You get some user process, it won't work, but the kernel isn't affected by installing a new driver. We've also been fairly careful about static data structures. There's no malloc in the kernel. This means you sometimes have to over-dimension things, but RAM is cheap. And so we don't have all the problems with malloc and memory leaks and all that stuff. We don't have that, because we don't have any dynamic stuff in the kernel.
21:21
Moving bugs to user space, is what we're doing, doesn't make them fewer bugs, but it means they're less powerful bugs. Because if you're a bug in an audio driver, or worse yet, you're a hack in an audio driver, somebody's compromised you, you can make very strange noise, but you can't fork off a new shell because when you try to create a new process, the kernel says, hey, audio driver,
21:41
you have no permission to create other processes. Sorry, you know, no perm, you can't do it. And so we reduce the power of the bugs. Okay, fixed length messages, all the messages are 64 bytes, so there's no buffer overruns. You know, you can't have all the problems with buffer overruns. There's a variable length messages, we don't have them.
22:01
It's a hard constant in one of the header files saying buffers are 64 bytes, message is 64 bytes. We had a rendezvous system that A, you know, sent to B, B listened and got the message, we copy it over, but it turns out it has some reliability issues, namely if the sender sends it to the, you know, the client sends it to the server,
22:21
the server is trying to do it, the client dies, the server can't respond, and everything hangs. So we had to go to an asynchronous scheme even though we like the rendezvous better because there are no lost messages, there's no buffer management. So we had to add asynchronous messages, but we try to avoid using it as much as we can. And we've integrated interrupts and messages
22:40
at a very, very low level, interrupts have turned into messages. Okay, you know, the untrusted code, like drivers, is heavily, you know, protected by the MMU. So our model is, you know, most of the operating system actually is untrusted code, that's kind of a different model than everybody else.
23:00
So bugs and viruses and whatnot can't spread from one module to another module easily because we assume, our starting position is that most of the operating system is untrusted code, you know, the kernel is trusted, that's very small. Okay, nobody can touch kernel data structures to muck them up, nobody has permission to write on the kernel.
23:21
You know, if somebody needed to read the kernel data structure, I think we have a couple of system calls where you could read it, it's simply copied into your address space, but you can't write it, okay? So a lot of the problems if somebody mucks up, you know, a kernel data structure and you're hung, you know, can't happen. Bad pointers, you know, a bad pointer could crash a driver or one component, but it can't crash the kernel
23:41
because it can't, you know, get at the kernel. Infinite loops can be detected if some component is looping and not, you know, paying attention and doesn't answer the ping from the reincarnation server, that's effectively dead, okay, and then it'll be killed and a new one will be started, okay? I should say that starting things is tricky when there's state and we're working on that to some extent,
24:02
but we haven't solved that problem entirely yet, but things that are stateless, and most drivers are stateless, that we can deal with, we just start a fresh copy, okay? But I'll talk more about, you know, state later. So we're restricting the power of bugs to do damage rather than reducing the number, okay?
24:21
Okay, other advantage of user drivers, well it's a shorter development cycle, you know, you do something to the thing, you compile it, you run it, doesn't work, okay, you can just start a new one, you don't have to reboot the computer, which takes, you know, five minutes. So it's a normal programming model, you start up a process that doesn't work, you know, and you can debug it, there's no crash time, there's no reboot,
24:41
you can use normal debugging type tools, it's just another process, okay? So it makes the whole cycle easier, it's more flexible. We ran a couple of fault, one of my students ran some fault injection experiments. We injected 800,000 faults into each of three ethernet drivers, which is done on the binary drivers at runtime, so, you know, debugging program over wrote memory,
25:04
and we were careful about writing over, we didn't write random numbers into it, we looked for like branch less than, we changed it to branch less than or equal to, and that's the kind of error a programmer might make, so we looked for specific errors that programmers might make, and we modified the binary to emulate the kind of errors that actually happen,
25:22
and we inject 100 faults, we waited one second to see if it crashed, it didn't crash, we inject another 100 faults, we just keep going, okay? And we managed to crash drivers 18,000 times, but we never lost the operating system, okay? Drivers went up and down all the time,
25:41
but we never lost the operating system, that's the fault tolerance that I think is so important, okay? A little while back, I got a second advanced, a second ERC grant, and that was about sort of trying to make this thing more useful to the outside world, and we ported Minix to the ARM, okay?
26:03
And so we had to restructure the source tree for multiple architectures, oddly enough we had that in Minix one, but it sort of got lost along the way somewhere, we changed the booting to use U-boot for the ARM, we had to rewrite the low level code dealing with the hardware, because you know the MMU works a little differently than the x86,
26:21
we had to change the code for context switching, some of the very low level stuff is different on the ARM than the x86, we had to change some of that. We got rid of the segmentation code, since I think Intel has also lost interest in it, and ARM doesn't have it, so we threw it out. We gradually began importing the NetBSD ARM headers and libraries, NetBSD is fanatic about portability,
26:42
they really want to make things run on every known platform, they've really been very careful not to do things like put inline x86 code in the middle of a C program, they don't do that kind of stuff, so it was nice and clean, we had to change the build system to do cross compilations, we didn't want to build it on an ARM, we wanted to build it on something else,
27:02
we wrote drivers for the SD card, and some of the other Beagle devices, and our initial target was the BeagleBone and the BeagleBone Black, and for those of you who don't know it, it's a single, it's a PC with an ARM on it, about the size of a smartphone, small smartphone, and here's some of the characteristics of the Beagle, it's got an ARM V7,
27:22
for those of you who don't know it, but the arms, the clock runs at a gigahertz, it's got half a gig of RAM in it, it's got four gigs of disk in it, flash memory, it's got HDMI port at 1080p, it's got 92 IO pins, you can connect anything you want to have it drive things in embedded systems, 100 megabit ethernet, it's got a USB port on it,
27:40
it's open source, which is important, and about $45, maybe $55, depending on which model exactly, there's a couple of models out there. Some of you may know about the Raspberry Pi, if you compare it to the B+, that's got an ARM V6, which is an older and somewhat less good processor, it's at 700 megahertz, also half a gig, it doesn't have any disk on it, so that's a problem.
28:01
It's also got 1080p, it's got 40 IO pins, 100 megabit ethernet, it's got four USB ports, which is a plus, on the Beagle Black, you'd need a hub if you wanted to go more than one. It's not open source, so that's the main reason we didn't want to touch it, and it's a little bit cheaper, okay? So I will, I'm willing to admit I'm wrong,
28:22
I've been right a lot, but I've been wrong once in a while. On January 29th, 1992, I posted to com.os.minix, don't get me wrong, I'm not unhappy with Linux, it'll get all the people who want to turn Minix into BSD, Unix off my back, so I apologize.
28:42
Actually, I do want to turn Minix into BSD, it just took me 20 years to realize it, I'm sorry about that, I'm kind of slow on the draw here. So, Minix meets BSD, and the BSD thing is copyright Kirk,
29:08
Marshall, Kirk, McKusick, and used by his permission. Okay, or maybe, so why BSD? Minix didn't have enough application software,
29:20
and BSD has proven reliable quality product, and I think the code quality in general is better than Linux, those guys are really fanatic about good code quality, and they don't release things, it's a very slow release scale, I mean, once in a while there's a new release, and they really test it pretty carefully, it's a little bit different philosophy than Linux. Package source is a really good package manager,
29:41
and so we really like that, and there are thousands of packages out there, and there's an active community, and also license compatibility. I was keynote speaker at Linux conference in Australia a few years ago, and I didn't mention licensing in the talk, somebody asked me afterwards, what's the license, and I said, it's BSD license, and the audience began cheering, it's a Linux conference, so I don't know.
30:00
So anyway, we're BSD. Why NetBSD? Because it's got a tremendous emphasis on portability. Some of the other guys care about security, but the NetBSD guys care about portability, if you're running on 80 platforms, you can't have any weird stuff that uses some peculiar feature, you know, it's undocumented on the x86 that doesn't fly in 80 platforms, so they've really made an effort to make it really, really clean code,
30:21
so we kind of appreciate that. So anyway, there's a bunch of features from NetBSD, we have the Clang LLVM compiler, that's a very nice thing, some of you may know that Linux is not written in C, it's written in GCC, and so they can't use, I mean, there have been attempts
30:40
to use the Clang compiler, we've tried to compile Linux with the Clang compiler, there are thousands and thousands of places in the code which are not ANSI standard C, we've got to go past them, people have attempted to do this, we have never got it working, it's just a disaster, and I think that the moral of the story is, write your stuff in ANSI standard C or C++, but then you can use whatever compiler
31:00
happens to be the best at whatever time, so for us to change from our own ACK compiler to Clang was a question of just changing the build system to call that compiler and it works right off the bat, and Clang is a very nice compiler, has some very nice properties, which I'll talk about later, as well as producing very nice code. We adopted the NetBSD build system, we adopted the L file format,
31:21
the whole source tree dealing with architectures is modeled on the way NetBSD does it, the headers and the libraries are all from NetBSD, it's got X11, it's got package source, last time I looked we could build about 5,000 NetBSD packages right out of the box, you just say make and everything happens right,
31:42
some of them don't work because we don't have some font library or there's some other thing, we didn't have the time to really track it down, but my guess is with a small amount of effort you could probably get thousands more packages to work, out of the box, there's a few system calls we don't have, those are actually our impediments, but a lot of the stuff is sort of minor things,
32:01
like some package needs some peculiar font we don't have, and it requires somebody to spend a day figuring out where the font is and how to install it, but a lot of stuff just works out of the box, and nevertheless we built Minix on top of, the NetBSD environment is built on top of Linux,
32:21
of NetBSD built on top of the Minix environment, so there's some things we don't have, we don't have kernel threads, that's a long story but in the beginning it wasn't there, it was too complicated, we have user LAN threads, the things that actually require kernel threads, that's a problem, there's some system calls that we're missing, like we don't have the LWP calls,
32:42
the message calls, SEM calls, some might be easy to add, we don't have clone, we don't have some of the get and octal calls, we don't have KQ and KTrace, don't have vfork, we don't have job control, but it seems to me if you have X11, why would you want job control, you just enter the window, and some minor calls are missing,
33:01
nevertheless we can build over 5,000 packages, so it's moderately close, if you're looking for a count of how close it is to NetBSD, there's the QA test, we ran all these tests, and 512 failed and 2139 passed, so basically 81% of the QA test passed,
33:23
so it's sort of 81% of the way there, and the things that don't work tend to be the more exotic things, so it's not all the way there, and some of that requires some work, but we're a large way along the road there. So here's the system architecture,
33:43
the bottom layer is the microkernel, that's the part that runs in kernel mode, it handles interrupts and loads the registers up, and physically manages the page table and so on, then the next layer is the drivers, all processes, then come all the servers, all that's just Minix,
34:02
but in user land, we have packages and clang and all that stuff, and that's NetBSD, so it's NetBSD sort of re-implemented on the Minix infrastructure with the reliability and self-healing properties of Minix, but to the user it looks like NetBSD, so we think this is kind of the best of at least two possible worlds,
34:21
you have all the nice reliability properties of Minix, but you have a user interface, which is familiar at least to people who are BSD people, could we have done Linux, maybe, could it still be done, I think it would be hard, because I don't think it's quite as clean as NetBSD. Okay, so here's Minix on the Beagle boards now,
34:42
you can't read this, but green is good and red is bad, so we've tested on three different Beagle boards, the BeagleBone Black, the BeagleBone White, and the BeagleBone XM, and on the black, most of it works, there's a few things like the serial peripheral interface bus we didn't get around to running a driver for,
35:00
but those are all things that could be done relatively easy, we just sort of ran out of manpower for that, but most of the stuff on the Beagle board sort of works. Okay, your role in all of this, it's now an open source project, funding has run out, I'm theoretically retired although I haven't noticed it yet,
35:20
and we hope some of you will join us and help to work on it like all the other open source projects around here, and it's an interesting project, we're combining a well established user land with a somewhat novel and interesting lower levels with this modularity property, if there are crucial system calls that are missing,
35:40
people can try to add them, we don't want to gum things up with a weird system call that one package somewhere needs and nobody wants to, some game that nobody cares about needs this system call, we're kind of inclined not to want to do that. Certainly porting more packages, we don't have Java, we don't have a browser, we have links, but we don't have a graphical browser.
36:01
I don't know if Firefox would be portable, it's a very big program, but there might be smaller Dillo, I mean there's maybe smaller browsers that are graphical that we could have. There's some missing drivers for the Beagle board, I don't think they're really important, but there might be, some people might need them, so get it running on other platforms
36:22
such as the Raspberry Pi and other platforms, that would be very nice. Rump is a project done by some guys in Norway which allows you to use a BSD driver, it's Windows drivers on BSD, it's like an interface between the driver lands and the operating system, that might be an interesting project. Certainly port libraries and port a GUI,
36:42
it doesn't have to be KDE, but there are other GUIs out there. We had one for a while back where we sort of lost it, but some kind of a GUI, all people wanna do is click on the icon basically, everything else is irrelevant, so if you had some way to show icons on the screen you could click on an icon and something happens, that's sort of all you need basically, there may be simple GUIs out there that do the job,
37:03
and then general port more of the packages. Okay, so here's Minix 3 in a nutshell, it's a microkernel re-implementation of NetBSD, didn't start out that way, but it's sort of evolved that way over time. It's fully open source, it's got a BSD license,
37:23
so it fits into that piece of the world, that's open source and BSD. I say highly compatible with NetBSD, so people who know NetBSD or FreeBSD will sort of recognize a lot of the stuff. It supports both LLVM and GCC, the default compiler is LLVM,
37:40
but GCC is there if you really wanna use it, type GCC, but the normal way to compile things is with LLVM, I'll come to that in a minute, we use the package manager from NetBSD package sort, which is a very nice package manager, excuse me, about 5,000 packages built right out of the box, you can get it, go to minix3.org
38:00
and just download it and try it, works on virtual machines, you can try it on virtual machine of your choice. So how are we positioning Minix now? One of the things is we wanna show that a multi-server operating system with pieces all built of the little components can be made to be reliable, okay? We wanna demonstrate that drivers belong in user mode,
38:21
Microsoft has figured this out also, that there's a user driver framework that Microsoft is pushing, they're encouraging Windows developers to write their drivers in user mode, because they understand the same problem, but this kid in Taiwan writes a driver and they put it in the kernel, it doesn't work, it brings down Windows, everybody yells at Microsoft, that's not really their fault,
38:41
and so they would prefer that the driver run in user mode, but doesn't bring down the operating system, simply they'll get a message saying, printer driver crashed or something like that, but the operating system keeps going, so Microsoft understands this very well, the drivers belong in user mode. High reliability and fault tolerant applications, there are many applications in the world,
39:01
especially embedded ones, where reliability is very important, so that's only a focus. There was that $100 laptop project at MIT, at some point somebody's gonna make a $50 one-chip computer, sort of like the Beagle boat for the third world countries, and it's gonna have a small amount of RAM,
39:20
because RAM makes the chip bigger, and so it's a besides constraint, the Minix doesn't take up as much RAM as some other systems, you can run it in a, I don't know, I think 16 Megas is sort of the smallest size, we're focused on embedded systems, but also it runs on virtual machines on the desktop for example, we'll play with that. There's a feature that's not in the system yet,
39:42
but we're working on it, we hope to get it out there in the next release, but it's not quite working yet, it's a long story, it's a live update, okay? So software is updated for a variety of reason, for example to fix bugs, that's a very common reason, to improve performance, somebody tweaked something, made it faster, that's fine, you wanna add new features,
40:03
and the goal is to update the operating system in real time without rebooting, okay? So your attitude may be, well, you know, so I can reboot, okay? I just hit the button and it reboots, okay? If you're running a nuclear reactor, taking down the control system for five minutes
40:20
while you're rebooting, turns out it's not a popular thing to do, they tend not to like that, okay? So there are plenty of applications where they really don't wanna go down, ever basically, in the face of constant updates. And so live update allows you to update the operating system without affecting the application programs currently running. If you're running a web server
40:41
and you reboot the operating system, guess what? Your web server goes down and forgets everything it's doing and you've had no servers for a little while and we're trying to avoid that. We have update in place while it's running, okay? And furthermore, the new operating system version may have some different data structures. So the old version may have used the linked list for something or other, and the new version they're using a hash table,
41:01
so you can't just copy things over, you're gonna move the state from the old one to the new one because it may have changed or some structure's got a new member in the middle, so you just can't copy the bits over, much more complicated than that. And tends to be a lot of state, there's open files and timers and all manner of things. You've gotta be able to update those things, okay?
41:21
So here's an example of how this might work. So suppose you've got a patching running on, I don't know, FreeBSD 10.1 or whatever, and what you want is Apache is still running and now you're running FreeBSD 10.2, okay? So you've changed the operating system and the application is still there. It'll be a very short period of time,
41:40
when the application freezes because while you're actually doing the update, nothing happens, but that's only like half a second or something like that, okay? So the goal is to replace the operating system while the user processes are running. This is very hard to do with BSD or Linux or Windows or other operating systems, and we think we can do it, and we have it running in the lab,
42:01
which it's not in the current release. So here's like live update in Minix. You know, you're running, say, Apache, and you've got file system 6.0 running initially, and we're all done, you've got file system 7.0 running, and it's a different version, you know, it's got new stuff in it. So how do we do the update?
42:24
Well, there's some manager process which tells, say, the old file system, hey, we wanna update you, okay? So what it does is it says, oh, I better finish off all the work that I'm in the middle of, because it's doing operations with various other pieces and waiting for other processes to respond to it.
42:42
So it makes sure it finishes all of those, so there's nothing sort of in the middle. If new work comes in, it just queues the messages, saves them in RAM, but it doesn't begin processing them at all. It just keeps them in a buffer somewhere, and sooner or later, everybody who it was currently interacting with has said, I'm done, so there's no work pending with other servers,
43:02
and all the incoming work has been queued, and at that point, it can tell the manager, okay, I'm done, I'm ready to be updated. So the manager then says, okay, I'll create another new process, a new file system, as a separate process with a new code in it as another process. So now you've got the old one in theory running,
43:22
although it's not being scheduled right now, and then the new one is there, but it isn't running yet. So that's where you are initially. Now, inside the new file system, in fact, inside the old file system, are all kinds of tables listing every data object there. That's one of the things that we really like about LLVM.
43:41
It's programmable. You can write new passes for it. There's a whole infrastructure for writing your own pass to LLVM. So we wrote a pass which simply collects all the information about all the data structures and puts it in a table in RAM in a certain place and there's a pointer to it somewhere to list every single data structure, where it is, what type it is, how big it is,
44:00
all that stuff is in a table somewhere. So the file system, both the old one and the new one, know where all of the data structures are because there's a table listing it exactly, okay? That's very important. Can't do that with GCC. The new file system knows which variables and data structures it needs. So it goes to the old one, sends it a message, and says,
44:21
hey, I need this variable x. Give me the variable x. And the old one then replies, here's x. And then the new one says, okay, next thing I need is y. And so it goes and gets y. And it just keeps asking one after another for everything it needs. If it turns out that some data structure has changed in an important way, like what used to be a linked list is now a hash table,
44:42
then the new version has to have conversion routines to convert from the old to the new, okay? So the assumption is the guy writing the new one wants conversion to work, knows how the old one worked, knows what the data structure used to be and what it is now, and then it gets the old version of it. Internally, it does a conversion to the new format.
45:01
It puts the new one in place, okay? And then asks the next one. So these are actively cooperating, okay? So the assumption is these are not hostile. This is the same guys who wrote the old one wrote the new one, and they want the conversion to work. When all the state is transferred, then we create a third file system,
45:21
and it runs the process backwards. It talks to the new one and gets all the state and tries to recreate the old state from the new one, okay? If that works, then we're probably in business, okay? This is somewhat analogous to using Google Translate to translate English into German and then translate the German back into English.
45:41
And if the English you got the second time around is more or less the same as what you started with, it's a reasonable bet that the German was sort of more or less right, which you probably couldn't have gotten the right English if the German was all off, okay? So it's not gonna be perfect, but if it's sort of more or less the same, so we make that check. And if it's not the same, it's easy to kill,
46:01
we kill off the new file system, we go revert back to the old one, the update is aborted, and the old one continues to run. So the system is still running, it just, the update didn't take and the message is sent somewhere saying we didn't make it, okay? Okay, so how does the update work? Here's, let's say, Apache is running and the old file system is running,
46:22
and somebody says, okay, get ready, we're gonna update you, the new one is started, new file system, file system says to the old one, I need some verbal x, gets an answer, you know, here's x, repeats that a whole bunch of times, gets everything, then this third file system, whatever you call it, starts up. It says, I need x, it gets x,
46:44
and then it compares itself to the original one, and if they match up, we're in business, everything works, and we go forward. If they don't match up, if they don't match up, then, you know, we abort the update and continue running the old one,
47:02
nothing is lost, except the update didn't work. Somebody may have heard of Ksplice, this was done at MIT, actually by one of my former students, Franz Kaslick and his student, they can sort of update Linux in real time, however, they can only handle very small security patches, so it's a couple of lines of code, you know, something wrong, they put in a branch, the place where there's a problem,
47:21
branch to somewhere else, put in the new code, branch back, skipping over the old code that didn't work, so they can't handle major data structures change or anything like that, and so it patches the running process, okay, but over time, CRUD kind of accumulates in the process, because, you know, when the second patch happens,
47:40
it's somewhere else, the other patch is still there, and then there's another jump somewhere else, over a period of a year, memory's gonna be full of these little patches, okay, and there's also, if something goes wrong with the patch, there's no way to recover, in our case, we do the check, if it doesn't work, we just kill off the new version of, say, the file system, and we go back to running the old one,
48:02
and we're back to where we were, and no harm is done, we just go through the update, if they have a problem with the update, everything crashes, so it's really a much better scheme than that. There's some other interesting uses of live update, for example, there's a lot of security problems,
48:21
where the attacker basically knows the layout of memory very accurately, and does something like, you know, it creates a gadget for a return to libc kind of attack, or something like that, where, you know, it does a buffer overflow attack, and it overwrites, you know, the stack in such a way, when the current function returns, the return address has been overwritten
48:42
to a jump into the buffer that overflowed, and there is a piece of code that, you know, does whatever the attacker's trying to do, and to make that work, you have to have a very, very detailed knowledge of exactly the layout of memory, okay? We could do an update of the operating system
49:01
at a very high frequency, just changing it in random ways, for the purpose of foiling that attack, because the attack only works if you know what memory layout looks like, if we've got, you know, dozens of different memory layouts, and we're changing them all the time, you know, it's very hard for somebody to, you know, they can guess what the memory layout is, if they guess wrong, it's gonna jump randomly,
49:22
and you're gonna get a crash, okay? But a crash is much better than having your system being taken over, so it turns, this kind of returns a libc type, you know, attack into a crash, rather than a takeover, so that, I think, is a huge advantage from a security point of view.
49:40
It also, because you don't know what memory looks like, you can't steal information easily, because the place they think it is, it isn't there, it's somewhere else, okay? It's also possible to do garbage collection in C, you wouldn't think you could do it, but remember, the way it works is, you start a new version, the new one has a list of what it wants, it fetches all the things it needs,
50:00
lost memory that nobody's using doesn't get copied over, okay, because nobody ever asked for it, so when it's all done, it's copied over all the things it actually needs, and doesn't copy over junk that, you know, there's no pointer to anymore, if there's no pointer to some piece of memory you malloc'd a long time ago, doesn't get copied over, and only the things that are currently active, so effectively it's garbage collection,
50:21
illuminates memory leaks, even though you're programming in C, and the programmer doesn't have to know about it, it happens automatically, that's kind of an unusual thing, as only the live data's copied over. Okay, so this can fix memory leaks. Now, another research thing, which I don't think is gonna make it into the code, because of timing, is fault injection.
50:41
There's a lot of, you know, if you claim your system is faster than somebody else's, it's pretty easy to test that, okay, if you claim it's more reliable, it's not so easy to test that, and so we're working on testing reliability, and how do we do that? You know, we inject a fault at runtime, we have compiled in two versions of every basic block,
51:02
there's the real one and the faulty one, and then there's a test for the basic block of, should I run the faulty block or the real block, okay, so we have all this code, and this is generated automatically by the LLVM compiler, again, we program the pass to inject false as we wish, so the new program structure doesn't have basic blocks,
51:21
but got these little extra blocks with the test go this way or go that way, so we have a single binary, where we can run all kinds of different tests without having to recompile it, so we can run a whole bunch of different kinds of tests without doing a recompilation, which means we can do it very quickly, we can run lots and lots of tests
51:40
using this fault injection technique, and we can optimize the whole thing to produce a single binary to do it, and the overhead for doing this is about 8%, I think we measured it, it's not very much, so it gives us a whole playground, we've written lots of papers about it, you know, if you look up Eric Fondekawa,
52:02
my PhD student, he's written and published a bunch of papers and got the best paper awards for this, it's very cool, cool kind of stuff. Okay, we have a logo, it's a raccoon, we have sort of two logos, sometimes we use the full logo, sometimes only the raccoon's face, why a raccoon?
52:20
Well, it seems that a lot of operating systems have something of an animal logo, it seems to be kind of a thing to do, they're small, not small, but they're sort of smallish, they're cute, they're very clever, they're very agile, like an open garbage cans with their little hands, you know, they eat bugs,
52:52
and they're probably more likely to visit your house than a raccoon, the raccoons are very common
53:04
in North America. Okay, we have a website, minix3.org, here's a snapshot of the website, the documentation's in a wiki, so you can help us document things, sometimes people ask, I can't program, can I help? And I say, yeah, you can help document things. You can help document the system.
53:21
There's wiki and everything about the system is in the wiki, how to use it, how to be a developer, all that stuff is in there. Here's some stats about the traffic, it's a little bit old, but basically, we were running about 20,000 visits a month to the website, we've been doing this for over 10 years, there was a big spike in September of last year
53:41
when the release came out, we were on Slashdot, we had about 80,000 hits. The number of downloads we've had since I began logging it in 2007 is about 650,000, this is a conservative estimate, I've been fairly careful about not counting spiders and Arkansas stuff, and so we've been having something like 650,000 downloads,
54:01
so it's surprising for a small academic project, that's quite a few downloads, I think. We have a news group, I should do this on Facebook, but it seems like most serious developers know about Google groups, it just seems much more appropriate place to talk than Facebook, so we have a Google news group, it's on the front page, it says for group,
54:21
click here, and people can ask questions and discussions and whatnot. So anyway, the conclusion is I think current operating systems are kind of bloated and unreliable, Minix 3 is an attempt to produce a reliable and secure operating system,
54:41
the kernel is very small, it's about 15,000 lines of code, the operating system runs as a collection of user processes, each driver is a separate process, each operating system component has restricted privileges, for every kind of thing it might want to do,
55:00
like what other processes can I talk to, what kernel calls can I make, all that kind of stuff, there are bitmaps and tables inside the kernel that describe what it's allowed to do, so if a component makes a kernel call, the kernel calls are different than the POSIX calls, but that's handled at a much higher level, the kernel first checks, is this allowed, if it's allowed, then it does it,
55:21
if it's not allowed, it sends back a message saying no permission, so there's a very fine grain control over what a component can do, faulty drivers can be replaced automatically, some stateful servers can be replaced automatically, but only the ones where the state changes slowly,
55:40
currently, for example, an audio driver has a little bit of state, the base and trouble levels and volume and that kind of stuff, and a driver or any component with slowly changing state, whenever a state changes, can send a message to the data store saying please save my state, and it saves it, and when a new version comes up, first thing it does is go to the data store
56:01
and say give me my state, and it gets its state and then it can put the state back where it was, for things like the file system where it changes very rapidly, you've got other techniques which we're still working on, I'm not sure if they're gonna get in there, okay? Live update is possible, we have it working in the lab, it's not on the current release, we're trying to get that to happen, there's some pointer cases that we haven't got right yet,
56:21
but we're pretty sure it's doable. If you download MINIX from the website, give it a try, and there's a survey on the main page where we have 650,000 downloads, but we don't know who the users are, we don't know what they're doing, and we'd like to sort of find out, that's one of the disadvantages of the BSD license,
56:42
is we don't know what people are doing, that there's talk at two o'clock from some people using MINIX for something, I didn't know about it, it's one of the advantages of the GPL, you sort of hear, maybe you don't hear, they're using the GPL, they post the source changes as they're posted, but they don't tell anybody about it, but they're there if you look, you find it,
57:01
but it's not announced anywhere publicly, so we don't know who they are, so we're trying to find that out, and we're trying to build a community, and we're thinking of having a conference next year, I don't know, a wild idea, we're gonna hold some kind of a, whatever a conference means, instead of two ways we could do it, we're thinking of having maybe a dev room at FOSDEM in Brussels, and a lot of people from this community
57:21
and other communities go to FOSDEM, which is an interesting conference, or if we don't get a dev room, or it doesn't look like the right thing, we might hold a small conference in Amsterdam, the day before FOSDEM or the day after it, for people who are coming from far away, anybody coming from, say, the US, once you've come to Brussels, you can come to Amsterdam, it's only a small increment,
57:41
if you're coming from Spain, it's two separate trips, but we're thinking of doing it. Can I have a show of hands of how many people might be interested in coming to such a conference, one place or other? Small number, I mean, we're not expecting vast numbers, but 30 people showed up, that could be quite a successful conference, we're trying to build a community, and it's hard to do,
58:01
I don't know how many of you have some piece of software, where you're trying to build a new community for it, it's not so easy. Just a little ad here, we have a master's program at the Free University, if you're a student in computer systems, and you're interested in parallel or distributed systems, look for our website, Google me, that's the fastest thing,
58:21
look at my homepage, there's a link to it, there's a movie about it, there's a video there about our master's program, it's a research focused master's program, and most of the people who've gotten the master's have gone on and gotten a PhD somewhere, that's sort of a very common, it's a very research oriented kind of program, so if you're an undergraduate, and you wanna get a master's and get a PhD later,
58:41
this might be an interesting program, or look at pdcsparallelundistributedcomputersystems.food.nl, and that's the end, and I made it on time.
59:17
We won't have time for one question.
59:31
Actually, it's kind of two questions wrapped up in one, given the fact that you are kind of working on this almost 30 years, what would you have, the two most important things
59:41
that you would have done differently, and given the fact that you are aiming at embedded systems, maybe there is some industrial interest in the project? Thanks. I wrote a paper called lessons from 30 years of Minix, which describes exactly.
01:00:00
that. It's been accepted by the communications of the ACM. It's in the pipeline for publication now. They're kind of behind all the time. It'll be published there eventually, and that goes through lessons learned in many different spaces in great detail. I'd be
01:00:20
hard pressed. Knowing what I know now, I'd say, gee, we should have kernel threads. But that makes a much more complicated system, so I'm not sure if I would do it. The main thing I think I would do, this is maybe ridiculous, but the main thing I think I would do differently is when we switched from MINIX 2 to MINIX 3, I would have renamed it something other than MINIX, because too many people used it in a course in college
01:00:47
long ago and think, oh, MINIX is an educational system. It's not a real system, but it is a real system now. It just wasn't a long time ago, but nobody knows that because they associate MINIX with their college course. I had a friend who was in advertising,
01:01:02
and she said never throw away a famous brand name, but maybe it would have been good to throw away a famous brand name and pick some new name so people wouldn't associate it with the educational system, which it once was. What was the second part?
01:01:23
Are there industrial applications? Are we talking to companies? We were at Embedded World in Nuremberg two times. That's the biggest embedded systems fair in Europe. We were there two times. We had a stand, and so on. We talked to many, many customers, and a lot of them were very interested. They liked the open source. We had guys
01:01:43
who make trains, and they said, you know, a train lasts 60 years. We've got to have the source code because we don't know if that company is going to be around in 60 years. We've got to be able to maintain it ourselves or hire somebody to do it. How big is your company? We'll be around for a while. We didn't have a company yet, but we can't get a company until we had customers, and we can't get customers until
01:02:02
we have a company. There's this vicious circle. There was a lot of industrial interest, but it didn't pan out because we weren't big enough. Maybe if I were 30 years younger, I would try to get venture capital and do it the right way. So I'm not against other people who want to pick up the ball and go that direction. That would
01:02:22
be fine. We have a Minix Foundation, and we're trying to make it go as an open source project now. So we know that there's industrial interest in the sense that many people came to our stand at Embedded World, and they liked a long list of the properties, the self-healing. Guys who make thermostats, they're all, it's an iPad glued to your wall now, right? And it's on the internet. It means people can hack your house. So they're
01:02:44
worried about real-time updates to it. They're worried about security, all these things. But because we weren't a company, they couldn't start it, but we couldn't start it because we didn't have customers. It's that whole sort of vicious circle, and we took more time and effort and funding to get that started. So we weren't able to pull it off, but I'm certainly not
01:03:05
against other people trying to do that. So there is industrial interest. We know that, but we just weren't able to pull it off basically in a very short time frame. Okay.