Early days of Unix and design of sh
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 41 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/18650 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
4
8
13
14
16
22
25
33
39
00:00
Queue (abstract data type)Value-added networkAlgebraGame theoryCompilerCodePhysical systemSpherical capMetropolitan area networkGastropod shellVariable (mathematics)Kernel (computing)Portable communications deviceData modelModemBinary fileDirectory serviceFeedbackMiniDiscComputer fileWeb pageInterface (computing)Computer iconGamma functionKerr-LösungInternet forumRead-only memoryData managementRevision controlSocial classStorage area networkForm (programming)Computer wormImplementationMathematical singularityMultiplicationScripting languageLoop (music)Data recoveryPort scannerGame controllerOpen setProgrammschleifeElectronic mailing listPattern languageSpring (hydrology)PermanentStack (abstract data type)ParsingMemory managementComputer fileGastropod shellGame theorySubject indexingSpring (hydrology)BitLine (geometry)Revision controlIntermediate languagePhysical systemInterface (computing)Endliche ModelltheorieProcess (computing)Kernel (computing)Social classVideo gameGame controllerElectronic visual displayVisualization (computer graphics)ModemMultiplication signData managementMiniDiscWeb pageVirtual machineSemiconductor memoryNeuroinformatikMathematicianTape driveGroup actionProjective planeBeat (acoustics)Rule of inferenceSoftwareSource codeBootingSlide ruleContext awarenessDirectory serviceLattice (order)State of matterStandard deviationDifferential equationDevice driverAlgebraic structureDataflowPatch (Unix)Software developerLimit (category theory)Bell and HowellMachine codeString (computer science)Disk read-and-write headoutputComputer programmingTable (information)System callThread (computing)Formal languageMereologyVector spaceFile systemProduct (business)ConnectionismMemory managementVirtual memoryRegular graphScripting languageIterationResource allocationMathematical optimizationBlock (periodic table)CodeControl flowFerry CorstenPointer (computer programming)Statement (computer science)Bound stateResultantCoroutineNP-hardSoftware bugMechanism designCASE <Informatik>WeightQuicksortSet (mathematics)Hydraulic jumpSurfaceSign (mathematics)Functional (mathematics)CompilerParameter (computer programming)Goodness of fitPattern languageType theoryAddress spaceProgrammschleifeForm (programming)EmailSequenceExpressionWordOffice suiteAxiom of choiceReading (process)CodeException handlingParsingAngleIntegrated development environmentData structurePoisson-KlammerComputer clusterInteractive televisionVariable (mathematics)Object (grammar)Representation (politics)SpacetimeNetwork topologyInterleavingChainInheritance (object-oriented programming)2 (number)Stack (abstract data type)Field (computer science)Constructor (object-oriented programming)Point (geometry)Condition numberLevel (video gaming)Substitute goodLengthExterior algebraMathematical analysisConstraint (mathematics)Single-precision floating-point formatGreatest elementPerformance appraisalTelecommunicationOdds ratioDecision theoryBuffer solution1 (number)Radical (chemistry)Roundness (object)Utility softwareAreaProcedural programmingTransmitterExecution unitWhiteboardFunction (mathematics)Operating systemDifferent (Kate Ryan album)Declarative programmingTypinferenzRandom number generationDependent and independent variablesData storage deviceFehlererkennungAbstractionProgramming languageHost Identity ProtocolPower (physics)Software engineeringError messageCartesian coordinate systemFamilyAuthorizationWaveNumberMetazeichenAssembly languageDebuggerMobile appPhase transitionExploit (computer security)Macro (computer science)FirmwareEvent horizonElectronic mailing listVector potentialMachine visionDaylight saving timeComputer-assisted translationTerm (mathematics)Letterpress printingAlgebraRow (database)Physical lawDirection (geometry)Position operatorPattern matchingMaizeSuspension (chemistry)InterprozesskommunikationCuboidOnlinecommunityMetropolitan area networkResidual (numerical analysis)Boundary value problemBlogStapeldateiInformationObservational studyTrailBit rateSound effectDrop (liquid)Program slicingExpert systemMassSurreal numberDivisorPhysikalisch-Technische BundesanstaltDelay differential equationWindowOverlay-NetzHeegaard splittingElectric generatorNumbering schemeAdditionRight angleStaff (military)Library (computing)Traffic reportingPlanningCoefficient of determinationUniformer RaumElectronic program guideData recoveryOnline helpSineMathematicsDemosceneSatelliteForcing (mathematics)Variety (linguistics)Theory3 (number)ACIDPolar coordinate systemInversion (music)ProteinRootCellular automatonCausalityNominal numberBuildingSoftware testingPeer-to-peerHypermediaOrbitLatin squareComputer simulationWave packetGodSampling (statistics)Order (biology)Fundamental theorem of algebraLogicMusical ensembleBoss CorporationSyntaxbaumSpeech synthesisMaxima and minimaDivision (mathematics)P (complexity)Data conversionVorwärtsverkettungService (economics)Discrete groupSheaf (mathematics)Arithmetic progressionJSONXML
Transcript: English(auto-generated)
00:00
And as it says here, I've been doing nothing useful at Rally Ventures for the last 15 years. And before that, I was doing things that were half useful. And as it says also, ACMQ is the magazine that George and Kirk and Eric and a few of the other people help out with. So let me start. This talk is mostly about things
00:22
I did once upon a time and then about things we're doing today and mostly about why we did it and what we wouldn't do again next time. And by the way, is there an RMR counter in the room? Because there needs to be, because I do RMs and Rs. And my grandson said to me, don't RMR when you're up in front of people.
00:41
I want somebody to wave their hand when I do RMR. I got to Bell Labs. I was in Cambridge for 10 years before I got to Bell Labs. So I'll tell you a little bit of background so you can understand why I did some of the things I did. I was there with Wilkes and Wheeler. Wilkes invented microcode. Wheeler invented subroutines. So it's kind of cool to be around those guys in those days.
01:01
I started life out building algebra systems so that we could solve differential equations for the moon orbit, which was a lot of fun. Except we didn't have an extensible language, so we were writing all the machine code. And we started designing our own language, which was called, well, Zed was the language. And actually, just as a side story,
01:21
we actually ported Zed to a PDP-7 with a 340 display. And that's how Conway got the visualization for his life game originally, and as he was designing it. And Conway was a good mathematician. And one of the guys who was on the Algol 68 project with me, Mike Guy, he was a good mathematician, too. And so Conway would come visit Mike, and they would play with his game.
01:41
So that was pretty cool. That was a lot of fun. I spent five years in Cambridge before I came to Bell Labs on writing an Algol 68 compiler, and that was a lot of fun, too. And actually, it still works, for what that's worth. I arrived in Murray Hill in 75, and I did wanna show you one thing, by the way.
02:03
This, when I was, the mathematical laboratory was called in Cambridge in those days. It was actually a computer laboratory now, which is Andy Herbert, and those guys have been running it. We actually had these machines around. In fact, I've still got one, and it's in pieces. So if anybody knows how to put one of these things back together again, can you come see me afterwards?
02:20
I actually, to be honest, I think I lost one of the springs, but maybe we can find a spring for it. Anyway, just to keep going, the story about this is that JCP Miller, who was a famous numerical analyst, they called him in those days, used to race people, people would try to write a program, and he would do it on his machine, solving a differential equation. And he'd say, I beat you the first time,
02:41
but you'll beat me the second time, because the software was written for the second time. But the first time, he could actually beat machines doing that. Anyway, enough of that. So I got to Bell Labs in 75, and we were around with the UNIX group then. I'm gonna put my glasses on, if you don't mind, because I can't see my slides. And some of this is in the history books, and I don't really want to go over it too much,
03:01
just except to set the context for what we were doing. And the best version of this, if you're interested or if you haven't read it, and if you care about this stuff, is the retrospective that Dennis Ritchie wrote, which you can easily grab online or whatever you want to do. The whole project was funded by NROF and TROF, which was text processing system,
03:20
that was used in the lab, and that's how they got their money. And you can see, and most of you probably know this, what the state of the system was in those days. I want to say a little bit, excuse me, I want to say a little bit about the key players.
03:41
Dennis obviously you know about. When I was there, he was doing C and writing the drivers. Ken was mostly a kernel guy. Doug McElroy was the pipes guy, you know that. But Doug was also kind of, he liked Snowball and stuff like that, and he was into string processing and substrings and things. So there's a lot of work done on that with him and a couple of the other guys. And Doug was always,
04:01
had something sensible to say if you went and asked him about stuff. Joe Sanu actually didn't, he died in 77 actually. He did all the NROF and TROF stuff. And you know about Brian. Mike Lesk did TBL X and stuff like that. So there's a lot going on. And the development situation was actually kind of fun.
04:23
We were in two rooms, two attic rooms, no windows, with air conditioning around and stuff and tables. I would say there was typically 15 guys in the room, maybe 10 to 15. And we had a PDP 1145 and model thread 33 teletypes. The fun part of it was we had
04:41
Tektronix 4014 displays, which I don't know if you've ever seen one of those, but they're actually very cool. And they were a vector display and you could use those. You see a couple of pages of TROF output on it. The network, there was no ethernet in those days. As you know, this was in the seventies. But we did have, excuse me, we did have UUCP running
05:01
and that was something that Mike Lesk did. And the UUCP ran over 1200 board modems out to the world. So that was how it was in those days. The thing about, that's important about this is when we would write a piece of software or change a piece of software, it would get deployed in the room wherever it was sitting.
05:20
And if it didn't work, you found out pretty quick it didn't work because they would beat you up. And so, and so, but the point is you were personally, your reputation was personally accountable for what you were doing right there with your peers. And so if you screwed it up, so you were careful about testing your software before you put it out, even to that small group.
05:41
I've run a bunch of engineering groups since then and tried to instill that idea in people that you're actually personally accountable for the software you write. That works sometimes, it doesn't work other times. We had the source code and manual pages online. The manual pages were very cool in those days. These days manual pages are kind of hard work
06:00
because there's so many of them. But in those days you could actually read the manual page you could figure out what's going on. It had nouns and verbs in it. So you could figure out what nouns meant and what the verbs did. So, but that was very cool in those days. And maybe some of the manual pages these days are okay, but that was what we did. Oh, the other story is I don't have a picture of an RPO5 disk, but it was like a washing machine
06:22
with a top surface on it. And you could pull the platter out. And one day somebody, I think it was Ken, or somebody spilled some coffee in it, and that's, you know, the disk crashed and it wasn't a happy disk. So the next day, the RPO5 disks had a sign on them on the flat part of the disk. This is a vertical surface.
06:43
So it was a fun group. Sixth edition, so this was 1975 we're talking about. It was written in C, there were 40 system calls. The Gratman page was 20 lines. You could read it, you actually understood what it was. And the rest of it you can see it's the same.
07:01
I just want to show you, this is a bit of fun, I like to show this. This is the index to the manual. So there's page one. I'm just going to go through this slowly. You don't need to read it all, you know what it says. There's page two, we already passed the commands now. So it's pretty good though.
07:23
And then system calls, there they are, subroutines, C library, libc as it was then. There's no standard IO by the way, it didn't happen yet. And then special files and stuff, and that was it. That was the index to the manual. So it wasn't complicated, and most of that stuff is survived as you know, some of it didn't.
07:41
Most of it survived. So let me talk about the seventh edition. This is really a timeline here. So in 76, and I'll tell the story in a minute how we got to doing this. We wrote the shell and wrote adb. 77 make lint and awk showed up and ucp was already there. 78 we did the 32 bit port, so this is just like a snapshot of what happened.
08:02
And I'm going to get corrected here by Kirk I'm sure, but I'm pretty sure the Berkeley guys were involved in the sixth edition too. I think you took the sixth edition and played with it and ran it on the PDP-11 like we did. Anyway, so, but then we did the 32 bit port and published the seventh edition and improved the file system and blah, blah, blah.
08:22
And I'm going to talk a little bit about that. I do want to talk a little bit about what happened in, so Bill Joy called up one day, and this is in 77 I think, around about then. He said, well, I want to put this stuff in the shell. Will you put it in the shell? And it was like history and job control and stuff. I said, well, you know, it doesn't really belong in the shell. You know, it really belongs in a terminal,
08:42
window, manager, whatever you want to call it. It doesn't really belong in the shell. So he said, yeah, I know, but I'm going to do it anyway. So that's how the seashell happened. He went off and did it, which is just fine. And of course we had VI and stuff like that. The other story about this time is actually 1980
09:03
was when I was at the lab, and I'd invited Bill over to give a talk or something. And as usual, he was late. He flew through Chicago and got lost, which was typical of Bill. And he got there, and when he, so after the talk, he dug out of his bag a tape, you know, the big old tapes.
09:22
I said, what's on that? He said, BSD 3.0 or something. And he said, I said, does he have virtual memory? He said, yeah. So I was running a computer group then. I had three or four VACs, 11, seven, eighties at that time. This is 1980. And we were running LISP, and we didn't have virtual memory because we had the riser operating system,
09:41
which was using the old, no-man virtual memory. So anyway, so he said, yeah, it's got virtual memory involved. I said, well, does it work? He said, yeah. So I said, well, let's go install it on my machines. And this was a lab. I had like 30 or 40 people using computers, you know. So Friday night, we jacked up the file systems, and by, I can't remember what time it,
10:01
how long it took, probably a few hours. And replaced all the operating systems on all the VACs, 11, seven, eighties in this lab. Monday morning, I come in and talk to my boss, who was Hal Alice at the time, great guy. And I said to Hal, guess what we did over the weekend? He said, what? I said, well, we replaced all the operating systems on all the computers.
10:20
He said, does it work? I said, yeah. He said, oh, okay, great. And it did work. I mean, and the guys who were running the LISP stuff, they were ecstatic. I mean, because they could now use as much memory as they wanted to do all the LISP things.
10:42
Anyway, so that was the BSD story. And that was the first time, I believe that was the first time Berkeley Unix got back into Bell Labs. And Bill wrote it up somewhere in some magazine and said, you've got to take risks. So, seventh edition.
11:00
So one of the things I did was for the seventh edition, I was sort of like the release cop, or whatever you want to call it. So what we would do, the rules of the game were if you touched the source of a program, you owned it. And so you were pretty careful about screwing around
11:21
with the source unless you really wanted to get into it. So I did end up owning a couple things. I ended up owning the loader and a few other things, just because I wanted to mess with them. And that was how it worked. And Make had been introduced in 77 by Stu Feldman, as you know. And that was an idea that Steve Johnson gave him for this Make machine thing.
11:42
So we had Make files for everybody. So the rules were you put it in the standard release directories. We will compile it overnight and make sure it works. And in a relatively clean, relatively, I say, because we didn't really have clean rooms then. You've got to have a Make file. You've got to have a manual page. And if you don't have any of those things, well,
12:02
we didn't actually delete them. We just sent you a nasty gram and said, can you go put the Make file in there and the man pages and blah, blah, blah. So anyway, we didn't have source control in those days, of course. But actually, that was how we managed the 7th edition release. And it worked pretty well. So let's get to the story here.
12:22
So Dennis and I were in a meeting in December of whenever it was, 75. And people at the time, the original shell was very limited. It didn't have variables. It didn't have all kinds of things. It didn't have any flow control. And people were starting to mess with it
12:41
and patch it over and try to fix it. And so Dennis and I got out of this meeting. I can remember it now. We walked out of the room. And I'll show you the room, if you like. We walked out of the room, and we kind of hung our heads down and, oh, just another one of these. Oh, I can't believe they're doing this. Anyway, so we said we've got to fix it. So we decided then that we would go and rewrite it
13:04
and redesign it. And we knew we couldn't patch over the existing one, because people were trying to do that, and it had too many limitations. For example, you couldn't use a shell command file as a filter, because it was with a shell command file itself was the standard input. That was how it worked.
13:20
So you couldn't have shell files which were filters, which was like, what? And so you couldn't pipe in and out of them without doing some gymnastics. So we went off, and at the same time, it turned out Ken Thompson had left for Berkeley for a year, and he actually owned the original shell. And he probably would have been OK
13:41
if he'd been there, if we'd just gone ahead and done it anyway. But the fact that he was out of town made it easier. Ken's not here, let's just go. Also, at this time, the PWE shell that John Mashew wrote, PWE's program was Workbench, that was another group, a business group at Bell Labs that were using it for production
14:00
work, serious production work, actually trying to make money for the phone company. That started in mid-'75. So anyway, we started writing code, and we actually got deployed. The original versions, where it came out in early-'76, only took a few, two or three or four months to actually write the thing and get it out there and get it working.
14:21
So let me talk about the shell as a language, first of all. And first of all, it's a type. So the issue with a shell as a language is that strings are first class citizens. So you're screwed. I mean, you can't easily add types, or you can infer types.
14:42
But at the time, we said, because the last thing you want to do is use both interactively and as a scripting language. And so, interactively, there's no way you want to quote your strings, which is what you do in a regular programming language. So you're constrained by, that's the biggest constraint on the language itself.
15:01
And we managed to put a few things in there that allowed you to do some string processing. And these days, I've looked at some of the things that have been done since, and people have added stuff to do string processing that, in that dollar stuff, you can do crazy things. So, but it does provide a program interface.
15:21
And the whole point of the shell, of course, is that it's the interface to the system. So you're really trying to map the system and manage the system resources for the user and through scripting. And it turns out the hard parts were the signal management and the process management. I'll come back to that in a minute.
15:41
So where are we? Oh, yeah. So, you know, I had this ALGOL 68 background, which you can beat me up later about. And if you were at the USENIX conferences 30 years ago, you've already done that. But the thing that I liked about ALGOL 68 was it was expression language, which
16:01
means you could take any of the constructs and replace it with any of the other ones. So for example, you could take any command and replace it by a conditional, and it would return a value which would then be used as the value of that expression. And so you can basically substitute anything anywhere. There's no constraints on the language.
16:21
Well, there are some constraints, obviously, but there's not much constraints on the language. The other thing that was true at the time was a lot of the utilities had arbitrary string lengths. You know, they had some 500 character buffer. And if you ran over the buffer, it either crashed or told you to go away. And I didn't kind of like that idea, particularly in the shell. So I made a decision early on that there
16:43
would be no arbitrary limits on anything except memory, except how much memory there was. And that'll come back to bite me in a minute. I'll tell you that. The other thing that was hard about the whole thing was the quoting rules. And they went through a bunch of iterations for the first couple of years.
17:01
And I'll talk a little bit about that, too. Oh, yeah. So it turns out we used the return code of commands as the thing to tell you what the result of a condition was. If mumble, mumble had a return code. If the return code was zero, it was true. And if it was non-zero, it was false.
17:23
Most of the commands did not have return codes in them. The exit statement in most of the programs was exit, pren, pren, which is like that's a random number. So I had to get people to go back and fix their codes so that there was a sensible return code.
17:41
So you could use, if the name command succeeded, it would return zero. And if it failed, it would return non-zero for some reason. So that didn't work. I asked people, and they were busy doing something else. So what I did was, in the shell, when it prompted, just before the prompt, it would say what the exit code was for the command that was just
18:02
exited. And so we go exit equals $4,271 dollar sign. And that pissed everybody off, and we got the exit codes fixed because it didn't take long.
18:22
It only took about a week. The other thing which was really important at the time, which I haven't said much about yet, but it's a thread through the whole thing. We were running on a PDP-11. Performance was really important. And so some of the things I did and some of the things that were done were done simply just to make sure
18:41
that the thing performed well. And of course, in the shell, the biggest performance hit is forking and the amount of memory you're using when you do a fork. And so you'll see why that matters in a minute. And then the other thing I did, which I'll defend somewhat, is I did my own memory allocation.
19:02
And I'll explain why, too. So what's the shell got in it? It's got all this stuff in it, and I'll just keep going because you've seen all that. And actually, here was a complete list. I'm not going to show this for long, but what I did is I went back through my notes and my files and stuff, and I dug out things which actually
19:21
got written at the time. And that was the product data sheet. I don't know what you call it these days. But anyway, this was the note that was sent out to everybody saying, OK, here's what you get, guys, with the new shell. And it was all that stuff. There's a couple of cool things in there.
19:41
You can pipe in and out of loops. I thought that was pretty cool. Yeah, so what else is in there that's interesting? Well, the signal management was really hard work. And the last two things you'll see there is about weight being interrupted.
20:01
It took me a good two or three months to understand the Unix signal mechanism and how it affected, because the shell actually has to get it right. Because you're sitting there, and if you hit interrupt, you actually want it to do something or not, depending on what you're trying to do. And there are lots of corner cases in there
20:21
that took a while to figure out. And so you can see some of them in there. And in fact, some of the bugs that were in there for a year or two after the initial release were to do with signal handling. And so anyway, that's that list. And the other thing we did was we built the pattern matching into the shell itself.
20:40
It used to be done by a subcommand called clob, which random stuff went on in those days. So Unix at the time, what were we doing? So this sort of explains a couple of things about why I did things the way I did them. When it was written, we didn't have standard IO. We didn't have, we did have yak and lex.
21:03
And we actually talked about using yak and lex, but they're pretty heavyweight things to have. And it's not very hard to parse the shell syntax. It's not very difficult, guys. It's a few, couple, three, four pages of C, and you're done. And so that wasn't a problem. There was no standard IOs, no string copying routines,
21:21
so you had to write all that yourself. So that's how come we ended up doing that. And by the time 7th edition came along where these things were done, I was done. It was 78, 79, and I was like, go rewrite it just to put standard IO in for what? So I didn't do that. And by that time, I'd also written it my own memory allocator, which I'll talk about in a minute. And so it didn't make sense to go back
21:43
and rewrite all that, because there was a lot of work writing memory allocator, and you'll see why in a minute. And there was no set jump and long jump, so Dennis added those for me, because there's no other way to do error recovery in a thing like the shell, unless you can get back to somewhere safe in some context where you can actually do something useful.
22:02
So, yeah. And the other reason not to use malloc at the time was that, as I say, I was doing my own memory allocation and I'll explain why. So, let's talk about memory management, because when people did the port from the PDP-11,
22:23
which we never expected people to do, we didn't think it was, we just were playing on the PDP-11. You know, we didn't expect it to get used by lots of people. I mean, we knew it was gonna get used in the lab, but we had that under control ourselves. So anyway, when people were doing ports, I used to go to USENIX meetings and people would sort of grab me by the throat and say, why did you do that freak stuff
22:41
the way you did it? So the way that it worked, and I'll, actually I should, I'm sort of getting ahead of myself here, but, so let me just say what the problem is and why you need to do your own memory management. Or at least, first of all, the overall flow for this thing is you read something in, stick it in an internal representation, then you evaluate it off an internal representation.
23:03
And whilst you're evaluating, you can be doing all kinds of stuff. So you can be forced back into reading again, which is back at the top of the system. So if you look at the example on the page here, it says, you know, assign something to this variable, and you're gonna, and so that gets read in,
23:20
it doesn't get executed, so now it's in the tree. And then the next thing you gotta do is go treat everything in between the quotes as something you're gonna read in and evaluate. So you've got a completely recursive system going on here. And whilst you're doing all of that, you're allocating heap space so you can store variables, as well as putting things on the stack, your own stack,
23:41
because you're gonna throw them away when you're done with them. You know, when you're done with executing this, it's gone, except the value of x is retained. So we had an interleaved heap and stack, which I'll draw a picture of in a minute, and the stack was used for permanent objects and including the parse tree, which goes away, and as I said, no arbitrary length restrictions.
24:02
I do wanna give credit, by the way, there's a guy, I haven't met him, George Collier, Jeff Collier, excuse me, I don't actually know him, who did a thing called the partial tour through the Unix shell, and he actually had done a bunch of analysis on it, and I suspect he used a later shell than the one I wrote
24:20
because there was some bugs in it that weren't mine. There were, by the way, there were some bugs in it that were mine, so I'm not saying I didn't write, so I'm not proud of that, and then I wanna show you something. I was in a conference in Japan in 1990, and some guy shows up, and he says,
24:41
I wanna give you this, Dr. Boren, whatever he called me, and so it's the source code, can you all see it? It's the source code, and it's annotated. It's completely annotated, it says, and it's in half in Japanese and half in English,
25:00
and it's got pictures of the memory layout and what goes on and blah, blah, blah, so that is so cool, and so thank you Akira Nakamura, I think that's his name, and if there's anybody who speaks Japanese in the room, please come by and make sure I got that right, but anyway, thank you to those two guys for going through it and actually understanding it
25:20
because they understood it better than I did. True, so the memory allocation, so as Kurt pointed out, I asked him to take a look at the slide, he said, yeah, this is upside down, I know it's upside down, I'm sorry, but the bottom of the memory is at the top of the slide, sorry, and the top of the memory
25:41
is at the bottom of the slide. Having said that, the C stack runs from the top of memory down, so you don't have any control over that, you can't put stuff in there because it'll go away, and you could put some global variables in there if you like and use that, and the bottom of the memory is where the S break area is in the old days anyway,
26:03
and so we ran S break from the bottom, and the heap and the stack are mixed up in there, and every now and again you're running a stack, and you suddenly have to allocate more heap, so that piece of stack now has to get covered up, and you have to go back and recover it later when that piece of stack gets released,
26:22
and so that gets returned to the heap, so there's a interleaving of the heap and the stack as you do that, and this next slide will show you the actual detail of that. It was actually a relatively simple scheme, and it did take a month or two to debug, because memory bugs are a pain in the neck.
26:43
So you can see the bottom of the heap is there, there's interleave items in there, there's a thing in there, that stack busy pointer was the pointer to the stack blocks that have got covered up by heap allocation that you needed to do as you were evaluating, and then the top of the heap is where it says,
27:02
and then above that is the entire stack, which is currently free, and it was usable, and so there were a bunch of routines I had which would simply write on the top of the stack. Of course, you couldn't retain pointers from stack base up for very long,
27:21
because you certainly couldn't retain a pointer over heap allocation, because the top item on the stack would get moved because you hadn't finished constructing it yet. So if you did have to cover up, I don't have a pointer, but if you did have to cover up the top of the stack item you'd have to move that item somewhere
27:42
so you could keep growing it. So anyway, that was the memory, and the thing that got everybody upset, so I went to Dennis one day, one of the things I didn't think we could afford to do was when you're pushing stuff on top of the stack was to do a bounds check
28:00
to make sure you weren't gonna hit the top of memory, because that would have been four instructions instead of one on a PDP-11, and all you're doing in the shell is constructing strings, that's all you're doing, except for forking and execing, so that's all you're doing. So most of the performance you get
28:20
is from that piece of code. So I said, so I was like, I can't afford to do this, so I went to Dennis and said, what happens if I'm riding up near the break end and I get a memory fault? Can I just like recover in the fault routine,
28:40
allocate more memory and then go back to what I'm doing? And he said, oh sure. So I'm sorry to all those people who had to find that the hard way, but, and we didn't document it, it was never in the break spec, and a lot of machines can't do that, that's not easy to do. So anyway, that's why we did it though,
29:02
was because simple performance of that. I'm gonna skip this next one. That's the, I'll skip that. Okay, so what were the hard bits? Signals and process management was hard, I've talked about that a little bit. Quoting, and I'm gonna talk a bit more about quoting.
29:21
We did add a bunch of error recovery over time, and also error reporting was improved. Debugging the memory allocation, that's obviously not easy. The here documents turned out to have some, you know, the inline documents, turns out to have some non-trivial problems.
29:40
You know, for example, and one of the things we did is we copied a here document to a temporary file, and then you have to make sure the temporary file goes away, and good source of bugs there, and the usual corner cases. So let's talk about quoting hell, because that's really, that was, this took two years to get this straight, and I'm not sure it's straight now even,
30:02
I mean, but it's true. It's half straight. So, there are three mechanisms. Parameter substitution, argument splitting, which means you're reading words, and they're delimited by meta-characters or blanks, so you have to break things up into words.
30:21
And then file name generation, those are the three basic mechanisms. So if you look at the choices here, you know, you've got three values for three parameters. The angle brackets are meant to delimit the actual string, so that's a null string. Second one is A space B, the third one is an asterisk.
30:42
So what are the choices? If the argument is, and you can see, you go down through the choices, and what we ended up with was, and by the way, the single quote we didn't add until a year or two later, because all we had originally was a double quote, which meant you had to escape stuff,
31:00
and so there was no way to have, so you'd be typing in the shell script or something, and you'd have to remember to quote everything, you know, with backslash. So that was, so, but it did take a while to do that, and that's where we ended up. And of course, the difficult part of this was explaining what gets evaluated and when.
31:24
You know, for example, if you've got a command substitution and you're reading that, does it get evaluated when you first read it, or does it get evaluated when you execute it, and so on. So the evaluation rules are a little complicated. So this was how the meta, what I call the meta character rules ended up,
31:43
and inside a single quote, which is the first line, that nothing is interpreted except another single quote, that's it. So that was the clean one. That was McElroy who really pushed for that. He said, you've gotta have, he said, I can't deal with all this quoting stuff. You've gotta have something. So that's how we ended up with single quotes.
32:00
The back quote, which is the command substitution, which I would have done differently these days, and in fact, I think the newer shells have a dollar, round paren, something, which is a much cleaner way to do it. Excuse me, didn't think of that at the time. So you can see which ones get evaluated then, and again, it's, you know,
32:21
backslash inside of command substitution, quotes things, and then the terminator is this other backslash, and then double quotes is what it is. But even then, I mean, that looks simple, but even then, it's not obvious when you're reading scripts sometimes, what's what. So what do we do?
32:42
So we did convert the units group. I've already told the story about, and here was some of the bumps in the road that we ran into as the group got converted, and it was this group in this attic I'm telling you about. So I told you about printing actually equals thing. So originally, you know, I had do odd,
33:01
you know, that's the way our 68 did it, and that's the way we did it. Of course, odd was a command OD for optimal dump, and so we had to change it. So of course, we found that out the hard way. And this is a quote, actually. So go to went. The go to used to work by the shell files, as I said, in the old days,
33:21
they were, a shell script would have the standard input was the script. So go to was just a command, and you went off and executed the command, and what it would do is it would share the standard input pointer, and it would seek through the file, which the parent was executing, well, was waiting for you to finish, but when it had, it would seek through the file
33:42
looking for the label, and when the parent came back, the parent shell come back, it would, the input would have been moved, and so you've gone there. That was how go to worked. I didn't believe in go to, so I got rid of it, and this was one of the groups that we gave the shell to,
34:02
or gave the system to, I should say, and you can read what it says. It says it was terribly sorry it's gone, but we'll get over it. And then the wait command being interruptible, and then this again was something, in the old days, you type wait, you waited.
34:23
So we made it interruptible. You know, what the hell, you know? Actually, it turned out that's non-trivial, and shell scripts, as Phil said, talked about that. Oh yeah, the other thing that went on was, so in the PDP-11, you could indirect through address zero and get a zero, and so a bunch of code did that, and I can't remember why we did it,
34:40
or what, you know, if it was just bugs, or if we did it for a reason, but anyway, it didn't work on other machines, so there was a bunch of, when Steve Johnson did the 32-bit port, he had a bit of work to do to get rid of some of them, and I wish, well, I didn't do the work to find out why. I talked about the S-break story. So the other battles were the sea versus shell battles,
35:02
so this was, I call it a battle, it wasn't a war, because we were all in the same room, and so this was my note from SRB. I sent this email out to Doug, I think, and a couple of people, and said, okay, guys, because they were pushing me to make it like sea, which is a reasonable thing to consider,
35:22
and the reason I didn't do it was because you can't, sea is not an expression language, and there's even more important reason I didn't do it, and that is if then else in sea requires you to look ahead, because you don't know until you get to the next part if it's there or not, and you can't do that interactively.
35:41
You get one line, you go to the next line, you're done. You know, you can't look ahead to the next line, so you know, you type if something something, and then it would sit there. So, interactively, you can't do the look ahead that you need to do to do the sea type if then else. That was my reason anyway. As the best I could come up with, okay? So, but no, seriously,
36:03
and as this note says, I don't see any hope of these two things looking the same, because, you know, the shell is a string language, and sea has got all this other stuff in it, so eventually we, we, I got agreement that we could go ahead with it, but then there were all kinds of debates about,
36:22
I think, McElroy used to call it the word drow problem. You know, that was do-done and all that other stuff, if phi, and so, and then there were various proposals about how to get rid of those funny words, the word drow, and the other thing we did at the time was originally I had four and a while
36:41
in the same construction, which is what, for example, our 68 has, and there's no reason not to do that, but it seemed like in a programming language, you do it for a reason, and that is you're scanning through some things, looking for some condition on whatever it is you're looking at, and that doesn't happen much in the shell
37:00
in the kind of programming you do, at least I don't think it does, and so we separated out those two constructions. I'm gonna skip that slide. Yeah, oh, here's a good one. So, there was a proposal at one time just to scrap if-then-else and use ands and ors and stuff
37:23
so I just thought I'd show you what if-then-else looks like if you do that. That kind of persuaded people that it was okay to keep if-then-else. This is about keeping case ESAC. Case was kind of a new idea,
37:42
you know, do the pattern matching like that, and but there were various attempts to mess with it, and I don't know if you can really understand what this is, I'm not sure I can, but it looks like it was case $1 in something, and then you look inside something to find out what $1 is supposed to do.
38:02
I'm not sure about that slide. Okay, and then, so there were various, okay, so let's move on now to environment variables. So, that was the syntax, and we sort of finally won those battles and moved on. I had been using TSO for my sins,
38:21
which is IBM time-sharing online system, because I've been writing my compiler on the IBM machine, which is where it ran, so I've been over there doing that and the TSO command language had one thing in it I liked, which was it had name keyword parameters, which you could, which would transmit through what we would call in Unix process boundaries,
38:43
and so I put in keyword parameters early on in the shell, but there was no mechanism in Unix, and they would transmit to shell procedures, because shell procedures were executed in the same process, so within a shell process, you could share keyword parameters, and so I did that, and then I tried to talk Dennis
39:03
into putting something in Unix that would allow us to transmit keyword parameters through the process fork exact pair. Actually, the fork does it, the exact doesn't, of course. So, in the end, we did, and there were lots of debates about what the interaction was between the shell's keyword parameters
39:21
and what they looked like, and so what the syntax was, for example, of, and originally, I'd had keyword parameters anywhere in the command line would be interpreted as a keyword parameter, and unfortunately, there were some commands that already had that in them themselves, like DD, for example, has X equals Y in it, so great, you know. So, in the end, we ended up with keyword parameters
39:42
ahead of the command name and or using some set thing, but it did take a while to settle that down, and it took a while for, to get it into the Unix, and I'll show that in a second. So, here were the rules. So, this was an email I sent to Dennis, I think,
40:02
about what I thought the principles should be for environment variables, and this was a PCS, it's a process, sorry, it was a shorthand we used for process, so that you have local names, the names don't get passed to children unless you want them to,
40:22
and the golden rule of Unix was that a child cannot fade its parent except through the file system. In other words, there's no way a child process can change what a parent process wants to do, and that was the golden rule, and so we had to work through all of that,
40:43
so it says here, but then you wanna be able to, you don't wanna also, the other thing you don't wanna do is have to have each command or process know all the names that need to get passed down, because that's crazy, right? I mean, the idea that everybody in the chain between here and some context, and down here where you wanna do something
41:02
has to know what all the names are and repeat them, so we didn't wanna do that, and that's what the last comment says, the process shouldn't have to be a postman of all names which are passed, because then you're writing things that the intermediate processes shouldn't know anything about, or certainly don't wanna know anything about.
41:21
There was an alternative to environment variables which was proposed, and I won't go into it much, but it was basically, there were four levels of scope, and this was from Mike Lesk who said, hey, here's the deal. You know, there are automatic variables, subroutines, external variables, shell environment variables, whose life is the session. This was actually quite a good way of looking at it,
41:44
but he wanted to unify these mechanisms, so that you didn't have multiple ways of doing things in all these different places. And I think nothing actually came of that. So anyway, in the end, we did finally, 1978,
42:01
so it took two years to get that designed, and there were lots of debates back and forth about not using keywords as end-of-process communication and back and forth, so we finally got that done. I've talked a little bit about quoting, but this is also an email
42:21
that came out in 1978, and so you can see how late it was before we fixed the quoting mechanisms. That's my marketing speech, in response to popular demand. Please let me know if there are any problems. I'll be on vacation.
42:43
And then you can see, that was when it got finally settled down to where it is today. Okay, and then these were also things that happened actually fairly late in the process. Oh, that's not true. HERE documents were done early on. They were done in 76. And then we put the quoting of HERE documents
43:02
so that you have them interpreted or interpreted as you want, so that was that. And then here was the last thing. So it was mid-77 when we finally got the shell to perform reasonably well,
43:20
and that was what it says at the bottom there. So modification we made, now runs twice as fast. So that was pretty cool. And that allowed us, by the way, this is important, because this, and I'll look at the next slide, you can see why. So there was this, I told you about the group in Piscataway that was doing programs workbench. That was the group we had to sell,
43:42
because they were the biggest users of the system out there. And you can see what they said. They said, here's the deal. And the deal was, we're running this thing, we've got our own version, it runs pretty well. A lot of differences between what, detailed differences between what our shell does
44:01
and what yours does, the research shell it was called. But if you can give us a big performance improvement and or some significant improvement in expressive power, both of which we actually ended up doing, it was greatly, but we did end up both more expressive power and we beat them on the racetrack.
44:21
So that was how we got to sell the PWB. And this just gives you an example of the kind of differences there were between the two shells at the time. And these guys had, they were claiming that they would have to spend millions of dollars to convert their shell script code base
44:44
to the research shell. And so you can see the kind of, some of this is easy to understand and some of it is gobbledygook, but you can see the kind of differences there were. There were just all kinds of detailed differences, which for a group that's got a bunch of code and they're in production and they're on a deadline,
45:01
was not very exciting. So we did finally succeed in getting the shell into the PWB group and I'm guessing they were happy because I never heard from them again. Not true, it turns out John Mashey
45:21
actually lives down the street from me in California, so still around. Okay, so that was sort of how we got there and why we did some of the things and hopefully after the, after I finish nattering up here, you can come and beat me up and tell me or ask questions. I do want to talk about one or two, a couple other things and then I'll wrap.
45:44
So as I said, I was writing an algol68 compiler and writing it, I was porting it to the pdb11 and so one of the things Dennis did for me, because it was too big to run on a pdb11 in 64K memory, but it was big, each phase of the compiler, the three phases was big enough. So what Dennis did for me, he put in overlays,
46:03
a form of exec that just overlaid the text segment, not the data segment, the text segment, the program. So he did that and so I was able to get the algol68 compiler running, but I couldn't debug it, so I had to write a debugger and the C debugger and all that stuff was very specific to C and that environment
46:22
and so there was nothing that would just debug and assemble a program, which is sort of what I had. So I ended up writing ADB and it was meant to be sort of a clean debugger in other words, it didn't know about the C stack trace and stuff like that, but it knew more about the memory layout
46:43
and the text segment and the data segment and stuff. So I wrote that, but interesting problem, I never actually got to fix, but I would like to have fixed it if I could have done and I should have known better when I started writing the code, was when we ported ADB, there's the machine you're compiling it on,
47:02
has certain length events and all this other stuff, there's this machine you're gonna debug, that's over there somewhere and you're gonna run it on a third machine potentially. So how do you write the code so that you can keep those three things clean? I started working on it before I left the labs
47:23
and then I gave up because it was too, certainly the code base I had was just, you probably have to start again to think about that. But anyway, if anybody's ever done that, let me know and you can tell me how you did it. So last slide on the system and what we were doing.
47:42
So I did say that at the time there was no standard IO, there was no, none of those things at the time. When I got to the lab in 75, I was kind of an hour 68 freak and so I said to Dennis things like, well, why do functions always have to return a value?
48:02
And he said, well, I don't know. So that didn't go anywhere. I said, I want void, I wanna be able to say this function does not return a value. I don't like just saying return zero when I have nothing to say. Why do I have to say something? In the end, I went into his office one day, actually, I think he might come into my office and he said, I figured it out.
48:22
He said, if I don't have to return a value from a function, I can save one instruction in the return sequence. That's how we got void into C. And then, when I got there,
48:42
you know what PO1 structures are like? They were offsets into some base and that was how C was when I got there. And Van Wyngaarden, where's Andy? Where's Andy? I thought he was here. Anyway. Oh yeah.
49:00
So, I used to say, Andy knows Van Wyngaarden, right? Because you were there. And Van Wyngaarden was sort of grandfather of Iowa 68. He used to say, well, cows don't have peaks and mountains don't have legs. So why are you able to select the leg of a mountain when, you know, if it's a structure with fields,
49:23
when it doesn't have one? So, but if you have base and offset, you can do that. So anyway, we did finally get C to change this way that structure fields were done so that they were associated with the structure. So you couldn't select a field that wasn't in the structure that you were selecting from, which was the point of that exercise.
49:43
You know, I've talked about most of this. Yeah, the other thing that took me forever and I still don't understand it is what the difference is between a definition and a declaration. I used to go into Dennis's office and say, where does storage get allocated and blah, blah, blah. And I still don't understand it, but maybe all you guys who've been mining C
50:01
forever know how to do this. So, I did this for fun. Can you read it? I thought, oh, nobody's using this thing anymore. And it turns out, that's not true. It turns out people are still using it, which I thought was pretty surprising.
50:22
Anyway, that's the thing from the book. Thank you, yes, it was a surprise to me too. So, last couple of slides and I'll get out of here and let you get back to your meeting.
50:42
So, the original Thomson shell had three manual pages. My shell had six, which was a lot more, twice as many. I did a hand, I did it by hand. I went, you know, I had it in VI and I went, blah, blah, man, mumble on this thing for bash
51:02
and I think I got to around 110. That's kind of a lot. It's too, my brain is not that big, you know. And so, and then the POSIX and K shell stuff, that was done in the early 80s. They were sort of tied together at the hip because Dave Corn was really driving some of the POSIX stuff at the time.
51:22
And I couldn't find the note, but Dennis and Doug and I sent a note to the POSIX committee, grumbling about something. I can't remember what. And then we gave up. We just said, oh, we can't deal with this. So, that was our POSIX. You know, the POSIX, I'd already given up work on the shell because at the lab, there was no real incentive to maintain stuff.
51:41
You know, your merit of you every year, if you hadn't done something new, you didn't get a pay raise, you know, so. So, anyway, so I sort of gave up messing with the shell in 78, 79. And then Dash seems, I don't know, maybe we can use that, but I've tried it a little bit. Seems like it's actually not quite so heavyweight.
52:05
Okay, shell shop. So, I just thought I'd throw this in for fun. I don't actually know anything about this. This is speculation about what I've read. And if anybody knows anything about this, I'd love to have them come by afterwards, and at least, because I claim I didn't write the code.
52:20
Actually, it's true, I didn't write it. But I suspect the code was written in 93, which is when the Bash guys changed over from one company to another or something, and there was a new Bash author in, and again, I'm not using names here because I don't want to accuse people of things they didn't do.
52:40
But that's what it looks like. And it looks like it was a simple error in the way, and they used some part of the code that would read a command line and then just run it, as opposed to, you know, just only picking off the function piece, which is what they should have done. But to their defense, or to whoever did this as defense,
53:03
the environment for the exploit wasn't around when they wrote the code. Come correct me if I'm wrong, but I don't think Apache was really deployed much in 93, was it? Was it? Yeah, you guys know more about this than I do, I'm sure, but this is just my crack at it.
53:21
My other question is, wait a minute, this isn't around for 20 years, so who's been using it? I mean, don't tell me that nobody used it in those 20 years to do an exploit. I don't know. I'm just speculating, I thought it was interesting. Okay, so here's the last slide, the last slide but one.
53:41
What do we do that I wish I hadn't done? And well, first of all, I put ALGOL 68 macros in the source code, like I defined if, capital if, to be if paren and all this other stuff. And that was so that I could read my own code, because I couldn't read that if without the stuff in it. And you know, every time you see a little curly thing,
54:00
where's that belong? I have no idea. So I couldn't get used to that idea, so I write my own macros, and we did get rid of them. When I left the lab, I got rid of them before we left, because again, I would go to USENIX meetings and get a scruff of the neck. Why do you write it like that? Memory management, that was very fragile, wouldn't do that again.
54:21
Find another way to do it, I don't know what it would be. I did not put functions in until 82, and I don't know why I didn't do that, because it's obvious you should have done it, because it allows you to do function. We thought that commands shell scripts were their own, was the functional abstraction mechanism, it's not true, it's not that.
54:41
And so, I have no idea why we didn't think of it at the time. I would, even now, I don't know how you debug shell scripts, maybe you guys know. So, I wouldn't actually mind doing that, that's not hard to do, and it would be fun to do it.
55:02
So, I might actually do that. And the other thing we did, was the other trick we played, which not a big deal, but this also was efficiency, and huge efficiency, that we had eight bit bytes. And so, in the old days, we only used seven bit characters, only seven bits were used
55:21
in the character representation. And so, the eighth bit was mine. And so, I used it as the quoting bit. You know, this character is quoted. Great, you know, because now you can run, you don't have, of course, the Unicode guys came along and they weren't happy about that.
55:44
So, that did need fixing, but by the time they fixed it, you could afford to have two bytes per character, one for the quoting bit, and one for the rest of it, or whatever. But in those days, doubling the number of characters, you know, or finding some other way to represent arbitrary strings, and have them quoted,
56:02
would have been a pain in the neck. So, I wouldn't do that again though. I mean, in the same timeframe, for the same reasons, I would have done it, but it certainly came back to virus. The other thing that was true for this whole project, was that we deployed in 76, and we spent two years iterating
56:21
with the user community, still doing design, really. You know, we fixed the quoting, we did a bunch of other things. So, and that's the way software should be written, as you all know. Have you real users? We did have real users. They were using the system we were building. And I can't believe nobody read Rob Pike's paper.
56:46
But, yeah, 110 pages, seriously. So, now what? I just want to tell, do I have a minute for a short story, and then I'll wrap. So, I was having a beer with Andy last night,
57:02
and I told him a story, told him the same story, because he just had his retirement party. Andy, where are you? Did he leave, honey? No, he's still here. So, when I, so I'm, that was old fart talk, right? About things I did a long time ago. So, the question is, what have you been doing for brain food lately? And when I quit Cisco in 1999,
57:22
I was having dinner with Morris Wilkes, and he said, I said, he said, what are you gonna do? He said, I'm gonna retire. And he looked at me, he said, you can't retire. I said, okay. And what he meant was, you have to have brain food, right? So, what do I do for brain food?
57:41
Well, ACMQ is a lot of fun, and we've been publishing now for 10 years. George and Kirk are on the board, and Eric is too, and some other people I may have missed who are here, I don't think so. We focus on the problems, not solutions, which is a cute way of saying the marketing guys can stay out of the room. The, but not that they're bad guys,
58:03
it's just we're writing it for software engineers, and software engineers don't want to hear what marketing people have to say, particularly. We are trying to move up stack, and what we mean by that is, you know, Unix has always been about infrastructure and stuff, and the Q stuff is a lot about infrastructure and that lower level of the system and stuff like that. And we do think that as a magazine,
58:21
we need to move further up into the application space. We're not quite sure how to do it. We're experimenting, but that's one of the things we're trying to do. And we're gonna copy what you, I think you guys have copied the firmware we used to get articles, I think, is what I heard, and we're gonna copy what you're doing with the BSD publication. Journal app. Journal app, thank you.
58:40
We're gonna copy that. And so, I think we're coming out in two or three or four months with a Q app that will allow, which will have a every two months publishing model. And the other thing I've been doing is, we finished the port of the ALGOL 68 compiler, which was written in itself, to a PC, and we used C as the intermediate language,
59:00
which is what Doug McElroy said it was for. It's true. He said it's the best assembler on the planet. He was right. And so, we did the port using that as the intermediate language. Of course, it has its own register-based intermediate language, which you translate from, blah blah blah. But it needs some work. It's really an archeology project, and it's not useful because it doesn't have interfaces to libraries and stuff.
59:21
And so, we will publish it at some point, but I don't want to publish it until I put at least a live C in there or something useful. So, anyway, that's it, guys. Thank you. And... Thank you. Thank you.
Recommendations
Series of 17 media
Series of 5 media
Series of 8 media