I'm not a number, I'm a free file descriptor!!
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 34 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/38552 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Bangbangcon (!!CON 2016)23 / 34
3
15
16
17
18
19
24
28
30
32
34
00:00
Computer animation
00:11
Computer filePhysical systemWritingoutputOperating systemFile systemQuicksortNumberTable (information)MassCartesian coordinate systemOcean currentImplementationVirtual machineMechanism designInformationState of matterTrailProcess (computing)Slide ruleRadical (chemistry)Kernel (computing)Socket-SchnittstelleComputer programmingPlastikkarteJSONXMLComputer animationProgram flowchart
02:23
Message passingProcess (computing)Radical (chemistry)NumberLibrary (computing)Modal logicComputer fileFunction (mathematics)Open setComputer programming1 (number)Electronic mailing listWordInheritance (object-oriented programming)Gastropod shellLine (geometry)System callStandard errorRepetitionProgrammer (hardware)
03:41
Socket-SchnittstelleTime domainInclusion mapComputer fileKernel (computing)Process (computing)Limit (category theory)Process (computing)Domain nameComputer fileLink (knot theory)Socket-SchnittstelleNumberLimit (category theory)Computer programmingBuffer solutionDefault (computer science)Network socketSoftwareAddress spaceMiniDiscSoftware testingOpen setStandard deviationKernel (computing)System administratorMessage passingOrder (biology)RecursionReading (process)Table (information)Multiplication signGame controllerPhysical systemTime travelGoodness of fitSineProgrammer (hardware)Computing platformSystem callLattice (order)Sheaf (mathematics)State of matterComputer animation
06:59
Open setLimit (category theory)Buffer solutionKernel (computing)
07:11
Buffer solutionRing (mathematics)Computer animation
07:25
Operating systemSocket-SchnittstelleComputer fileSystem administratorProcess (computing)Table (information)
08:10
Interior (topology)Message passingComputer fileSocket-SchnittstelleNetwork socketCodeComputer fileRing (mathematics)Open setMultiplication signKernel (computing)Domain nameSocket-SchnittstelleBuildingNeuroinformatikReading (process)Lipschitz-StetigkeitQuicksortProcess (computing)SpeicherbereinigungMessage passingTraverse (surveying)Operating systemComputer programmingSoftware testingWeightFigurate numberTwitterSweep line algorithmComputer animation
10:28
FreewareComputer animationXMLProgram flowchart
Transcript: English(auto-generated)
00:21
I'm here to talk about the Unix input and output system. There is one particular aspect about the Unix IO system that everybody kinda agrees on, and the idea is that everything is a file. Usually when we talk about files, we think of entries in the file system, some path name that points to something that has bytes in it.
00:41
So these dormant file system entries aren't the kind of file that I'll be talking about today. This talk is about the files that processes have opened, that they are reading and writing. These are called file descriptors, and I apologize for the quality of the drawings in these slides. So one day I was talking about Unix
01:00
with my colleague Nelson, whom you have heard of in Kieran's talk. We all work on the systems team together, and Nelson knows kernels, and he sent me down the path to see what weird things Unix lets you do with IO, and I have learned that Unix does not disappoint when it comes to weird things.
01:20
I found mechanisms to circumvent limits and break down process isolation, ways to bring a machine to its knees, and some very fun implementation details in the Linux and OS X current. So file descriptors are really the concept behind how Unix does input and output. So in application they're just numbers,
01:40
but they hide a massive amount of complexity, and they also let you do incredibly weird and funny things. So a number alone doesn't carry a whole lot of information. It's just a handle to an entry in a table somewhere that holds that file descriptor state and all other bookkeeping stuff that the operating system keeps for you. This is a primitive but simple API design. Numbers are very easy to pass around for programs,
02:02
and the OS can keep track of all that state in the table and easily retrieve it using that number as an offset. And they're also not just files. They can be pipes, sockets, terminals, all sorts of weirdo stuff in Linux. So that's basically any kind of input and output on the Unix system goes through one.
02:21
Let's look at an example. So here's the output of a program called LSOF. You probably can't see this in the back. I'll describe it in words. LSOF is short for list open files. It shows a bunch more stuff than just the open files, but the last three lines, glad that I have people sitting in front here,
02:41
are the important ones that I'm gonna focus on. So I ran a command to grep from a pipe and direct the output of grep into another file. And you can see that grep has, or you could see if you could see in the back, I'm sorry. Grep has a pipe opened as file descriptor zero and the output file opened as file descriptor one.
03:03
And then there's also a third file descriptor, number two, which is standard error, which goes to the terminal. All of these are conventions necessary for the C library. They're meaningful there. Everything after file descriptor number two is up to the programmer. And also, file descriptors that are opened by one process
03:22
can be passed off to another. So the terminal in LSOF output was inherited from the shell. This is a process called inheritance. Child process will inherit a parent's open files. Or you can pass file descriptors
03:42
between arbitrary processes using Unix domain sockets. The API is complicated, so I'm not gonna talk about that very much. But to a programmer, they look like easy network sockets that you can connect to by a path name instead of a network address. So something like slash temp, slash MySQL dot sock.
04:03
They also work only on the local system, so never across a network. And because they work only on the local system, you're guaranteed to get every message out the other end that you send, and always in order. That's kind of great. They're also kind of a wormhole.
04:20
So they let you send files from one process to another. The process to do that is kind of complicated. Basically, you send a message containing out-of-band data, containing control messages, mentioning those file descriptors, and out the other end, the process reads that message, and as soon as it reads that message, it has the file descriptor on its file table. The timeline here, though, is interesting. The sending process could close that file
04:42
before the receiving process picks it up. And it's kind of like time travel, in a way. This gets pretty cool. So from this, we learned that the kernel makes sure that the files stay open somehow. There must be some global table
05:00
that holds all the state. And the kernel then closes them, hopefully, when all their references are closed. Hi, Karen, please don't read that section either. So yeah, sockets are also file descriptors. And you can totally send Unix domain sockets down Unix domain sockets. Socket recursion is totally a thing you can do.
05:24
So people who design IO layers like Unix, they love to put buffers and limits everywhere. Same for the number of files that the process can open. That is, there is an upper bound to that. And what if I told you that using Unix domain sockets,
05:44
you could circumvent that upper bound, you can totally circumvent that open file limit. Oh, extended links have different defaults. So a standard comes with 7,000, Linux has 1,000. You can reset that as a system administrator. You can just bump that limit for any process.
06:02
If the process needs more, you kind of have to budget. But what if you require them to remain open and you don't want to talk to the system? Well, there's a workaround. There is a workaround. You could make a pair of Unix domain sockets
06:22
that both connect to the same process. Then you send the file descriptor in one, close the file descriptor on your end, and then when you need it back again, you read it out the other end. Totally works. So I wrote a little test program to do that.
06:40
This illustration hopefully makes sense to somebody, maybe me. There's a great Unix disc called socket pair. You get two Unix domain sockets that are connected to each other. Works exactly like I described it. Works across platforms, it's wonderful. So how many more files can you have open? On Linux, about 500.
07:02
That's when the buffer runs full or when the kernel tells you you're over the limit. On OS 10, about one and a half thousand. That's kind of great, but can we take this further? Yup. So you take one of those ring buffers, that I described, and then you recursively stuff that
07:21
into another ring buffer. Totally works on Linux. So yeah, you can have 200,000 open files, and then the operating system itself can no longer open files. This is really confusing to other processes.
07:45
Also to the sysadmin. Since each of these file descriptors is no longer on your file table, LSOF doesn't show them. All it shows is these weirdo sockets. Of which you have two.
08:05
Yeah. So Linux is really cooperative with this. OS 10 does not like it. What worked was that I could send messages into the inner ring, then put it on the outer ring, close the inner ring, read that inner ring back from the outer ring,
08:22
read messages from the inner ring, but those messages did not have file descriptors on them. So the only conclusion that I can draw, and I confirmed this with Nelson, who knows kernels, by the way? Is that BSD-like kernels like OS 10 close all sockets trapped in a pair of Unix domain sockets
08:43
once you close both ends. So every time I close the inner ring, all its containing files were closed. It does not matter that the ring itself is trapped in another one. It's kind of a reasonable behavior.
09:03
Just like, do this if you don't want your operating system to run out of open files. It's great. And it's definitely better than leaking them. But what does that mean about Linux? So, does Linux do garbage collection? Yeah, it does.
09:20
It totally marks and sweeps from open sockets. It traverses each one to see which file descriptors are in there. And if they are Unix domain sockets, it recursively traverses those up to, I think, four layers deep. So yeah, that's Mark and Sweep garbage collection right there. And I find this kind of amazing.
09:41
Linux has a garbage collector in it to ensure that my silly test program works. It's so considerate. It's really kind of. So what have I learned from this? Weird computer behavior is awesome. Building these tiny test programs and trying out how they break and what they break, that allowed me to take a mostly opaque thing
10:01
and figure out how it works. To even find some unintended consequences and pretty pathological behavior in the process. It's kind of awesome. This makes me really happy every time it happens. So thank you so much for listening to me. I hope you enjoyed my talk. If you want to try out my test code, go to boinkvore.net slash go slash fdfun.
10:21
And if you do, I would really appreciate if you come talk to me or if you hit me up on Twitter, I'm at Antipooks. Thank you so much again. Thank you.