Past, present and future of system containers
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 50 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/43116 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
All Systems Go! 201822 / 50
1
4
11
12
13
14
16
17
19
23
24
25
29
30
32
34
35
39
40
41
43
44
50
00:00
SpacetimeSoftware development kitSystem programmingCanonical ensembleProjective planeSystem programmingComputer animation
00:20
System programmingOverhead (computing)VirtualizationScalabilitySystem programmingOpen setMedical imagingInteractive televisionVirtualizationType theoryPoint (geometry)Data managementVirtual machinePhysical systemVector spaceBitWeb serviceTime zone
01:20
ScalabilitySystem programmingSingle-precision floating-point formatBitRepresentational state transferNamespaceData managementMappingScripting languageUsabilityDefault (computer science)Letterpress printingKernel (computing)Intrusion detection systemMeeting/Interview
02:10
ScalabilityVirtualizationMultiplication signSoftwareGraphical user interfaceExtension (kinesiology)Computer architectureKeyboard shortcutLibrary (computing)Interactive televisionVirtualizationKey (cryptography)BefehlsprozessorData managementDemonCartesian coordinate systemFocus (optics)Kernel (computing)Virtual machineComputer animation
03:30
VirtualizationSystem programmingBlock (periodic table)BefehlsprozessorComputer networkDirected setEvent horizonVirtual machineBitRootkitSupercomputerSoftwareBlogDirection (geometry)Link (knot theory)Musical ensemblePlastikkarteNamespaceSystem programmingKernel (computing)Block (periodic table)Meeting/Interview
05:01
Event horizonPropagatorBitEvent horizonComputer animation
05:44
Generic programmingRootkitLocal GroupRankingPlastikkarteTotal S.A.Directory serviceServer (computing)Address spaceLogicSoftwareCASE <Informatik>Process capability indexFigurate numberMetropolitan area networkLine (geometry)Kernel (computing)PropagatorRight angleParticle systemDressing (medical)Entire functionSpacetimeBit rateComputer animation
08:04
Structural loadPolygon meshShader <Informatik>Programmer (hardware)James Waddell Alexander IILogicBenchmarkBenchmarkExecution unitServer (computing)Computer animationSource code
08:26
Resource allocationBenchmarkRead-only memoryShader <Informatik>Core dumpPolygon meshEvent horizonSystem programmingInformation securityModul <Datentyp>Web serviceProcess (computing)Server (computing)Service (economics)Existential quantificationInformation securitySystem programmingKernel (computing)Event horizonSpacetimeInjektivitätModule (mathematics)2 (number)CodeBitData miningVirtual machineSource codeComputer animationMeeting/Interview
09:46
Information securityModul <Datentyp>Web serviceProcess (computing)System programmingNormal (geometry)Service (economics)Mobile appProfil (magazine)BitComputer animation
10:41
System programmingInformation securityModul <Datentyp>Web serviceProcess (computing)Profil (magazine)Mobile appLeakMultiplication signSoftware bugSet (mathematics)Structural loadRootkitMeeting/Interview
11:38
Information securityModul <Datentyp>NamespaceComputer fileNamespaceProfil (magazine)Structural loadInterior (topology)Electronic mailing listLimit (category theory)Level (video gaming)Right angleComputer animation
12:10
Information securityCanonical ensembleRankingComputer virusColor confinementError messageUser profileElectric currentMobile appProfil (magazine)Different (Kate Ryan album)Greatest elementSoftware testingInstallation artComputer animation
13:12
Information securityModul <Datentyp>NamespaceConvex hullStack (abstract data type)System programmingElectronic visual displayComputer configurationLevel set methodProcess (computing)Mobile appStack (abstract data type)NamespaceElectronic visual displayLevel set methodPatch (Unix)System programmingMultiplication signBootingType theoryWebsiteProof theoryRouter (computing)Student's t-testComputer animationMeeting/Interview
14:16
Information securityModul <Datentyp>System programmingLevel set methodComputer configurationElectronic visual displayStack (abstract data type)RootkitComputer fileKernel (computing)Computer fileRootkitAndroid (robot)Wrapper (data mining)BitMathematicsMeeting/Interview
15:20
Computer fileRootkitInformation securityType theoryMiniDiscRootkitInstallation artHD DVDMereologyRow (database)Kernel (computing)Computer animation
15:50
FingerprintRevision controlRepository (publishing)Process (computing)Database transactionPlug-in (computing)Image resolutionTotal S.A.Installation artError messageComplete metric spaceIcosahedronPrincipal ideal domainStatisticsSystem programmingDemonClient (computing)RootkitVideo game consoleComputer fileKernel (computing)SoftwareSpherical capExpected valueRippingComputer animation
16:43
Block (periodic table)Computer fileSystem programmingInformation securityVirtual realityData storage deviceSoftwareBlogBitLoop (music)Computer fileSurfaceBlock (periodic table)Computer animation
17:08
System programmingBlock (periodic table)Information securityVirtualizationExpected valueTheoryBitPoint (geometry)Block (periodic table)View (database)Complete metric spaceGame controllerSoftware bugCASE <Informatik>Loop (music)Information securityFile systemComputer fileKernel (computing)Meeting/Interview
17:57
System programmingComputer fontRootkitTotal S.A.Computer fileBlock (periodic table)Information securityVirtual realitySurfaceMeeting/InterviewComputer animation
18:21
System programmingBlock (periodic table)Information securityDirectory serviceFile systemBitMappingMultiplication signIntrusion detection systemRead-only memoryComputer fileMedical imagingSystem programmingDifferent (Kate Ryan album)RootkitFile systemEntire functionRight angleArithmetic progressionShift operatorMeeting/InterviewComputer animation
19:39
System programmingDirectory serviceFile systemDirectory serviceDifferent (Kate Ryan album)MappingMultiplication signLevel (video gaming)Intrusion detection systemMeeting/Interview
20:11
System programmingIntercept theoremGroup actionRootkitSystem programmingIntercept theoremKernel (computing)SpacetimeRight angleGroup actionParameter (computer programming)RootkitUniform resource locatorReal numberFile systemGodComputer animation
20:57
System programmingIntercept theoremGroup actionRootkitLimit (category theory)NumberProcess (computing)Total S.A.BefehlsprozessorRead-only memoryMiniDiscSpacetimeComputer networkBlock (periodic table)CASE <Informatik>Multiplication signLimit (category theory)Demo (music)BitMessage passingSinc functionMeeting/InterviewComputer animation
21:25
CoprocessorLimit (category theory)Read-only memoryMultiplication signBefehlsprozessorMathematicsLimit (category theory)Group actionComputer fileFunction (mathematics)LaptopSet (mathematics)Semiconductor memoryComputer animation
22:27
Limit (category theory)Parity (mathematics)Convex hullSystem programmingGroup actionFile systemCASE <Informatik>System programmingEquivalence relationEntire functionTask (computing)Video gameFilter <Stochastik>BitNetwork topologyWorkloadMultiplication signFood energyComputer animationMeeting/Interview
23:55
Process (computing)System programmingHuman migrationRollback (data management)Game theoryKernel (computing)Computer fileMoving averageHuman migrationBitState of matterMultiplication signProcess (computing)Control flowMiniDiscGame theoryRollback (data management)Kernel (computing)Computer animation
24:38
State of matterType theoryRootkitSystem programmingInterface (computing)Dynamic Host Configuration ProtocolAddress spaceRange (statistics)Point (geometry)Web pageCore dumpInformationFile systemProcess (computing)Human migrationCore dumpData structureConnected spaceWorkloadSystem programmingKernel (computing)Computer fileState of matterMultiplication signComputer animation
25:42
Process (computing)System programmingHuman migrationRollback (data management)Kernel (computing)Game theoryWorkloadMultiplication signTable (information)Computer animationMeeting/Interview
Transcript: English(auto-generated)
00:06
I'm Stéphane Kraber, I work at Canonical, I'm the Alexia Nexty project leader, and today we're going to be talking about system containers and camera features. So, briefly, what are system containers?
00:24
They're the oldest type of containers, really, they originated with BSD jails, I don't know, 12 years ago, thereabout. Then Linux v server, salary zones, then Open VZ, LXC, and now LXT, which I'm working on.
00:43
The main goal of system containers is to behave exactly like a physical system or virtual machine. There are no special images or anything, you run a full unmodified Linux distro effectively inside your container and interact with that exactly as if it was a normal system.
01:02
No virtualization is needed, because that's still a container, that's kind of the whole point of containers. Now, as for LXT itself, LXT is a somewhat new, I mean, I keep saying new, it's like three or something years old at this point, container manager with a REST API,
01:23
you can use a script, it's got a nice and user-friendly command line, it's pretty fast, it is secure by default, we use user namespaces for all of our containers, unless you opt out of that, you can even have per container maps of your IDs and GIDs if you want,
01:43
to make them even safer, we use all of the available LSMs, so seccomp, appammer, we use capabilities, we use pretty much every single bit of kernel API that's available to make them safe. And it's pretty scalable, you can use it locally for like your two or three containers,
02:01
or you can go like full cluster and run 10,000 containers if you feel like it, that's the same API, same CLI, same user experience, and you can scale very easily. For those of you who've played with Chromebooks somewhat lately, they've got a new Linux apps feature on the Chromebooks, well, that's TextD,
02:21
they're running, TextD is shipping on all the Chromebooks these days, and that's used to run Debian containers directly on your Chromebook. Now, as for what TextD isn't, as I mentioned, it's not a virtualization technology, it does not use any CPU extensions for that regard,
02:41
you can totally run TextD on a Raspberry Pi or whatever you feel like it, it works on just about every architecture out there, it is also not a fork of LXC, it is a Go daemon that uses the Go LXC binding and label LXC under the hood to drive the kernel interactions and use all the nice kernel features,
03:04
because doing that directly from Go is not always pleasant. And it's also not an application container manager, so we will not be running Docker containers with FlexD, we don't really have an intention of doing that anytime soon, you can totally install Docker inside a LXD container,
03:21
if you feel like it, that works just fine, but we really see addiction containers as a way of distributing a particular piece of software, whereas our focus is through an entire machine inside a container. Also, for the rest of the talk, we're going to be going through a bunch of new kernel features and other bits of interesting API we are using for system containers.
03:48
I'll mostly be focusing on unprivileged containers, so I don't really recommend anyone run privileged containers in general, so when we say something cannot be done or we need new kind of APIs, it usually means something cannot be done inside an unprivileged container.
04:03
Yes, if you've got full root access within a privileged container, you can probably do it, but you can also probably break the entire system. So, just something to keep in mind for the rest of this talk. Now, the first thing I want to go through is devices, why would you want devices attached to a container?
04:22
Well, maybe you want a GPU, maybe you want some USB device, maybe you're doing HPC and you care about InfiniBand networking and RDMA, maybe you need direct network access because you don't want to use bridging and all that stuff on your fancy 100 gigabit network card, for example, or you just want access to any character or block device on the system,
04:43
say, a USB cellular link or some science equipment or whatever. Containers are a bit special in that regard. There's no such thing as a device namespace, there's no nice way of attaching devices to a container. Containers can run UDEV and legacy containers usually do,
05:02
but they don't really get any U events, which makes that somewhat pointless until we've got some kind of API we're working on. Containers also cannot use dev tempfs, but it's not in a very useful way, which means that you need to pre-create all the device nodes that a given container needs to use.
05:21
That also gets funny when a container is running and you want to inject a new device inside it because, as I'll show in a tiny bit, you can't actually make node inside a previous container, so you need to use mount propagation tricks to propagate a device from the host into a running container. All right, so let's just show a few interesting things.
05:45
Let's do that, and let's do that, and there. OK, the first thing I want to show is the entire make node issue. I'm running a modern kernel, so I'm running a 4.18 kernel on there, which has an interesting behavior.
06:01
Interesting in that it broke a bunch of user space, but you can actually make node. Oops, it already exists, and it's just delete and create. There, I've make-noded major one, minor three. That means dev null. So if we compare dev blah with dev null,
06:22
they look kind of the same. You can write to both of them, and major and minor lines up. OK, so that should work, right? So if I was to write to dev null, no problem whatsoever. Now, if I try to write to blah, it doesn't work. That's the new behavior in the 4.18 kernel, which does let you make node things, but also marks them in a way that makes them
06:41
completely useless afterwards. That's slightly frustrating for any piece of software that tries to make node, and then if that fails, do something sensible, because now it doesn't fail. It just fails when you try to use it later on. So that's something to keep in mind. The old behavior was that make node just wasn't allowed at all in any way.
07:01
Even though you can make node these days, you can't make node anything useful, so you might as well consider you don't have it. As far as devices, let's look at a GPU case. So I've got a container. This container is kind of boring, because, oh, I forgot to delete the device.
07:20
Sorry. It should have been empty. Let's make it empty. So, node of the array nodes, and now let's say I want to pass a GPU, and I want a specific one, so I'm actually gonna give it a PCI address. Now, we need to do the interesting logic of going through a sys, figure out what device nodes are tied to that particular,
07:41
what driver is tied to that particular address, what devices are tied to it. LexD does that. It uses a man propagation trick to inject those devices inside the container, and you get those. And you can sometimes actually make use of that.
08:02
That's much better. So that's what I'm seeing here, which is the UNIGINE heaven benchmark running inside the container that's got the GPU access and access to the X server. And as you can see, that's running just fine.
08:25
Let's close that stuff. And back to this. So that's kind of where things are, but we also added a new kernel API recently that lets you actually inject your events from user space into a particular container.
08:44
So going forward, the idea is that we will have LexD, as it does today, listen to your events. If they're relevant to the container, they can then be injected inside the container, which then means you'd have inside the container can react to them and can do useful things. We've got people trying to run CODI and X servers
09:01
and whatnot inside containers, and they've got a bit of a problem when they plug a USB keyboard or mouse, that kind of stuff just doesn't show up and X just ignores it. You need to actually bounce X so that it notices something's been plugged. Where if you have an injection that's been done by Christian Brown on my team, we're gonna be able to fix that.
09:21
That's already mainline, we just need to use it from user space. Another thing that we're looking into for, well, that we need to deal with for system containers is security modules. So if you've got a full machine, you may want to protect your services.
09:42
That means attaching a Palmer policy or second policies or SLNX directly to a bunch of services running in there. Sometimes the init system will do that for you, sometimes you do it on the side, whatever you feel like, but that was a bit of a problem when we couldn't do that inside containers. We also had the issue back in the days
10:01
where the host policies, at least for our Palmer, which is path-based, were actually leaking into the container. So if you had a policy for some binary on the host and the same binary existed in the container, the policy would just magically apply to it, even though it might be a completely different distro and the profile might not be relevant at all.
10:21
And when you're doing nesting as well, so you run Docker inside the lxt container, it kinda matters for Docker to be able to run its normal app armor profile, or a seccamp or whatever else. So that's been, we'll go into some more details slightly later, but that's been fixed for app armor, at least.
10:43
It is possible to load app armor profiles as an unprivileged user, effectively, so as root inside an unprivileged container, and have things namespaced in a way that the container has its own set of profiles, the host profiles don't leak onto the container, but the host policy still applies on top of whatever is loaded inside the container.
11:01
It does get tricky for some other APIs. For example, anything that's gonna be based on eBPF is not suitable for that, because eBPF cannot be trusted for unprivileged users, mostly because eBPF can be used for timing attacks and effectively exploiting the Spectre bug. So ever since the Spectre meltdown mitigations,
11:22
eBPF is no longer allowed for unprivileged users, and so not available for unprivileged containers anymore. Now, for app armor, as I mentioned, app armor does support running inside containers, so that's done through internal nesting, support in app armor. You can create a namespace, create, effectively, a stack,
11:43
and then say that's your other profile, that's your inner profile, and if that profile allows policy loading, then the container can load extra policies. That lets you load, unload, list profiles, but there is one big limitation right now, which is single level. So you can do it in a container, but the container cannot then
12:00
create a second level inside it. That's something that's being looked at, but there are a bunch of missing kind of LSM hooks that need to be sorted for that. But I can show you that part already. So let's get out of that container, and so for app armor, I've got a basic container.
12:22
Let's install something that's confined, so there's that convenient Hello World snap, which, if it feels like installing, comes with a convenient app armor profile and a test. So if we look now, app armor status shows
12:41
that the profile's been loaded, actually a bunch of profiles for different subcommands in there, and if we run the command itself, it's fine. If we run the evil subcommand, which tries to do something it's not supposed to be able to do, it is not able to do it. And if we go and grab stuff, we should see,
13:01
the bottom one shows a denied by app armor, preventing it from writing into a path it was not supposed to. So that's app armor stacking. Now the real thing we want to get to is what's called LSM stacking and name spacing.
13:21
With that, instead of having like a per LSM, like specific to app armor type solution, all the major LSM should be able to stack and run at the same time. So you should be able to boot a system with both SELinux and app armor enabled at the same time. The sites that you'll display LSM for the system, so the main LSM is gonna be SELinux.
13:41
And then when you start a container, set that container's display LSM to app armor, if it's, you know, Debian, Ubuntu, or whatever else is using app armor. And then inside there, they can interact with app armor and pretty much never know that SELinux is even a thing on the system. The SELinux host policy will still apply, so it only access actually ends up going
14:01
through the entire stack and being validated by both. That's work that's been going on for a few years now by Casey Schaffler and Johnny Hansen. There are patches that do work. We're still pretty far from them being merged, but that's where we're headed, and it's gonna be pretty neat, because it will let us run Ubuntu and Debian containers
14:21
on CentOS and having both the host and containers fully secure. And similarly, we'll be able to run Android or CentOS containers on Ubuntu and have a SELinux run inside the container. So that's gonna be pretty darn neat. Now, another interesting topic is file capabilities.
14:43
You may know that in some distros, things like ping, MTR, some of the privilege escalation wrappers and whatnot actually use file capabilities instead of setUID, because it's much more granular and in general a better idea. We had a bit of a problem with containers in that it was not possible for a privileged user,
15:03
so root inside an unprivileged container to set a capability, the main issue being there that if you could do that, then you would be able to exploit it from outside the container, so that was considered to be bad and therefore blocked. The v3 file capabilities support that's been merged now a few kernel releases ago changes that.
15:22
As part of the capability record on disk, it stores what the root UID was, which then lets the kernel know when to actually consider the capability. We can see an example of that. If I go in a CentOS container,
15:42
if I can type, there we go, and say install HTTPd, oopsie, sorry. Let me fix that, there we go. So I'm installing HTTPd. Again, if network wants to cooperate, we'll see.
16:01
Well, network doesn't, eh, okay. It failed, but then it seems to still work. Okay, fine, I'm re-trying, there we go. All right, so HTTPd is installed. That used to fail miserably, like a kernel that doesn't support v3 caps would just fail the unpack because CPI
16:20
wouldn't be able to set the capability. Now, if we check that one file that ships with the package has got two capabilities set on it, and that works exactly as expected. And if we look, yep, Apache can work just fine.
16:46
Another thing that we've had issues in the past has been mounting stuff inside containers. That's been a bit of a recurring problem. Some people do want that for things like loop-mounted files or mounting squash FSs
17:00
or mounting network storage or even like passing some networked block device and wanting to mount that inside a container. It is not supported in general because it's a very bad idea from a security point of view. That's because the kernel will have to pass the block device you give it, that the user has complete control over, and you can then exploit very interesting kernel bugs
17:22
and do a bunch of nasty things. In the case of the loop device, there's also some issue with you being able to still modify the device after it's been mounted and confuse the kernel even more. So yeah, that's a bit of a problem. We do not expect file systems to really fix that, but there are some ways out of there.
17:42
For virtual file systems, it's usually pretty safe, so we should be able to make things like NFS work just fine in theory. Though NFS is a bit of a weird beast sometimes. One thing that has been done is FUSE, so we can actually mount anything that FUSE supports
18:03
and we can see that here. I've got that container, that container's got a squashFS and I can mount stuff and it works just fine. So that's unprivileged FUSE support that's been merged, I think in 4.17 or 4.18, thereabout, took a while.
18:22
We had it in Ubuntu for a long time, but upstreaming it took a while. So that's one way of doing things. The other thing that we are working on is the issue of UIDs and GID maps, which is a bit of a problem with mounts in general because you may want some,
18:42
like in our case, we're dealing with system containers. So we've got a full root file system per container, we don't have the entire issue of read-only images and all that stuff. But we still have the issue of having containers with different maps and wanting to share data between them. That's a bit of a problem in general.
19:02
Right now, there's no good solution for that. You can try doing POSIX ACLs and that kinda works, but it's very confusing for people. And for the root file system itself, it means that when we create the container, we've got to shift it, which means we need to go through every single file inside there. We need to change the UID and GID
19:21
and we need to change any POSIX ACLs and we need to change any file system capabilities. Not very fun. It's fine, it works. We've done it for years now, but we want something faster. And that's what ShiftFS gets us. It's in progress. It's been written originally by James Boulumley. I've got Seth Forshee and my team actively working
19:42
on fixing a bunch of extra issues with it. With ShiftFS, it lets you take a directory that's not mapped and tell the kernel, please mount it over there, but apply this map. And you can do that multiple times to different containers with different maps and they will all see it as their own UIDs and GIDs.
20:03
And things should just work effectively. So that's pretty interesting and hopefully we'll be there when the next year. Now the other thing that we are working on is like what if you trust your users? I mean that might be a thing. Right now there's no way in the kernel to allow those mounts.
20:21
It's just not possible. But with work being done right now by Tyco Anderson, we can, with seccomp, intercept system calls in user space. And then I'll have user space run whatever it wants as real root, which then lets us catch mount, for example, compare the arguments to a white list we've got.
20:41
And if we consider that this one container we actually trust and we're fine with that file system being mounted from that location to that location, we can perform that action as real root and then move on and you've just performed the mount. Things just work. So we're pretty excited about this particular feature. Mount is one of the things we want to use it for.
21:01
There are a bunch more use cases. We do want to let you make node things like dev null and not have it be useless. So that same feature will let us do that. Another thing we're working on, well we've been working on in the past is limits and something that containers kind of need.
21:22
I'm just gonna go with demo instead for now since we're running out of time a bit. Let's see, so here I've got that up time is two days. If I go inside the container we can see that the uptime is a few minutes.
21:42
So we've got the actual uptime of the container. We can see that the container right now sees four CPUs and 16 gigs of RAM. But we can change that, set CPU, set memory. And go through, uptime hasn't changed. We've not restarted the container.
22:02
We've got two CPUs, we've got four gigs, we've got one gig of RAM. The limits are applied to a cgroups, no big surprises there. But cgroups are not respected in proc files. So we're using like CFS, which is a FUSE file that we wrote a while back. And that's mounted on top of those proc files and gives you the actual output you would expect
22:22
based on the limits applied to the container. Now, cgroup v2 is obviously something we're looking at. We're currently missing a few things in depth before we can switch. One of those things is the freezer cgroup,
22:42
one equivalent of it. We do need to occasionally freeze all the tasks in the container in a reliable way. So that's something we need to resolve still. We've got, we also need to figure out a nicer way of managing device filters because that's another thing we need to do at least for previous containers.
23:01
And the BPF API is a bit tricky to deal with sometimes for that. Overall, cgroup v2 will get us less overhead, safer to use, more suited for containers. We do have a bit of an issue still with legacy workloads. So if you run a system container that uses an init system that configures cgroups but doesn't know what cgroup v2 is,
23:21
on a cgroup v2 only system, it's gonna have a very bad time. So we need to figure out whether we can delay things enough that it's not a case we need to care about because those containers will be end of life. Or if we need to do some fuse trickery or something to fake a cgroup v1 file system on cgroup v2,
23:42
which we've done before. LXCFS does support faking an entire cgroup v2 tree already so it's not that much of a stretch for us to do that. But still, we would like to avoid doing it if at all possible. And lastly, because I'm really out of time, another thing that's always kind of exciting is checkpoint restore,
24:01
which lets you do live migration and rollback of state of processes inside a container. It's a very, very complex problem because you need to serialize everything to disk, which is a bit of a pain. It's also the biggest game of whack-a-mole in kernel town because every time someone implements a new kernel API,
24:20
they break checkpoint restore and they need to figure out a way of getting that stuff out of the kernel into a file so it can be recreated. So that's a lot of fun. Rather than go into more details, I can just show you what it looks like when it works, if it works. So let me switch the screen there. So I've got a container running now.
24:42
I can do stateful stop. Please work, yes. So that works. The container is now gone. It's not running anymore. I could now reboot my system to apply a kernel update or whatever. And then you just start it again, and it's restored.
25:00
If I just do it again, I can show you the mess it creates on the file system. That's what happens. Every single process has got a file with dumps of various kernel structures and whatnot. And when you start it, everything is read back and all the processes are recreated. That lets you do live migration
25:20
because you can move that to another machine and restart them. It lets you do, say you had your IRC bouncer or something and you don't like using state. You could totally do that to your container, restart the system to a new kernel, restore it. If you do it quickly enough, your TCP connection might not even have turned out. The problem is that, as I said, it's the biggest game of whack-a-mole,
25:42
and it only works with extremely simple workloads or very, very specific workloads you've tested in the past. And with that, it's the end. I don't think I actually have any time for questions, so if you've got any, catch me afterwards. We've got stickers here and on the table downstairs if you want those.
26:01
Thank you very much.