We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

systemd: Two Years Later

00:00

Formal Metadata

Title
systemd: Two Years Later
Title of Series
Number of Parts
90
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The systemd project is now two years old (almost three). It found adoption as the core of many big community and commercial Linux distributions. It's time to look back what we achieved, what we didn't achieve, how we dealt with the various controversies, and what's to come next. The systemd project is now two years old (almost three). It found adoption as the core of many big community and commercial Linux distributions. It's time to look back what we achieved, what we didn't achieve, how we dealt with the various controversies, and what's to come next.
25
Thumbnail
15:46
51
54
Thumbnail
15:34
55
57
Thumbnail
1:02:09
58
Thumbnail
16:08
62
Thumbnail
13:26
65
67
Physical systemComputer animationLecture/Conference
MiniDiscCurvatureXML
Physical systemKernel (computing)Range (statistics)Focus (optics)Server (computing)Focus (optics)Covering spaceRange (statistics)PressureSupercomputerSoftwareBasis <Mathematik>Server (computing)Mobile WebComputer hardwarePatch (Unix)Spectrum (functional analysis)Military baseData managementVirtual machineDifferent (Kate Ryan album)Coefficient of determinationComputer animationLecture/ConferenceXML
Pell's equationBootingCuboidBooting1 (number)Server (computing)2 (number)BitDistribution (mathematics)Data managementService (economics)Spectrum (functional analysis)Computer hardwareMobile WebMultiplication signRule of inferenceMilitary baseResultantSemiconductor memoryLaptopBefehlsprozessorMathematicsEqualiser (mathematics)Kernel (computing)SequelVideo projectorDemo (music)WebsiteSpacetimeXML
Kernel (computing)Point (geometry)Public key certificateBootingVirtual machineHidden Markov modelComputer hardwareSource codeWindowGraph coloring2 (number)Distribution (mathematics)Right anglePower (physics)InformationFirmwareTotal S.A.Entire functionNumberSpacetimeSoftware developerArithmetic meanPhysical systemKernel (computing)Fiber bundleComputer animation
Asynchronous Transfer ModeVideoconferencingVideo projector2 (number)Real numberSlide ruleKernel (computing)Multiplication signDemo (music)Shift operatorVirtual machineBootingTouchscreenMathematicsBitComputer animation
Video game console2 (number)BootingMultiplication signNetwork socketAdditionKey (cryptography)Socket-SchnittstelleTouchscreenClassical physicsVirtual machineComputer hardwareServer (computing)SpacetimePower (physics)Computer fileText editorVideo game consoleProper mapConfiguration spacePhysical systemDevice driverLoginSimilarity (geometry)Connectivity (graph theory)Right angleOcean currentNormal distributionScripting languageTerm (mathematics)Sinc functionImplementationDifferent (Kate Ryan album)Process (computing)Arithmetic meanCartesian coordinate systemProjective planeCASE <Informatik>Functional (mathematics)SoftwareInformationGreatest elementService (economics)LinearizationElement (mathematics)Cycle (graph theory)HookingEntire functionMessage passingVideo gameQueue (abstract data type)INTEGRALKeyboard shortcutFigurate numberCuboidSoftware development kitDemonKernel (computing)Data storage deviceSoftware bugInternetworkingCovering spaceMiniDiscComputer animationLecture/ConferenceXML
Video game consoleService (economics)InformationSubject indexingLie groupQuery languageDatabaseInstallation artProcess (computing)Right angleLimit (category theory)Default (computer science)Latent heatDemonComputer hardwareChainServer (computing)Group actionRule of inferencePhysical systemRun time (program lifecycle phase)LoginLecture/ConferenceXML
Video game consoleMultiplication signFunctional (mathematics)Physical systemService (economics)Line (geometry)Computer fileGraphical user interfaceScheduling (computing)Distribution (mathematics)Installation artFormal languageProcess (computing)Position operatorParallel port2 (number)Point (geometry)BitDefault (computer science)System administratorTable (information)Expected valueLecture/ConferenceXML
Video game consolePhysical systemProcess (computing)Drop (liquid)Principal ideal domainGraphical user interfaceEmailElectronic mailing listVideo game consoleExecution unitSoftwareScripting languageDefault (computer science)Software development kitNetzwerkverwaltungLecture/ConferenceXML
Network topologyDirectory serviceRight angleStructural loadDistribution (mathematics)Drop (liquid)Graphical user interfaceMaizeExecution unitFormal languageScripting languageCASE <Informatik>Physical systemLecture/Conference
Video game consolePhysical systemProjective planeSubject indexingRight anglePatch (Unix)Lecture/Conference
Service (economics)Goodness of fitRow (database)Subject indexingNetwork socketPhysical systemControl flowBootingEvent horizonGoogolLecture/Conference
SpacetimeEndliche ModelltheorieShared memoryRepository (publishing)Goodness of fitFlow separationCommunications protocolBlock (periodic table)Kernel (computing)CASE <Informatik>SynchronizationSingle-precision floating-point formatComputer hardwareKey (cryptography)Power (physics)SoftwareConnectivity (graph theory)Latent heatNetwork topologyServer (computing)BuildingMetreCore dumpConcentricDifferent (Kate Ryan album)BootingDefault (computer science)Directed graphProjective planePhysical systemMixed realityLimit (category theory)BitGodCodeLecture/Conference
Physical systemVirtual machineStructural loadBootingData managementKernel (computing)Service (economics)Boilerplate (text)Software testingRevision controlConnectivity (graph theory)CodeMatrix (mathematics)Block (periodic table)BuildingState of matterGame controllerGastropod shellGroup actionRepository (publishing)Software developerFlagCASE <Informatik>Asynchronous Transfer ModeFunction (mathematics)Distribution (mathematics)LoginBitSensitivity analysisInformation securityArithmetic meanPlanningTrailSlide rulePoint (geometry)Near-ringMathematicsComputer fileRootContrast (vision)Server (computing)Interface (computing)VirtualizationProjective planeOperating systemFile systemTime zoneLevel (video gaming)Differential operatorPopulation densityConnected spaceMoment (mathematics)Network socketEntire functionType theoryDefault (computer science)Process (computing)Different (Kate Ryan album)Functional (mathematics)Office suiteSequelScaling (geometry)Student's t-testSynchronizationWebsiteWeb 2.0Integrated development environmentView (database)Right angleLecture/Conference
Range (statistics)Virtual machineLaptop
Server (computing)Limit (category theory)Level (video gaming)DatabasePhysical systemWeb 2.0Process (computing)Medical imagingVirtual machineFlagMiniDiscInternetworkingBootingSingle-precision floating-point formatComputer fileSequelResultantBefehlsprozessorSemiconductor memoryOperating systemMaxima and minimaCartesian coordinate systemConnected spaceMultiplication signKernel (computing)Group actionForm (programming)RoutingClassical physicsService (economics)BuildingData managementComputer-generated imageryComputer hardware2 (number)Airy functionView (database)State of matterGoodness of fitJunction (traffic)Point (geometry)Configuration spaceGodInterface (computing)CuboidOperator (mathematics)Flash memoryRootComputer configurationTotal S.A.SpacetimeLecture/ConferenceJSON
Transcript: English(auto-generated)
OK. Hi. I'm Leonard Patering. I'm going to talk today about systemd. I did a talk about systemd pretty much exactly two years ago at Fast2FastM. And yeah, so that's why I called the talk the first
two years. OK. I'll start the talk with a little bit of, around 50 or
something. So the focus of systemd has always been that we kind of
cover the full range of Linux itself. So we care about mobile, about M-med, and about desktop and server the same way. It basically means that the basis for your phone OS could be the same as for your supercomputer. And that's actually kind of impressive, because a lot of
the other software, like, for example, Solaris SMF is probably nothing you ever want to use on mobile. But we want to really try to cover all these different bases. And that's actually really interesting in many, many ways, because quite often it turns out that the problems that are specific to one of these areas, like specific to the server, end up being hugely useful on the
other end of the spectrum as well. For example, the embedded people, they supplied us patches for doing hardware watchdog stuff. Hardware watchdog stuff is relatively simple. It basically just says, you have this device, and you ping it, and if you stop doing that, the machine will automatically reboot. It's a technique to increase reliability of M-med hardware.
Now, as it turns out, something like this is not only hugely useful for M-med hardware, but on the other end of the spectrum on the server, that's also what you require for high availability setup. So yeah, we added this feature there, ended up being really useful on the other end as well. And there's a lot of other stuff, like for example,
and this happens, for example, resource management. Resource management basically is what you have to do if you run a lot of servers and have limited resources available, a lot of services, and they have
limited resources available. You need to make sure that the available resources are nicely distributed according to your rules on the servers that are running. Now, resource management has been, for quite some time, been important for servers. You want to make sure that Apache and MySQL both get an equal amount of CPU and memory, and not one gets way
more than the other, and then things starve, and you get all kinds of problems. Or even more, like you have a couple of customer sites running on a single server, and you want to make sure that not one customer can monopolize the CPU, and the others can't. So you want to make sure that the resources that you have are viable on the server are actually distributed in some
kind of way among your services properly. Now, this resource management also ends up being useful very much on the embedded hardware, because on embedded hardware, you usually have very little resources. So it becomes even more important that what you run on the embedded hardware gets an equal amount, or like the
appropriate amount of resources assigned. So there are lots of examples like that. Sometimes there's desktop stuff that ends up being usually useful for the server as well, or embedded stuff that ends up being useful for the desktop stuff. So the same way as the Linux kernel manages to cover all the bases from embedded mobile desktop server, we want to
cover the same thing for systemd as well, because we figured out that most of the problems actually reappear on the other end of the spectrum as well. Yeah, we nowadays have consistently boot times of less than one second for user space. We'll actually do a demo here on Kai's laptop, which is lying right here.
We're not going to do this on this laptop, because actually the initialization times for projectors, like during boot, a little bit too slow, so you couldn't actually see that. We have these boot times of less than one second for user space. Of course, unfortunately, Fedora, as the distribution that I use and that I work for, doesn't really provide
that out of the box. A couple of reasons for that. One of the bigger ones is LVM and these kind of things. We want to go for supporting that out of the box on Fedora eventually, too. But right now, Fedora will boot slower. But anyway, Kai, can you show that?
So Kai has a laptop here, which runs pretty much unmodified Fedora. The only change is basically that we don't have LVM and very few other things not enabled. So this is already rebooted, actually. It's the information from the last boot. Now what you see here now is that we will actually break
down with the systemd analyze command. We'll actually tell you how the performance of the last boot was. Now we can see that the firmware, like the bias, really the post, the initial initialization of the hardware took seven seconds. The boot loader took 35 milliseconds only.
The kernel took, as you might see, 34 milliseconds. That's definitely not grub, because already initializing grub takes a second or something like that. This is gummy boot, actually. The kernel takes 1.2 seconds. And then user space, currently in this boot here,
required slightly more than one second. There's a sleep bundle there. In total, it's 9.6 seconds. That's, of course, not good enough yet. On modern hardware, for example, the firmware, like the Windows 8 certified machines, like if you want to
have a certified machine, you need to make sure that your hardware, to get the certificate, your hardware needs to boot in less than two seconds through post. So with that in place, we can boot an entire Linux machine in like five seconds or so. So if you look at this distribution, you will
actually see it's quite complete, right? We have all these desktop-y things, like UPower. We have UDFD anyway. We have accounts daemon, policy kid, color daemon, even like the profiling stuff, Avahi. This is like NTPD. It's all, in GDM, it's like a really complete distribution. The only thing that is basically changing from majority fault is that we did not install an LVM, because
LVM is like, it's a major source of slowness. I mean, we have some LVM developers. Still away, sorry. I will just say it. I mean, has been two years ago and still is. Well, let's discuss that otherwise. But anyway, so we did show you just these numbers.
How fast it is. But let's actually see how fast it is by demoing it. Now, this is a rawhide machine. So we hope it's actually working. But so yeah, now we're in the post. These are the seven seconds, basically.
And the screen turn blacks when the bootloader comes. And so yeah, it's hard to see. But you see, actually, it just got up. There was that one second for user space, one second for kernel. It's actually gray, the machine that projects that.
But you see, you recognize that this is GNOME, right? But it changed. Yeah. Something changed with that projector. I think maybe that's because of the, can you shift it? Yeah, actually, in real life, this thing is green,
what we see here. Like if we see that's gray, the projector is a little bit confused. But anyway, what you saw, basically, is you saw those seven seconds for the post. And then you saw the two seconds after the post when it was black. And then it was already GNOME up there.
So I mean, the reason why we use this weird projector is basically that, yeah, well, the projectors usually take time when video modes are switched. And that sometimes, or usually, takes more than one second. So it kind of destroys the demo. But this thing destroyed the demo anyway,
because you can't actually recognize that it's GNOME. But anyway, yeah, you see the GNOME 3 stuff here. Anyway, let's turn back to the slides. By the way, again, if you guys have questions at any time, just interrupt me. I'm really interested in questions.
So yeah, we have consistently boot times of less than one second. This one wasn't, but we generally have. The kernel device?
No. This is pretty much unmodified Fedora 19, but without LVM, basically. So it doesn't use anything newer than D bus, anything that wasn't in any standard distribution. I think it basically uses Fedora 19 and maybe a snapshot of current system D, but that's it.
OK. As we develop system D, we have obsoleted a couple of things. I mean, obsolete here is a very wide term, supposedly, here. So we replaced console kit, which used to be the stamina that managed logins, and sessions, and seeds by
something called lognd, which is a component of system D. It provides quite some more than console kit used to do, for example, proper multi-seed support. So you can actually, if you have multi-seed hardware, like these little USB boxes that provide you an additional screen, an additional keyboard, additional sound,
and these kind of things, you can now plug that into your system D machine. System D will recognize that it's one of these devices, and will just say, here's the new seed. GDM will then pick that up and bring a log in screen. And that's fully automatic without reconfiguring anything. So yeah, we did this console kit. It's much nicer than console kit used to be, because we can
rely on integration of the kernel in a way that was not available before. For example, we can easily track membership of a process, of a session, by doing cgroups magic. I'm pretty sure two years ago, when you were at that talk, you probably heard the term cgroups. I'm not going to go into detail what that is.
But suffice to say, the console kit replacement that we did with log in D is hugely more powerful than console kit used to be. We replaced system five in it. It's kind of obvious. And the init scripts package. The init scripts package, I mean, in this case, it kind of means all kinds of different implementations of that.
Like Debian has its own init scripts package, and Fedora used to have its own as well. We kind of replaced that, because all the early boot stuff we nowadays have parallelized little implementation we see, which are nicely configurable and are kind of common in similar configuration files. We obsoleted PM mutals. PM mutals had the job of suspending and hibernating the
machine. It used to have a huge amount of quirks, because user space quirks dealing for specific hardware that exposed certain bugs. But as it turns out today, that's mostly not necessary anymore. They're only very exotic hardware that still needs that,
because the kernel got fixed. The kernel drivers nowadays are able to apply these quirks in a much saner way. Anyway, we don't need any user space components. So because system D already managed powering off the machine and powering it on, we added in rebooting and these kind of things, we just added support for suspending and hibernation as well, because it's kind of
the natural thing. It's so that system D can manage the entire life cycle of the machine, and that definitely includes suspending and hibernation. Of course, we did it really nicely, like better than PM mutals did it, because we now have inhibitors and these kind of things, so that arbitrary user software can
actually hook into the suspend process and do something right before the machine goes down, which is very useful, for example. Let's say you have your text editor, and you want to make sure that that text editor actually saves everything to disk before going into suspend, so that if the battery runs out, you don't lose anything. And then there are many other users for that as well.
So we replaced INAD. That's probably one of the more obvious things as well, at least if you know that system D does socket activation. I know for sure that last time, two years ago, I did talk about socket activation in more detail. Again, I'm not going to talk about that very much, because it's very technical. But suffice to say, we do basically everything that
classic INAD did in system D as well. And of course, we try to make it more useful. So we will not just cover internet sockets, but all kinds of other sockets as well. And even some things that are technically not really sockets, like for example, FIFOs or POSIX message queues and these kind of things.
We replaced ACP-ID. ACP-ID is a little daemon whose only purpose it was to, when somebody presses the power key, shut down the machine. That's basically what it did. It was a bigger project, did a lot of other things too, but nobody ever used that. People just used it to hook up the stupid power button with
power off. And we thought maybe that's some functionality that should not require any additional software that should really just work. It's about powering off the machine, especially since the reboot button, and since Control-L Dell is implemented either in hardware or by system D anyway. So we thought, well, maybe the power button should be
handled the same way. And everybody needs that, right? Like admitted hardware, servers, desktops, they all have power buttons that should work as they're supposed to. So yeah, we decided, well, listening to one key is not particularly hard. Everybody once said, maybe you should just move it down
into system D and just do it, and it will just work. Something else we kind of replied to Syslock with something called the journal. I figure that has been discussed while in the community already. Basically, we looked at Syslock and figured out,
well, I mean, we have been looking at Syslock for quite some time, and what we always wanted is that we can actually query the Syslock logs for all messages from a specific service. However, Syslock only stores linear text files from top to bottom without any kind of inexorable information, right? It does not actually know anything from which service
things came. Basically, I mean, it's untrusted data. Every application can send data into it and pretend it was Apache, and Syslock will restore it and not care. So we have a lot of issues with that, because we want to have a secure system where the services cannot lie about who they are, and where we actually have this
information which services did and can index it so that the query of the log database stuff is fast. So we are perfectly compatible with, I mean, for all of the processes that I listed here, basically, you still can install ACP ID. You can still install Syslock. And in fact, Fedora does that by default. You can install INAD.
You can install still pimutals. We're totally compatible with that. We will not break these things. So the only thing that we basically can't install anymore is system5 init, because only one process can be PID1. But yeah, it's just that you don't really need that anymore, because we have it simpler and usually more powerful in the basic stuff anyway.
Something else we replaced is the watchdog support. I already talked about that earlier. It's something that is usually useful on embedded and on servers. It's not so useful on the desktop. But it's an absolutely trivial thing to do, and actually should really be done in systemd, because only
then you have this nice chain that the hardware watches what systemd does, and then systemd watches what the services do. It's actually very simple to use and very powerful. Then cgrulesd, I'm not sure if you guys know that. It's something cgroup-specific. It basically is a daemon that tried to move, during
runtime, services into specific cgroups to apply resource limits to them. It was a very weird thing, because it actually did that asynchronously. So instead of right when the service was started up, making sure that before the service was forked off, it was already clear it would end in the cgroup.
It would just watch what was happening in the system, apply some rules, and magically move it into that. Which, of course, yeah, it's ugly. Then we moved quite a few functionalities that was traditionally done by cron into systemd. Not everything. And again, we are totally compatible with the classic
stuff, but there are many, many reasons why it is actually useful to have something like cron, like calendar-based timing in systemd as well. Yeah, the stuff that we have in systemd is in some ways more powerful than cron is, but in other ways less
powerful. For example, we don't do user cron jobs, anything similar to that. But we basically have a calendar language now that is more expressive than what cron can actually express in its tables. Like it has a second granularity and stuff like that. Yeah, and at the same time, we kind of consider that the same problem.
As mentioned, this has obsolete, but you can still install this all parallel, and then probably most distributions will. I don't expect Fedora, for example, to anytime soon not install cron by default. I mean, I personally believe cron is a really useful tool. Even though systemd can schedule things by time, you
would have to create a service file and a timer file for it, and admittedly, using cron and just running one line can often be easier. So this is not an attempt to kind of remove this from the distributions or making this unavailable to administrator. It's just, yeah, we have this in systemd, and there's a
lot of functionality in there that you don't have with the classic tools. So you can use it, but you don't have to. OK, so much about the status quo, and a little bit like giving a position, like an explanation where we currently are with systemd. Are there any questions to this point?
There's a question, like, can you? Just a, Jesus Christ, sorry. Just a short question. Can you essentially, if you install a Fedora 19 system, can you actually de-install all these things and just run with systemd if you would, just to see if it would work?
I didn't entirely. What can you install? Just all the stuff that you obsolete. Can you just de-install everything and just, like, Actually, you mostly can. You can uninstall inod and acpod anyway. You can uninstall syslog, watchdog, cdrulesd. You can mostly install cron, but then, like many packages drop in cron jobs.
We're working to make that at least possible, and there's currently talk on the Fedora mailing list to port over at least some stuff to cron. We'll have to see about that in detail. You can remove atd. You can't really remove init scripts because things depend on it. But most of the stuff you don't really need anymore. Sys5 init doesn't really exist anymore, and console
kit is not shipped by default anyway anymore. What's still in init scripts? So, I mean, the thing that init script still does that is useful is static networks, like all the stuff that's not network manager, basically. But yeah, you basically can remove all of it.
Yeah, I was just wondering, so systemd interprets the fs tab right now and creates, like, virtual units. Will the same thing happen with crontab? Yeah, that's actually something. So everybody got the question? The question is, currently, fs tab is parsed by systemd for compatibility purposes. And because we actually think that fs tab is a cool thing,
because it's very simple and things like that, if we could do the same thing for the crontabs. So I think the crontabs are not a very nice language. I think it's very simplistic. So my guess is that we probably will cover cron.daily and cron.weekly and these kind of drop-in directories,
because they're really super nice to use. And we can nicely integrate them in the system tree as well. But I don't think we'll do the same for cron.d. I think, like, for the actual crontab stuff, we kind of assume that, like, at least for the Fedora case, the way it currently looks like, we might convert everything that actually installs more than just cron.daily and
cron.weekly and cron.hourly. Scripts will probably use timer units, but we have to see about that. But of course, even if a distribution doesn't want to go systemd all the way the way Fedora decided to do it, they can, on their own, write a generator for that that generates this stuff. That's totally doable.
It's just that it's probably not something we want to do upstream. But yeah, there's another question. This way. Just a little question. systemd has today integrated a lot of systems, like Red Hat, soon Red Hat, Fedora, Arch Linux, but you are
superseding projects like syslog, ng, rsyslog. So how do you deal with communities? Are you offering them to work with you? So we actually, like before we did the journal, we actually, I talk frequently to the rsyslog guy.
He doesn't like me very much anymore. I don't know why. We actually, like we suggested, we wanted that stuff. We wanted the indexing, and we weren't necessarily looking into doing that on our own. I mean, there are actually patches from me in rsyslog, for example, for the socket activation stuff
that we do in systemd. So yes, we did work with them, but eventually we came to the conclusion that they probably never would want to give us what we wanted, which is basically index stuff. They had different focuses than we had on our stuff, and then we eventually decided to just do it on our own.
One of the syslog ng guys, he's going to be at the hackfest, as I saw on Google events, actually. So yeah, there's cooperation there. There used to be with the rsyslog guy. But yeah, is that an answer? I mean, of course, if you install syslog ng or rsyslog,
it just will work, right? We quite carefully made sure to not break anything. In fact, even if the journal records everything as well, and everything is routed to the journal, this ultimately is a good thing for rsyslog and syslog ng as well, simply because we collect much more data via the journal.
Because the journal can run during early boot, and the journal actually gets connected to every single service of the system so that we get stdout and stdr. And we will then forward all of that from early boot and things like that, all to that it will be forwarded to syslog and rsyslog, syslog ng and rsyslog.
So yeah, they ultimately benefit from this in a way, too, even though it's not really necessary anymore to install it by default. But I mean, there's no doubt in the world that syslog ng and rsyslog have really useful use cases for the future as well, simply because the journal does not implement the
syslog protocol, and we don't intend to. It's a protocol that has grown over years, and for most of its history, didn't really have a specification. So it's a complex protocol because you always have to deal with the different ways the hardware and software implemented it.
So we don't have an interest in that. We totally see the use case for that, like if you have a server setup and you want to collect all the log data from various hardware. It's a different use case, basically, than the journal. But again, they benefit from the journal as well, as the journal is this concentrator thing that pulls way more data than syslog traditionally got,
and we pass it on. And we pass it on with all the meter data, if they want. So I have a simple question. How long until systemd merges the Linux kernel? Thanks. And also, the name of your blog, Albreaker Audio, is it intentional? Is it trolling or what? Thanks.
I didn't really get the question. Sorry. I meant how long until systemd merges in the Linux kernel? You mentioned already a lot of stuff like udef and so on. Thanks. Well, we will probably move a couple of more things into systemd, to be frank.
The thing is, it is, of course, a question, where do we put the limit? Of course, there is stuff where we absolutely clearly know that we'll never move that into systemd. But basically, our definition of what should be in systemd is everything that you need to build the basic building block. And it doesn't really mean that anything was monolithic, and you couldn't separate things, and
things weren't modular anymore. It just means that we develop them in the same good repository and share more code in them. It's not unlike how, for example, the BSD model is. In BSD, much of the core OS user space is kept in a single CVS or SVN repository and is actually developed in sync with their kernel.
And we're not going that far. We never will merge the kernel into our tree or the other way around, because that doesn't make any sense in the other user spaces for Linux anyway. But what we want to do is we want to look at what the other Unixes are doing and take a little bit back. Because I think it's a good model.
On Linux, traditionally, all these little components that consist of the boot, like, for example, everything like this. There was a separate project, a separate git repository for the bloody power key handling. It's like that's not something that really scales. Because every single of these components, how trivial they
might be, like, for example, ACP ID and the most extreme thing, they re-implemented all the service management. They shared a lot of boilerplate code. And it was basically, yeah, all of them recompiled, rebuilt, that wrote it in their own code. And most of them didn't really do that, actually, that well. So with systemd, we want to unify much of this really
more trivial code, especially. Because everybody needs it. Because if you share code, you can actually reduce footprint, because you have less code to execute. And we improve the testability of things. Because if you share code, you can much more likely actually test the stuff that you run. I mean, yeah.
And then also this stuff, like if you have all these various components that you put together in different combinations, like many of the distributions really like doing that kind of stuff, that you can use any version of the kernel, with any version of UDET, with any version of libc, with any version of and all these kind of things, it explodes the test matrix.
And that's actually a huge issue. Because if we ever want to ship these kind of things to people, we need to have a rough idea that it will actually work for people. And you don't do that like, yeah. Anyway, so our approach there is, well, some of the basic building blocks should just be developed in sync, updated in sync, and tested in sync, so that we actually
have a good idea. It's a question basically of how to do your design process, how you do your development process, your testing process. And we believe we should copy that thing from the traditional Unixes where they did that, more in a repository. I hope that's kind of an answer. Are C groups going to make it so that on my desktop
system, if I have a simple fork bomb, it's not going to make my desktop shell unresponsive anymore? And how's that going to work? So there is a controller that is supposed to deal with fork bombs. I didn't entirely get the question, but it was something about fork bomb for the desktop, right? There's a controller that does fork bomb protection
developed by some people at RATAT. I don't know what the current state of that is, and it's going to get in. I don't know. Yeah, we probably support something like that one day. I don't know. It's a very specific question. Ask me afterwards, I think. Are there any further questions? There are more questions.
Hi, it's about compatibility with previous behavior. In particular, things like FS tab. Older systems, when an FS tab failed to mount, would continue to boot, and I believe that's not always the case by default, possibly. Are you going to try and continue to
maintain that behavior? Or what's your behavior on failure kind of philosophy? I didn't really get the full question, but it was something about compatibility with FS tab and things. It's more about older systems. If FS tab tried to mount a device that was no longer present or had failed, it would continue to boot.
Now we don't do that, and it's basically a philosophy of behavior on failure. OK, so basically, you're already in a classic system five minute systems, there's already the flag called no auto and no fail. And it had slightly less of a meaning that it has for
system D, because system five cared less about that. But basically, when you traditionally already had to use no auto and no fail in FS tab, you still have to do that. It's basically the same definition, except that, yeah. But what currently happens, basically, if you do not mark a file system as no fail, then system D assumes that it
is important that it doesn't fail at boot. It might be security sensitive if it's not mounted. And then it will wait for it to show up. If it never shows up, then the system will put you in an emergency mode, and you have to authenticate and will give you a little bit of log output why you actually ended in the emergency mode.
So that's the approach there. But by default, unless an entry in FS tab is marked as no fail, we assume that everything you list there must be around during boot. And if it isn't, we'll enter emergency mode. It's a little bit different from traditional stuff. It's a little bit harder, but I think it's technically more correct to do it this way.
Hi. I know about system D UC groups for tracking processes, but is there any plan to use LXC to confine services in the near future? So that's an interesting question. Actually, it's one of the things I have here on the,
actually it was my next slide, about container support. So let me talk a little bit about the future at this point of system D. I have three slides here about the future. I don't really like to talk too much about the future, because then we'd probably have to fight the fights of tomorrow already today, and I'm not looking forward to that.
But anyway, so for the near future, we want to add more nicer container support. We already have relatively nice container support, but there's a couple of things we would like to add. Containers, for those who don't know, are basically a full virtualization, where the same kernel runs a couple of operating systems, basically, side by side.
In system D, that is very nicely supported. We actually ship a tiny binary called system D nspawn that just uses the kernel interfaces for setting up containers and can boot an operating system in it. For example, it is very useful. You can use it on Fedora to boot up a Debian system, and it
will just work. It feels a little bit like the change root command you used, like the binary you used, for example, to, I don't know, if your machine is broken, you mount all the file system, then you use change root to move into it to fix something. System D nspawn does basically the same, but in contrast to
just setting up a change root environment, it will set up an entire container for you. And then you get a shell, and it will do much more than change root does, like mounting things and all these kind of things. It's a really useful tool. The container support in the system D is, of course, not limited to this nspawn tool, which is really tiny. It can also support LXC and libvert-LXC, which, by the
way, are completely different projects. The name might suggest they're actually the same, but they're not. Or any kind of other virtualization. The system D actually comes with an interface to the container manager. So it's very, very simple stuff.
Like, for example, the container manager can tell system D's UUID that should be used to identify the system, and then system D will use that to initialize a UUID of the system itself. It's really nice. In the long run, we want to bring system D more to the
level of what Solaris can do with zones. Like on Solaris zones, what you can do is you cannot only get an idea about the services running on the host, but with the same command, you can actually enumerate all the services running on all your containers, on all your zones, on the system as well. And this is all very useful functionality, and
individually, really not that hard to implement. And we should provide the same on Linux, too. So basically, that system control, if you type that, it will show you everything from your host and the containers inside. The container support in system D is actually really nice and can do tricks that nobody else could do before.
One specific thing here is auto-spawning. We can auto-spawn entire containers on demand. For example, you have one container where you put your Apache MySQL in. And with the container support in system D, you can use socket activation for that so that the host initially listens on your HTTP port.
And the moment the first connection comes in, it will actually spawn the entire container, the entire OS inside of it, both Apache and MySQL inside of it, hand over that socket, and then the system D inside the container will pass it on to MySQL and then HTTP D.
So that is a hugely useful technology, actually, because it allows you to increase the density of customer systems on your server. Because it basically allows you to run a ton load of containers on one system and actually only have them use up resources when they're actually used.
This is especially useful for a couple of companies or web-hosts, which have these kind of one-click things where you can, with one click, set up a tested machine and then test your stuff. With this thing, it basically comes free, because you don't actually have to run something for that. All you have to do is create that, actually make the
listen, and then if actually somebody uses it, you invoke the container. This is actually really, really nice stuff. This also goes on the other side, auto-shutdown, which basically means that we have to figure out when the container is idle to shut it down again so that you make the best of your resource usage.
Now, the auto-shutdown stuff is actually really interesting, because what I talked about earlier about this thing that we want to cover the whole range of Linux users from embedded to desktop to server, the auto-shutdown stuff, like figuring out when a container is idle and shutting it down then, is actually the very same problem that you have on laptops, where you want to suspend the laptop as
soon as the machine gets idle. And so that's one of these nice things, because we already have the auto-suspend stuff. We can extend it just a little bit and make it work for containers as well. So yeah, I hope that kind of answers the question regarding containers. So it's going to be awesome, and it is already awesome.
Yeah, just a second. There is a previous question here. Thanks. I just wanted to ask you to elaborate on something you said earlier. You said that when you have a web server and a database server, systemd can make sure they fairly share resources. Does that just come down to managing nice levels and POSIX limits, or is it more advanced than that?
So nice levels and resource limits are inherently process bound, which makes them pretty much useless on anything you want to do these days, because Apache is not one single process. Apache is usually, it spawns worker processes, quite a few of them, and then these worker processes spawn CGI
processes and PHP whatever processes. So Apache actually is usually not one process, but more like 100 or so, depending on how big your system is. So actually setting the classic POSIX resource limits on them is pretty useless. Now what you can do with systemd is that it will enable you to use cgroup for that.
cgroup is a kernel feature that allows you to group processes into one cgroup, and then apply results limits to it. So this stuff is kernel supported, basically, and systemd gives you a very nice interface to make use of this. So you can basically say, yeah, whatever happens, Apache, with all its worker processes and everything,
gets only this much memory, and it will get this CPU priority in total. So even when MySQL has only three processes running, and Apache has 100 processes running, because each one forms one group, Apache gets half, and MySQL gets half, and it's
going to be evened out. So it actually makes resource management workable for the first time in Linux before everything else. Before that, the only thing you could do basically is run all your services as a single process, which is completely illusionary. So yeah, I hope that answers the question. Thanks, that's pretty cool.
It is. Any chance we'll see user space checkpoint and restore for these containers? Checkpoint restore? Good question. Let's just say, I'm not convinced that checkpoint restore is such a fantastic idea, I don't know. I'm not sure it really can work.
So the question was regarding checkpoint restore, it's basically a facility that if you have a process, you can save the current state of the process to disk, and then later on, possibly even on another machine, you read that image again, and the process continues running where it is. There's a huge amount of problems with that, I think, because processes use a lot of resources of the operating
system, and getting those resources, like file descriptors, sockets, internet connections, whatnot, getting those back into the same state as before, to me, appears not workable. I know that people disagree on that. Maybe I can be convinced. I don't think so, but I'm not convinced of the idea.
So any further questions? There's questions. Where was it? No, there was another guy.
What is your view on taking this system D for the tiny embedded applications where the flash size is like 4 MB, 8 MB, or 16 MB maximum? So is it a good idea to take up this system D for a build root or busy box based tiny embedded applications?
So does it really fit the system D approach? Because this seems to be very interesting, because we are talking about boot times in only a few seconds, which also involves a lot of hardware initialization and stuff like that.
We are dealing with so many sleeps in the startup to have a problem. It's really embedded where we really want to have a gracious startup and shut down of services. So this looks very interesting, actually. So what is your view on such tiny embedded applications?
I'm not entirely sure. I understand everything of that. But the question was basically about using system D on the lower end of embedded, right? So, of course, if you have an embedded system that just wants to run run process and nothing else, there's no point to involve system D in that.
That said, I think it reaches quite low. I know that a lot of people use system D in conjunction to busy box. For example, I don't know. System D is nowadays used in wind machines and in cars. It's built into cars. Like the Genevieve people, for example, all the coming cars from any companies involved in Genevieve, like
BMW and things like that, will run system D, which is kind of, oh my god. But yeah, you find it in toys. You find it in everything already. I'm not sure how far you can scale it down. But you should know that most of the stuff we include in it are compile time optional.
So you can just use with configure. It's basically like the Linux kernel itself, where you can choose while you build it what exactly you want to build. You have the same stuff in system D as well, where you can pass a lot of flags to configure. So yeah, I think my time is over, though. Yeah, unfortunately, the time is running out. So if you have any other questions, just ask the
speaker. We will probably hang around the conference. Thank you very much. Thank you very much for the good questions.