Beyond Init: systemd
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Alternativer Titel |
| |
Serientitel | ||
Anzahl der Teile | 64 | |
Autor | ||
Lizenz | CC-Namensnennung 2.0 Belgien: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/45900 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
FOSDEM 201149 / 64
3
7
10
11
17
19
21
28
33
34
35
37
40
44
48
49
52
55
57
59
62
63
64
00:00
Physikalisches SystemMultiplikationsoperatorRechter WinkelRichtungGeradeWechselsprungRechenschieberFlächeninhaltComputeranimationVorlesung/Konferenz
01:02
Physikalisches SystemSkriptspracheParallelrechnerSocketDienst <Informatik>Dämon <Informatik>Weg <Topologie>PunktKontrollstrukturMathematische LogikTransaktionDienst <Informatik>Kontextbezogenes SystemClientPunktMathematische LogikServerPhysikalisches SystemDistributionenraumImplementierungTransaktionBitInformationGamecontrollerProzess <Informatik>NetzbetriebssystemDateiverwaltungATMMathematikAggregatzustandNormalvektorRechenschieberDatenbankVirtuelle MaschineSocketTermWeg <Topologie>Erweiterte Realität <Informatik>MultiplikationsoperatorStandardabweichungReelle ZahlWeb SiteParallele SchnittstelleZusammenhängender GraphArithmetisches MittelDeskriptive StatistikSchnelltasteSoftwareschwachstelleSystem FBootenHyperbelverfahrenDämon <Informatik>Wort <Informatik>MereologieSkriptspracheCodierungVersionsverwaltungMomentenproblemMehrplatzsystemSchnittstelleSpeicherabzugKernel <Informatik>Socket-SchnittstelleHardwareDatenverwaltungRechenwerkGruppenoperationExogene VariableInnerer PunktNichtlinearer OperatorApp <Programm>Rhombus <Mathematik>Bus <Informatik>Grundsätze ordnungsmäßiger DatenverarbeitungComputeranimation
09:20
ParallelrechnerAdditionNebenbedingungPhysikalisches SystemGanze FunktionProzess <Informatik>Virtuelle MaschineGraphiktablettInstallation <Informatik>Minkowski-MetrikImplementierungKernel <Informatik>MaschinenschreibenBootenZahlenbereichParallele SchnittstelleDienst <Informatik>EinsApp <Programm>TypentheorieHauptidealringGamecontrollerBefehlsprozessorFrequenzComputeranimationVorlesung/Konferenz
11:18
ParallelrechnerBloch-FunktionMereologieBootenPhysikalisches SystemDienst <Informatik>MultigraphHidden-Markov-ModellSocket-SchnittstelleParallele SchnittstelleZeitrichtungQuellcodeSpeicherabzugSchreiben <Datenverarbeitung>Message-PassingDistributionenraumRechnernetzResolventeKontrollstrukturVarianzSchnelltasteSocketServerAuflösung <Mathematik>Kernel <Informatik>CASE <Informatik>ProgrammierungMultiplikationsoperatorBildschirmmaskeBefehlsprozessorEinfach zusammenhängender RaumCluster <Rechnernetz>FokalpunktDifferenteOrdnung <Mathematik>sinc-FunktionGraphZweiSoftwareentwicklerIntelligentes NetzFormation <Mathematik>LoopRechenwerkZahlenbereichBitInstantiierungPersönliche IdentifikationsnummerGrundsätze ordnungsmäßiger DatenverarbeitungNummernsystemSoftwaretestComputeranimation
19:04
Wechselseitige InformationParallelrechnerImplementierungPhysikalisches SystemDifferenteEndliche ModelltheorieSocket-SchnittstelleSocketDienst <Informatik>BitDämon <Informatik>RechenschieberInstantiierungEinfach zusammenhängender RaumFokalpunktParallele SchnittstelleKernel <Informatik>Message-PassingMultiplikationsoperatorBinärcodeElektronische PublikationKonfigurationsraumKartesische KoordinatenSchedulingBefehlsprozessorParametersystemStochastische AbhängigkeitCASE <Informatik>MathematikBridge <Kommunikationstechnik>Klassische PhysikMomentenproblemLoginPuffer <Netzplantechnik>SoftwareTransaktionSystemzusammenbruchVersionsverwaltungClientOrdnung <Mathematik>Prozess <Informatik>SoundverarbeitungLeckVerklemmungYouTubeDigitaltechnikEinfache GenauigkeitBootenp-BlockLipschitz-StetigkeitRhombus <Mathematik>UnternehmensarchitekturRechenwerkMereologieCodeGrundsätze ordnungsmäßiger DatenverarbeitungApp <Programm>EindeutigkeitBetragsflächeComputeranimation
26:50
Ordnung <Mathematik>Puffer <Netzplantechnik>Kernel <Informatik>Dämon <Informatik>Ultraviolett-PhotoelektronenspektroskopieGeradePhysikalisches SystemDatenverwaltungProgrammierumgebungBootenSocketKernel <Informatik>NetzbetriebssystemDistributionenraumUmwandlungsenthalpieEinfach zusammenhängender RaumKategorie <Mathematik>Dienst <Informatik>Rhombus <Mathematik>CodeMailing-ListeSchnittstelleEreignishorizontCASE <Informatik>Patch <Software>AbstraktionsebeneGruppenoperationDigitale PhotographieFunktionalMultiplikationsoperatorSystemverwaltungKlassische PhysikProzess <Informatik>TabelleSystemaufrufLesen <Datenverarbeitung>Message-PassingEnergiedichteATMSoftwareRechenschieberVariableSocket-SchnittstelleBetriebssystemVirtuelle MaschineSystem-on-ChipDefaultGüte der AnpassungRechter WinkelDifferenteNichtlinearer OperatorComputeranimation
32:35
Puffer <Netzplantechnik>Ordnung <Mathematik>Kernel <Informatik>Dämon <Informatik>ComputerspielFokalpunktProdukt <Mathematik>FreewareOrtsoperatorBetriebssystemSoftwareentwicklerGruppenoperationPhysikalisches SystemMereologieSocket-SchnittstelleComputeranimation
33:34
Puffer <Netzplantechnik>Kernel <Informatik>Ordnung <Mathematik>SocketDämon <Informatik>SocketStrömungsrichtungPhysikalisches SystemSocket-SchnittstelleArithmetisches MittelDienst <Informatik>DefaultInstantiierungBus <Informatik>RechnernetzNummernsystemKartesische KoordinatenRechter WinkelWeb-SeiteHardwareComputeranimationVorlesung/Konferenz
35:02
Kernel <Informatik>Puffer <Netzplantechnik>Ordnung <Mathematik>SocketDämon <Informatik>BootenGüte der AnpassungMessage-PassingMini-DiscKartesische KoordinatenExtreme programmingLesen <Datenverarbeitung>Fokalpunktp-BlockCodeSoftwaretestNichtlinearer OperatorPhysikalisches SystemPatch <Software>Dienst <Informatik>Schreiben <Datenverarbeitung>SpeicherabzugSoftwarewartungKlassische PhysikElektronische PublikationSystemverwaltungHypermediaStrömungsrichtungSkriptspracheKonfigurationsraumNeuroinformatikBitBeanspruchungSocketFestplatteProzess <Informatik>Auflösung <Mathematik>SchnittmengeSoftwareImplementierungParallele SchnittstelleDämon <Informatik>ServerBefehl <Informatik>Rechter WinkelRootkitFunktionalSocket-SchnittstelleNetzbetriebssystemZellularer AutomatVirtuelle MaschineMomentenproblemComputerspielCASE <Informatik>WasserdampftafelStapeldateiComputeranimation
42:07
Puffer <Netzplantechnik>Ordnung <Mathematik>Kernel <Informatik>SocketDämon <Informatik>DateisystemSkriptspracheProzess <Informatik>DefaultKernel <Informatik>Dienst <Informatik>BootenNeuroinformatikParallele SchnittstelleNP-hartes ProblemRechenwerkPuls <Technik>RechenschieberAnalytische FortsetzungPunktMomentenproblemLesen <Datenverarbeitung>RootkitVersionsverwaltungElektronische PublikationImplementierungQuaderDateiverwaltungMultiplikationsoperatorInteraktives FernsehenGanze FunktionNabel <Mathematik>SchedulingKryptologieInformationKlassische PhysikVirtuelle MaschineHalbleiterspeicherOrdnung <Mathematik>Physikalisches SystemCodeWeg <Topologie>SchnittmengeDämon <Informatik>SocketTermKonfigurationsraumMini-DiscEntscheidungstheorieBus <Informatik>SystemaufrufArithmetische FolgeMinkowski-MetrikInstantiierungQuick-SortMailing-ListeDistributionenraumGlobale OptimierungFlächeninhaltFestplatteAutomat <Automatentheorie>VerzeichnisdienstComputeranimation
49:13
DateisystemPhysikalisches SystemRechenwerkPlug inResponse-ZeitSoundverarbeitungDatenflussPunktÜberlagerung <Mathematik>DateiverwaltungBitDienst <Informatik>Konfiguration <Informatik>SystemverwaltungKryptologieElektronische PublikationParallele SchnittstelleSkriptspracheLoopDefaultSpieltheorieSocketComputeranimationVorlesung/Konferenz
51:00
DateisystemSystemverwaltungDienst <Informatik>BootenDateiverwaltungPhysikalisches SystemLoginMultiplikationsoperatorPunktNichtlinearer OperatorCASE <Informatik>SoundverarbeitungRechnernetzClientMathematikFehlererkennungMomentenproblemDifferenteFehlermeldungComputeranimationVorlesung/Konferenz
52:59
XML
Transkript: Englisch(automatisch erzeugt)
00:07
And now, can you now hear me? Does it work? OK, wonderful. So yeah, I will be talking about system B today.
00:20
By the way, I prefer my talks more of the interactive kind. So meaning that if you have a question, just go and interrupt me. I much prefer that over questions at the end. So that all the questions we can have are right on the topic that we're discussing. And I would also much prefer if you guys would lead the talk
00:41
into the right directions by your questions, instead of just me picking a couple of things that I like to speak about. So yeah, I have a couple of slides prepared here. I will start with an introduction to system D. And then hopefully, based on your questions, we can touch particular areas in more detail.
01:02
So let's jump right in. This slide contains the original description of system D from the system D website. It's a long, long paragraph. This is the first half sentence of it. It's a paragraph with a lot of information on very, very little room.
01:20
It's not necessarily easy to understand. And that's why we're hopefully going to parse it a little bit so that everybody understands what is meant by this. So the first half sentence reads, system D is a system and SASS manager for Linux. So what does that mean? A system manager, probably everybody might have an idea. The system is like this operating system thing.
01:41
And a system manager, what we mean by that is that it's an init system, that it manages the system, that it manages the components of the system, meaning that it controls a little bit what processes are being run based on a couple of things. A session manager, on the other hand, the term session manager is probably known to many people, like gnome-session is a session manager
02:01
and kd-session is a session manager. And system D also manages the session. So basically, it's a replacement in some ways, or it can be used to augment gnome-session or kd-session for Linux. Let's go on. It's compatible with system 5 and LSP init scripts.
02:21
System 5 and LSP init scripts, probably everybody has known. At least if you ever came in closer contact with the Linux system, you probably have played around with system 5 init scripts. They are basically these things, etc, initd, some service start and stop you can start. We're compatible with that. We're an init system that tries to stay compatible with existing system 5 or LSP
02:43
init scripts. LSP in system 5 is basically the same thing in this context. Basically, system 5 introduced the original contact ideas, and then LSP standardized around this a little bit and extended the original system 5 classifications, like standardizing exit codes of those init scripts and then standardizing the verbs
03:02
that you can pass to it and standardizing a comment and a couple of other things. But it's mostly synonymous. System D provides aggressive parallelization capabilities. Parallelization, probably many have quite an idea what that is. What that means in an init system context is probably not so difficult to understand either.
03:24
Basically, it just means we start everything that we start in parallel. But the word aggressive here means that it's probably a little bit beyond what existing init systems do. And in a later slide, we'll go a little bit into detail on what precisely this kind
03:41
of aggressive parallelization means. It uses socket and D-bus activation for starting services. That's probably one part here of this paragraph that is only understandable after you had a look on the further slides we have. Basically, it means that we can start services
04:01
if something happens on a network socket or if something happens because somebody required a D-bus server or something like that. So why this is so very useful and it's so interesting that we put it on this initial paragraph, we'll see a little bit later. But yeah.
04:21
It offers on-demand starting of daemons. I figure most people will also get a kind of an idea what that could mean. Could mean, for example, that we start a daemon the moment we use it. For example, if we have a daemon like Bluetooth, which is responsible for maintaining the Bluetooth hardware, that we then start the daemon only if the Bluetooth hardware is actually plugged in.
04:41
But this is actually very generic. It could mean a lot of other things, that if some service requires another service, that we start the service right the moment we actually need it. It's an interesting feature because you do less work, but it's not the core feature. It keeps tracks of processes using Linux cgroups. Processes probably, I hope that everybody
05:01
knows what a process on Linux is, but what does it mean keeps track of it using Linux cgroups? Linux cgroups is a new kernel interface that has been introduced, I don't know, five, no, 10 versions ago. Cgroups is short for control groups. What that really precisely means, I
05:22
have a couple of slides about this later, but this is actually very, very useful to not only start and stop services, but also keep precise track about everything, every process that a service responds. And that can be quite a lot. For example, if Apache starts up, it can start a gazillion of CGI scripts and whatnot.
05:42
We use Linux cgroups to keep track of them. Again, a little bit more detail about that later on. It supports snapshotting and restoring of the system state. Snapshotting probably people know from context of databases or stuff like that. It basically means that you take a snapshot of the system like it is, you store it away, and later on you
06:01
can return to it. System B supports that, so you can say, OK, I've started this service and that service and that service and that service, but I don't have started that and that and that. Then you say, save that state. Later on, you return to it. This is, for example, useful for quite a few things, like you're administrating your machine. You have Apache start and everything else.
06:22
Now you say, oh my god, I need to administrate something, and I need to make sure that nobody and nothing does anything to the system at this time. So you change to single user mode. You do your changes. You can be sure that nobody interferes with you because you're the only one. And then you return to the original state again as if nothing happened.
06:43
It maintains mount and auto mount points. This is probably something surprising to many people because in its system so far, it didn't really do mount and auto mount handling. Mount points, probably everybody who ever dealt with Unix has an idea what it is. Auto mounting, not necessarily. Auto mounting is a lot like normal mounting, except that instead of actually mounting a file system
07:03
to someplace, you just mount an auto mount point to it, which is something like a magic thing that just stays there. At the moment, somebody for the first time accesses it is backed by the real file system. In systemd, we use that to parallelize boot up. And we also use it to delay boot up certain jobs during boot
07:23
up so that we don't have to do them right away during boot but only when they actually need it, thus speeding up the boot up. It implements an elaborate transactional dependency-based service control logic. That is quite a hard sentence there. Transactional, probably everybody
07:40
heard in the context of databases. You have something where a couple of operations, you bind them together, call them transaction, and you either execute them or you don't execute them, but you do not half execute them. In systemd, we have a very weak definition of transaction, but we have one in there. Basically, that means if you start Apache
08:01
and that pulls in MySQL, then either you end up with both or you end up with nothing, but you do not end up with Apache half started and MySQL half started as well. Dependency-based basically means, I mean probably everybody heard that in the context of package managers, you install the MySQL client and pulls up the MySQL server or the other way around,
08:22
something like that. We have the same thing in systemd that you can say, well, DBAS requires syslog. So syslog is pulled in by DBAS. And service control logic basically means, yeah, you can control the logic. You can control the service with systemd. Surprise, surprise. It can work as a drop-in replacement for system 5-init.
08:41
System 5-init, everybody knows, I hope so, is a classic implementation of an init system for Linux. It has been used by almost all distributions. Historically, very recently, a couple of distributions changed to an alternative implementation called AppStart. We're not going to talk too much about AppStart here,
09:01
but yeah. So this is a written paragraph that's on the website. It's quite a lot of information for very little text. But I hope everybody has a rough idea of what it actually might be. So on the next slides, we're going to get a couple of these topics mentioned here into a little bit more detail.
09:21
So let's talk a little bit about init. Init, as I kind of mentioned already, is a special process. It's the first process that is started after the kernel is booted up. It's the init system. It's PAD 1. And it has magic capabilities. systemd installs itself as one implementation
09:41
for this magic process number 1. It's magic for a couple of reasons. It's magic, for example, because if you press Control-Alt-Delete, this gets forwarded as a special request to PAD number 1. If a process dies, then all its children will be reparented to this magic process number 1.
10:02
Every single process that is not a process of something else is automatically a child of this magic process. For this reason, it has a couple of additional requirements, additional constraints that it needs to implement. Because basically, the entire user space depends on this to be running and to be controlling
10:24
everything. So yeah, as mentioned, there are a couple of implementations. This is around the big ones, system 5-init, and outstart, and now systemd. I like to believe that systemd is the most advanced of those three.
10:41
So by the way, if anybody has any questions to all of this, just ask. Raise your hand, and there's some people here with microphones who will then give you a microphone. If anybody has a question, ask. The next topic we'll touch is parallelization. Parallelization is one of the key things that systemd is about. Yeah, as mentioned, probably everybody
11:01
has a rough idea of what that means. It means if you boot up your machine and you start a couple of searches, depending on what you're running, it might be, I don't know, up to 50 services or so. Then we start them as much as possible in parallel, so that whenever the CPU has nothing to do, it can do something else. We have this wonderful graph here, graphic here,
11:22
which tries to explain the way that systemd implements parallelization and how traditionally parallelization was implemented or was not at all implemented. To the left, we have this traditional system 5 parallelization. It's basically how most of the distributions five years ago
11:41
worked, and actually Fedora until Fedora 14 still works. This shows you basically the order in which four services, we just picked four services for the graphic here, are started. Syslog, Dbus, Avahi, and Bluetooth. There are a couple of dependencies here between these services, which
12:02
is why this order is the one that is actually used here. Dbus uses Syslog. So Dbus has started second, and Syslog has started first. Avahi uses Dbus and Syslog, so it started after those two. Bluetooth also uses Syslog and Dbus and started after those two. Bluetooth and Avahi, however, they don't have any dependency. Bluetooth does not use Avahi, and Avahi does not use Bluetooth.
12:23
So however, since traditional system 5 boot ups, like Fedora implemented them at 14, were strictly serialized. This meant that still we had to pick an order. We had to start one first and the other one second. In this case, we just picked the alphabetical order because we didn't know any other thing to do.
12:41
Of course, it could have been started the other way around. There would not have been a problem. Now, a couple of people looked at this and said, oh my god, yeah, well, the ordering between Syslog, Dbus, and Avahi we can't do much about. And the one between Syslog, Dbus, and Bluetooth, neither. But we could do something about the ordering between Avahi and Bluetooth. Because it can be started parallel,
13:01
we should start parallel. And then people came up with this, the middle kind of parallelization. Syslog and Dbus are still started one after the other. And Avahi and Bluetooth are started afterwards. But Avahi and Bluetooth are started at the same time. This is the traditional parallelization, how upset works, and how SUSE updated the classic System 5 boot
13:21
process. It's an improvement. I mean, if you look, the arrows altogether basically should give you an idea how long this takes to bring up all those four sources. And you notice that the traditional System 5 took like four arrows, and this one just takes three. So it's a little bit faster. But it's not as good as we do it in System B.
13:42
Because in System B, we actually start all four of them completely in parallel. And that is kind of surprising. How can we do this? Because there is still a dependency between Syslog and Dbus, and between Avahi and Dbus. How do we actually pull it off that we can actually start them completely in parallel? And this is a technology called
14:01
socket-based activation. It's something that Apple pioneered in LaunchD, which is in the core part of the Mac OS operating system. They basically looked at these kinds of boot-up graphs and thought, hmm. So if we look at all of this here, why precisely is it
14:21
actually that Avahi has to wait for Dbus and Syslog? What is the one thing that Avahi waits for? What is the one thing that Dbus waits for in Syslog? And they looked at that, and looked in all detail. And then they noticed, it's about the sockets. It's about the sockets that are created. The Syslog socket dev slash log that is created, that is
14:42
bound to by Syslog, that Dbus waits for before Dbus can start up, because Dbus wants to connect to that socket and write messages to it. And then they looked on the other dependencies and said, OK, yeah, so why exactly is it that Avahi has to wait for Dbus? And again, it's about the sockets. Avahi wants to connect to the Dbus system socket. It's a socket called slash var slash run slash Dbus slash
15:03
system minus cross socket. And they looked at it and said, well, if it's really just about the sockets, can't we somehow do something about that so that we can start things in parallel? If the socket is really everything that it's waiting for, wouldn't it be possible to somehow speed that up? And then they came to a solution.
15:20
And the solution they came to is that they pull out the actual socket binding out of the demons, do them in one big step in the init system itself. And then they just pass these sockets pre-initialized, pre-bound to the actual services. And that's what they did with this lock. I mean, they don't use Dbus, actually, on macros,
15:41
but they have a couple of other services that work like that. And then they pulled out the socket binding out of that, did that in launch D. So in one big step, all the sockets, that all the services, be it if you need sockets or if I need sockets or whatever, they're all created in one big step. And then in a tight inner loop, that gets really, really fast. And then they start every single service
16:02
that is supposed to be started at the same time. And the services get past the sockets they should be listening on later on. And this is actually really, really nice, because suddenly you can start everything in parallel, because the sockets are already established. So if Dbus wants to connect to syslog, it doesn't have to wait for anything,
16:20
because the listening was already done before syslog was even started. There's a question. Sorry for that. By the way, I know that I speak very, very fast. I'm sorry for that. If I speak too fast, say something. I'll try to slow down.
16:41
Hi. This socket-based activation seems to lend itself very well also to work across the network. But from what I understand, you rely on the kernel to kind of resolve the dependencies by queuing. Won't that break if, for instance, one of the servers in your cluster hasn't been booted properly yet,
17:02
or if there's, will the dependency resolution still work properly in those cases? So the focus of this kind of socket activation for us is mostly AF-UNIX, actually. It's not so much AF-INET. So it's a little bit different from the traditional INET stuff, which, the INETD stuff, which focused on internet sockets.
17:21
We also cover INET sockets. In the case of cluster stuff, where you have an actual network, things become, you probably need to program your stuff more defensively anyway. So you probably need to continue trying to make the connections. Generally, on the local case, the dependencies, if you have cyclic dependencies, whatever you have, they don't become different
17:41
through adoption of this scheme. It just means that you create the listening sockets earlier, it doesn't mean that if there is a cyclic dependency or anything like that, it becomes, they suddenly go away, or suddenly more cyclic dependencies get created. That doesn't really change much in that way. But I would say, if you focus on clusters and stuff like that, you probably should program defensively, you should retry
18:01
because packets get lost all the time. So does that kind of answer your question? Okay, so socket activation has many, many advantages. One of them, as mentioned, is that we can do this drastic form of parallelization, where we can start every single service at the very same time, and then you make
18:20
the best of the CPU and the IO time available. But there's a couple of other additional, really great advantages. One of them is, suddenly, you do not have to encode any kind of dependencies anymore. Because in one big step, all the sockets are actually established. So whether D-Bus uses syslog or not,
18:42
doesn't matter anymore, because it can just connect to the socket and it will be there. And traditionally, you needed to make sure that D-Bus got started after syslog, so you needed to write down somewhere that D-Bus requires syslog. But it's not necessary anymore, because all the sockets are created at the very same time, and everybody can just connect.
19:00
So it's a lot simpler for the administrator and for the developer, because they don't need to think anymore about all these dependencies. It has a couple of other advantages, too. For example, we can actually restart stuff without having this service being unaviable for the tiniest bit of time, even.
19:21
For example, you start syslog, and then syslog crashes, for example, because, I don't know why. Syslog implementations tend to be gigantic beasts nowadays with all kinds of enterprisey SSL and whatnot, so they have every reason to crash. So if they crash and use this kind of socket activation, then this socket they are listening on
19:42
got created by the init system and is there. So if the process goes away, the init system still retains that original socket, and if the init system then notices, oh my god, syslog crashed, and was configured to say, okay, if it crashes, then just restart it, then it will do that, and will pass the original socket again to syslog. And however, this is still the original socket,
20:02
so everything, every message that got queued into that socket is still there, to the effect that nobody will actually notice that a syslog crashed, because not a single message will be dropped. Every single message that is in the socket will be read by the syslog implementation. And this is really cool,
20:20
because you can actually write robust software that can just crash, and the only thing you might lose is one transaction that it was actually processing while crashing. But otherwise, you don't lose anything at all. You can even use this for really amazing stuff like upgrades. You say, okay, my syslog implementation, I got a new version. I can shut it down,
20:41
and I can start up the new one, but because the socket listening is done by the init system, and it still always keeps that reference to that socket, you can do that, and you won't even lose a single log message. You can use it for a couple of other things, too, like you can actually replace the implementations of what you do. And again, syslog is a good example for that.
21:01
We, for example, have a little bit of a very tiny bridge that connects syslog to kMessage, kMessage being the kernel log buffer that you can see with dMessage. We always thought, it's kind of sad, that during early boot, no proper logging is available. In systemd, that's different, because in systemd, we very, very early created that listening socket for devlog,
21:22
then spawned this little, little bridge thing that just pushed everything that comes in through devlog into the kernel log buffer, and eventually, when the real system goes up, and the real syslog starts up, we can just start them, and replace the implementation of syslog on the fly, without losing a single message. And that is really, really great.
21:40
And there's a couple of other things, like with this kind of design, it is the kernel that schedules the execution order for us, because let's, for example, the syslog example again. Let's say you create the socket, then you start syslog. At the same time, you start all kinds of clients off syslog, then they will connect to that socket,
22:02
they will write the messages to it, but they will never actually block on it, because the socket buffer of devlog is kind of large, so every time they write a message, they will just write the message, and the kernel will just put it into that socket buffer, and will return immediately. So the clients do not have to wait, ever, for anything. They can just push data into it,
22:21
and eventually, if the socket buffer really runs over, if you really logged a couple of megabytes of stuff, then they will have to wait until the other side caught up, but only in that case. And in the syslog case, it works really, really well, because syslog is really strictly one way. You never expect a reply from syslog. You just push data into it, and syslog takes it,
22:42
but it will never actually respond to you. Every time, then the client will write a message to the DBAS socket, and DBAS will take some time to reply, but it's only that one application that will wait for it, and at the same time, you can start everything else, and they can also access the socket
23:00
and push data into it and stuff like that, but yeah, the kernel is the one that will order the execution for you, and that doesn't need to be a scheduler anymore in the init system to make sure that the things are started at the right time and make the best use of the CPU. So yeah, this is socket-based activation. It's one of the greatest things that are in LaunchD,
23:22
and so we thought, okay, this is so awesome, we want that in System V2, because it simplifies everything, because you don't have to configure dependencies. It parallelizes things like nothing else, and it makes things more robust, because it can change things and replace implementations and stuff like that. Is there a question?
23:41
There's a question. Oh, there's a question, there, there, there, there, there. One thing is, how do you know which sockets have to be created in the first time? It's basically, if somebody wants a service to be started, like you install a syslog implementation,
24:02
then it will tell us not only that it's this syslog binary that should be started eventually, but it will also tell us, okay, please create the devlog socket for us. So it's basically, at installation time, you drop in a service file and a socket file, that's how we call it in System V, that contains configuration parameters. Usually, that's very, very short.
24:20
You just say, listen on a datagram socket slash devlog for me, full stop. So that's really, really short. And you can actually maintain those sockets in the service in System V completely independently. You can start the syslog socket very early, and then very late, actually, start the service, and you can even do stuff
24:41
like this bridge I mentioned. It will actually terminate when idle. Like, if there's no log messages, if it didn't receive any log messages in the last 15 seconds or so, it will terminate itself, because the inner system still listens on it. It's not a problem, because the moment somebody actually writes something again to the thing,
25:03
then the inner system will notice, oh my god, something wrote something to the socket, but there's nothing backing it. Then it's just started, and then it's being started. Then it will process the message that got queued in the socket, and then eventually, the bridge will terminate again. So you have this kind of on-demand
25:20
starting in this as well. All of this is not really a new idea in Unix. As mentioned, Apple came up with using this for parallelization. But actually, this was known before in INETD. I already mentioned that earlier. INETD is one of those classic Unix sockets that has been around for ages, Unix services that has been around for ages.
25:42
It did very similar things, but it didn't do it for parallelization or for robustness reasons. It mostly did it to simplify implementations of DOMINs and do on-demand starting of DOMINs. They had a little bit different models. They were mostly focusing on internet sockets and not so much on Unix sockets.
26:01
We focus more on Unix sockets, but also do internet sockets. And they had, although it was possible to start services that would then take the listening socket, it was mostly focused on spawning one instance of a DOMIN for each connection socket. But actually, it supported both ways, and we support both ways too. It's just a little bit of a different focus.
26:20
We want, if you start Apache on-demand, we want that Apache get the real listening socket, and they, back then, wanted to just hand off one connection socket and have a couple of Apache instances, which is not a recipe to make things fast. And we want things fast. But there was a question. Yeah. I'm assuming that demons need to be modified
26:42
in some way to understand this, or? That is a very good question. And I have a slide for that. So yeah, we need to patch DOMIN for this, in some cases at least. It's actually very, very simple to patch DOMIN for this,
27:01
for a couple of reasons. The code of the DOMIN actually becomes much, much simpler. Because if you currently, if a DOMIN creates a socket that it wants to listen on, what it does, it calls a socket system call, it calls the blind system call, it calls a listen system call, and maybe calls a couple of SOC ops in between. If you use this kind of system deactivation,
27:20
then the socket just gets passed to you via process execution. You don't have to do anything. You just take the socket and it's there. So yeah, but you need to patch most of the DOMINs. We already did most of the work for the stuff that is running on default on a Fedora system. Like we patched RSS log, we patched DBUS, and all these kind of things
27:41
so that they actually work like this. It's a really, really simple interface. It's basically, you just get an environment variable which tells you, hey, you got a socket pass, and then you just take it and use it. Also, quite a few DOMINs already support a socket-based activation due to INA-D history. For example, SSH-D, we won't bother
28:02
with actually patching that because SSH-D-I can be used to start it per connection via INA-D. We say, that's good enough. Apple actually started that way too. We don't need any further socket-based activation. We can just use this classic INA-D mode. We support that and it works fine. There's a couple of other reasons
28:22
where socket, this kind of patching, becomes very simple. For example, because Apple supports this on MacOS, quite a few software that exists out there is already patched for these kind of, the same mindset, basically. It doesn't use the same APIs as we do.
28:41
Our APIs are much, much simpler. Our APIs really are just check the environment variable and just use the socket, and while with the LaunchD APIs, you have to check in and say, yeah, give me those sockets, and it's kind of complicated. But if the software already got patched for LaunchD, it's very, very simple to update it to support our mode as well.
29:01
So yeah, you have to patch, but it's really, really simple. So regarding this socket-based activation, have you given any thoughts to services that are associated with the user sessions? That is a good question. So yeah, later on, if you saw the original slide, it was not only saying system manager,
29:21
but also session manager. So eventually, we do not only wanna manage the system with it, because the system boots up fast is one good thing, but that the session, after you're logged in, boots up fast is almost as important, might actually be more important than most desktop machines because we start a shitload of stuff in the sessions nowadays. So yeah, we definitely want to also start GNOME like this.
29:45
It's not on the to-do list right now, because first, we need to get into distributions for the system stuff, but it's going to be the next thing that we look into. Actually, you can already run it like this. You can just run systemd dash dash user, and it will work as something like a session manager,
30:00
but you can't really use it yet to spawn up GNOME. In the long run, it's our secret hope, but we haven't really figured out all the details that we basically pull out the session management, take this away from the different desktop environments, and just have one simple one that everybody can use so that you can, instead, that you have a GNOME session, a KDE session, and then something else.
30:21
People just can mix and match stuff, and there's only the Linux session, which would be systemd. But yeah, all these things, like parallelizing boot up, they are the same problem for starting up the system and starting up the session. I guess that answered your question. Okay, there's another question.
30:43
Yeah, just a quick note. It's a good idea to try to replace GNOME session and KDE session and XFC4 session, for example, but you still have to keep in mind that systemd will only work on Linux. So... I didn't follow? systemd will still only work on Linux,
31:01
so we still need the traditional session manager on the other operating systems. So yeah, in systemd, we do care only about Linux. This offers us so much advantages that we, yeah, I personally don't care about the other operating systems, basically. If people care, it's their problem.
31:25
The thing is, if we focus on Linux, then we have so many opportunities because we can use all those Linux-specific APIs. And Linux is simply the most advanced kernel there is, and it has all these awesome things, like Linux cgroups, all these kinds of properties. And if we would want to make these portable
31:42
to other Unixes, which basically, I don't know, was the API stood still in the, I don't know, 10 years ago, we could do this, but we couldn't use all these functionality. Also, it makes our code much, much simpler because we do not need to care anything about abstraction. We do not need to abstract kernel interfaces
32:03
because we could just develop it on Linux. So our code becomes much shorter, it becomes much easier to read, and becomes much more powerful. And we actually use a shitload of Linux-specific interfaces. We use eventfd, we use timerfd, we use cgroups, we use slash proc mark, really, yeah. A lot of those things,
32:20
you can't even do on other operating systems. For example, just watching the mount table, what is mount and what's not, there is not really a nice API for this on any other operating system, but on Linux. So, well, I mean, I do respect, if people want to spend time on other operating systems, they may, but I think the focal point
32:43
of free software development is Linux. And I don't think I need to care, oops, I need to care too much about keeping compatibility with 10 operating systems. Just one is good, and it's going to create the best product and I think most of the people who actually do the work in GNOME
33:00
and do the work in any of the other operating systems do mostly care about Linux. So, yeah, basically my position on all of this is, yeah, my focus is Linux. I offer you something for Linux. If you care about something else, I won't make your life extra difficult. I will respect that, but don't ask me to care for FreeBSD or whatever.
33:24
I won't. There's another question. What do you do if you have a dependency relationship that's not based on sockets, so it's some other dependency? So socket activation is not the only kind of activation we support. We also support D-Bus activation.
33:41
So on current Fedora systems, actually, by default, more D-Bus services are installed than actually services listening on sockets. So everything I told you about socket activation, we also support for D-Bus activation, meaning if somebody wants to talk to a service that is not available on the bus right now, it will be spawned.
34:00
This actually exists in D-Bus anyway, but in this new scheme, it will actually forward to systemd. systemd will start the service and it works like this. This is much nicer than the traditional stuff, actually, because you can actually start a service, for example, Avahi. Avahi, I hope everybody knows that it's service discovery. You need it if you have a network.
34:20
You need it if an application requests for it via socket, meaning it does an NSF request, and you need it if somebody requests it via D-Bus. So these three triggers, and if nothing of this happens, you have really no reason to start Avahi because if you don't have a network and no application wants to use it, why start it? So in systemd, we have this scheme in there
34:42
that you can have three triggers, hardware, D-Bus, socket, and they will result in just one instance being started, and they're completely race-free and completely safe. And yeah, does this kinda answer your question? Yeah, thanks, that was great. So if you look at the main page, you'll see we have a couple of other triggers as well,
35:03
like we have mount triggers and then all these kind of things. So you have quite a bit of flexibility. There's another question somewhere? No, they're working on this, and there's an XPC to everything that's out there. Oh, by the way, so what Kai just,
35:20
by the way, this is Kai's inverse, he is the other systemd guy here. So if you have questions, you can not only ask me, but also him. But what he mentioned is that while in systemd, most of the dependencies completely go away, you don't have to configure them anymore, you still can if you want. And this is necessary, for example, for early boot up,
35:41
because at early boot up, you have dependencies like you wanna first set the clock and then do a couple of things like writing lock messages because you wanna make sure that the right clock is used when writing those lock messages. So sometimes you do need explicitly configured dependencies,
36:01
and systemd exposes a very elaborate system, if you want to. But normally, for most of the services, most of the services like syslog, dbus, whatever, you actually do not have to configure a single dependency. It will all happen for you. So the dependency configuration is mostly something for the people who actually build the operating system for you, like the mias upstream
36:21
or the fedora maintainers and the demian maintainers and all these people, they will use explicitly configured dependencies. But the people who develop services, the administrators who want to write service files for existing services, they now actually have to deal with that. Classic. I don't really understand how this is supposed
36:42
to be a drop-in replacement for system five in it if you need to patch the demans. You don't want to think about the operating system, but you expect the deman people to patch specifically for systemd and Linux.
37:01
So to make this really clear, you do not actually have to patch anything. It's, you can if you want to. But systemd supports classic system five init scripts just fine. It will read them as if it was native configuration. It's just different kinds to write configuration down. So if you don't want to use a socket-based patched,
37:21
I don't know, implementation of MySQL, you're welcome to. So far nobody has patched it anyway, so you can't even use it in a socket-based activation. So there is no need for you to patch. We welcome you to do this. We think it's a great improvement if you do, because you can get all the robustness. A lot of people agree with us that it makes sense. For example, really not so mainstream software
37:43
is even patched these days, like Dovcott that IMAP server actually supports socket activation. But yeah, you really don't have to. You can continue to use system v init scripts. You can continue to not use a socket activation. But if you do, you gain all the advantages that we offer you, that you get rid of all the manual dependency configuration,
38:01
that everything can be started up in parallel and all these kind of things. That answers the question, I hope? Okay, there's another question. Somebody got a mic? Hello, yeah. I didn't understand the last answer you just gave there
38:22
about optionally supporting system v init scripts. Surely, for the dependency resolution process to work, you have to have a full set of dependencies. If you only have a few exposing their sockets and then a mishmash of half-assed system v init scripts that no one has been bothered to port, then the functionality that you're relying on
38:42
to make system d work just won't work very well. It'll be pointless. But it should really be all or nothing. No, it's not all or nothing, because actually, if you install 10 services on your machine, like an IMAP server and Apache, then Apache do not actually have dependencies. But you're right.
39:00
If one of the things that are more at the root of things is not socket-based activated, then you cannot parallelize the stuff that's started afterwards. That is true. But in real life, that actually doesn't happen that much, because at least as far as we see it, all the core stuff, we have patched. Syslog, DBOS, blah, blah, blah, blah, blah.
39:20
But yeah, you're right. And in some cases, like for example, for the DBOS activation, it becomes much easier, because DBOS activation already existed, so manual configuration of dependencies is never necessary. But you're right. If you have a long chain of system v init scripts that use classic dependencies between them, then yes, the system will make use of that. But so basically, my look at this is,
39:44
if you start converting, then start at the root. Don't start at the end. And then the problem doesn't exist. Okay, any further questions at this moment? There's one. So when you first actually blogged about this and did your first release, one of the things you talked about was aggressive starting of daemons
40:01
versus the sort of start them whenever they become required. And you'd put forth a few statements about how you thought that one would be faster, but you needed further testing. I'm curious, I have not seen you post anything about any further testing you've done, and what has been the performance impact of that?
40:21
So yeah, interestingly, our focus is not only speed. It's also speed. But the central focus is to make things correct. That's what we wanna do. We wanna have clean code that does things correct. So if you look at system D and everything started parallel, then you have this problem that on a classic rotating disk,
40:41
actually this is not necessarily improves the performance. The reason for that is that while traditionally if you started everything one after the other, then because all these blocks that these applications used were on disk one after the other, you would actually have linear reads through the disk, and that is the best thing you can do in rotating media. If you however use system D and suddenly start everything in parallel,
41:02
then basically the read requests to the hard disk come completely random, because there's this service which needs access there, and then next one needs to access some completely different place, so yeah. This resulted in not an extreme improvement in speed if you use rotating media. So however, this is not unfixable.
41:22
The reason is because it's not unfixable, we have these elevators, these IO elevators, and they still, a good elevator benefits if it gets a lot of requests that you can choose from. So far the classic Linux elevator wasn't very good at that, but we're now generating those workloads, and elevators tend to be optimized
41:41
for the workloads that they have. But the summary of it all, it isn't worse than the current stuff, and it's much better on SSD, and the future is SSD anyway, because this is of course a little bit disappointing, that this on rotating media, because rotating media is still, I guess most of the computers, or the majority of computers
42:00
probably still use rotating media. It's a little bit disappointing. So we looked a little bit into this, and tried to find a couple of fixes. System B actually comes out of the box with a read ahead implementation. Read ahead implementation is something that exists actually at least five or six versions of. Read ahead implementation is basically,
42:21
they look at one boot, deduced from that, the sectors in which order they were accessed, and the next time, at a very early boot, they read all their sectors, and in the right order that they are on disk, thus optimizing things, and under the assumption that then they're already in memory when they're actually used, and speed things up. We installed it by default,
42:42
to remedy this problem. It gets us about 10%, depending on what kind of machine you have. If you don't start any services at all, like if you have very little to start, then of course the speed up of this stuff will be minimal. If you start a shitload of stuff, then you will actually notice things. But yeah. But to be honest, I don't really want to really get too much
43:01
into optimizing my corruption with stuff like read ahead, because read ahead is actually not a nice thing. Because what you do with read ahead is that you second guess the IO scheduler of the kernel, because you actually try to be smarter than the IO scheduler of the kernel. So if we really wanna do this, then we probably should upload the request
43:20
that we will know will happen to the IO scheduler, so that the IO scheduler, which has much more information about the actual seek times of the disk to make the decisions. But we talked to a couple of kernel guys, like, what's his name? Anyway, there's the IO scheduler guys.
43:41
How about all of this? And to be honest, the interest in read ahead is not the biggest, because they always say, well, if you want it fast, use SSD, and all these problems go away anyway. So yeah, I hope this kinda answered your question. Well, I have a follow-up. I was probably speaking more directly in terms of Dbus, because obviously the way that systemd works,
44:01
you start up everything in parallel up front that's socket-activated, but then Dbus sorta trickles along. As dependencies arise, as things actually get called, then the process actually starts to handle the Dbus request. So essentially, assuming, for instance, a typical desktop, a boot scenario, your desktop basically becomes available
44:21
after the last daemon that had some request was started. So in that sense, you are parallelizing everything up front with socket activation, but with Dbus, you're actually creating a long trail of dependencies. So has there been any thought about optimizing that? Well, I mean, we can start Dbus at the same time as the process is.
44:41
Also, I mean, there's a lot of stuff. The desktop can also peer up while still, for example, pulse audio starting, because if pulse audio's starting and you wanna have the welcome sound, there's no need that this in any way actually delays the graphical stuff. But yeah, I mean, Dbus is probably the central dependency of the desktop in this area.
45:03
But I mean, yeah, we can't parallelize it already. And yeah, of course it's a bottleneck, and there are a couple of people working on Dbus optimization, but to be honest, Dbus actually starts really, really fast these days. I don't feel too concerned about that. Thank you. Are there any questions at this moment?
45:21
I don't see anything, so let's go to the next slide. So yeah, one nice thing that we actually can do is parallelize file system jobs. So if you have a computer these days and it has a couple of hard disks, then you actually run fs check at boot. And then the entire boot waits
45:40
until the fs check finishes, and then you mount everything. As mentioned, systemd actually supports auto-mounting. And that enables us to actually parallelize the fs check with the actual startup of the system. Because what we need to do, we need to of course do the fs check of the root file system, but we do not actually need to wait until slash home becomes available.
46:01
Because what we can do here is we start fs check to file system check the home directory, the home file system, and while that is active, we already install the auto-mount point for slash home, and continue booting. And then everything will work fine, and the moment some service or some user logging in actually accesses it. Because, for example, Samba wants to share it.
46:21
In that moment, Samba will do the file system access. This request via the auto-mountable will go to systemd. While that actually happens, Samba will wait for that. It will automatically freeze by the kernel. And eventually when systemd then call up, when the file system check finished, and it replaced the auto-mount point by the real mount,
46:42
the execution of Samba will just go on. So all this kind of parallelization that we have with socket-based activation, or bus-based activation, where we can start syslog and divas and all this stuff in parallel, we can extend to the file systems. We can run fs check, the quota check, and everything else at the same time as other stuff that is starting on the system
47:04
is still in progress. So that's the question. And there's another one. Just a question about that. How does systemd interact with early user space, and as an example, for instance,
47:20
like crypto roots and things like that? So, systemd nowadays is not only this inner system that you can install, and where you then can integrate your classic shell scripts with. You can do that if you want. systemd nowadays tries to standardize the entire boot process for you. We looked at all the boot processes
47:40
of the different distributions, and they all have these gigantic inner scripts that do the early boot stuff, and we looked at them and noticed they all do the same thing. And they all do it completely differently, and in a gigantic shell script, that is a horrible mess, usually. So what we did is, we said, we can do this better, because shell scripting isn't necessarily nice. Shell scripting is necessarily slow.
48:02
It's necessarily, because it involves a shitload of forking of processes. We thought we could do this nicer. So what we did, we looked at this over a longer time, and always picked these little pieces out of it, and implemented that in C code. For example, most trivial thing, setting the hostname. We thought, well, if it's just about reading one configuration file and calling the set hostname system call,
48:22
why do we need to do forker process? And then we looked into a couple of other things like this and then said, okay, let's find a nicer place where we can just do this in C. And for the hostname stuff, for example, we said, okay, it's in systemd itself. systemd itself, when it boots up, will now set the hostname for you. And so we don't need to do that in the shell script anymore. So we covered actually everything now
48:42
that is in the default Fedora boot. Everything, if you boot F15, not a single shell, and it didn't install any kind of magic stuff like NFS or, for example, like that, you will not have executed a single shell while the system boots up,
49:00
because everything is now done properly in C. For the dmcrypt stuff, we actually provided something systemd is nowadays extensible. So you can actually create during boot up, systemd units, these units are basically what systemd covers. A unit is a service, a unit is a socket, a unit is a mount point and stuff like that,
49:21
and you can have dependencies between them if you want to. So we have this plugin which looks at etc.crypta and automatically generates on the fly units of it, and systemd then reads them and can integrate that into the usual flow. The effect of all of this is that the cryptoloop stuff can be executed in parallel
49:42
with fs checking the next file system already, because instead of having this gigantic script where everything bit by bit is executed, we can actually parallelize that completely, and because all this unit stuff in systemd is perfectly parallelized, this is actually quite an improvement. So yeah, in systemd, by default, if you install things, you actually get support for crypto stuff.
50:01
It's optional dependency, probably all the ambit people don't want to use it, because they don't need crypto stuff, but on the desktops we probably all want, and so we support that. But your samba example, is it really a good idea to start up samba if file system really isn't accessible and won't be for minutes or potentially hours
50:22
if it's running FSK and waiting for it to finish? And that's a question for other services as well. Sometimes you really want it to be available. If it will have a response time that's reasonable to the system as it is set up. I mean, do you take that into account?
50:40
I'm not sure I understood the question. You give the example of samba, starting it up, and it will automatically block if it tries to access the file in the auto-mounted file system, because the file system is being checked. But that might take minutes or hours. So sometimes some administrators might not consider it
51:04
a good idea to pretend that the service is available, even though it won't actually be in practice. So my reply to that is, in the traditional mode, if the file system really takes half an hour or something, then in the traditional system, your boot took half an hour to boot. So we allow you to already run the samba
51:22
at a much, much earlier point, or something else, for example, you can already SSH log in at the point where samba is still waiting for the FS check to complete. But my reply to that is, in systemd, all operations, really all of them, actually time out it, and you can actually configure that, to the effect that if something really takes ages, we will just go on booting, for example,
51:42
or basically configure what happens. So the idea in this case is, if the file system check takes too long, we will actually fail this request to samba. samba will get a clean error code, but we'll just get this thing like, yeah, not available, EIO or something like that, and can then continue from that. But yeah, I don't think that actually
52:01
really something changes there. If samba previously had to wait completely for the FS check to wait for half an hour, this is still what happens, except that this delay to the moment where it actually accesses the file system. Well, there is a difference in that samba is actually, appears on the network, it's visible, so clients might try to use it.
52:21
Well, there is some point in that, but yeah. To be honest, I don't think that in the future, all the file system checks will be that slow, I don't know. But yeah, it's a valid point, that it will appear valid, accessible for a while, but if you actually access it, it will time out after a minute, that is true.
52:40
Okay, my time's up, so thank you very much for your interest, and if you have any questions, then I won't bite, and just ask me or Kai, and yeah, we'll be available for your questions all the time. Thank you.