We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

systemd in 2018

00:00

Formal Metadata

Title
systemd in 2018
Title of Series
Number of Parts
50
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
24
Thumbnail
15:29
25
Thumbnail
21:21
32
44
SpacetimeSystem programmingService (economics)BootingElectronic meeting systemBootingServer (computing)CountingVideoconferencingPoint (geometry)Operating systemArithmetic meanDevice driverImplementationSoftware testingFunctional (mathematics)HookingMultiplication signPhysical systemIntegrated development environmentKernel (computing)BitOrder (biology)Distribution (mathematics)Service (economics)Software developerSlide ruleRevision controlGraphical user interfaceNeuroinformatik1 (number)Gastropod shellComputer animation
Run time (program lifecycle phase)Physical systemLatent heatRun time (program lifecycle phase)Exception handlingBuildingBlock (periodic table)Data managementOperating systemIntrusion detection systemTerm (mathematics)RoutingSoftware testing
Run time (program lifecycle phase)Physical systemSystem callOperator (mathematics)Functional (mathematics)Software testingService (economics)Distribution (mathematics)InformationSoftware developerData managementSystem callDefault (computer science)NetzwerkverwaltungPhysical systemGoodness of fitMoment (mathematics)Set (mathematics)Regular graphType theoryFilter <Stochastik>Focus (optics)Group actionInformation securityPersonal digital assistantMultiplication signVideo gameGeneric programmingBinary codeOvalRight angleCASE <Informatik>Process (computing)Server (computing)Computer fileKernel (computing)SoftwareComputer animation
Physical systemConnectivity (graph theory)Computer fileDeadlockBootingLocal ringService (economics)Cache (computing)AdditionServer (computing)Directory serviceSource codeExtension (kinesiology)Default (computer science)BitDirect numerical simulationLogicConnected spaceInheritance (object-oriented programming)EncryptionRight angleExecution unitData managementResolvent formalismProcess (computing)Directed graphOperator (mathematics)Principal ideal domainSlide ruleMathematicsNumbering schemeTheory of relativityDrop (liquid)Computer animation
System programmingService (economics)Right angleExtension (kinesiology)Physical systemComputer animation
HyperlinkRadical (chemistry)Game controllerExecution unitPhysical systemDirectory serviceHyperlinkFunction (mathematics)Computer fileSequenceService (economics)Electric generatorDrop (liquid)Link (knot theory)Slide ruleLatent heatRight angleComputer animation
Firewall (computing)Service (economics)Limit (category theory)Table (information)Level (video gaming)Kernel (computing)Program slicingBefehlsprozessorSoftwarePhysical systemBlock (periodic table)Context awarenessGame controller1 (number)Semiconductor memoryComputer programmingLocal ringBitCuboidTrailData managementWeb pageCategory of beingMultiplication signCountingNetwork topologyGoodness of fitBuildingEntire functionProcess (computing)InformationDistribution (mathematics)Group actionTask (computing)Radical (chemistry)Rule of inferenceDefault (computer science)RootLink (knot theory)MassVideoconferencingIP addressVirtual machineAverageType theoryFunctional (mathematics)Ocean currentRegular graphRight angleSigma-algebraImplementationVideo gameMereologyElectronic mailing listGreatest elementFunction (mathematics)SequelRange (statistics)Address spaceError message
System programmingHost Identity ProtocolMaxima and minimaData miningLecture/Conference
Transcript: English(auto-generated)
Hi, I'm Leonard Patalenga, I'm going to talk a little bit about what's new in SystemD in 2018. This talk is basically just a collection of a couple of things that I found interesting in the development of SystemD in the last year, basically since the last AllSystemsGo.
There is no particular order, and that's completely fine if we don't cover everything that I have on the slides here. So if you have any questions regarding any of the topics I'm bringing up here, by all means do interrupt me and ask questions right away.
I very much prefer if we can make this interactive than just do the questions at the end. That said, I only got half an hour, so I probably should get started. So, as mentioned, in no particular order. The first thing is portable services. I'm not going to talk much about this because I actually got another talk later today in like an hour or something just about that.
But it is a biggie, so I put this first. What I actually do want to talk about is boot counting. This is something that it's not merged yet, but it's pending and like it's a PR and SystemD and we're probably going to have that with the next release. Boot counting is something like if you build operating systems that shall be somewhat resilient to failure
and that can actually recover automatically from failed boots. You need something like boot counting, meaning that the boot loader needs to somehow know if an update was successful
and if it wasn't, revert back to the old version of the operating system. There are various operating systems that implement something like this. For example, Chrome OS does and so does the real chorus. But most of the general purpose distributions have nothing like this and all the solutions for this so far have been like local solutions for the specific operating systems.
Nobody tried to make this a commodity, tried to make this generic so that it's generally useful by general purpose Linux distributions. With this features thing set in SystemD, we want to generalize the general concepts around it as well as one specific implementation around one boot loader, the one that we ship ourselves as the boot.
But everything that this entails is actually generic enough so that it can hook it up to other boot loaders and there has been work to make that work with Grub as well. How this actually ultimately will be noticeable to users is that when boot counting is enabled, the boot loader will boot one version of the operating system or of the kernel, will boot it up
and then some tests can run after the kernel was booted up and figure out if everything's okay. And only when these tests parse, the boot menu item will be blessed. And if it's blessed, then the next time it's going to be booted again.
If it's not blessed, a counter was decreased and it's decreasing from three or something to zero and when it reached zero, then it will not be attempted again to boot this. So if you do high reliability server stuff, if you do embedded stuff that's interesting to you, if you do desktop stuff that's probably not so interesting to you because if there's a human person sitting in front of the computer, this is not so interesting.
But for everything else it is. There was a question somewhere. Do we do the microphone? Or I can repeat the questions if that's quicker.
The revert on the next boot. Okay, so the question was regarding what happens with failures regarding, for example, video drivers that usually require a user to identify if everything's okay. Actually, it's a good point. One of the Fedora people, Hans de Houde, has come up with exactly that issue. They want for the desktop stuff something where only after GNOME Shell has been booted up
and the desktop environment figured out that everything's okay as well, that only then a particular kernel that was booted is blessed. This PR that is almost ready to merge does not have that functionality yet,
but most of the concepts, the general ones, are extensible to that point and it's very likely that before we do the next release we'll also add that functionality so that this can be used in Fedora right away. Okay, let's talk about the next one. Something that is also pending as a PR is, you know, nspawn is like this small container manager
that is inside of System.ly. It's like Chirrut on Steroids. It's what we wrote to test System.ly with. It's actually generally useful now. I have added OCI runtime support. You know, OCI is a specification that came out of the Docker container stuff that's supposed to generically define how containers are supposed to look like.
nspawn pretty much implemented everything needed to do OCI stuff except that it didn't implement OCI itself. So given that it looks like maybe hopefully people can agree that OCI is the way how containers are put together, it made sense for us to support that in nspawn natively. So yeah, the idea is basically that then with the basic building blocks of the operating system
you can just run your containers as the executer. This is not going to solve how the containers actually got into the system. That's for other people to solve. But I think the long-term goal that would be actually useful to have is that Kubernetes could just use that thing directly
so that the actual execution of the container is no longer something that people have to think about in the upper layer, but the execution of the container is actually just functionality of the operating itself as long as it's an OCI container. So this is also pretty much ready and just needs to be finally reviewed and be merged into systemd, and hopefully then it just works for everybody.
But then again, initially, I mean, I tested with a couple of OCI containers, but before this will end up in the big distributions that will probably require more testing for real-life containers. Any questions to that? Otherwise, next one. This one is an interesting one. This is already in systemd.
Like, if you follow systemd development, you might have noticed that we tried to put a lot of focus nowadays on sandboxing of system services. The idea being that most operating systems are still put together mostly out of system services, and if we add sandboxing to those, we can generally make operating systems a lot more secure
because we have so many different services, and they tend to be imperfectly written because they're written by humans. Yeah, so system call filters have been implemented in systemd for a while, but they were not overly useful because you had to figure out exactly the system calls you wanted to allow and the system calls you didn't want to allow.
And it was mostly blacklist systems, so you're basically told the system needs to not allow Apache to change the clock. In general, though, if you do security, you usually prefer whitelisting systems where instead of saying that Apache is not allowed to do clock, you just list what it is allowed to do.
This is not easy to do, though, because there are so many system calls. Since the latest release, we have this system call group. It's called AddSystemService, which we sat down and tried to figure out what's a good set of system calls to, by default, allow regular system services, which is the basic set of system calls that everybody needs. And then we gave that a name, and the idea is that basically from now on,
people who put together system services will just enable this group, plus a couple of individual system calls they need that are not in this group, which is, for example, the right to change the system clock. So the idea is really we want to push people to do whitelisting of system calls by default
and make that more easy than it used to be. Questions regarding that? Otherwise. So the question was regarding which is the most controversial system call that is in there that we had to argue about. Good question. I mean, this didn't come out of nothing.
We had these groups since a while, but these groups were all very small. So it was not like you still had, previously you had to list a lot of these groups to actually run any regular service, like, for example, Apache. So after learning from that how this actually played out in real life
and trying to look at generic services, like, for example, Apache or Nginx, which don't do anything magic. They do very basic stuff there. There's nothing particularly kernel-related that they do. And the idea is basically this just contains everything that you need to run Nginx, but not more. So it's not enough to run an NTP server because that needs to change the clock.
It's not enough to run a network management service because that needs to be able to change the network. But it is enough to run an HTTP server that just does basic file serving or something like this. But, yeah, so there wasn't anything controversial. It's, like, from uneducated information, like, how things are.
No other questions? Let's go to the next one. This one's actually kind of useful. It's a minor thing. It's, you know, if you do service management with systemd, you always have to specify a type equals something to specify how the service tells systemd about when it's ready to, when it finished initialization. Type exec is actually something we should have added a long time ago.
It's basically, it's a hack so that systemd will consider a service ready in the moment that the execve, that the kernel, like, where systemd, when it invokes the binary, calls the execve, and the moment the execve succeeded, that's when systemd considers the service to be successfully started up.
This sounds not surprising in its definition, but it is, like, the way Unix is built, not the obvious way how this is implemented. Previously, what came closest to this type was type equals simple, but in that case, systemd would consider a service started up in the instant that the fork completed, right?
Like, if you are a Unix developer, you do know that when you start a process, you first fork, and then the child, you exec, and previously would think it was at the fork ready, and now we can optionally think that it's at the exec ready. Why is that interesting? It's interesting because it basically means that systemd will no longer consider a service whose binary is absent, successfully started,
because previously, when it reached the exec, it already thought it was started. If the binary wasn't there, the exec would fail, and the service startup would still be considered successful. So with this, things are a little bit more debugable, but then again, compatibility, we can't make this the new default,
so you have to, if you want to use it, have to explicitly specify it. But it's incredibly useful, and quite frankly, something we should have had since always. So I don't think the mic works, so I'll repeat that.
So the question is, why didn't we make it the new default? Why we made this opt-in? The reason for this is that between the fork and the exec, systemd executes a lot of operations, like dropping privileges and things and so on, and there are a couple of these operations, like for example, for the dropping of the privileges, you need to resolve a username.
Resolving a username might need NSS, might need IPC to another service, and so you suddenly create races, because suddenly something is blocking, like systemd PID1 will block on an S lookup before it starts other services that it previously didn't. So it's just the risk of deadlocks that we saw there
so we couldn't make the default, because, yeah, we didn't actually try if we change it, if it boots still, but it's just knowledge of what NSS is a major source of deadlocks, we couldn't switch this. I hope you guys followed in any way when we were discussing that. DNS over TLS, you know, SysaMe has this ResolveD component
that does like a local DNS caching server. Recent addition is DNS, that's merged and released even, is to add DNS over TLS. The logic behind that is that it appears to be the way how the DNS is going to work in the next 10 years, is that everybody does this. This does basically just transport encryption,
but the way TLS works is that you do a TCP connection always to a central server, and if you do that, you always need like a local caching singleton service that actually does that because it becomes too expensive
if every process would always do the TCP and TLS connection itself because there would be huge latency involved. So this is actually, I think, a major step forward because it actually gives it a lot of reason to use ResolveD because, yeah, you start actually needing this if you want to work with the way how the DNS is going to work in the future,
simply because you don't want the latency and ResolveD can give you this ability that you cache it locally and have the connection already in action so that you don't need to create it when you actually need it. Any questions about this? This one looks a little bit cryptic.
So what system you recently learned is this is basically about service management when you write a unit file. You already had this ability to extend a unit file by these drop-in files, right? So if you had fubar.service, you could create a directory fubar.service.d
and drop in a file there called something.conf and then it would be read after the service file itself and could override or extend what the service file did. We slightly extended this now. The extension is like we look not just for the service name .d and then everything with the suffix .conf in that directory,
but we'll also look for all dash prefixes of the name. The idea basically being if you have a service that's called foo-bar-valdo.service, then we'll first look in foo-bar-valdo.service.d as before, but now we'll also look in foo-bar-service.d
and then all files in there as well as foo-service.d if you still follow. I probably should have put that on the slides, I guess. Anyways, the long story short, a lot of people when they put together their systems, they usually have a lot of services that somewhat belong together, right?
So, I don't know, Samba, for example, comes with Samba and NMBD and so on. They are related services that are usually shipped together, run together, and that, hence, you might want to manage together. With this change here, basically, SystemVid allows you to,
as long as you follow a very simple naming scheme that you always name these related services with something dash, some suffix, right, and the prefix is always the same, you can extend all of the service file in one go because SystemVid now allows this prefix extension.
So, did this make any sense to anyone here? Like, I know this is kind of, wow, surprisingly many. Who did not understand what I was just talking about? Wow. Yeah, well, this is a little bit confusing.
Just think that this is not there. So, it's foo dash, like, if you have a service called foo-bar-valdo.service, then we would first look for foo-bar-valdo-service.d, then we would remove everything after the last dash, right, after the last dash, so actually without this. Dash dot. Dash dot, yeah.
Then we would remove everything after the slash before that, and so on, so the next thing would be this, foo-dash.service, right. So, the dash always has to be there because the dash implies that we will do this extension checking, right. It's a very natural extension, we believe, because at least in SystemVid itself, all our services already were named like this,
and if you go through the Fedora package services, you will actually realize people implicitly did the same kind of thing, right, like where they always used some common prefix dash, some specific suffix, right. So, the idea is basically to make this a lot easier to extend in one go. So, you've been crazy.
Yeah, I'm supposed to repeat the question, so the question was regarding what happens if I create a directory called SystemD dash, dash dot service dot D and drop something in. Yes, it will change every single service that we ship automatically for you in one go.
I'm not sure if that's desirable, but knock yourself out. So, next slide. This is super important. It's like we realize that today's graphical terminals
all support a special ANSI sequence so that you actually can generate clickable hyperlinks in them, and that's just awesome. So, I recently prepared that PR and it got merged that everywhere where it makes sense in SystemD output, we now create clickable links in your terminal, and that is really, really nice because, for example,
if you do system control status now, you know how the current output looks like, that it shows you the unit file that something's defined in and all the others, the drop-ins and so on. These are now clickable links, so in the system control status output, you can just click on it now and it opens the edit or whatever you have configured to actually have a look on it. It's really nice.
So, I invite everybody to extend the tools that you work on the same way because it's like, I mean, come on, links. It's almost as good as emojis, right? Only problem with this is while all the current terminals do implement this,
like the graphical terminals, less does not, right? Like, the pager does not. So, if you use the pager, which we actually do by default because we do git style auto paging in most SystemD tools, yeah, you're not going to see this. So, that's a bit of a limitation. But we hope that less will be updated eventually
to support this as well, and then it's going to be so much better. Any questions about that? No questions about that. Something we recently did is we turned on memory counting by default. Background of this is this is basically how SystemD exposes cgroups, like the various controllers there are.
Like, I mean, the cgroups controllers allow you two things, always accounting and resource management, right? Like, figuring out how much resources does the service use and putting limits on how much it may use. There are multiple controllers. These controllers had, like, different qualities in the kernel implementation, and some of them were really expensive.
So, until recently, if you turned on memory, that memory controller, for example, to get memory counting per service, this slowed down your machine by 10% or something was the average that Tijun said. We have recently turned this on by default because this has changed in the kernel with current kernels, the memory counting is, I mean, it's not going to be completely free,
but it's very close to zero for being free. So, this was enough for us to say, okay, by default, it's enabled now. We also turned on BlockIO accounting and not CPU accounting, but task accounting. So, three of the really interesting ones are enabled by default.
Effectively, what does that mean? It means that if you type system control status on a completely regular default system, you will now see how much memory a service uses, how much processes it has. Unfortunately, not yet how much I owe it used, but that's just an omission because we were lazy, not because we didn't want to. I also heard from the guy working on it that we probably very soon can enable IP accounting
the same way by default because they managed to make the cost for getting that information per service so low that it's something we can enable by default. I'm totally looking forward to this because I think it makes service management lots more explorable
because you don't even have to do anything magic. It will just tell you out of the box how much resources it takes up and, quite frankly, something we should have always had but never had. What's interesting now, the CPU accounting is still too expensive. That's very unfortunate. I think it would actually be the most interesting one, like how much CPU time actually service uses.
So, we're going to have to wait a little bit longer for that, but as soon as the kernel guys work on this, let me know that it's safe now, and it's not costing 10% or what in CPU time. Just turning this on will enable that too, and I'm looking forward to that. Any questions regarding that? Something we also added is IP accounting and firewalling.
This is like, I mean, we had CPU accounting and management, and we had block IO and memory, as mentioned. We now also have that for IP packets. So, if you turn this on, which, as mentioned, is not entirely free yet, but we want to, like, the kernel people are working on making it free,
then a system will track per service how many packets have been received and have been sent by each service as well, how many bytes that is. I think it's incredibly useful. There's also firewalling related to that, where you can basically specify IP address ranges that a service may contact or may not contact. We turned the firewalling on for all our services that we ship by default, actually.
For example, UDEF cannot access the network anymore, which is a really good thing because UDEF shouldn't be able to access the network anymore. And we went through all our services, so it's enabled everywhere. I think this is really an awesome feature because it basically allows you to do really service-level firewalling, and it's fully dynamic, and it's, like, I mean,
if you do traditional firewalling on Linux with IP tables or something like this, you do it at a level where you look at the individual packets that flow through the network, so we have no local context anymore. Like, the packets stand for themselves. You don't know really which program they belong to. With this new stuff that is resolved because you actually, this is inherently local, right?
Like, all the counting, all this access control is inherently something you configure per service. So I think it's a massive step forward, and it is how local firewalling should work. By the way, this guy who's doing the video implemented this part, so say thanks to Daniel.
I think I don't have much time anymore. Let's spend it with questions, if you have any. If nobody has a question. How is firewalling implemented? Do you use BPF for that? Yeah, we use the BPF per C group packet filter hooks that exist in current kernel.
This functionality is only viable if you enable C group v2, by the way, which means that I think it should actually be one of the major reasons why distributions should really, really look into turning on C group v2 now. The container mess is kind of stopping us from that,
but that's a really highly political discussion that is really messy. But yeah, it's all BPF, and if you want to experience this and how awesome this is, you have to make sure that you run a distribution that supports C group v2, and like Fedora, for example, you can specify that at the kernel command line,
and it should just work then. Another question? Do you also support live reloading of rules and probably some API to change those on the fly while the service is still running? Yes, we do. This is exposed the same way as all the other resource management properties of system services. So you can do system control, set property, some service name,
IP accounting equals yes, and there you go. You have the IP accounting, but you can also do the same thing, and then IP address allow equals something, and then you can even reset it like this, so it's entirely dynamic, entirely focused on the specific service. But the interesting thing is also that this is available as well for slices,
so you can actually build entire trees of this, and the firewalls that you specify for a slice and the firewall that you specify for a service that is inside of a slice get merged, so you can do something like you said, for the root slice, no traffic is allowed, and then for a leaf slice, for a leaf service, it shall be able to do traffic to this port,
and then they get merged, and so the blacklist at the top gets masked out by the whitelist at the bottom, so it's the behavior you want. So yeah, it's kind of cool actually. And it's all the best IPIs if you like that, and you can, with Shelley, you can do set property. But my time's probably over.
One more time, and one more question? No more questions? Okay, if you have any further questions, then meet me in the hallway tracks. Thank you very much.