Portable Services are Ready to Use - TIB AV-Portal

Portable Services are Ready to Use

00:00

7

Poettering, Lennart

Formal Metadata

Title

Portable Services are Ready to Use

Subtitle

Portable Services bring some container features to classic service management

Title of Series

Number of Parts

561

Author

Poettering, Lennart

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/44485 (DOI)

Publisher

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Portable Services bring certain aspects of containers to classic systemd service management. With systemd v239 Portable Services are for the first time complete and ready for users to take advantage of. In this talk we'll have a look on the underlying technical concepts, how things fit together and what the precise limitations and benefits are.

Speech

Text

Image

00:00

Portable communications deviceService (economics)Data managementPhysical systemWeb serviceFiber bundleDisintegrationModul <Datentyp>Range (statistics)Principal ideal domainNamespaceComputer networkInstallable File SystemPhysical systemServer (computing)Web serviceRight anglePortable communications deviceRange (statistics)VirtualizationForm (programming)Cartesian coordinate systemINTEGRALConfiguration spaceFilm editingBoundary value problemExtreme programmingClassical physicsFiber bundleDifferent (Kate Ryan album)Data conversionError messageSoftwareData managementEntire functionMereologyMaxima and minimaInheritance (object-oriented programming)NamespacePrincipal ideal domainCASE <Informatik>Key (cryptography)File systemSlide rulePairwise comparisonLibrary (computing)Execution unitRegular graphPeripheralSound cardNetzwerkverwaltungSimilarity (geometry)WordGraph (mathematics)Goodness of fitShared memoryRoutingNeuroinformatikBlock (periodic table)Computer fileRaw image formatDirectory serviceArithmetic meanTable (information)ResultantGraphics tabletArrow of timePosition operatorComputer animation

08:33

Service (economics)Physical systemPortable communications deviceWeb servicePrincipal ideal domainComputer networkNamespaceInstallable File SystemRange (statistics)Data managementSystem administratorInheritance (object-oriented programming)DisintegrationWeb servicePhysical systemRight angleCycle (graph theory)Different (Kate Ryan album)Stack (abstract data type)RoutingPosition operatorServer (computing)System administratorVacuumRegular graphINTEGRALMedical imagingFile formatComputer fileVirtual machineCASE <Informatik>Direction (geometry)Metric systemAxiom of choiceUniqueness quantificationInformation securityMetadataIntrusion detection systemDistribution (mathematics)Key (cryptography)Video gameGame controllerKeyboard shortcutPoint (geometry)BitFilm editingData managementFunctional (mathematics)SequelBridging (networking)InternetworkingPerpetual motionBlock (periodic table)Scripting languagePortable communications deviceCore dumpForm (programming)Complex (psychology)CuboidGoodness of fitSoftwareLinearizationDirectory serviceComputer wormMultiplication signClassical physicsSinc functionLoginiSCSIComputer animation

17:00

Portable communications deviceBuildingElectric generatorBlock (periodic table)MiniDiscComputer-generated imageryNetwork topologyVolumeDirectory serviceSimilarity (geometry)Service (economics)CryptographyDirectory serviceMedical imagingWeb serviceMiniDiscPhysical systemBuildingBlock (periodic table)Regular graphRootTable (information)Kernel (computing)Volume (thermodynamics)Right angleComputer fileDifferent (Kate Ryan album)Keyboard shortcutKey (cryptography)Endliche ModelltheorieLimit (category theory)Form (programming)BitData compressionComputer configurationFile systemNetwork topologyMessage passingSurfaceINTEGRALAreaGame theoryLevel (video gaming)RoutingSoftware frameworkCodePoint (geometry)Set (mathematics)Traffic reportingFocus (optics)Game controllerSinc functionMoment (mathematics)MetadataUtility softwareMathematicsCurvature1 (number)Proper mapInformation Technology Infrastructure LibraryFile formatMobile appInstallable File SystemSimultaneous localization and mappingCASE <Informatik>NeuroinformatikJava appletInterface (computing)Portable communications deviceProcess (computing)SoftwareTranslation (relic)Core dumpGoodness of fitHash functionComputer animation

25:27

Computer-generated imageryExecution unitDirectory serviceNetwork topologyVolumeSimilarity (geometry)Service (economics)CryptographyMiniDiscPortable communications deviceAerodynamicsControl flowDefault (computer science)InternetworkingComputer fileWeb serviceRight angleRoutingFile formatGame controllerMetadataCASE <Informatik>Limit (category theory)Distribution (mathematics)Level (video gaming)AreaBitServer (computing)BuildingSelectivity (electronic)Block (periodic table)Computer configurationPeripheralCarry (arithmetic)Portable communications deviceDynamical systemSoftwareBoolean algebraFitness functionContext awarenessVariancePoint (geometry)Physical systemDifferent (Kate Ryan album)Directory serviceIntegrated development environmentRootMultiplication signElectronic signatureLengthINTEGRALScaling (geometry)View (database)Instance (computer science)Keyboard shortcutFile systemDatabaseExecution unitMedical imagingInformationSlide ruleKernel (computing)Independence (probability theory)Field (computer science)State of matterHash functionKey (cryptography)Goodness of fitCycle (graph theory)Video gameMultilaterationReal numberRandomizationClassical physicsHoaxLibrary (computing)Firewall (computing)Computer animation

33:55

AerodynamicsService (economics)Execution unitPhysical systemDifferent (Kate Ryan album)Operator (mathematics)Endliche ModelltheorieWeb serviceKeyboard shortcutLibrary (computing)Regular graphMedical imagingComputer programmingDirectory serviceVarianceComputer fileModal logicDynamical systemCartesian coordinate systemValidity (statistics)Configuration spaceGame controllerOnline helpNetwork topologyShift operatorCache (computing)File systemMultiplication signIntrusion detection systemLoginLink (knot theory)Real numberVapor barrierGroup actionSet (mathematics)Server (computing)Run time (program lifecycle phase)State of matterPoint (geometry)CodeExplosion2 (number)RootBitCollisionKernel (computing)Entire functionRight angleThumbnailCASE <Informatik>Declarative programmingData managementVirtual machineFlash memorySpacetimeEnvelope (mathematics)Exception handlingSlide ruleGastropod shellPunched cardComputer animation

42:22

DatabaseFormal verificationNetwork socketBuildingHuman migrationLastteilungFunction (mathematics)Server (computing)Service (economics)Military operationAsynchronous Transfer ModeCoefficient of determinationWeb serviceExecution unitIntrusion detection systemTheory of relativityServer (computing)Moving averageInstance (computer science)Right angleDynamical systemNetwork socketConnected spaceMedical imagingFormal verificationSystem administratorComputer fileCore dumpMultiplication signTracing (software)MiniDiscGame controllerDimensional analysisPoint (geometry)Portable communications deviceMereologyMultiplicationInformation securityDemo (music)Default (computer science)Run time (program lifecycle phase)Different (Kate Ryan album)Uniformer Raum1 (number)Image warpingVideo gameLie groupPhysical systemGoodness of fitKey (cryptography)MathematicsSlide ruleBitCASE <Informatik>BootingString (computer science)Category of beingTemplate (C++)Electronic mailing listRule of inferenceFamilyRegular graphPower (physics)Drop (liquid)SoftwareConfiguration spaceWordAbsolute valueVariable (mathematics)ChainDistribution (mathematics)Installation artComputer animation

50:49

Computer animation

Transcript: English(auto-generated)

00:06

Hi, I'm Leonard Padaling, I work on systemd. I'm going to talk about portable services are ready to use. Portable services is a new concept in systemd. Yeah, I'd like to introduce you to this. I did this talk before, by the way. You might have seen it if you attended DefConf,

00:22

or if you said have seen it at all systems go. There's not going to be much new in that if you saw it there, and probably would be cool if there are still people outside. Are there? Yeah, there are still people outside. So if you have seen it, yeah, you wouldn't miss anything. But apparently nobody has seen it yet.

00:41

If you have any questions, completely, totally interrupt me right away. I much prefer my talks to be like conversations, instead of just a Q&A at the end. So do not hesitate, interrupt me, I love that. Yeah, let's jump right in. Portable services, what are portable services?

01:01

Portable services, you can see two ways. One way to see it is they're system services with some container features, but you can also see it the other way around. They're kind of like containers, but with some system service features. What does that mean? First of all, we have to understand what containers are. I don't really know what containers are.

01:21

Different people have different ideas about containers, what they precisely consist of. I think the definition that I tend to agree on is that they combine three concepts. One is resource bundles. That's what you do with containers, you pack them all up in a couple of tables, like shared libraries and all the dependencies.

01:41

There's isolation, meaning that they generally run in some form of sandbox. Might be a better one, might be a worse one, but there's at least some form of isolation generally. And delivery. You deliver them on the server you want to deploy them on, and then you can run them there. These three concepts is what at least I find most interesting about containers.

02:03

I'm pretty sure that other people who care about containers probably would list more of that. Portable services try to take some of these features and add them to classic service management. Specifically, portable services are about making resource bundles available

02:21

for regular system services. They're about integration. If you look at the previous slide, I had isolation there. I put integration here because that's what system services are really in comparison to containers. They're generally much more tightly integrated to the host system. And sandboxing. I do not put delivery on this slide

02:43

because I don't really care about delivery. The difference really here is I don't care about delivery, and instead of isolation, I put integration and sandboxing here. Sandboxing for me is slightly different from isolation because the way I see isolation, it's really about creating a new world that is separate from the host you live in,

03:01

while sandboxing I kind of more see as like you're living still in the same world, but you can't do everything you want to do. Another key aspect of portable services that's supposed to be highly modular, which to show your position to classic container management is that in container management,

03:21

you generally tend to have to buy into the whole idea, right? And then some people don't do that and, for example, end up with containers which they introduce the concept of superprivileged containers, which are basically containers where you didn't throw one half of the concept away by turning off all the isolation features. But yeah, portable services are not supposed to be like that.

03:42

Portable services are supposed to be modular. You pick exactly what you want. Do you want resource bundling? And if you want to have sandboxing, you pick exactly how much isolation or integration you want, so it's really supposed to be modular and very, very fine-grained and doesn't require you to buy into the whole idea, but into parts only.

04:03

Another way to look at it, consider range from integrated to isolated. If I was a good graphics artist, I would actually have drawn a graph here, but I'm lazy, very lazy, so I just put a couple of words here with an arrow in the middle. Think about a range from integrated to isolated.

04:20

On one side, you have the classic system services. These are system five services or system D services. They tend to be very well integrated into the host. They live in the same world. They see the same network interfaces, the name file systems. They can establish file systems, mount them, and do things like that. They see the same users. They see everything because they are part of the host,

04:41

and they have full integration. On the other extreme, there's VMs or KVM. They tend to live in their entirely own world. They could be living on a different host after all, and the way how they communicate with the rest of the stuff that runs on the local system is actually across the network, so you get a maximum isolation.

05:02

Docker-style microservices are probably somewhere in the middle. It's not entirely clear if they're more like system services or more like VMs. I put them on my little range here in the middle. Full OS containers, LLXC, generally try to be something very similar to KVM. They expose the system that runs an init system inside

05:22

and that you can SSH into. SSHing into is probably something you wouldn't do into Docker containers. Now the concept that I'm going to introduce is portable system services, which I put close to classic system services but somewhere on the range that goes towards Docker. So just to position this on a one-dimensional axis

05:41

of integration versus isolation. Think about what's actually shared and not shared with these forms of virtualization. Classic system services obviously have shared networking. They don't configure that. If I install nGenX as a system five service or as a system service on my system,

06:01

you don't configure networking explicitly for it. You just use the host networking. While on the other extreme, of course, it's completely separate. They have to run their own network management solution inside of the VM for things to work. If you go along the axis, of course, Docker-style microservices generally do not configure networking,

06:20

but they also don't share it with the host, so you have a relatively... Yeah, but it really depends on how you configure things. LLXC generally also is a little bit vague. Think about file systems. Generally, Docker-style microservices do not share the file system, they have a different route. They are relatively separate in this regard.

06:41

Classic system servers, of course, are fully integrated. They see the same files and directories as everybody else. VMs live in a completely different world. They have their own block devices, even mount these block devices and see something that the host cannot even see. Think about PAD namespaces. PAD1 in classic system services is the same as the host, because it doesn't live in a new PAD concept.

07:03

PAD55 that the classic system service sees is actually the same PAD55 that's actually running on the host. In Docker-style microservices and everything to the right, that's not the case. They generally live in their own little world where they have PAD namespaces. PAD1 inside of it or PAD55 or 77

07:23

is going to be something very different from what the container sees and from the host. And then VMs, of course, PADs are separate too. So you see, yeah, the PAD namespace where the boundaries is somewhere there. The init system, it's kind of similar, right? Like, where is the cut there?

07:41

If you have classic system servers, they, of course, share the init system with the host. I mean, that's what started there. If you do Docker-style microservices, that's kind of weird, because they generally don't have an init system. Like, there's no init system visible from the containers, but they also don't have their own. If you do LXC, then, yes, you have an init system inside of the container generally. I mean, yeah, people can disagree with me on this,

08:04

that, yeah, you can configure it differently. But, yeah, I'm trying to position this in the general case how people tend to actually use this, right? Device access is also, yeah, if you do classic system services, you generally have raw device access, right? Like, you can access the block devices or the sound card or whatever physical devices your computer has

08:23

directly because you're living on a system, right? This is very different, of course, than VMs. VMs are generally isolated completely. Yes, I know that you can do pass through and things like that, but that's kind of, that's magic manual working to make this happen. It's not how this works out of the box, right? And logging, on the other hand,

08:40

tends to be much more integrated. Like, even Docker-style microservices tend to log, like, provide the logging, and it's not done by the payload itself. So, yeah, so the point of this lower thing is that it's actually, even though the access suggests it's a linear access of integration, it's actually more multidimensional, right? Like, depending on what you look at,

09:02

the cut where you get the integration and the isolation is more to the left and more to the right. I hope this makes any sense so far. Okay, then portable services. One of the goals with portable services we had is leave no artifacts, right? Like, in containers it's kind of a given, but on system services it's not, right?

09:20

What do I mean by that, actually? If on a Linux system you install Nginx or MySQL or whatever package you like on your server, right, download the deps or RPMs, install them, and then remove them again, right? This leaves artifacts around, major artifacts. For example, on Unix, users are generally,

09:43

like, you cannot sensibly delete system users or any kind of users, actually, because if the user is created in some directory, some file, and you remove the user, these files will still be owned by the user ID of that user, so the file ownership is sticky, and if the user doesn't exist anymore that the file ownership refers to,

10:00

then you have a problem, and then if the UAD gets recycled later on, you have a security issue, right? So this is a problem, I think, with Unix since forever. Most distributions generally do never delete users, right? So that basically means you install Nginx once, you remove it again. Yeah, your UAD is gone for good.

10:23

With portable services, our attempt was to fix that problem and provide a way how user IDs can be used, but they also become inherently transient. We'll talk about that in more detail later on. What this specifically means, but it's not just about system users, actually.

10:41

There should be no artifacts left around. Like, if you ran a service and you remove it again, for example, temporary files should be gone as well, right? So, yeah, it's about binding lifecycles. When a system service starts up, it can allocate a couple of resources. When it shuts down, they are released, and unlike classic system services,

11:00

we don't leave stuff around. Another goal of portable services is to have everything in one place. This is kind of the bundling thing, right? Like, yeah, it's not a new concept, of course, because Chroot's existed in Unix since forever. In fact, the whole of portable services you can summarize as, yeah, making Chroot's useful.

11:21

But, yeah, it's one of the key goals. You know, I'm a system service guy, of course, because, yeah, I wrote most of system D. So, for me, yeah, doing Chroot's is awesome, but I want to make them viable to service managers in a very, like, just regular system service in a very powerful way, and that's what portable services are.

11:43

One other goal is I want this new concept that adds these couple of container features to service management a lot like a native service, right? Because I want it to actually be a native service. Native, by this, I mean like a regular system D service that has a .service file or so on.

12:02

So the ultimate goal after this is that if the portable services are used, then you end up having an init system that supports three formats for services. The native ones, classic one, system D service files. The old system five init scripts, right? And these new portable services. And as you, like, if you're using system D,

12:20

you already noticed that the behavior of system five init scripts and regular system D services, the behavior, like, you do system control start and stop on them and can reset resources and see the logs of them. It doesn't really matter. Like, this distinction is removed. The key with the portable service concept is to do the same here, too. Okay, so much about the goal,

12:42

so much about the positioning of this new concept of portable services. Let's talk about the why, right? Like, this is a question we always have to ask. Yeah, I already mentioned that I'm a service management guy, so we don't live in a vacuum, right? Like, things happen around us. Containers happen, of course, right? They have a lot of mind share, and they are in a way a form of service management,

13:04

but a lot removed from the core system. But there are definitely a couple of good ideas that I think make a ton of sense to take and apply to system service management as well, right? It's not about coming up with anything new. It's just about looking what's good, what's out there,

13:22

and then figuring out, is this something that we want for service management, like for regular service management as well? And I think there's, yeah, the bundling and the sandboxing is, so let's apply it there as well. Also, what's really interesting to notice is at this point in time, pretty much all packages of services

13:43

that are viable tend to have native system service files, right? So, in a way, most of the stuff that already exists on the Internet tends to have these service files, and that's actually kind of cool because it allows us to, if we add a little bit on top, we can do something that goes in the direction of containers

14:01

without actually defining any new kind of metadata because we already have the service files and pretty much everything has these service files at this point already. Yeah, another concept is like, you know, containers is a separate world where you use different tools. Admins generally are used to system service already,

14:22

so maybe we can just make them more powerful in some regards because, yeah, some of the features that you want to have in containers, you can just make it viable for regular system services as well. One primary use case for portable services, I mean, just to make this clear, this is not an attempt to reinvent containers or something like this. I explicitly want to position it as something

14:41

that is more low-level than this, and I explicitly want to position it for use cases where containers might not be the most appropriate way to do things. Like, if you want to use containers for something and containers are the right choice for what you do, continue doing this, this is supposed to be a little bit more low-level. So one primary use case is what people have dubbed

15:01

super-privileged containers so far. It's, for example, storage people like to do this, right? Like, they want to ship a lot of complex stack in one image onto your server machine, right? That's why they want to use containers.

15:21

But on the other hand, they need a really strong integration into the whole system because they need to do device management, like block device management. They need to figure out what's being plugged in, what iSCSI does and whatever else, right? So they are in this weird position that they would like to ship a lot of complex stack software with its own dependencies, because it's all far from trivial,

15:41

onto existing machines. But they also want the full integration into the host because they do device management. And if you do device management, that's where you really, really need it. So, yeah, inside of Red Hat, there was these people working on super-privileged containers. When I saw that, I said, this is horrible, like, because they ended up using Docker initially,

16:00

and then they turned off all the sandboxing features and created all these bridges that you could escape from inside of the Docker to the host so that they could do the manipulations of the device and everything is like... But yeah, portable services are supposed to cover that use case perfectly, right? So that you can have your bundling

16:22

and all these kind of things, but you can pick exactly how much you want to see from the system, or how much you want to be isolated from the system, without being completely and terribly ugly. Yeah, also one of the key ideas is integration is a good thing often, not the bad thing, right? Like, with containers and all these things,

16:40

they're very strong about isolation. I think in many use cases, you want the integration. Not in all, but in many you do, right? So, yeah, it really depends on your use case, integration, and the ability that you can introspect the rest of the system, in particular for tracing tools and debugging functionality and metrics and these kind of things.

17:01

It's a really, really good thing, if you do not have to first play games with the sandboxing to escape it. Okay, I kind of mentioned this already. One of the goals is that, yeah, the system five services, the native services and the portable services are supposed to be next to each other and equally well supported, with the same interfaces and same behaviors, same resource management, same everything.

17:23

The building blocks, this all is built off, are actually permitted more, right? Like, the portable services code base is actually relatively separate. Like, when we added that to systemd, like the portable control command that makes all of this available, it didn't actually require any changes

17:40

on the systemd core itself. It just added a new utility that allows you to interface with the existing sandboxing options and bundling options in a nicer, more integrated way, right? So, this is kind of, because it is implemented that way, it basically even would allow you to come up with your completely

18:00

own service delivery framework, use all these basic concepts, and build something completely different, right? Like, for example, people have been working on making OCI stuff work and translate them dynamically into native system service and things like that. Yeah, one key idea about,

18:20

any questions so far, by the way? Nobody has interrupted me about all the stuff. Okay, let's talk a little bit about disk images, right? When we think resource bundling, we have to think about disk images in some form. Docker, as you know, uses tarballs and then weird layers and AFS and these kind of things. One of the goals with portable services

18:41

was no new metadata, right? I didn't want to sit down and come up with a new OCI spec. I have no interest whatsoever in that. I didn't want to define a new image format or anything. I have no interest whatsoever in that, and it's a political, massive job. So, with systemly portable services, the key really is we use the metadata we already have.

19:02

Specifically in disk images, this means, we don't care how the bundled images come onto the system, as long as the moment you actually want to start them, they are accessible as a Linux-accessible file system in some way, we're happy. What does that mean?

19:21

You can ship things as tarball, if you like, and unpack them, and then systemly can use it as a portable services, but you don't have to. You can also use it as a disk image, right, like a UFI block device image that you can mount. In systemly, with the portable services concept, we support both equally, right? The key, what we just require from people,

19:42

is that they provide images on the block layer or on the file system unpacked layer, but in both cases, it needs to be something that the Linux kernel natively supports. Yes, that's kind of the point that I was making, right?

20:00

It's like, I don't really care what you use, and we explicitly support both block device level stuff and unpacked tarball kind of stuff, right? So the question was supposed to repeat the question, right?

20:20

So the question was regarding whether whatever is mounted as a disk image has to be a proper block device, or if it can be just a regular file. The idea is it's just a regular file, right? It can be a block device, but the idea is it's normally a regular file, and systemly will internally do the loopback mounting

20:40

and stuff like that. So you will never actually see that there are loopback devices and block devices involved in the end, but the idea is generally, yeah, either take a tarball, uncompress it, and you operate on the file system level, or provide us with maybe a squashFS file, and systemly will set it up as a loopback and mount it, and then it's kind of the same thing from that point on.

21:05

So the question was regarding, yeah, if we can point it to a folder, yeah, that's the tarball option that I mentioned, right? The fact that tarball is used, I don't give a damn about that, right? As long as it's there in a format that the Linux kernel natively supports, which could be a directory,

21:21

or could be something that I can mount, I'm happy. So the question was regarding whether I know of anybody using OSTree with this. I heard of people who were interested in this, but I didn't follow up in detail.

21:41

But of course, you can totally use OSTree, because as mentioned, if it's a directory, it's good enough for us, and OSTree stuff, after you check it out, it's just a directory, so all is good.

22:14

So yeah, the question was probably, I guess I can summarize it, the relationship to OSTree and Snaps, right?

22:22

Yeah, to flatpacks and Snaps. So I mean, the key really here is, this stuff is system level stuff, right? So flatpacks, not system level stuff. Snap is, though, right? This is actually in the design very close to Snap, what it does. But I mean, I don't really care about the disk images, as mentioned, right? You can actually use even a Snap disk image,

22:41

if you want. The focus that I have really is, after it's there, how to make the stuff that's in that image viable as a regular system service, right? And this is what they generally don't care about or want to provide. So yeah, flatpacks, different storage, desktop stuff, this stuff requires privileges.

23:00

People have asked about making this a viable unprivileged, but it's kind of difficult because at least if you do disk images, it's all about mounting and loopback devices and none of that is viable, unprivileged. I hope that answered the question a little bit. Yeah, key here really is, let's avoid something new, right? I don't want to be in the business

23:20

of defining image file formats. So yeah, let's just take simple directory trees or better sub-volume, or maybe a GPT containing squashFS, but actually we don't really care if it's squashFS and we don't really care about if it's GPT either. I just think it's a nice thing to do. The services run directly from these images, right? There has been this root image

23:41

and root directory service settings since ages in systemd. We just make use of this here to say, yeah, now I run the service from this image and then the moment the service starts, this image is mounted, if it's not mounted yet, or bind-mounted, and yeah, the moment the service shuts down, the image is not used anymore.

24:01

In a way, this is about fixing Chroot. Chroot has been around since ages and some people have deployed things like that since the 90s, but Chroot has a couple of serious problems like, for example, one of the bigger ones we'll talk about later on is about like sc-passwd,

24:21

because you kind of have to synchronize the sc-passwd on the host with the container. We'll talk about that, what we're doing in this area. What's, by the way, interesting, because we actually do, so like, root images stands basically where you specify a squash file system, like any kind of file system that kernel mounts. Root directory is where you specify a directory.

24:40

But the root image thing is actually kind of nice because when we mount the stuff, we can actually take benefit of all the weird storage stuff that the Linux kernel has. Two things are particularly interesting, I think. One is you can lux encrypt, for example, an image and then make system start the service from it so you can have a model of basically encrypted services.

25:02

Variity, I find even more interesting. Variity, for those who don't know, is a kernel concept about that every access to the disk, or in this case, to the image, is verified as the access happens cryptographically against some predefined hash value that can be signed. So with that, you can actually make

25:21

trusted services in a way where you basically say, yeah, I have this computer here and it will only run software that is signed by me and then you deploy an image on it and then system will start it, but system will only start it if the top-level hash of that image matches against some signature checking stuff

25:41

that makes sure that it's only my stuff that runs. Don't want to talk too much about that because it's probably a talk of its own. Just wanted to mention is simply by the fact that we rely on whatever the length kernel provides us with, we can make these things happen. And we did make this happen. It was all hooked up. So the question was if this means

26:06

that we can use another distribution to start services on some host system. And yes, this is what it means. The idea is that much like for containers, the distribution that is used inside of this image doesn't have to match what's on the host,

26:22

and things should still work just fine because the Linux kernel people might not be perfect in maintaining compatibility with everything, but they're pretty good, right? Okay, yeah. The only thing that an image needs to have

26:40

to qualify as a portable services is that it has to carry system to unit files, that it has to carry a file user-level OS release, and that's it. It has to carry system to unit files because we need to know what to actually start in it, right? So it has to have that. As mentioned, it's kind of cool that all software that currently exists

27:00

generally has these already. And the user-level OS release file is something that has been existing since five years already. We came up with this originally in the Systemd context, but it's actually adopted universally even beyond even the Systemd haters now ship this file. It's a very simple file that is supposed to describe the distribution you use, but it's actually extensible.

27:21

So our idea was, yeah, we use that as a metadata information that if you want to declare what the image that you have there is about, you just add a field there. So again, the key here is no new metadata, no new formats. We use Linux file systems. We use Systemd unit files. We use the OS release file.

27:40

Nothing of that is new. All of this exists for at least five years and in many cases for 10, 20 years. Any questions about this so far? So in this case, if you don't want to use portable control to start the service, the portable service, do you have to declare like a host system service file to load the portable service,

28:01

which then in turn carries the system service file? So that was a very complicated question and I would like to delay that to the end if I hope we have enough time. We'll kind of introduce you a little bit to the command line if we have enough time to because that will I think explain your question. Somebody else had a question again? Okay, then let's talk about that a little bit later.

28:23

One quick slide. I mean, there's lots of stuff on this, but I don't want to go too much into detail. The point I want to make here is like Systemd for a longer time had all these sandboxing options for system services, right? This is independent, like everything else that I said from the actual portable service concept. These things existed since some longer, some shorter,

28:41

but they are generally options how you can lock down your system services. Just as an example, private devices, for example, it's a Boolean that you can set on a system service. If you turn it on, it basically means that the service gets its own instance of slash dev that doesn't contain any real devices but dev view, random dev null, dev zero and these kind of pseudo devices that are Unix API or Linux API

29:02

but don't actually reflect to real physical devices you could touch. And a couple of other options of this where they are generally designed to be super easy like Booleans or something very close to being just a Boolean how you can lock down your services. In the portable services concept, we just make use of the fact that this already exists.

29:21

There's one major difference, though. You can turn these on on your classic system service already but they generally are opt-in so far, right? Which is something, I mean, we would like to turn that around, of course, but at this point, we can't really for compatibility reasons like because system services have been around.

29:40

I mean, they inherit everything from system five even and since system five and in the beginning, system services did not have sandboxing, if we would turn that on by default, now we would break everything, right? So for the classic stuff, it's opt-in. For the portable services, we have the luxury, though, because of the new concept we just introduced, it's a reverse, it's opt-out, right? So by default, you get a policy

30:00

but you can actually opt-out from everything and if you do, then you get a full integration so the whole system can do whatever you want. Yeah, these are a couple that we already have. I did talks in the past talking about all of them in detail because they actually do feel like a talk of their own. There are going to be more. There's also nowadays, which is really interesting to know,

30:21

there's like the per-service firewalling these days where you can set inside of the unit file, you can do access control on IP level and things like that and IP accounting actually is just so awesome but yeah, it kind of fits into this whole sandboxing concept. Yeah, I already mentioned this. Sandboxing is opt-out for portable services rather than opt-in.

30:42

How much time do I have? This is 20 minutes. Okay, let's talk a little bit about these hard problems. I kind of already mentioned this. If you do Chirrut's classic on Unix, you have this problem that yeah, because the Chirrut environment generally doesn't see the user database of the host,

31:02

both the host and the Chirrut environment might have a different idea as to what UID 500 means, right, or UID 1000 means. So most of the how-tos that you find on the Internet that tell you how to manually set up a Chirrut are by copy over at C++ WD.

31:21

Yeah, because portable services are ultimately just a way to make Chirrut more useful to work, we try to figure out what we can do in this area. For this, we added a concept called dynamic users. Dynamic users is something that is particularly useful in the context of portable services, but you can use it already on your system independently of it. Yeah, it's the one building block, but the building block can be used in any context you like.

31:43

It just happens to be one of the building blocks portable services are built on. What are dynamic users? Dynamic users is a concept where you basically can say for a service that when the service starts, a system user is registered, and when the service shuts down, it's released again, right? I've already mentioned this problem. Now you have this problem with the file stickiness, right?

32:00

So that, yeah, when the service then goes down and the user ID is released, what happens to the files that the service created while it was running? Our solution to this is a couple of things. First of all, when the service writes something to slash temp or slash var temp, yeah, we at one would be able to do this.

32:21

What it instead will do, it will get its own fake little slash temp and fake little var temp that actually is backed by the real one, but whatever the service writes into that is automatically removed when the service shuts down, right? This is what I call life cycle binding, right? The life cycle of the system is bound to the life cycle of the temporary files,

32:44

so when the service goes down, the temporary files go with it, right? So that's one facet of it, but it's not particularly useful yet. So the other thing is, yeah, to deal with the sticky file ownership problem, our solution is we simply disallow the service to write anywhere, right?

33:03

So it's a nice way to avoid the problem with sticky files by simply prohibiting the files altogether, right? So this, of course, limits the usefulness because you then have a service that can write stuff to temp and var temp but can't do anything else. I mean, it's good enough for probably some use cases,

33:20

probably most use cases want to be able to actually write stuff to disk and, like, whatever they generate. So our way out of this is in system D that's actually also independently useful of dynamic users and independent of probable services. There's a concept where you can specify a state directory

33:42

inside of the service file. If you do that, it basically means that a directory in var lib gets turned, like the ownership gets changed to the system user the instant the service is started, right? So basically the idea here is that system D manages for you

34:05

of specific directories, the ownership, and the service then gets access for writing, right? This is a little bit ugly, right? Because it basically means that you start up your service for the first time, system D creates a new directory in var lib for that specific service, changes the ownership of that directory

34:23

to the system user it also allocated for you. Then you run, you write some stuff to it, you shut down again. Now the user gets released, the data shall stick around, and it does, and then you start again. But now you might have gotten a different user ID. So what system has to do, it has to recursively show and everything.

34:41

That is horrible, but it's actually not as bad as it sounds. Because Linux is very much optimized on that, and even if you have a directory tree that's a couple of gigabytes, at least on my machines, it never took more than a couple of seconds to turn recursively through it. If it's something smaller than a couple of gigabytes, it's practically not noticeable. Also, system tries really, really hard to assign the same services,

35:02

the same dynamic user IDs as it can by hashing them out of the name of the service and things like that. However, given that the name, like the UAD space, is a little bit too short, collisions will happen. And in that case, it has to jump. So in the meantime, the system stopped after running the first time. Do you change it to root only? That's a very good question.

35:21

So the problem is, of course, if the server started up, has its own directory, then our service goes down. Now these files are there, still owned by this user ID that now seems to exist, right? This is, of course, the problem we always wanted to avoid. So what do we do? We take a lesson out of how containers are managed. Because in containers, they're generally stored somewhere in valid containers or valid docker or something like that.

35:43

And the way, because they have the very same problem. They also have a concept of users that only exist while the container is running. The way they avoid it is they have a top-level directory where all the containers are stored. And this top-level directory is not readable by anybody but root, basically.

36:02

So they avoid this by adding a barrier in the middle so that it doesn't matter if the files that are stored below are owned by a user ID that is otherwise recycled, potentially, now, simply by just cutting it off in the middle and saying, yeah, that entire subtree is not available to you. That's exactly what we do here. Now, this is actually harder than it sounds

36:21

because we want to make available var lib foo for a service foo. So what do we do? Do we change the ownership of var lib to 7.0.0? We can't really do that, right? So this is actually trickier in the background, what actually happens there. There's a directory var lib private.

36:41

And var lib private, that thing is actually 7.0.0. I hope you guys still follow this. It sounds like, without slides, really nasty to follow. So that one is actually 0.7.0.0, and then there's a sibling automatically created from var lib foo into var lib private foo to make it invisible to the outside. And from the inside, because the inside shall have access to this directory

37:04

even though it's unprivileged, through bind mounts we make the var lib private hidden and instead mount it to the top. And I hope you kind of followed at all when I was babbling here. It took us a while to figure out that this is actually workable and is nice. And I'd really like to get rid of much of this code. And maybe one day we can, if we have shift fs in the kernel,

37:23

like the file system where you can actually change the user ID. So far we can't. I don't know. I try to come up with anything better and talk to a lot of people we couldn't. The general runtime behavior of this, even though we do the recursive choning, is kind of nice, right? It's not a costly operation because inode updates,

37:42

because we never actually write through files, we just change the inode ownership, are surprisingly fast on Linux and on the file system these days. There was someone here. What if you need two services to work? That's a very good question. So the question was what happens if you have two services that want access to the same directory? The thing is, if they have two dynamic users attached to it,

38:03

that's not going to work. Unix doesn't allow that. It would allow that if we have shift fs, which we don't. I hope that we eventually can fix it and make this happen. As long as that is not available, though, what you can do is you can system d actually, when it creates these dynamic users, honors the user name you specify inside of the unit file.

38:22

Now, if you have two unit files, both of them have dynamic user turned on, and both of them specify the same user name. System d will actually create one of the same user for this. And if you do this, then you can actually share this directory. Right? But it's really on you. System d won't help you right now with this because I kind of still hope that shift fs is the real thing eventually.

38:43

Actually, one of the container minicomps just before here where one of the guys actually talked about this and then he said that's going to happen. But yeah, we'll see when that's going to happen. But yeah, you can do it, but you have to be careful to use the same user names for the services that want to access. Sorry?

39:02

So the question is about, couldn't you use something with groups instead? But I mean, the general problem is that the user IDs... Yeah, well, I mean, the problem is that it's up to the programs then to make this happen because they can't, for example, create files that have non-writable things like that. So in my assumption, we thought about this,

39:23

but using SELs in particular to make something like this happen. But the problem I always saw with that is these solutions tend to end up being something that applications need to explicitly support because many applications manage the access control manually, while with the solution we went for,

39:42

it's transparent to the applications. They don't know that there is... Well, I mean, if they look from the outside, they will see the weird thin link. But from the inside, there's no difference from a regular... So the question was,

40:04

how does the file system look like inside of the services namespace, basically? Like, if you provide an image, right, like, for example, a SquashFS image or directory, and then you have a unit file inside of it, and then you're told portable control

40:20

was the command that I'm going to show you later to attach this thing to the host, which basically means copy the unit file out of it, put it on the host, and update it slightly so that there's a root directory or root image setting that points it back to the image file. If you then start the stuff, what it sees from the inside is exactly what is inside of that image,

40:42

except for the stuff where you punch the holes into. And the holes that you punch are generally something like state directory, the thing that I already mentioned. If you specify state directory in the unit file, it basically means that directory that you picked there in var lib is shared between host and the user, and if dynamic user is turned on,

41:01

which is an optional feature, then it does the magic EOD stuff. And then there are a couple of other settings like this. Besides state directory, there is cache directory, which does the same in var cache. There is configuration directory, which you might guess does the same thing in Etsy.

41:20

There's runtime directory, which does the same thing in you might guess slash run. There's one more. Log directory, true. So where you get the same thing in var log. But the model really is towards pushing people to be more declarative in their services

41:40

by denoting exactly which of the directories are actually relevant to the service, and then this actually doubles as a way how you can punch holes into the sandbox in a safe way because we do the dynamic user rechoning if necessary. The question is if we support arbitrary bind mounts from the host, we do, but if you do that,

42:02

then of course you can't use the dynamic user ID thing so easily because we would have to chown magic, but if it's arbitrary stuff, we can't insert the var lib private thing in between so everything explodes. But by all means, go and do that. For example, the storage people, they would run their stuff as root anyway, not as a dynamic user. And for them, yes,

42:21

use as many explicit bind mounts as you want and make a variable whatever you want to have a variable. So the question is regarding if distributions could ship tiny portable images and use stuff from the host. Yes, they can.

42:41

I'm not going to prohibit that from you. This is a completely generic tool. You can misuse it anyway. I'm not going to rule into that. Would I do that? I'm not sure. Part of the idea about portable servers, of course, much like for containers, is that you distance yourself from the ABI of the host

43:01

to make the stuff more portable. Sorry? Yeah. So 10 minutes. Okay. I've got a couple of more slides.

43:21

It's completely fine that I didn't cover this because I got so many good questions and even more here. Is it possible for portable servers like this to install system servers? So the question was regarding whether it's possible to start multiple instances of a portable service like it is possible with regular services. The answer to that is a clear and resounding yes

43:41

because these things aren't native services, right? So if you put a template file, like a template unit inside of the portable services, then you can instantiate it as many times as you want just like you could do it with a template unit that is installed on the host. I want to, yeah, so let's...

44:06

The question was regarding whether if you do the multiple instance stuff whether they can share the same user ID. Yes, absolutely. But they don't even have to be multiple instances of the same service in that case, right? As mentioned earlier, if you have two unrelated, otherwise unrelated services that are not instances

44:21

of the same one, you can do the same thing. What's key is that you turn on dynamic users in both cases and that you set user equals the same string, which is the name. It's also possible, right? So to repeat the question, if you want to have different user IDs for every instance, yes, you can do this too. Which is actually really, really awesome

44:41

for the dynamic user stuff, because you can actually implement trivially now a service that is socket activated, right? And for each incoming connection, a new instance is created, and each instance gets its own dynamic user that lives as long as this connection exists and then goes away again. To me, this is actually just this concept

45:02

of making user IDs something cheap that you can have and then you can return them and they don't become this extremely expensive thing that stick around forever, and hence your package can only allocate like one or two or maybe three, but never like 100 of these. Yeah, this is actually one of the, like I see it as a breathing new life

45:22

into the Unix concept of user IDs, because suddenly you can use them for much more than you traditionally could. And that is so awesome, actually, because user IDs are like the core security feature of Unix, after all, right? Like all the other stuff that we have these days with as a Linux and up-armor

45:41

and whatever else is in it with second came later and is specifically supported in only one concept, but the user IDs as a security concept have existed always and are built into every piece of our software, after all. So adding the dynamic system user concept to that is kind of like, yeah, it turns something really established into a much more powerful concept.

46:03

Okay, I'm gonna talk about this slide here as my last slide then, just to give you a little bit of feeling. I don't have demos, because demos tend to go wrong, but I wanna at least show you how the concept generally works. Yeah, the command you use for interfacing this portable service

46:20

is called portable control. It, yeah, you invoke it, portable control, with the word attach, for example. Fubar.raw in this case is a disk image, like could be, for example, a SquashFS image in there. When you do this, right, when you invoke this, then this portable service image is attached to the host. What does it mean?

46:41

It means that, yeah, portable control will do a little bit of verification that it's the same image, and then it will just copy out a couple of unit files from this image. Which unit files will it copy? Actually, the ones that start with the same name as the image file itself. So if the image file is called fubar.raw, the unit files it copies out are fubar dash whatever you like,

47:03

as well as fubar dot whatever you like. The unit files that are copied out don't have to be service unit files, by the way. They can be socket unit files as well, they can be target unit files, they can be whatever else, like not all of them, not all eight of them that we have in systemd, but most of them.

47:21

So you can, what this basically means is that you can package a couple of related units into one image, right? You just have to follow a little naming regime that you always call them some prefix dash some suffix, and the prefix is always the same. You can use socket activation, time activation, all of this at the same time, and just by part of a control attach,

47:42

all of them become available in the system. And from that point on, they are regular services. So at that point, you can do system control start, system control stop, system control set property, system control whatever else you kill, whatever. You can do general control dash unit with them, because after they're copied out like that,

48:01

they are regular system services. There's nothing distinguishing them anymore, except for the fact, yeah, that they originally got copied out of some image file. And then of course there's, yes, part of a control attach, if you call it like this, it's actually across reboots. So the files are copied out into etc.

48:22

There's actually part of a control attach dash just runtime as well, and you might guess it copies them into slash run. So the attachment, the fact that the unit files exist on the host goes away when you reboot. Yeah, there's obviously the other verb, that undoes all of this. It just removes the files that were copied out, and then at that point, the services are not available on the host, right?

48:42

Key again is leave no traces. The idea really is that besides logs, like because we never should delete logs, yeah, after you do the detach, nothing remains in the system. So the question is, yeah, whether these can be template unit files.

49:03

Yes, they can. The files that are copied out just have to have full bar as a prefix followed by either a dot or a dash, but what comes after that, it doesn't really matter. It can be a template. It can be an instance even. It's, yeah, yeah, yeah.

49:26

And by the way, by default, like the idea is that you enable them all at the same time as attaching, but you don't have to. You can attach them and then enable them, or you don't. It's completely up to you. Yeah, sorry?

49:52

Oh, you mean drop-ins. Okay, the question was regarding like the drop-in files we support for unit files so that you can extend them on the host. Yes, these, because they, after they're copied out,

50:02

they are regular unit files. You can also extend the contained unit files on the host by dropping stuff onto the host, and at C and run, like because they are native unit files at that point. They're not distinct anymore. You can do everything you can do. You can even do system control edit, if you like, and then system control will drop in the unit file for you.

50:24

If you do detach, however, these ones would not be removed, right, because, yeah, I mean, we could probably add that, but it's probably, yeah, it probably needs an extra switch because admins might be pissed if we remove the configuration that the chain is meant to have. This was my last question,

50:40

so thank you very much, everybody. If you have further questions, I'm going to be outside.