We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

System containers at scale

00:00

Formal Metadata

Title
System containers at scale
Subtitle
An introduction to LXD clustering
Alternative Title
Running full Linux systems in containers, at scale: A look at LXD and its clustering capabilities
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
LXD is a system container manager, its goal is to safely run full Linux systems at very high density and low overhead. Containers may be created from pre-made images, covering most Linux distributions, or by importing an existing virtual machine or physical system. Advanced resource control and device passthrough is available to expose as much or as little system resources to those containers. Snapshot and backup tooling is available to safeguard those containers and data. Storage pools and networks can be used to offer a variety of storage and network options to the containers. Management happens through a REST API with a default CLI client. LXD has built-in support for clustering which makes it trivial to scale a deployment to dozens of servers, all acting as one virtual LXD server. In this presentation, we'll go over LXD's main features through a demonstration including usage of LXD's clustering abilities, running a variety of Linux distributions and converting existing systems to containers.
33
35
Thumbnail
23:38
52
Thumbnail
30:38
53
Thumbnail
16:18
65
71
Thumbnail
14:24
72
Thumbnail
18:02
75
Thumbnail
19:35
101
Thumbnail
12:59
106
123
Thumbnail
25:58
146
Thumbnail
47:36
157
Thumbnail
51:32
166
172
Thumbnail
22:49
182
Thumbnail
25:44
186
Thumbnail
40:18
190
195
225
Thumbnail
23:41
273
281
284
Thumbnail
09:08
285
289
Thumbnail
26:03
290
297
Thumbnail
19:29
328
Thumbnail
24:11
379
Thumbnail
20:10
385
Thumbnail
28:37
393
Thumbnail
09:10
430
438
System programmingScale (map)Time zoneServer (computing)SoftwareComputer-generated imageryProcess (computing)Open setClient (computing)ScalabilityRepresentational state transferInterface (computing)Directed setComputer hardwareHuman migrationInformation securityDefault (computer science)Kernel (computing)LaptopOracleGraphical user interfaceComputer clusterPublic key certificateOperations researchComponent-based software engineeringBackupComputer networkVertex (graph theory)Instance (computer science)Installation artDatabaseView (database)ConsistencySystem programmingScaling (geometry)Algebraic varietySoftware maintenanceSoftwarePopulation densitySingle-precision floating-point formatMedical imagingProcess (computing)Kernel (computing)Computer clusterCartesian coordinate systemFile systemComputing platformArmVirtual machineRun time (program lifecycle phase)Patch (Unix)Computer fileHeat transferClient (computing)Default (computer science)Multiplication signBitVirtualizationType theoryGame controllerVolume (thermodynamics)View (database)WebcamBackupDifferent (Kate Ryan album)Revision controlPlastikkarteInstance (computer science)INTEGRALMereologyRepresentational state transferGraphical user interfaceFitness functionGastropod shellSlide ruleLaptopCore dumpTrans-European NetworksComputer virusBootstrap aggregatingDirection (geometry)NamespaceMessage passingVariety (linguistics)Computer hardwareHuman migrationSemantics (computer science)Information securityKeyboard shortcutEnterprise architectureRootkitFormal languageMixed realityNumberServer (computing)Overhead (computing)Data managementComponent-based software engineeringNetzwerkverwaltungBridging (networking)Insertion lossComputer iconRing (mathematics)Disk read-and-write headInterface (computing)2 (number)Position operatorTrailPlanningDistribution (mathematics)Video game consoleRight angleSystem callDot productHTTP cookieComputer networkMetropolitan area networkPerimeterGodExecution unitAsynchronous Transfer ModeComputer animation
InfinityConvex hullDifferent (Kate Ryan album)System programmingLink (knot theory)Computer clusterPasswordSingle-precision floating-point formatMereologyElectronic mailing listBitGroup actionArithmetic progressionComputer animation
Maxima and minimaConvex hullComputer clusterIP addressPasswordLattice (order)Wave packetWebsiteBit rateCellular automatonSubject indexingInheritance (object-oriented programming)CASE <Informatik>System programmingComputer animation
Execution unit3 (number)10 (number)Smith chartMaxima and minimaIRIS-TPersonal identification numberHill differential equationSystem programmingBefehlsprozessorInheritance (object-oriented programming)Query languageRight angleComputer clusterPasswordArmServer (computing)Enterprise architectureWordOrder (biology)CodeCASE <Informatik>Optical disc driveMultiplication sign1 (number)Forcing (mathematics)Point (geometry)Virtual machineSet (mathematics)SoftwareComputer animation
Maxima and minimaTorusUniformer RaumPasswordVirtual machineMedical imagingHeat transferConfiguration spaceVolume (thermodynamics)Address spaceArmProper mapContent (media)Computer clusterSinc functionVideo game consoleCASE <Informatik>System programmingComputer networkSynchronizationDifferent (Kate Ryan album)BitComputer hardwareMultiplication signGame controllerTotal S.A.InternetworkingIntelligent NetworkPower (physics)Sign (mathematics)Time zoneComputer animation
Link (knot theory)WindowHill differential equationState of matterWindowBit rateCommitment schemeMultiplication signComputer animation
Installation artGraphical user interfaceGamma functionElectric currentLogicVariety (linguistics)Client (computing)Computer networkJava appletInformation securityFormal languageCommon Language InfrastructureInternet forumCloningWebsiteDemo (music)Client (computing)Remote procedure callFluid staticsComputer hardwareKernel (computing)WindowPoint (geometry)Escape characterFiber bundleAuthorizationBinary codeThread (computing)Computer configurationExploit (computer security)BitMultiplication signGoogolMereologySoftwareComputer networkArmComputer clusterFerry CorstenImplementationSound effect2 (number)System programmingAlgebraic varietyServer (computing)Keyboard shortcutInformation securityEnterprise architectureEmulatorFile formatVirtual machineMedical imagingFormal languageBinary fileLibrary (computing)Right angleState of matterStaff (military)Insertion lossUniform resource locatorSpacetimeCASE <Informatik>LaptopNetwork topologyConservation lawMetropolitan area networkGod4 (number)Figurate numberFactory (trading post)Group actionUML
FacebookPoint cloudOpen source
Transcript: English(auto-generated)
Our next speaker is Stefan Graber, he is the LexD project leader, and one of the maintainers and one of the core developers, and he's going to talk about system containers at scale. Hello.
All right, so, let's talk about first system containers and LexD and clustering. So, what are system containers? Well, system containers are effectively the oldest type of containers. They've been around for quite a while, originating with BSD.js, then as Linux v server on Linux,
it was a patch set from over a decade ago, then Sorari Zones, sorry, obviously, OpenVZ on Linux, which was also an insanely large patch set on top of the Linux kernel. And then, as we were upstreaming things in the Linux kernel, LexC and LexD kind of came
as the runtime to drive that. System containers behave very much like standard old systems, you run a full Linux distribution, it's not like a single process type thing that you get with Docker and other application containers. You don't need specialized software images or anything, you treat them just like virtual
machines effectively. They're very low overhead, easy to manage, you can run thousands of them on systems, there is no overhead that just take up physical resources, there's no need for hardware accelerated or anything. And, as far as the host is concerned, it's just a bunch of processes, so you can go
on the host and you see all your processes, it's nice and easy. Now, what's LexD? So, LexD is a modern system container manager. It's written in Go, it uses the LexC to drive containers. It's got a REST API, a bunch of REST API clients, and as you can see here, you can have multiple hosts running Linux, then the LexC layer to drive the kernel, and then
LexD on top exposes the REST API, and then you've got a number of clients that can talk to that. More of what's LexD? So, LexD is designed to be simple, it's a very clean command line interface, pretty simple REST API, we've got bindings in a lot of languages to make it easy for people to drive system
containers through LexD. It is very fast, it is based on, it uses images, so it's no more like creating a root file system with the bootstrap or whatever. It's got optimized storage and migration over the network, it's got direct hardware access because they're containers and we've got nice semantics to pass GPUs, USB devices, et cetera.
It's secure, so we use all of the kernel namespaces by default. We also use LSMs, like Aparna, we use seccomp, we use capabilities, we use pretty much everything that's at our disposal to make it safe. It's scalable, and that's what we'll see most in this talk. It can go from just a single container on the laptop to tens of thousands of containers
running in a cluster. As far as what we can run on top of LexD, we've got a lot of images that are generated daily for all of those distros, plus a few more that literally couldn't fit in the slide anymore. So we built for about 18 different distros, about 77 different releases, all combined,
which ends up being over 300 images we build every day that people can use to run on LexD. You can also build your own, but we've got a lot of them. LexD is effectively on Chromebooks, so if you've seen that Linux feature on Chromebooks, it then gets you a Debian shell that's using LexD.
So we've got a decent user base through the heart, and that feature includes integration for snapshots, backups, file transfer, GUI access, GPU access, stand card access, webcam access. They really went with it on the Chromebooks. The little piece where LexD is used right now is on Travis CI, so if you run any job
on Travis that is not Intel 64-bit, so if you use ARM 64, if you use PowerPC, if you use IBM Z, all of those platforms are using LexD containers with an extremely quick start-up time of usually less than two seconds, running on shared systems with all of the security in place, and some of the Cisco Interception stuff that Christian demoed
has been done partly as part of that. Now for the LexD components, go through this quickly. That's kind of the main things we've got in our API. Clustering is what we'll demo today. I said we're image-based, we've got images, and then image alliances to have nice names
on images. We've got instances, so those are containers, but these days it can also be virtual machines. That's a new thing we added a few weeks back. We've got snapshots and backups for instances. We've got network management to create new network bridges that you can use for your instances. We've got projects that get you your own individual view on a shared LexD.
You can have conflicting, there's no more conflict with container names or any of that, so long as they're in different projects. We've got storage with a variety of storage drivers we support, and you can create custom volumes in the snapshots and all that. Some internal bits are mostly to get notified
when something happens on LexD or for access control. We support doing file transfers and spawning applications directly in containers and virtual machines, accessing console, and publishing containers to images. Now for the main topic of this talk.
LexD has had clustering support for about two years now. It works, it's really built into LexD. There are no external dependencies. It works on LexD 3.0 or higher. Insertions can just be turned into a cluster member, and you can easily join an installation into the cluster.
There's really no external component you need for any of that. It works using the same API as you have for a single node. It's got a few more bits you can use through the API to say I actually want something to be specific on this machine. But if your client is not aware of clustering and it just throws things at LexD
exactly like if it was a standalone node, things will just work. The cluster will just balance things for you, and you'll never know that you're even talking to a cluster. And it can scale quite nicely. So we can run containers on dozens of nodes. We've actually run clusters of 50 to 100 nodes, and they still mostly work. And each of those can run hundreds to thousands
of containers. So very high density depending on what you're running. We've also added, and that's very recent, it was a few weeks ago, support for mixed architecture. So you can have cluster nodes that are different architectures, and when you ask a particular image to be used to create a container or virtual machine,
to just pick whatever node is capable of running that given architecture. All right, now for an interesting part of this. Let's see how that works. Okay, so for this, I've got three systems.
Actually, I need to connect to a third one. Okay, now that it's connecting. So what we'll do is LexD is installed. That's LexD 320 that we released two days ago. So just configure the first node. So do you want to set up a cluster?
Let's go with yes. Let's do enter its IP, because the link local is not gonna be fun enough. This one. We're not joining an existing cluster because we want to build a new one. Yeah, let's set the password. Let's configure some storage. So let's go barefs.
Create barefs. That's fine. Was there anything special to do on this one? Yeah, okay, that one is a bit different. So we just need to tell it what's the shared subnets for all of those. So that's my subnet at home.
Okay. All right, so right now you've got a single LexD part of a cluster, but it's the only one in there. So you can see. Now let's go on to the next one and repeat this thing. It's gonna ask less questions because it's just joining.
So it wants clustering. It's IP address is dot, I believe. Joining is in cluster is yes. And the other node was on 1646. Okay, so it's asking for the password entered earlier. Yes, everything's gonna go away when we're joining. Size, we don't care. Yes, so we don't care.
Come on, okay, so now we're joined. We should see that we've got two nodes and things still work. Now to make things slightly more interesting. So those systems were Intel x86, nothing super special, Xeon CPUs.
Now we've got one that is not Intel x86. So this one is running on 64. Same thing, next to init, cluster, it's fine. This is wrong. I forgot to connect that. So it's actually a nested container
because I didn't have a spare M64 system. So I'm just doing next to nesting for that one. But I connected it to the wrong network, so I'm just fixing that. Okay, so there it is again. The IP should be right now. Okay, so clustering, yes, name is correct. IP is correct this time.
Joining is in cluster is yes. And we said it's 1646. Cluster password. Yes, we're cool with that, size and care. Because it's a nested container, it can't create a loop device. I need to actually tell it where the storage is. And that should be the end of that.
Okay, let's go back to like one of the x86 nodes. So now if I list the cluster, we see we've got three. And one of them is AR64 instead of Intel. Now let's show some stuff at it. So just create a container called C1.
This one is, I didn't specify what architecture I actually want. So it's gonna kind of surprise me. LexD will pick whichever it consists of to be the least busy server and just schedule a container there. So it's probably gonna be on one of the x86 ones. Yeah, it's on Edfa, which is one of the x86 servers. Okay, now let's do another one.
Let's do CentOS. I think, my guess would be it's gonna go on Notero, which is the other x86 system. And then the third one would most likely be scheduled on ARM. Let's see. Yep. It doesn't have an IP yet,
but that's gonna fix itself, there we go. And let's do Alpine. That name is already taken. Oh no, they're just out of order, nevermind. Okay, so C2, and that's gonna be on ARM. Okay, so now if I go on there. So from, I keep forgetting that Alpine doesn't have a bash.
There we go. So yeah, I'm just executing a command in there, and we can see it's running on AR64. So LexD is doing all the API forwarding for us. So I'm on one machine, talking to the cluster, and just go on to the right node and kind of query it there. The other thing that's somewhat interesting
is we've got a tool to convert a system into a container. So that's what we've got here. That VM01 is a CentOS 7 VM. That's just doing nothing, but it's there. We've got a tool called LexD P2C that can take the address of the cluster,
will ask for the same password we set, and will then transfer the entire thing over the network into LexD, creating a new container for you. That's the entire content of the system. That's gonna take a little while, so I'm just gonna let it run. While that's going on, I wanna show the new, cool thing we've added.
So all of the LexD networking, storage, and configuration bits, because our containers act so much like virtual machines, the same concept really applied to actual virtual machines. So we figured, well, why not just allow running virtual machines as well, using the exact same storage and configuration?
So that's what we've got. I can do launch, and notice I've got just an extra thing at the end, so just dash dash VM. That's pretty much the only difference. In this case, I don't want it to go on ARM, because since that ARM host is a container inside a VM on ARM, there's no way I can run a VM inside there.
But the x86 machines are running on physical hardware, so those will be just fine. We do support running VMs on ARM, but you need to run on the actual hardware, which is not the case here. So the images are a bit larger, because we have our machines, but still downloading, unpacking that, creating the storage volume on Burefess, I think it was this time, yeah.
And now if you do console, so console works fine on containers too, just to show you. If I do C1, there we go. So on the container, you get attached to the console, and on a VM, what did I call it?
Yeah, not fun. Well, VM01 is the one I'm transferring in as a container. I would have expected console V1 to actually function. Where did it go? Is it just because it's confused? Square that for a turn, yeah. Oh, okay, all right. All right, so this is a bit picky.
So we see the same thing, the VM was booting. I touched a bit late. Let's just go back to this guy here. We can just launch a second one of them. Let's see if this one behaves properly. Creating V2, come on, you can do it.
Oh, yeah, that one takes, that one takes a tiny bit, because since it's built by anything within the cluster, and the node it picked doesn't have the image yet, it's doing an internal cluster transfer of the image. So it's not pulling it from the internet again, but still needs to move it around. It's optimized, it uses Burefess send receive in this case.
It doesn't use our sync or anything. It's pretty optimized, but. So we can see we're in the bootloader, and then booting the VM. And lastly, just to show you that, I'm hoping that VM is done transferring, the CentOS thing, it is. So if we go here, we can start VM01,
which is a container that was created from the CentOS system, and there we go. All right, how am I doing on time?
Okay, we're two minutes behind, yeah. So LexD is available on obviously Linux, that's why we run, but we also have a Windows and Mac OS client, so that you can talk to a remote LexD, if you've got a Raspberry Pi or an Intel NUC or something you want to run LexD on. If you want to contribute to LexD, it's written in Go, it's fully translatable. We've got client libraries for a bunch of languages.
It's Apache 2.0 license, there's no copyright assignment or anything in there. We've got a good community you can work with, and we've got usually a bunch of smaller issues, good starting points to contribute. That's it. If we've got some questions, we can take them now.
And we've got stickers towards the exit when you leave, if you want any of those. Questions. Sorry for speaking so fast, but as it turns out, 20 minutes is pretty short. Thank you very much. Two quick questions. What do you think about running Kubernetes
inside the LexD containers, the LexD containers here? Yeah, so yes, we can do it either way around. Some people have been, you can run Kubernetes, especially things like API servers and stuff inside LexD containers, no problem. You can even run kubelet inside LexD containers, because we support nesting and we support running Docker inside LexD containers,
so that's possible, and people have done it before. And you can actually do it the other way around as well, where LXE is a community project that implements a CRI for Kubernetes, that then drives LexD containers. So you can kind of do it either way, but yeah, it's possible. And then question, like,
why kata container exist if LexD is secure? Why what? Why kata container exist if LexD is secure, like, system container? It's always, so that depends on people, depends what they trust. We've seen that hardware is not particularly safe either. Some people think that relying on the Linux kernel
for the entirety of the security story is quite fine. Some people think that VMs are the only option. Some people think that you need both. In fact, like on Chromebooks, Google is on purpose using both. So they're using a virtual machine layer and then running only unprivileged containers inside there, so that if the kernel is busted, you're still in a VM. If the VM is busted,
you're still an unprivileged user in a user namespace, because we've seen exploits against both in the past. Recently, we've actually seen more CVEs and security issues around both the hardware bits of authorization and some of the hypervisor stuff than we have against the Linux kernel as far as escape of containers. But, I mean, there's always a risk,
and that's kinda up to you what's fine with you, what's not. Combining both is also the slowest option, but it's there, and some people have done it. Okay, two short questions. First, do you support foreign containers on the host,
I mean, running ARM containers on x64 or something like this? Sorry, I'm not, I'm asking a question. Foreign containers. Oh, yeah, okay. So you, architecture emulation on the system, effectively. So like ARM on x86, yeah. So we did that in the past. I did implement support for that in LXC
almost, I don't know, five, six, seven years ago using QEMU user static. It is possible. It is not pleasant, and it's not something we want to ever have to support again. The main issue being that the QEMU user static layer cannot handle properly threads or netlink
and some other things like that. So we have to do a very, very weird container where most of the binaries were indeed ARM, but the init system and the network tools were x86, which works, but is really, really weird. And as soon as you start doing updates and stuff against those containers, just quickly get
into a really weird state. So not something we're particularly keen on revisiting at this point. It is possible you can make it work. You could create a custom image that bundles QEMU user static at the right location, and with the right bin formats configuration, LXC will let you do it, and it will just work. But not something we want to support. Actually, I do just a bind of host system works.
I just want to see a native implementation. But anyway, and second question. You talk about clustering. What about roaming clustering nodes? If you remove your notebook, for example, with a node somewhere else. Yeah, so that part is kind of tricky.
So LexD has my move support to move containers around. That works fine, but usually you want to stop them. If they're running, then you just create a QEMU that Adrian talked about earlier, which can work in some cases, but also tends to fail with a lot of modern software. As far as the storage bits, one thing that's interesting
is that LexD does support Ceph as a storage driver. And so if your container is backed by Ceph, at least if a node goes down, you can always start the containers back up anywhere else you want, because the data is on the network. So that's kind of what we have there. And I think we're out of time. Yep, we're out of time. Thanks very much. Thank you.