We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Jailhouse, a Partitioning Hypervisor for Linux

00:00

Formal Metadata

Title
Jailhouse, a Partitioning Hypervisor for Linux
Title of Series
Number of Parts
199
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
This talk will introduce the architecture of Jailhouse, describe typical use cases, demonstrate the development progress on a target system and sketch the project road map. The Jailhouse project provides a minimal-sized hypervisor for running demanding real-time, safety or security workloads on fully isolated CPU cores aside Linux. In contrast to other commercial and open source hypervisors of similar scope, it is booted and managed via a standard Linux system. Its focus is on keeping the core code base as small as feasible, generally trading simplicity over features. Jailhouse has been released under GPLv2 and is being developed in an open manner. The talks aims at attracting further users and contributors, specifically from the embedded domain, but may also trigger discussions about additional use cases
TrailProjective planeVirtualizationXMLUML
Projective planeProduct (business)Presentation of a groupPhysical systemCellular automatonSlide ruleSoftware developerCentralizer and normalizerNumberOpen sourceCASE <Informatik>BitFlow separationMagnetic-core memoryDemo (music)Electronic mailing listLecture/ConferenceXML
Very-high-bit-rate digital subscriber lineTask (computing)Physical systemService (economics)Control systemSoftware maintenanceGame controllerSoftwareAreaBefehlsprozessorCausalityBit rateSupercomputerRadiusOpen setProjective planeQuicksortCalculationOnline helpInsertion lossMaxima and minimaConfiguration spaceComputer hardwareInformationFrequencyInterrupt <Informatik>Event horizonLimit (category theory)Single-precision floating-point formatProgramming paradigmRight angleCache (computing)Plane (geometry)Computer configurationFocus (optics)CountingLecture/ConferenceComputer animation
Computer clusterOcean currentComputer programmingOperating systemWorkloadCASE <Informatik>SoftwareMultiplication signLatent heatAreaDualismGame controllerMaxima and minimaComputer hardwareSoftware testingProcess (computing)Physical systemTask (computing)RootkitVideo gameComplex (psychology)Run time (program lifecycle phase)State of matterBefehlsprozessorSpacetimeSlide ruleObservational studyReal-time operating systemVirtualizationWordPresentation of a groupStandard deviationPerturbation theoryGraph (mathematics)Domain nameFood energyLevel of measurementRight angleHyperbolaPositional notationExecution unitVisualization (computer graphics)Partition (number theory)Software maintenanceOnline helpMoment (mathematics)Virtual machineConnectivity (graph theory)Operator (mathematics)Open sourceRule of inferenceEndliche ModelltheorieElectronic mailing listPhase transitionForm (programming)AdditionRoundness (object)Public key certificateBootstrap aggregatingBootingCodePrototypeFrequencyStack (abstract data type)Magnetic-core memoryDirection (geometry)Configuration spaceExtension (kinesiology)Flow separationFluid staticsProgramming paradigmLecture/Conference
VirtualizationChemical equationWorkloadLecture/Conference
Physical systemDomain nameTheoryCellular automatonMultiplicationPartition (number theory)Real-time operating systemArithmetic meanSubject indexingGame controllerBijectionComputer hardwareCartesian coordinate systemMaxima and minimaConnectivity (graph theory)Configuration spaceInstance (computer science)Set (mathematics)Error messageInterface (computing)Structural loadCommitment schemeBefehlsprozessorAreaModule (mathematics)Run time (program lifecycle phase)Complex (psychology)MappingVirtualizationExistenceBootingComputer architectureInformation overloadStandard deviationScheduling (computing)Multiplication signMulti-core processorXML
Line (geometry)VirtualizationInteractive televisionComputer fileCellular automatonSoftware developerTask (computing)Reduction of orderMechanism designLevel (video gaming)Interface (computing)Multiplication signNormal (geometry)Integrated development environmentGame controllerComputer hardwareComplex (psychology)Right angleSlide ruleComplete metric spacePhase transitionStructural loadBootstrap aggregatingLoop (music)CodeEncapsulation (object-oriented programming)Visualization (computer graphics)Video game consoleEvent horizonOperating systemTouchscreenPhysical systemBootingEndliche ModelltheorieThread (computing)BefehlsprozessorNormal operatorLaptopDemo (music)Computer-generated imagerySound effectMagnetic-core memoryUsabilityVotingOperator (mathematics)Serial portSoftware bugVirtual realityPoint (geometry)Configuration spaceTheory of relativityTable (information)Matching (graph theory)DemosceneBitFocus (optics)Execution unitOpen sourceCartesian coordinate systemRevision controlWorkloadVirtual machineRun time (program lifecycle phase)Game theoryCASE <Informatik>Different (Kate Ryan album)Control flowQuicksortInsertion lossLecture/ConferenceSource codeComputer animation
WordVideo game consoleCellular automatonConfiguration spaceView (database)Module (mathematics)Virtual machineLaptopOpen sourceOnline helpQuicksortException handlingCategory of beingRing (mathematics)Computer animation
Virtual machineFunction (mathematics)BootingRight angleGame controllerSpacetimeBuffer solutionSound effectExecution unitDifferent (Kate Ryan album)Message passingPoint (geometry)WhiteboardReal numberSource code
Binary imageCellular automatonNumberConfiguration spaceBinary codeSoftware testingLatent heatEvent horizonLoop (music)Computer hardwareConnectivity (graph theory)SoftwareFunction (mathematics)Multiplication signVirtual machineSubsetSource codeTable
Configuration spaceCellular automatonMaxima and minimaBefehlsprozessorComputer animationSource code
Game controllerInformation overloadPhysical systemSpectrum (functional analysis)Computer hardwareBlock (periodic table)BuildingProcess capability indexBefehlsprozessorInterrupt <Informatik>Right angleDifferent (Kate Ryan album)VirtualizationKernel (computing)Task (computing)Theory of relativityMaxima and minimaAreaArmExpert systemMereologyElectronic mailing listState of matterOpen sourceSoftware testingIntegrated development environmentStandard deviationCodeVisualization (computer graphics)SoftwareWhiteboardValidity (statistics)Line (geometry)Process (computing)Single-precision floating-point formatMechanism designPlanningCASE <Informatik>Connectivity (graph theory)Plane (geometry)Vapor barrierScalabilityProjective planeCellular automatonEmailTelecommunicationWorkloadComputer architectureHand fanLatent heat1 (number)Range (statistics)Partition (number theory)Open setPoint (geometry)PrototypeTerm (mathematics)Type theoryLattice (order)Set (mathematics)Confidence intervalNegative numberStatement (computer science)Asynchronous Transfer ModeField (computer science)Mixed realityClosed setAssociative propertyArithmetic meanData storage deviceVideo gamePartition of a setIncidence algebraMusical ensembleMappingView (database)Memory managementOrbitInsertion lossLimit (category theory)Computer animationSource codeXML
Computer hardwareData managementCellular automatonCombinational logicBefehlsprozessorSoftwareConfiguration spaceDemo (music)Speech synthesisPhysical systemInterface (computing)Cache (computing)Magnetic-core memoryBound statePerturbation theoryDifferent (Kate Ryan album)Power (physics)Game controllerCommunications protocolWorkloadInteractive televisionVirtualizationClassical physicsRun time (program lifecycle phase)BootingOperating systemIntegrated development environmentInformation securityCodePoint (geometry)Extension (kinesiology)Line (geometry)System callCartesian coordinate systemSocial classField (computer science)BildschirmtextBookmark (World Wide Web)Vector spaceInternetworkingDuality (mathematics)QuicksortRight anglePhysicalismOntologyMathematicsAsynchronous Transfer ModeArithmetic meanFamilyParticle systemProcess (computing)Boundary value problemLecture/Conference
FirmwareDifferent (Kate Ryan album)Level (video gaming)Cellular automatonBefehlsprozessorGame controllerPhysical systemMereologyMathematical optimizationCategory of beingVisualization (computer graphics)Computer hardwareJSONXMLLecture/Conference
XMLUML
Transcript: English(auto-generated)
OK, so welcome, everyone, to the last talk of the virtualization
track today here. So I'm pleased to present a project here. So first of all, anyone attend my talk in Edinburgh on this? That's good, because otherwise you will feel some slides will repeat. This is a professor talk. So I'm going to talk about a new hypervisor called
Shellhouse, which is being used for partitioning Linux systems. So I'm working for Siemens, as you see on the slides. So we are a central research and development department. So this is not a product presentation. This is a project presentation from research. So please don't ask me about products on this topic.
But I will try to give some hints where this could be used. So we'll start with the motivation for this activity and the needs we have on this, present the approach, what is different in this case, give a status, what the current development is about, and also try to squeeze in some live demo on it.
And also argue a bit about why we are going open source with it or why we went open source with it. So first of all, there are several use cases where you want, basically, in a multi-core system, the full use on certain number of cores on this system. So where you want to run, basically, a service, a task,
to 100% on this core. And don't be disturbed. So the first thing that may come in mind to you is some kind of high-speed control systems. So where you want to talk to hardware and react on certain events at a high rate and with a low latency. So here, every microsecond counts.
But simply to achieve a higher frequency. So every latency you can get from software specs between you and the hardware basically lowers your achievable maximum control rate. So we want to keep, basically, the caches active, hot. We don't want to get any disturbances from other activities operating systems, for example.
And we want to miss, of course, very demanding deadlines in the scenarios. But there's more beyond it, actually. There's also this high-performance computing area. They also have interest in dominating a single course or single CPU fully, simply because they all want to keep caches hot to finish their calculations earlier.
And finally, another scenario. I basically, what we learned about when we went open with this project, not in our focus right now, but it's interesting for other users. This is the software-based data planes. So if you think about software-defined networking
these days, so there's a data plane with a big data run-through. And if this is done completely in software, not with the help of system and hardware, you also have the typical requirements on high throughput and low latency there. So the same applies here for this scenario as well as above scenarios. So what you see today to fulfill
these requirements with Linux is what I summarized on this config no H set configuration option. So the idea here is to keep the Linux programming model and dominate a single core, a simple CPU with a single Unix task. So no interrupt on this CPU unless the task requested it.
No maintenance work, no housekeeping work of Linux on this CPU. And while we keep the current programming model, but this is not really trivial if you talk to developers, because Linux is not prepared for this right now. So there's ongoing work on this area. It's improving, it's getting better and better. So, but there's still something remaining.
So far right now, for example, you still have to run at least once per second the maintenance work on this CPU. And there are other works which require, well, offloading work to other CPUs just to get one free. So it's not yet perfect, at least if you think of a long running thing where you want to have really the full CPU
over a long period without any disturbances. So it's an approach in the right direction. And maybe interesting, but well, there are other workloads which do not follow this programming model of Unix. So if you look at, well, industrial scenarios, you often have some kind of preexisting software,
let's call it like this, where you have some artists running certain workloads and you want to combine these workloads now these days with multi-course machines with some general purpose operating system, like Linux typically. So how to get these together on a single hardware? Well, the typical approach to this is, well, to use virtualization, for example,
a real-time extended KVM machine. So you can lift basically the preexisting software in the virtual environment, make sure that the virtualization is fulfilling your timing requirements, and well, that's the approach. And actually, we also did some studies on this and also have some prototypes running with the scenarios, I don't want to go to details. Basically, this is a slide from a previous presentation,
or a presentation about making a KVM real-time capable. So we measured basically in a setup between some real-time virtualized nodes and another networking node, and the latency you get on the round trip. Basically, what we did also here is to dedicate a core to the virtualization task, so to the virtual machine.
And we got in this scenario with all the stacks, so this ran through KVM, Cuba was not involved in this scenario, we came up with latencies, maximum latencies about 330 microseconds. So depending on the scenarios could be already enough, on other scenarios it's definitely too high.
So even if these timing requirements would be okay for you, for your scenario, there's another area. And that's about safe and secure scenarios. So specifically the safety scenario is interesting for us. And there you have to go to certain certification processes, let's call it like this,
and in these processes you have to look very closely at hardware, but also the software stacks. So this involves review, testing, arguing about the software, and possibly some formal validations of it. And well, simple rule, the larger your system is, the larger your software is, the more effort you have on this.
So the typical approach to this is to split it off. In the olden days we had separate machines for this, safety on one, and the other stuff is in another machine. These days everyone wants to consolidate, so you suddenly have the safety and non-safety in a mixed scenario in the same machine. So you want to separate these workloads in some way,
to keep the critical components away from all the complexity of a modern operating system, the modern operating system stacks. And for these scenarios, well you also see that virtualization could be a helper with desegregation. So it could basically isolate non-safety rated workloads
from the safety rated workloads, at least as far as the virtualization in itself is not adding more complexity that encapsulates. So if your operating system or if your virtualization stack becomes as complex as the system you want to isolate from, you have one nothing. So for these scenarios,
you want some really small hypervisor, and with small I mean even smaller than 10, something which is really focusing on this specific task. That means mostly statically isolating workloads from each other, spatially and temporarily.
So there are quite a few solutions out there in fact, although this is a niche market, there are many commercial offerings on this. Unfortunately there are not many open source activities in this area. They either have hardware restrictions, or they're not targeting industrial use cases. So this is first of all one thing that we saw,
we want to be able to be independent of a certain solution, we want to have control over it. So we want to have an industrial targeting open source solution for this. So this is one area where we thought about and the other area is if you develop
such a micro hypervisor, well it takes over the control over the hardware, that's the purpose of this. So this is basically what you boot now instead of your full operating system. It takes over the control control and then it has to boot still your non-critical workload
and this is the general purpose operating system and getting just this operating system up and running is a quite complex task. So the bootstrap process of your gas systems keeps you quite busy and increases the complexity of your hypervisor significantly. While in the end when you have a running system you usually get away with much less code involved with the runtime just to keep this isolation
between the non-critical workload and the critical workload alive. That's about static partitioning. So we thought about it would be interesting to go for another approach than the classic approach of booting the hypervisor first. And this is where the jailhouse approach comes in.
So we boot Linux on the system, just as before. So it takes full control over the hardware. That's the boot phase. And if you look at the embedded world specifically there isn't barely any hardware where you can't boot Linux on it. But then when we come to the partitioning phase,
so when we want to separate certain workloads from the Linux world, we load the hypervisor belately. So we lift up Linux from the hardware, move the hypervisor underneath, and well, of course configure the hypervisor according to our requirements so that we have a partition system.
And there we are with a statically positioned systems where Linux can't access resources that the real-time domain is using and vice versa, fully controlled by the hypervisor. But this hypervisor just has to keep the system in its current state running. It doesn't have to care about getting Linux booting up and down.
And even if you have a specific dedicated workload on the right side for the real-time purposes, for the safety purposes, they also can be reduced in their needs and their requirements on the virtualization layer. So we have to look for the right balance for these kind of workloads.
And that means balancing between features, of course, it's always easy to add features, and simplicity. Simplicity of the hypervisor providing these features. And the purpose of JLoud is really to focus on the simplicity, to keep things as simple as reasonable, let's say. Not as possible, you can always do things
more crazy and more simpler. But well, if we have to decide about a feature, the simplicity is very important for us. So let's have a look at the whole thing. How the architecture of JLoud looks like. So as I said, we have multi-core systems
with multiple CPUs, let's say we have two of them split off for the real-time domains, Linux running on the other systems. So the hypervisor is taking full control of the hardware, there's really no interaction, no possibility of Linux during runtime to bring the system down. Still, we need some kind of control for it. And there we make use of Linux.
So we have some kind of loader and control module, which talks to the hypervisor, which also brings up the hypervisor, as I said. And this one has a standard index interface, so we have some character device created, and via this device, basically we talk to the loader module and we can bring up the system,
so we load the hypervisor itself, we also provide a configuration for the system, so what resources are there, how they should be divided between the instances during startup, and also later on, when we load these, what we call cell. So these real-time applications, it could be full RTOS, it could be bare metal,
whatever you would like. So these are the main components that are used to get the system up and running. If you see at this, basically this is more about actually access control than true virtualization. So we want to manage basically the available hardware
in a partitioned way. We don't want to virtualize them, we don't want to overload them possibly. So we basically just intercept and then filter access to all the sensitive resources we have in the system, as far as the hardware doesn't support us in this. So ideally, hardware virtualization provides
all the means we need to do this partitioning, so we just have to configure it. But we can't do this because hardware is incomplete in this regard, virtualization is incomplete, we have to intercept and decide about if a certain access is valid or not. The goal here is basically to avoid that any cell
has some kind of system-wide impact on the other cells. So simple example, no one should be able in the disabled setup to reboot the system as a whole. Or even crash it or whatever way. That means, of course, one-to-one resource assignment. So we don't do any overcommitments,
we don't support any scheduling. There is no scheduler involved. It's boring for some research topics, which is often about rescheduling, but in this area, really, we have no scheduler. That means, of course, better practicability. For real time, very interesting, so less complexity. And we do not hide much of the hypervisor existence.
So we don't emulate resources that we steal from one side. So if you boot in one of those specific cells, you will not find typical PC hardware there on PC, because it already is signed to Linux, or it's completely blocked for other reasons. And this, of course, keeps the complexity of hypervisor down.
Linux won't notice it, because it's already running, so it won't access this hardware anymore. If it would, there would be a fault. It wouldn't allow this during run time. So if you want more, if you have a full system which can't be kind of paravirtualized for this scenario, can be adopted for this scenario, there's still a KVM for this.
So this is basically where we draw the line between our approach, Trailhouse, and the full virtualization approach we have out there. So Linux is our friend in this scenario. So first of all, as I said, bootstrapping is done by Linux. Also, the loading. You will see an example later on,
so I will just jump to it quickly. Cell creation. We have some kind of commander interface for this. Again, reduction of complexity of the hypervisor, and we also get some kind of Unix look and feel like with the whole system. You don't have to fiddle with all the bootloader, for example, to get this thing running. You have a normal Linux environment where you can use files to pass in
the images or the configuration. You can handle it on the command line, things like this. Also, during operation. So if you want to reconfigure your system, you also issue the correlated commands for this. At least as far as the system as a whole is allowing this. So you can destroy your cell, reconfigure it, bring up a different configuration
and use a different cell. Of course, also enables us to use logging and monitoring from the next environment. And yeah, well, also shutdown is possible as far as the configuration allows it. Again, reduction of the complexity of the hypervisor and also shorter turnaround times. You don't have to reboot the whole system
if you just want to reboot a certain cell. That's typically the case with many of the static, real micro-hypervisor which boot during the early boot phases. So at this point, I'll let you know the slides on the status. So we currently focused on x86.
So this was the first development. So depending, of course, on hardware visualization features that you find these days. So CPU virtualization, device virtualization, these are the requirements. We enabled direct interrupt delivery even if the hardware isn't supported from this. So we have very low latencies for the real-time cells.
So if you write it in the proper way, you can even get down to zero exits. So the cell can run independently of the virtualization at full speed without any interaction with the hypervisor at all during normal operation. We are also using virtualization to bring up the whole system.
So basically, the first development stages were all done in QA with KVM. So we didn't work on any hardware. So this is also very helpful if you want to debug the environment. As we learned today, OSV is also using this approach to develop the operating system completely in a virtual machine and debug it the same we did here, which caused some interesting effects
because it means nested virtualization. We are using virtualization to virtualization. So the code for this wasn't completely stable at this point. So many bugs were fixed in this area, but quite interesting. Unfortunately, we are lacking right now VTD in KVM. So there's no VTD relation available or virtualization available. So we can't do this kind of feature in the KVM
or in the real-time development in a virtual environment. That's why VTD currently is still actually a soft requirement, hypervisor boots even without it, just to enable this kind of development model. So we went public about four months ago now,
and development moved on. So actually, we implemented now cell destruction so we can shut down a complete guest cell again now. We have now complete support for device pass-through to other cells. So that means encapsulation of the DMA request, sent from the device.
This is now also implemented. We have also some mechanism implemented right recently is access control. So cells which do a certain critical task should probably not be destroyed at arbitrary points. So we probably need some kind of shutdown warning at least. So basically what we implement here is
some interface for critical cells to talk with a hypervisor before executing a destruction command and vote against it or at least delay this kind of operation for ordered shutdown. There are further improvements on usability. So this is still an early stage. It's still some corner, some edges, rough edges on it.
But it's step-by-step improving now. So, and now for a short live demo. We're using my notebook here, which consists of a multi-core CPU, two cores with each two threads.
And it's running Linux on it, of course. And we have a few devices underneath. What I'm doing basically right now, well, I'm loading the hypervisor, of course. And then I'm establishing some kind of special loop-back table, this is what I have here. So just to see what the hypervisor is doing,
it has a console, a serial console just to dump the status. And we're also using serial console to dump then the workload, which is running there. So we're basically feeding back the serial port via USB cable to the console so you can see it on the screen here. What I'm going to start is a very simple demo, a timed event loop.
So you set up a timer in this guest machine, which is triggering 10 times per second, and measuring basically latency of this timer against a known clock, the PPM timer precisely. So let me see if this works.
OK. So this is here the serial console. It's carrying nothing written there. And this is the command line on my host machine. So I'm first of all loading, of course, this helper module. Nothing special happens for this, except that you now have a device node for it. And to this device node, now the tool,
the JLOs tool, the command line tool, it's a very, very thin tool just to talk to this device, it's communicating. So I'm enabling now the hypervisor with the cell configuration for my notebook, which is basically describing the resource I have, so how much RAM is there, where it is precise, how many views are there, and so on and so forth.
So this is done now. If you look now at the console, you see the boot up messages. I shrink it down a bit, just because we can't read all the data while the hypervisor is starting. Otherwise, we run out of buffer space in the serial port, so I shrunk down the output.
So if you do this on a real machine, have a different machine attached to it for serializing, you see actually that something was cut over the last commands out there. So basically, it's booting up on all four cores, running the hypervisor there, and now at this point, my machine is in control of the hypervisor. So I couldn't reboot, for example, right now anymore.
Hypervisor would refuse it simply. So let's create some test scenarios. So again, I'm using the command line tool here to create a cell with a specific configuration for it, and I'm also specifying what should be running the cell, so the binary image, basically, and where it should be loaded.
And this is the output. So the numbers here are not very impressive. They are nanoseconds, so this is basically the latency of this timed event loop. It's getting up even beyond 20 or 30 microseconds because of the hardware here. So what you're measuring here is not any kind of software. You're measuring really the buyer hardware
latency of this machine, including chipsets and all the components involved in x86. So our different hardware, we are seeing even lower numbers, single-digit microsecond performance on the Orono Run. So it's running now. And I could now also destroy it again.
So I'm specifying the configuration again, which I would like to destroy. And it gives me, oh, I'm not permitted to do it. That's simply because this cell said, oh, I'm rejecting the first request. I'm doing an audit shutdown, and then I'm done with it. I'm signaling to the hypervisor, OK, now you can destroy me. So let's try this again.
And we are done. And we see, OK, the CPU has been returned to Linux. And I can even, oops, wrong console, disable the whole thing again. And now the control is back to Linux. The whole system is back to Linux, hypervisor overloaded.
So what are our planning for next steps? Well, we still have to do some work on x86 environment. Specifically, interrupt remapping is currently under work to plug these last hole. But also, the access to certain PCI resources
have to be moderated basically with hypervisor, so there's still ongoing work. Another open area is inter-cell communication. Of course, right now, the cells can only talk with the outer world. But you also have scenarios where you want to talk in between the cells, so between Linux and a cell for monitoring for starters,
for controlling, whatever. So we need to establish some kind of channel for this, obviously. It's not yet decided what kind of mechanism to use, but we are a big fan of reusing existing ones so we will probably look pretty closely at WordIO. Not only because it's more standard, but it's also portable to other non-PCI architectures, for example. So a very interesting approach.
So this will basically enable us to do inter-cell communication. Further areas required, specifically for safety scenarios, is to do some kind of validation of the setup. So we want to prove that the hardware is brought into the state that we know that is proven to be working.
And so this is also ongoing work to do some kind of mechanism for this. And yeah. Then, of course, the question arises, what about other architectures? And this is what we often receive as feedback. OK, X86's knife. What about ARM? Of course, we want to be portable. The hypervisor is written in a portable way,
at least as far as we were able to prove so far. And we will go for ARM soon. So requirements there are, again, hardware virtualization support. So it means anything ARM v7 or better. And also device isolation, of course. First target thing would probably be some Cortex-A15 SoC.
So most likely some Exynos 5 system. We are in terms of discussion on this. So it's currently the question, who will start and who will do the first steps. So we are in contact with the narrow in this regard. There are some telecommunication suppliers interested as well. As I said, these data plane scenarios are very interesting.
And the other small companies are interested in this area. So we are trying to coordinate the interest and the efforts. Some communication has already happened in the mailing list. Others are currently having the background. I hope we will soon be able to have a full roadmap established on the mailing list and then see who is working on what should come the next week or the next month, let's say.
Definitely we have some roadmap, which means that something will be there by the end of Q3 of this year. So this basically means that we have some kind of state that we had with x86 in the end of last year, also with ARM available for the first steps. But of course, this is an open source project.
So early contributions on this area will accelerate the roadmap, of course. Yeah, open source. Why open source? So when we discussed this also internally, there were some comments like, well, small hypervisor means just a few lines of code. This is easy to write ourselves, to maintain ourselves.
Why should we go open source? But yeah, it's not that easy as it looks like, even if there are only a few lines of code. This is half of visualization. This is a nasty beast on most architectures. I think all architectures. So you really need to have experts looking at this. And you need experts from different areas
with different experience behind it. Also, this is about supporting a broad range of systems, a broad range of architectures of CPUs, of boards. And this also means that at some point, you get to the barrier of scalability of your own resources. We want to attract others to work on this. So we want to benefit, of course,
from hardware vendors working on this area, from board support done by vendors. So this is also a reason to go open source. And of course, we want to broaden the usage. We have a certain use case scenarios in mind. But I see that this kind of component could be useful for broader areas.
So think of automotive areas. Think of avionic, possibly. So the more we get this thing out in the field, the more it is being used, actually, the higher test coverage we get earlier. And well, in the end, we all benefit from it. All old known wisdom.
Finally, it's a close relation to Linux kernel. Actually, well, the code is not part of the kernel right now, but we are, well, of course, reusing Linux as far as possible. So working at open source also enables this cooperation, and is it definitely.
And maybe one day, this is also one question I got back, what about integrating the Linux kernel? Yeah, why not? Not today, probably. Maybe not in five years, who knows? But if this is actually established as kind of standard way to doing this kind of isolation, then maybe at one point become interesting for it. And we don't want to close this door, we want to keep it open.
And of course, this also means have to go in open source. This also means one reason to choose the license. So we chose license GPL intentionally to keep the openness of the whole thing. So we want to foster that everyone
who's working on this, and actually prototype this probably, is also forced to release what he has done on these areas. So we chose GPL, we also made a clear statement, just like Linux kernel has, that everything you run in a guest environment is not affected by it. That's just to make it clear,
just to avoid any kind of discussion on this. So the whole thing is GPL v2, just like Linux kernel. OK, to summarize, give us what I'll look. So you see now that there is a need for isolating workload on single cores, on multicore system specifically.
So there's a need to do undisturbed processing work on single cores, with low latency on IO. And there's also a need to do this with very low software layers, just to have validation efforts reduced. JLOUD is providing a building block for this. It's not the solution for the whole thing, of course.
But it's a building block for this kind of full CPU isolation, 100% CPU to a given task. We want to reduce it to the minimum of this kind of scenarios. So the goal is of the work of the code base to keep below, let's say 10,000 lines of code, currently below this limit, definitely.
And of course, when adding any kind of feature, this also is an important requirement to keep in mind. So we won't add everything to this. Yeah, and it uses Linux, so just to have a handy infrastructure at hand. So it's a different way of working with these kind of scenarios, a different way of working with static partitioning.
So thank you for your attention, and I'm open to questions. That's the first one.
Speak up. How difficult, oh, there we go. How difficult was it to be able to insert yourself underneath of Linux? Was the... Amazingly simple, at least in x86. Okay, I'm just curious, could you briefly summarize it?
Because actually, at the Zen project, we had thought about... To be fair, the current configuration has a lot of holes. So if you start up a system, for example, here on my notebook, I tricked a lot. So a lot of resources are simply passed through right now which shouldn't be passed through. So this is, of course, a tuning thing you also have to do in certain hardware. But generally, what we did, basically, is to look what is accessed during runtime and then decide if this is allowed or not.
Like you set up security policies. First of all, you block everything but in monitoring mode, and then you look basically what gets trapped, and then you decide, oh, does it make sense or is it something that should be changed? So it was pretty simple, actually.
When you are separating the cores, usually the cores have shared caches. How do you separate this? Because this is a... Just as far as the hardware supports on this. Is, of course, the point. This is also the reason why you see 20 or 30 microseconds latency on these cores.
Even they are just running one workload. The hardware, specifically x86, is not really well made for these scenarios. So you get latencies there and interaction between different workloads just because of hardware dependencies. So you have shared resources like shared caches. You have shared buses and things like this.
And if there is any kind of support for it, some hardware provides some kind of quality management for these, we will configure it properly and try to enforce and reduce, basically, dependencies as far as possible. But if you can't, then you either have the wrong hardware or you have to live with the consequences. Okay, and one other question. What about combination of J house and below it KVM?
Below it? Or above it? Above it, okay. Above it. Well, we have a demo, more or less. So I played with it. So basically trying to pass through the virtualization, CPU support in a moderated way to Linux. So this should be possible.
Specifically with newer hardware where you have support in the hardware and the VTX extensions to do virtualization nested way, it should be quite efficiently possible. Of course, you will have a little slow down there. But at least I got it as far as running then a KVM guest inside the J house environment on Linux booting.
But of course, not confined. So there would be a lot of code required to confine the access to the VTX resource on x86, for example. So it really depends on the hardware. But it would be possible. So for the scenario, let's say they have some non-Linux operating system running aside these scenarios. That would be a way to go, possibly.
Do you support hot plug, CPU hot plug? Hot plug. In essence, if Linux decide to shut down some CPUs for power management,
the hypervisor, is it able to deal with these issues? Well, it depends on basically what kind of control you have to give with a grant to the hardware controls managing it. So if you want to shut down a CPU and really physically turn it off or whatever, this interface has to be monitored.
So either it is in hardware already properly separated per CPU, and if it's a per CPU command, you can say, okay, you can do this on your Linux CPU. You can't do it on a non-Linux CPU. That's one way to model this. The other way is basically to trap all the access and then decide, based on the CPU assignment
of the different cells, if the access is allowed or not. Conceptually, nothing speaks against it. Practically, we haven't implemented it. So what we do actually is software CPU hot plugging right now just because we need to get CPUs off from Linux and assign it to a different core, to a different cell. So what you see in the demo basically
was CPU hot plugging and software. So we offline one CPU for Linux and assigned it then to a different cell. So Linux won't see the CPU. If I would online it while Linux was not assigned to the CPU, we had a violation. Yeah, so going back to the TLB stuff,
I think the problem is that your L2 cache is almost always gonna be shared between cores. And then you have also the coherency protocol running between your cores to make sure that cache lines are coupled around, which will always, I think, cause some overhead. And I don't entirely see how you can avoid it by using virtualization, because unless you would take
the core out of coherency, but I don't think there is any hardware which allows you to do that. This is a remaining problem, with all the virtualization approaches so far, that there might be some inter-effects. The question is, are they bounded or not? And this is something, well, we have to also discuss with the hardware vendors, of course.
I think it's very hard to characterize boundaries for that, although it might be possible, but I don't think classical SMP systems are designed with bounded latencies, guaranteed upper bounds for latencies when it comes to coherency protocols.
The coherency protocols, I don't think the coherency protocols are designed with this kind of requirements in mind, at least. It might work, but I'm not sure. Hi, do you have any problems with firmware and things like ACPI, SMI?
Like, basically the firmware just taking away a CPU from a cell? SMI is a different level, again. So if SMI is happening, and if SMI is hitting an isolated core, you have a problem, just like you have without virtualization. So typically what you do in this scenario is that you have control over SMI, so you don't allow to run arbitrary biases
on your system, arbitrary firmware. That depends, of course, you can't do this if you are just buying stock hardware. You usually don't have the control, but if you are designing your own hardware and you are interacting or even modifying the bias, then you may have control over this. But it's another issue.
Generally, anything which is then beyond the hypervisor, which is underneath the hypervisor of control, can always disturb you. And something you have to keep in mind, this is not solved by virtualization. This is a hardware property, and you have to deal with the hardware in this regard, okay?
Hi, we need to close the room soon, so.