We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

NOVA Microhypervisor on ARMv8-A

00:00

Formale Metadaten

Titel
NOVA Microhypervisor on ARMv8-A
Serientitel
Anzahl der Teile
490
Autor
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
NOVA is a modern open-source microhypervisor that can host unmodified guest operating systems next to critical host applications. Although originally developed for the x86 virtualization extensions of Intel and AMD, the internals of the microhypervisor and its external API were designed with flexibility in mind, such that the code could also be ported to other architectures. In this talk we present the first ever version of NOVA on ARMv8-A. We will show how the NOVA abstractions map onto the ARM architecture, how modern virtualization features such as GIC and SMMU are being used, discuss the ongoing evolution of the NOVA API and how the ARM port differs from the earlier x86 version. The talk will conclude with a short demo, an outlook into the NOVA roadmap and the formal verification efforts around the code base, as well as opportunities for collaboration with the NOVA community.
ARM <Computerarchitektur>Schar <Mathematik>SystemprogrammierungMereologieBitComputerarchitekturSoftwareentwicklerComputeranimation
VisualisierungDemo <Programm>Strom <Mathematik>FontUnternehmensarchitekturPhysikalisches SystemKomponente <Software>Virtuelle RealitätInstantiierungVirtuelle MaschineSummierbarkeitPartitionsfunktionMereologieBitVirtualisierungEchtzeitsystemDemo <Programm>SpeicherabzugPhysikalisches SystemATMFramework <Informatik>Einfacher RingCoprozessorZusammenhängender GraphInstantiierungMaskierung <Informatik>SchnittmengeMikrokernelApp <Programm>FirmwareVirtuelle MaschineKartesische KoordinatenNetzbetriebssystemProzess <Informatik>Kategorie <Mathematik>MaßerweiterungARM <Computerarchitektur>SoundverarbeitungGamecontrollerGruppenoperationAbstimmung <Frequenz>Computeranimation
ARM <Computerarchitektur>UnternehmensarchitekturPhysikalisches SystemKomponente <Software>Virtuelle RealitätVirtuelle MaschineInstantiierungZugriffskontrolleZeiger <Informatik>RahmenproblemObjekt <Kategorie>ZeitbereichMinkowski-MetrikROM <Informatik>Formale SemantikHypercubeWeb-SeiteHalbleiterspeicherLeistung <Physik>AbstraktionsebeneAdressraumFokalpunktPhysikalisches SystemMinkowski-MetrikObjekt <Kategorie>ProgrammierumgebungAutomatische IndexierungBitZusammenhängender GraphStützpunkt <Mathematik>Rechter WinkelSystemaufrufDatenbankVersionsverwaltungComputerarchitekturMultiplikationsoperatorVirtuelle MaschineGamecontrollerKartesische KoordinatenMAPTelekommunikationPunktCodeEndliche ModelltheorieHardwareKlasse <Mathematik>Zeiger <Informatik>Arithmetisches MittelZahlenbereichATMKategorie <Mathematik>OrtsoperatorMapping <Computergraphik>RechenschieberKernel <Informatik>ThreadVirtualisierungDomain <Netzwerk>RichtungPhysikalismusFirmwareGeradeARM <Computerarchitektur>BefehlsprozessorElektronische PublikationComputeranimation
AbstraktionsebeneInterrupt <Informatik>SynchronisierungVererbungshierarchieInterprozesskommunikationPortal <Internet>Domain <Netzwerk>Operations ResearchInterface <Schaltung>AusnahmebehandlungBefehlsprozessorEreignishorizontVisualisierungArchitektur <Informatik>AggregatzustandMessage-PassingWärmeübergangARM <Computerarchitektur>SenderVirtuelle MaschineMessage-PassingSystem FWärmeübergangEmulatorSystemaufrufARM <Computerarchitektur>ImplementierungDifferenteVirtualisierungCASE <Informatik>Kontextbezogenes SystemParametersystemRechenschieberDienst <Informatik>ServerVersionsverwaltungBefehlsprozessorAggregatzustandComputerarchitekturSchedulingMechanismus-Design-TheorieThreadEreignishorizontWort <Informatik>Domain <Netzwerk>ZahlenbereichPunktInterface <Schaltung>DifferenzkernAbstraktionsebeneQuick-SortObjekt <Kategorie>DatenfeldMultiplikationsoperatorMereologieUmwandlungsenthalpieVererbungshierarchieWeb-SeiteGamecontrollerKartesische KoordinatenAusnahmebehandlungBitBandmatrixResultanteGruppenoperationAdressraumGraphfärbungEchtzeitsystemWeb SiteAbstimmung <Frequenz>Sichtenkonzeptp-BlockSkriptspracheTrennschärfe <Statistik>MehrrechnersystemIndexberechnungMeterComputeranimation
Kontextbezogenes SystemGleitkommaprozessorInterprozesskommunikationHasard <Digitaltechnik>BefehlsprozessorARM <Computerarchitektur>VirtualisierungInterrupt <Informatik>InjektivitätInterface <Schaltung>Prozess <Informatik>Element <Gruppentheorie>VirtualisierungBefehlsprozessorDisjunktion <Logik>Interrupt <Informatik>KonditionszahlAggregatzustandVektorraumMAPAdressraumZeitzoneMathematische LogikAusnahmebehandlungDifferenteHasard <Digitaltechnik>GamecontrollerKontextbezogenes SystemSpeicherabzugInterface <Schaltung>BitHardwareHilfesystemCoprozessorARM <Computerarchitektur>SpieltheoriePhysikalismusDatenverwaltungMessage-PassingInjektivitätSystem FKernel <Informatik>PartitionsfunktionMereologieThreadStellenringProzess <Informatik>Element <Gruppentheorie>Verzweigendes ProgrammMultiplikationsoperatorCodeATMOverlay-NetzSichtenkonzeptPhysikalischer EffektResultanteNachbarschaft <Mathematik>WärmeleitfähigkeitRechter WinkelHeegaard-ZerlegungPhysikalisches SystemPräprozessorTypentheorieBildschirmmaskeDateiformatDiagrammSynchronisierungSystemaufrufSoundverarbeitungMinimalgradVorzeichen <Mathematik>InstantiierungRechenwerkQuaderCASE <Informatik>Computeranimation
ARM <Computerarchitektur>VirtualisierungReelle ZahlPhysikalisches SystemSemaphorKontextbezogenes SystemKontextbezogenes SystemInterrupt <Informatik>AggregatzustandMultiplikationsoperatorARM <Computerarchitektur>Physikalisches SystemComputerarchitekturEmulatorVirtualisierungSpeicherabzugInstantiierungBildschirmmaskeArithmetische FolgeNotepad-ComputerComputersicherheitTreiber <Programm>Kartesische KoordinatenPerspektiveSichtenkonzeptMinkowski-MetrikArithmetisches MittelBitGüte der AnpassungComputeranimation
ARM <Computerarchitektur>Physikalisches SystemSpeicherverwaltungTextur-MappingStreaming <Kommunikationstechnik>Translation <Mathematik>GruppenoperationPartitionsfunktionSchnelltasteKonfigurationsraumSystemplattformEmulationMapping <Computergraphik>Demo <Programm>Translation <Mathematik>Kontextbezogenes SystemSeitentabelleDifferenteVirtuelle AdressePhysikalisches SystemSystemplattformQuaderWhiteboardArithmetisches MittelTransaktionTreiber <Programm>AdressraumStreaming <Kommunikationstechnik>SchaltnetzKartesische KoordinatenQuick-SortGruppenoperationKonfigurationsraumWeb-SeiteTLB <Informatik>Einfache GenauigkeitARM <Computerarchitektur>MultiplikationSoftwareentwicklerDomain <Netzwerk>VektorraumDatenverwaltungMinkowski-MetrikZahlenbereichPartitionsfunktionAttributierte GrammatikBeobachtungsstudieSchätzfunktionSchnelltasteInformationsspeicherungRechter WinkelCluster <Rechnernetz>ZweiSichtenkonzeptKlasse <Mathematik>TabelleMenütechnikUniformer RaumSystemaufrufSpieltheorieComputeranimation
PartitionsfunktionDemo <Programm>Strom <Mathematik>GEDCOMVirtuelle MaschineVirtualisierungGraphfärbungTreiber <Programm>SpielkonsoleZusammenhängender GraphHardwareAbstimmung <Frequenz>VersionsverwaltungTopologieDifferenteComputeranimationFlussdiagramm
Lokales MinimumNetzadresseSoftwareentwicklerSpielkonsoleSoftwareMultiplikationsoperatorClientVirtuelle MaschineLastRechter WinkelEinfach zusammenhängender RaumHumanoider RoboterSoftwaretestKartesische KoordinatenZellularer AutomatWurzel <Mathematik>TouchscreenInternetworkingBootenWeb-SeiteDemo <Programm>ServerVirtualisierungTabelleInterrupt <Informatik>Treiber <Programm>TypentheorieLoginGamecontrollerDelisches ProblemPhysikalisches SystemBenutzerfreundlichkeitGruppenoperationInterface <Schaltung>Mailing-ListeDatensichtgerätNetbook-ComputerProzess <Informatik>AbschattungHyperbelverfahrenWeb logProgramm/Quellcode
Inverser LimesPartitionsfunktionDemo <Programm>Strom <Mathematik>Virtuelle MaschinePhysikalisches SystemVersionsverwaltungVirtualisierungMereologieMultiplikationsoperatorSpeicherabzugDemo <Programm>GruppenoperationDifferenteTreiber <Programm>TypentheorieMeterProzess <Informatik>Humanoider RoboterVerzweigendes ProgrammQuick-SortMenütechnikBitRapid PrototypingSichtenkonzeptUnternehmensmodellArithmetisches MittelMAPComputeranimationFlussdiagramm
UnternehmensarchitekturARM <Computerarchitektur>Funktion <Mathematik>ProgrammverifikationQuellcodeBinärdatenBetriebsmittelverwaltungKernel <Informatik>Komponente <Software>Lokales MinimumFunktionalProgrammverifikationPräkonditionierungVirtuelle MaschineKonditionszahlMessage-PassingVersionsverwaltungProgrammiergerätQuick-SortDatenstrukturZusammenhängender GraphParametersystemSchnelltasteCodeElektronische PublikationSelbstrepräsentationComputerarchitekturZeiger <Informatik>HalbleiterspeicherResultanteAdditionUmwandlungsenthalpieDivergenz <Vektoranalysis>Abstrakter SyntaxbaumMinimumKette <Mathematik>QuellcodePerspektiveBitKernel <Informatik>Physikalisches SystemMereologieDatenverwaltungBeweistheorieFormale SpracheARM <Computerarchitektur>MaschinenschreibenSummierbarkeitVerzweigendes ProgrammMultiplikationsoperatorTermGeradePhysikalismusGebäude <Mathematik>HilfesystemBinärcodeFormale SemantikZweiunddreißig BitGlobale OptimierungProgrammfehlerBenchmarkRapid PrototypingStrömungsrichtungSchreib-Lese-KopfAbstraktionsebeneZahlenbereichProgrammierungAutomatische IndexierungSchlüsselverwaltungCompilerAusnahmebehandlungAggregatzustandMathematikMenütechnikSichtenkonzeptAuthentifikationPunktComputeranimation
SystemprogrammierungQuellcodeVerschlingungInformationKernel <Informatik>Verzweigendes ProgrammARM <Computerarchitektur>FokalpunktPhysikalisches SystemKomponente <Software>ProgrammverifikationExistenzsatzEin-AusgabeQuellcodeEinfache GenauigkeitVirtualisierungDokumentenserverPunktWeb-SeiteWärmeausdehnungZusammenhängender GraphComputersicherheitBitInformationGrundsätze ordnungsmäßiger DatenverarbeitungVerschlingungBildschirmmaskeComputeranimation
ProgrammverifikationAutomatische HandlungsplanungFunktionalMultiplikationsoperatorResultanteGanze FunktionPhysikalisches SystemHardwareVorzeichen <Mathematik>CompilerPunktMaschinenschreibenGamecontrollerUmwandlungsenthalpieKontrollstrukturGrenzschichtablösungSoundverarbeitungMathematische LogikCodeOpen SourceVirtuelle MaschinePerspektiveProgrammverifikationMechanismus-Design-TheorieMereologieAggregatzustandMAPKategorie <Mathematik>DifferenteKritischer Punkt <Mathematik>Endliche ModelltheorieOffene MengeVirtualisierungHalbleiterspeicherQuellcodeTermBeweistheorieArithmetische FolgeKomplex <Algebra>GruppenoperationBitAbstrakte SyntaxFunktion <Mathematik>BruchrechnungComputerarchitekturStrahlensätzeComputeranimation
Demo <Programm>Strom <Mathematik>PartitionsfunktionProgrammverifikationCodeBeweistheorieAutomatische HandlungsplanungComputerarchitekturCASE <Informatik>Offene MengeSoftwareentwicklerArbeit <Physik>MathematikGrenzschichtablösungAdditionDateiformatFlächeninhaltComputeranimationFlussdiagramm
PunktwolkeFacebookOpen Source
Transkript: Englisch(automatisch erzeugt)
The next talk is given by Bruno Steinberg. I'm excited about Bruno because Bruno hasn't been giving a talk here for many years, I think. Six. Six years. Seven, seven. Now he's back to talk about new developments in the NOVA world.
Yeah, I'm excited to be here too. Let's first do a little poll. Who of the audience has heard of NOVA, or who has worked with NOVA? Maybe a quick shot of hands. Okay, I think about half the folks. So, today I'll be giving a talk about NOVA hypervisor on ARMv8a, which is our port of NOVA to the most recent ARM architecture.
And since only half of you know NOVA very well, the first part of the talk will go a little bit into detail about what NOVA is and where we came from. Then the second part of the talk, I'll talk about virtualization on ARMv8a. ARMa is the A-profile for the performance cores.
There's also ARMr for real-time cores and ARMm. We are talking about Cortex-A. And then I'll conclude the talk with the current status. We'll do a little demo here. We've already put some equipment here, and then I'll talk a little bit about roadmap. So those of you who've been following the microkernel dev room today, you've seen this system,
which is NOVA on x86, already in action, when Norman gave his talk on G node. So, NOVA on x86 is a system that has existed for more than 10 years. And the way that virtualization works on x86 is that without the virtualization extensions,
you have two processor rings, really four of which only two are used. You have ring zero, which is used by the kernel, and ring three, which is user mode. And you have an operating system and applications running in those two modes. This virtualization, these rings are now called guest rings, which is why we call them ring zero G and ring three G. And you get the same set of rings
duplicated for host mode. You have ring zero H and ring three H. So ring zero host is occupied by the NOVA microhypervisor, and ring three host is occupied by what we call hyperprocessors. And the whole G node framework that you saw two talks ago is a framework that runs in ring three host.
It's interesting to note that the NOVA microhypervisor is the only component that runs privileged, and everything else, all the colorful apps that run in the host, and everything that runs in guest runs deep privileged. And there's a little footnote there that says obviously we have no control of firmware. What's also interesting to note
is that we give every virtual machine its own instance of a user-level virtual machine monitor, which means that should something go wrong in a virtual machine, should you have a VM escape, should an instruction not be virtualized properly, or something like that, then a VM escape will only affect that VMM instance, and the rest of the system will not be affected. So now we took this system, and we said,
can we build something like that on ARMv8? And the next slide shows what does NOVA look like on an ARMv8 architecture. So at the bottom, we have the hardware, and then we have four privilege levels, which ARM calls execution levels. So starting at the top, we have EL0, which is user mode, EL1, which is supervisor mode, where kernel runs.
EL2 is called hypervisor mode, this is where NOVA lives. And then we have EL3, which is called monitor mode, in which a firmware monitor sits that can switch between a non-secure and a secure world. And we do not own this monitor, we do not control it, and we also do not control a trusted execution environment like an Opti that runs along the side.
But in the world that we own, which is the top left corner, the non-secure world upwards from EL2, NOVA is the only privileged component, and all the other properties that we had on x86, including a virtual machine monitor, pure guest VM, and the deep privileged host environment, all of that has been carried forward to ARM.
Now let's talk a little bit about NOVA. NOVA is hypervisor, micro-hypervisor, that takes the best ideas from micro-kernels, capability-based systems, high-performance virtualization, and puts all of that together in a small piece of code where the x86 version is less than 10,000,
and the ARM version currently is less than 7,000 lines of code. It is based around capabilities. What does that mean? You have a bunch of kernel objects that I've shown in the middle. We have protection domains, these are address spaces. We have execution contacts, you can think of them as threads or virtual CPUs. We have portals, these are communication endpoints
by which protection domains and threads inside them can communicate. And all these kernel objects cannot be referenced directly. Instead, we give every protection domain that needs to access them a capability. What is a capability? You can think of a capability as a pointer to a kernel object plus associated permissions. So possession of a capability
gives you the right to invoke that capability and to access that object. And if you don't have a capability to an object, you can neither name the object nor access it. So that's a very powerful security system. SEO4 has the same, and a lot of newer micro-kernels do. And you can build really powerful other systems on top as Norman has demonstrated with G node.
The same principle, by the way, works for memory, where you can think of a virtual address as being a capability to a physical memory page with read-write execute permissions. So that same principle is also applied for Nova. So we have multiple space, we have an object space for these capabilities, and we have a memory space for delegation of writes to memory.
In the older version, the x86 version, we used to have a mapping database that tracked all these dependencies of who gave the capability to whom and things like that. We got rid of all that, we replaced it with a new hyper-call called control protection domain, which takes two capabilities, the sender and the receiver protection domain, and you have a take-grant model
where the possession of, let's say, a capability A and a capability B for those two protection domains, you can say, take the two capabilities from slots two and three in protection domain A and put them in slots six and seven of protection domain B. And obviously, we have more than eight slots, this is just for illustrative purposes. And when the user wants to refer to a capability,
they give a selector, which is a number, and index into that space and say, I wanna do something with capability two. This is how it works, it's like a file to script. So upon this, we built some basic abstractions, and these are the basic abstractions that the Nova version on x86 has had for a long time.
So on the left, you can see two protection domain, think of them as two address spaces, two applications that wanna communicate with each other, and in the protection domain A, we have a caller thread, NEC, here, and it has a scheduling context attached to it. And those of you who saw Gernot's talk this morning know that the scheduling context is a really nice abstraction for expressing scheduling parameters
and for delegating it along a call to implement priority inheritance, and Nova has had this for 10 years. And we also have, on the other side, a quality execution context that has no scheduling context. For the very same reason that Gernot gave, that you can completely account the execution of a server invocation to the caller.
So what would that look like? The caller makes a hyper-call, in this case called IPC call. It gives a capability selector to a caller, and what's called an MTD. We call this message transfer descriptor, and the message transfer descriptor tells Nova how many words to copy from the UTCB of the caller to the UTCB of the callee.
So the UTCB is like a message in an outbox. It's very abstract, and the same mechanism applies no matter whether you're on x86 or on ARM. So once that call has been made, we now transition over to the right side. The situation looks like this. The scheduling context has been given to the callee. This achieves priority inheritance,
or in case of real-time, bandwidth inheritance. The callee can now execute, because it has a time slice. The time is accounted to the caller, and at some point, the callee is going to do a reply. It invokes IPC reply, gives a message transfer descriptor, says how many words do I want to copy back from my UTCB to the caller, and then we are back to the left side. This is the basic mechanism
by which protection domains and threads in protection domains communicate with each other. And this also illustrates how the different kernel objects sort of come into play there. So the MTD defines the number of words to copy between those two yellow UTCBs, and we'll see in the next slide how the MTD is also used for other purposes.
Now, I said NOVA is a micro-hypervisor. This means at some point we need to deal with virtual machines, with virtual CPUs, with emulating instructions and devices. So a lot of you will notice that this slide looks very, very similar to the previous slide. Why is that? Because we layer VM exit handling and the emulation exactly on top of IPC.
So the difference is that we now have two protection domains that are called a virtual machine, in which an execution context run that is a vCPU, and we have a VMM, which in this case is the server. It provides the virtual machine service. And the execution context here, we call it handler. So at some point, the virtual CPU is going to take a VM exit on x86 or an exception on ARM,
which requires the attention of the virtualization layer. This VM exit takes us into NOVA. NOVA will save the state in a page which we call virtual machine control block, because we came from x86. And then NOVA is going to synthesize an IPC call on behalf of the virtual CPU, send it to the handler,
and the question is now, what state do we transmit? And that state is called MTD-ARC. And it comes out of a portal where every event has its own portal. So you can say, if I see this event, here's that state that I want. If I see some other event, here's this state that I want. And then the VMM gets this message in its UTCB. And the reply works exactly like an IPC reply,
except that the handler specifies an MTD-ARC, the result gets stored in the VMCB, and then the vCPU resumes, or eRETS back to the guest, that state will be restored. So MTD-ARC is similar to the MTD that I showed you before, which copies state around, except this time between VMCB and UTCB,
and not between two UTCBs. Same with the scheduling context, the entire accounting of the emulation goes onto the scheduling context of the vCPU. Now, we get into ARM territory, and before we do that, I'll show you how we hide the differences between x86 and ARM. And you don't have to understand all these details,
but here I've shown you the MTD, the MTD-ARC for ARM, and x86, which is basically a bit field that describes the different fields. And this will also be listed in the NOVA interface specification. So the VMM gets to decide for every event which parts of the architectural states it wants to see.
And by using this architecture-dependent message transfer descriptor, we can completely reuse all the mechanisms. It's just a message that looks different depending on which architecture you're on. FPU is an interesting state because it's kind of large. So on ARM, it's 32 128-bits words,
which is 512 bytes. This is something you do not want to switch on every VM exit or on every context switch, so we switch the FPU lazily, which has some consequences. So we call this red, which currently has its state saved in the FPU. We call it the FPU owner. And then whenever some other thread
wants to access the FPU, we take the state from the owner, we save it lazily, give it to the other guy. So that means that the hypervisor has to do two things. When we switch away from the FPU owner, we have to disable the FPU to see any other access trapping. And when we switch back to the FPU owner, we have to re-enable the FPU because we don't want the owner to trap. The owner has its state in the FPU.
And that means on a typical kernel exit pass, you have multiple conditions to check, like am I going back to the FPU owner? Is the FPU enabled? If not, this can all be very time consuming because it's branch heavy code. So how do we make this efficient? Novi has a notion of hazards, which are exceptional conditions that the kernel needs to fix up before it goes back to user mode.
And we have two types. We have a CPU hazard for certain exceptional conditions on the CPU, like the CPU needs to reschedule, the CPU needs to flush its TLB, the CPU needs to restore a certain state. And we have the same thing for ECs. The EC needs to reload some state. The EC has been recalled and things like that. So what we do is we overlay two bits of the state
from the CPU hazard and the EC hazard, namely the state FPU is disabled and the EC is not the FPU owner. And FPU is enabled and the EC is the FPU owner. And only if the XOR of both states yields a value of one, which means these two conditions are out of sync, do we actually vector through the hazard pass? And do we do this FPU switching magic?
In the fast case, where the FPU is already in the right state, so it's disabled and the EC is not the owner, or it's enabled or it is the owner, we will not take the slow pass. So these are some of the performance tricks that Nova's using to get really excellent virtualization performance. I'll talk a little bit about interrupt virtualization on ARM,
because unlike x86, we get a lot of help from the hardware. ARMv8 has the interrupt controller split into two parts. There's one part that lives in the chipset, which is the upper part, the upper box, consisting of the gig distributor and the gig redistributor.
You can think of it like an IOA pick. The distributor is the global part, which takes shared peripheral interrupts coming from devices, and the redistributors are CPU local parts, which take interrupts like core local time interrupts, core local performance counter interrupts and things like that. And then the distributor and redistributor send physical interrupts to the CPU interface,
which is called gig C, and then they normally go to the processing element, which is your processor. So that's the left side of this diagram, and this is the thing that gets used when virtualization is not in the game. With virtualization, you get the right side of this processing element consisting of something called gig H and gig V.
So what ARM did there is they created a virtual interface of the gig C, which looks exactly the same like the physical CPU interface, except that it doesn't get its interrupt from the distributors. It gets its interrupt from the hypervisor control interface, and this is what Nova drives. So what we do is we map gig V directly into the physical address space of the VM,
so the guest can access it anyway at once without causing any exits, and all the interrupt injections that come from the hypervisor go to the control interface, and this control interface autonomously injects interrupts into the guest. So this is very efficient. How we expose this at the API level is twofold.
First of all, the interrupt injection logic between gig two and gig three uses a different format, which is really not so nice. So we abstract on the injection interface the differences away. We let the VMM always tell us interrupts in gig three format and if you run on something older, we convert. And then we have a hypercall called assign interrupt,
which the partition manager can use to steer interrupts to specific cores. Next thing is timer virtualization. ARM has a global system counter, which distributes the time in the entire system to all the cores, and this is exposed on each core in form of two counters and two timer instances.
One is called the physical timer, which directly reflects the system counter, and the other is called virtual timer, which has an offset applied or subtracted from the system counter so that you can hide time progression from a guest. The good thing is that the physical timer can be trapped.
So it can be disabled, you can trap and emulate it, and this is what we do. And this emulation works much similar to x86. So the VMM uses a timer interrupt, summer four, to catch this timer expiring, and then this interrupt is asynchronously delivered. It's much more complicated for the virtual timer because the ARM architecture does not allow you
to trap or to intercept the virtual timer, which really forces you to context switch it. So every time we switch from one vCPU to another, we context switch the virtual timer, which is obviously a little bit ugly from a context state saving perspective, but the good thing is that the timer interrupt
will always be synchronous to the execution of the vCPU. So timer interrupt can never come when that vCPU is not running. And whenever a timer interrupt comes, it is for exactly this vCPU. So we deliver such a timer interrupt directly synchronously via a portal. Next thing I wanna talk about is a system MMU. System MMU is a crucial piece
of a security-critical system because you need not only protect applications from writing to each other's address spaces by means of virtual memory and MMU, you also need to prevent rogue drivers that can do DMA from DMAing all over the place, like overriding the hypervisor,
overriding the neighboring protection domain, and this is what x86 uses an IO MMU for, and the ARM concept is called a system MMU. So a DMA transaction that arrives at the system MMU on the left side of the picture has two attributes. It has a stream ID, which identifies which device it's coming from, and it has a, let's call it a virtual address or an IO address,
which is where the device wants to DMA to. And then ARM has a configuration piece in the SMMU that consists of stream mapping group, which allows you to associate streams to translation context. So we have two streams mapping to the first translation, one stream mapping to the first translation context, and two streams mapping to the second translation context. So this would be two devices assigned to the same PD,
and then a translation context directly maps to a DMA page table, and there's a TLB in the middle that can cache frequently used translations. So it's really important to have this piece supported in the hypervisor, otherwise DMA is a critical attack vector. So there's a limited number of these stream mapping groups
and context, which is why we let the partition manager manage them, and NOVA then manages the translation part, keeping the page tables and the TLB consistent, assigning devices to PDs, and there's a assigned device hyper-core for doing this assignment and binding a device to a protection domain.
So with that, let's take a look at all the platforms that we currently support. We'll do a little demo here in a few minutes. We'll do this demo on the top left board, which is an AFNIT Xilinx Ultra 96 board, and the reason we are doing that is precisely because it has an SMMU. A lot of the other boards do not, but this board has one,
and we obviously wanna use the best infrastructure that we can have. We also run on the NXP iMX8M quad, and those of you who were in Stefan's talk just a couple minutes ago, you already saw that board in action, and the VMM that we use on top of Nova was, as Stefan pointed out, a joint development effort between Bedrock and G node,
and it was a lot of fun to work on this together. We also run on a Renaissance R-car. The difference for the R-car is that it's actually two clusters, of course, of different cores, so it's a big little system, so we also support big little, and then, as we know, Fostom is a community conference, and a lot of you have probably gotten a Raspberry Pi 4,
so Nova also runs on that, and we also did a lot of prototyping on QEMU, so naturally, we support QEMU out of the box. The NXP, by the way, is the only board we have that has a gig three, so we have sort of all combinations of different gigs, this MMU, this out SMU, single cluster, multi cluster,
all of this is supported. So with that, I'd like to invite my colleague, Shantanu, up here, who will do a little demo for you. I'll quickly summarize what we are going to show. So we have Nova Hypervisor, we have a bunch of devices, we'll assign different devices to different virtual machines using the SMMU,
as you can see here in these shaded colors. Every virtual machine has a bunch of VERT.IO interfaces, like a virtual Ethernet and a virtual UART, we connect the different virtual machines together using a virtual Ethernet switch, and we also have a console into every virtual machine using a UART multiplexer, with the UART driver being a component in the host
that has direct access to the UART hardware. So with that, I'll give the mic to Shantanu, and he will walk you through the demo and show you a bunch of things.
Okay, so, Udo, do you make this full screen? Can you make this full screen? I don't know, just this straw, sorry.
Okay, it was that easy. Okay, so, yeah, going by the picture that Udo showed earlier, so we've now booted into our UART multiplexer. This is the administrative console of our multiplexer, it's an asynchronous console,
and we have, basically, we can cache all the data coming from our virtual machines, so we can list the clients that are connected right now. So as you see, we have three virtual machines, so three VMMs, we have the console from Nova, we have the MSC, that's the master controller, that's our root hypo process, and then we have the three guests, which is an Android and two Linuxes.
So just to show you the boot logs, maybe from Nova, because we just had the talk, so the screen is not big enough, but what you see here is the typical boot log you get from Nova, so most of what's here is basically the interrupts that have been passed on and configured for the guests.
So to show you maybe a virtual machine monitor, so these are the logs from our VMM, this one is hosting an Android system, so what you see here is a pass-through for the devices, the display, the USB controller, because of the touchscreen attached,
and also a connection to the vSwitch, so we have a virtual switch that connects three virtual machines here, like Udo showed earlier. So let me quickly get into a guest here. So this one is running the network driver,
so it has our wifi driver, and I'll also set it up to use the virtual console, sorry, to use the virtual switch and set up a DHCP server so that the other machines can get an IP. So what I need to do is probably assign myself
an IP address and also start a DHCP server, so to make sure that this is working, I have another Linux system, this one doesn't have network connection right now,
but I can run DHCP clients here,
so now I have a virtual connection, and I also got myself an IP address from the other VM, we can exercise this interface, maybe sending some packets, possibly doing throughput test with the other VM,
sorry, I think I put the wrong IP address,
so I can definitely ping,
so I put the wrong IP address earlier, so we have a throughput of 580 megabits or so between the two virtual machines, this is just exercising the interface, and because this is the VM
that has access to our wifi driver, I can also connect it, maybe I'll just set up a quick hotspot on my cell phone, not trusting the wifi here,
so I just use WPS applicants to set up a wifi network, so now I have wifi on my Linux, I have a virtual ethernet, now I can show you the console from Android, this is a development build, so please ignore all the audit logs coming from SELinux,
and as you can see here, Android has also picked up an IP address because of our DHCP server running, so basically what it now needs is just a connection to the internet, and to do that, I'll just set up NAT tables and make my network VM behave as a router,
so now my Android should have access to the network, it does, so basically this is how we isolate the network driver from Android,
it's running in a separate virtual machine, and Android itself can be connected to the outside world through this connection, and because there was already a Star Wars shown before this, I will probably run some other webpage,
maybe the date and time today, so I don't know if people at the back can see,
I think I mistyped something.
There's some issue with my, can you type that?
I can. So unfortunately not very used to German keyboards, but you should see a page loading here
telling you the date and time today, and I think that is it for the demo. It's a very slow internet connection, sorry for that, but it's kind of loading. Yeah, so we have today's date's time getting it from the network, so our Android has a network connection.
So I'll hand it back to Udo to conclude this.
Thanks, Shantanu, for this really impressive demo. Shantanu spent a lot of time working on this together with the other parts of our Userland team, so it's only fair to let him demo this system in action. So let me just repeat what you just saw. We showed three VMs, an Android with two virtual CPUs,
a Linux, and another Linux with a single core. So we actually can support SMP and uniprocessor virtual machines. We can show different types of virtual machines, Android, and we saw two different versions of Linux. We have virtual devices, we have pass-through devices with the SMMU. We have a virtual Ethernet switch that can connect all the different VMs together.
We can do host drivers and all of this on an ARMv8 system. So let's talk a little bit about Roadmap because this is obviously not the end of the road. We made the ARM port as a rapid prototyping effort
together with our friends from Genode who helped us on the VM side, and we developed this independently on a separate branch, which means that today the ARM and the x86 version have a little bit diverged. So the next effort that I'll be spending time on is to merge significant portions of the x86 and ARM branch
and I said we have roughly 10,000 lines of code on one side and 7,000 on the other side, and we can probably get that well below 17,000, so maybe half that or something like that. We wanna support newer ARM features because ARM has put a lot of cool features into the architecture such as pointer authentication, memory tagging, all of which is useful in a secure system.
In terms of what additional functionality we'll add to Nova then in the x86 and ARM version, you make the hypervisor binary relocatable, which means you can move it anywhere in memory and it will run, so that's good for different systems which have different physical memory layouts. We add a lot of features which will help people build VM introspection on top of Nova,
which means you can use the hypervisor to peek into what's happening inside a virtual machine, maybe protect stuff inside a virtual machine from the outside. We have a lot of ideas how we can improve the kernel resource management and obviously as part of the effort of merging these two branches, we'll also take a look at all the cool things that other people have built on top of Nova and see which of those are useful for us
and there's I bet also a bunch of bug fixes that we haven't yet incorporated. There will always be performance optimizations. I think performance is already on par with the best micro currents that we have, but we encourage independent benchmarking and we have a big effort going on around formal verification of the hypervisor code and of the verification of the components
that run on top of it. And those of you who heard the SL4 talk today know that verifying something is a really hard exercise, especially if you're talking about something like ARMv8 and especially if you're talking about SMP. So what does formal verification on Nova roughly look like? And I don't want to go into the details here.
The head of our formal message team is here, so if there are any detailed questions, I will just directly hand those questions to him. But to give you a feeling for what that would look like, you have a really simple function here and obviously functions in Nova are a lot more complicated. A function that adds two unsigned 32-bit numbers, X and Y, and returns the result.
And what programmers do to make their code formally verifiable is they annotate it as pre and post conditions. Component lit already showed sort of a similar structure. So what we say here is we have, we introduce in the precondition two new variables, V1 and V2, that carry integer values, V1 and V2, that we bind to the parameters X1 and X2.
Otherwise the precondition is empty, because this function does not touch any global state. And we have a post condition that also does not touch any external global state, except for the return value. And the return value is the sum of V1 plus V2, but since we are adding two 32-bit numbers, this addition can obviously overflow.
So we have to trim the result to a 32-bit value. And all the functions in the kernel will be annotated with specifications like this. And then we have a tool chain that looks like the bottom of this slide. So the way this works is as follows, from a high-level perspective. We have the source code, which is written in CPP or HPP files.
Then we run a tool on it, which is called CPP2V. CPP2V is a tool that we wrote at Bedrock. Gregory wrote it, the gentleman sitting there. It's a Clang plugin, which takes the abstract syntax tree of Nova and converts it into a Coq representation of that abstract syntax tree. So what you get out of this
is a file called foo.cpp.v. And the annotations of the source code, which you can either write inline, or we nowadays prefer to write them out of line, come in a separate file called foo.cpp.spec.v. These two files are put in the Coq compiler, along with semantics of the C++ language.
And we are also heavily investing in proof automation, because proving something, as you heard earlier today, is a lot of manual effort. And the more you can automate this, the faster you can reproof things. Like if you make a change, which touches only a minor portion of the code, there's a high chance that significant portions of that proof will just go through the automation.
What you get out of it is then a file called foo.cpp.proof.vo, which is the machine check proof. And this is an ongoing effort, so we're not as far as SEL4, but I think the toolchain that we have there is already looking really, really good. So with that, I'll conclude my talk and take questions.
We just made the source code of the Nova ARM port public. As the rest of Nova, it's available under GPLv2 license. You can download it from those two links. There's a bedrock systems repository, and there's the repository where Nova has always lived, my private one.
It's important to pick up the ARM branch, not the old x86 branch. And there's webpages that have further information, papers, links, and so forth. And with that, I'll conclude the talk, and I'll be happy to take any questions you may have.
So as far as I understand, SEL4 has mostly, so the question was, where do we see the differentiating points to SEL4?
So I think the major focus of SEL4 has always been on the formal verification of the kernel. The major focus for Nova's existence was always to make virtualization and component-based systems on top of a single hypervisor possible. At some point, you also have to formally verify to get a compelling security story.
So this is something where I see SEL4 clearly as the thought leader. But performance-wise, I think we are very competitive. I'm not sure, maybe Gernot can comment on that. The way I understood it, SEL4 only runs on ARMv7, and there's an ongoing effort to port it to ARMv8,
so Nova is now running on ARMv8. You're running on ARMv8? Okay. So SEL4 also runs on ARMv8, but it's not verified. We do not plan to support ARMv7 as a hypervisor interface, but we support 32-bit guests
which are ARMv7-backward compatible. Verification bit, I do understand that you actually could use some kind of input to follow verifier from the C code. That's my understanding.
But that's different from having, well, a design, actually breaking down stuff and improving stuff in regard to the design. So what you can prove is that a function probably has specific characteristics, but this has nothing about the overall architecture. And what is the approach there? So you need a model.
So the question was, if you have a simple function like this, how does this connect to the overall architecture? Because proving a simple function doesn't prove that your whole system is correct. We use a concept called separation logic to be able to reason independently about different functions of the system which makes the whole proof effort very modular.
Which means if you can reason about this function only touches a certain fraction of the state and does not affect anything in the other portion of the state, then you can verify that function and you know it doesn't affect the other side. You need to prove certain things at different levels. So we need to have a model of how the system behaves,
like how Nova behaves. But you also need to show that the source code actually conforms to that model. So the system specification and the specification of the Nova functions will be connected through those annotations. The actual source code will be connected to the functional specification of all these functions
through the tool Cpp2v, which shows that the code that the compiler generated, the abstract syntax tree, conforms to the specification that this function has. That's also something that we built.
Gregory, you want to take that question? Yeah. So the question was, is the aimful functional correctness? So one of the properties that we will prove at Bedrock is what we call the bare metal property. And the bare metal property, in a nutshell, expresses that if you execute something in a virtual machine, it will behave
as if you executed it on bare hardware minus some timing requirements. But from a correctness perspective, you want to show that execution of something in a VM does not differ, except for small timing effects from execution on a bare metal machine. And this property obviously also entails
that NOVA does not crash and that the VMM emulates the instructions correctly and so forth. Yes, here's another question. Do you have interest in architecture to know back in architecture?
So the question was, will formal verification require us to re-architect certain aspects of NOVA or do we expect it to be formally verifiable as is? I expect that we may find maybe one or two things that we want to change, maybe also not because they were incorrect
but because they make the verification easier. But overall, I don't expect too many surprises in terms of the code being correct because we exercise this already quite a lot and other people like Gino do too and you've seen the complex scenarios that run into it. I'll be surprised, of course. Of course. The question in the middle?
So the question was, we've listed VM introspection in our roadmap, what kind of VM introspection do we do? So we provide mechanisms for VM introspection for being able to look at the memory of a virtual machine,
to be able to look at register state, maybe set break points, intercept execution control at critical points, but these are low-level mechanisms. The more richer VM introspection features that we built on top, we have not yet disclosed.
So the question was, what is our plan for open sourcing the remaining parts of the stack? So NOVA has just gone open source. The remaining parts of the system will progressively go open source over the course of the next year.
Like we said, we jointly developed the user-level virtual machine monitor with Gino. Before we make that public, we wanna make sure that it really fits their model and our model very well because we want this to be a community effort. So we don't wanna go out with something that is half-baked so we'll polish that a little bit before we make it public, but the plan is to make the entire virtualization layer basically everything that's shown in the host public.
Come on. I'll make Gregory. So the question was, will the verification artifacts be open source?
So the answer was we would like to make the artifacts public, but this is a separate discussion and we will discuss it offline. Yep, I think Gregory would agree with that.
There was another question here. The primary use case for the proof automation is that we want the code to be agile, not in the development sense, but in the sense that we wanna add new features,
we wanna make changes, we wanna make additions, new architecture port, and we do not want to go through a lot of manual labor of fixing up the proof. So we want to automate this as much as possible. So the whole purpose of proof automation is to make the proof go through faster, to make the proof less manual.
And I think we also have a plan to work a lot with academia on developing new tactics to make the proof automation smarter. So if anybody is interested in working in these areas, talk to Gregory. Any further questions? I think that's it.
Thank you. Thank you very much. Thank you.