We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Mitigating Processor Vulnerabilities by Restructuring the Kernel Address Space

00:00

Formal Metadata

Title
Mitigating Processor Vulnerabilities by Restructuring the Kernel Address Space
Title of Series
Number of Parts
287
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
In this talk, I will present a new Spectre/Meltdown mitigation that I have prototyped for the Hedron microhypervisor. This prototype has also been used to quantify the runtime overhead of the proposed mitigation. Processor-level vulnerabilities, such as Meltdown and Spectre v1/v2, allow attackers in userspace to leak information from the kernel address space. This is particularly devastating for kernel designs where the kernel address space is identical for all processes and thus allows the attacker to break the system's confidentiality boundaries. Previous mitigation attempts, such as kernel page-table isolation (formerly KAISER) for Meltdown and various branch predictor/speculation barriers for Spectre v1/v2, introduce costly instructions into performance critical parts of the operating system kernel. Especially mitigations related to the branch predictor are only possible if the CPU vendor has exposed special functionality. During the last six months I investigated an alternative mitigation strategy on the kernel design level that shows good mitigation properties, but adds negligible runtime overhead. This alternative mitigation involves moving process-related information in the kernel into a process-local part of the kernel address space. A userspace attacker that can infer the content of its associated kernel page table can thus only read information about its own process. Switching between these kernel address spaces is done as part of the normal address space switch when a thread in a different process is scheduled and thus comes with no additional cost.
Kernel (computing)SpeicheradresseComputerStudent's t-testKerberos <Kryptologie>SoftwareVirtualizationSoftware testingBlogProcess (computing)Energy levelInformationBefehlsprozessorOverhead (computing)Independence (probability theory)MereologyCASE <Informatik>MeasurementThermodynamischer ProzessVirtual realityCryptographyAsynchronous Transfer ModeSeitentabelleStructural loadControl flowWhiteboardData storage deviceWordLeakAdditionPhysical systemSemiconductor memoryWorkloadCASE <Informatik>MathematicsBranch (computer science)Asynchronous Transfer ModeQuicksortStructural loadIndependence (probability theory)VirtualizationGame controllerMechanism designThermodynamischer ProzessKey (cryptography)CryptographyEmailRight angleInformationAddress spaceLocal ringAreaContext awarenessSeitentabelleProcess (computing)Arithmetic meanError messageArmLine (geometry)Different (Kate Ryan album)Overhead (computing)SpeicheradresseKernel (computing)System callMereologyPrototypeBefehlsprozessorOperating systemComputer scienceSoftware industryOpen sourceSoftware testingCategory of beingGreatest elementSign (mathematics)MikrokernelSocial classStatement (computer science)CybersexInformation securityHypothesisStrategy gameVulnerability (computing)Exterior algebraStudent's t-testCausalityService (economics)Revision controlSet (mathematics)CodeMultiplication signSinc functionBasis <Mathematik>Menu (computing)CircleResultantOpen setFile formatHTTP cookieNeuroinformatikBitSource codePerspective (visual)Control systemMusical ensembleSelf-organizationDuality (mathematics)QuantumVideo game consoleOffice suitePresentation of a groupPhase transitionComputer animationEngineering drawing
Thermodynamischer ProzessSpeicheradresseVirtual realityKerberos <Kryptologie>PrototypeImplementationMechanism designMeasurementThermodynamischer ProzessInformationKernel (computing)Content (media)SpeicheradresseOverhead (computing)Thread (computing)Semiconductor memoryLocal ringMechanism designNetwork topologyVirtualizationGame controllerPrototypeAreaCryptographyState of matterFloating-point unitData structureInterprozesskommunikationContext awarenessMessage passingMultiplicationQuicksortWindowMiniDiscBenchmarkMultiplication signAddress spaceBuffer solutionBlock (periodic table)Web pagePhase transitionPhysical systemComputer configurationDecision theoryCompilerSocial classResource allocationRight angleConstructor (object-oriented programming)Virtual machineMemory managementTelecommunicationPoint (geometry)Metropolitan area networkTable (information)Basis <Mathematik>WordIntelligent NetworkFilm editingCASE <Informatik>Object (grammar)Uniform resource locatorDigital electronicsOperating systemString (computer science)Data storage deviceFocus (optics)Coma BerenicesPulse (signal processing)Execution unitSet (mathematics)Forcing (mathematics)Open setComputer animation
Mechanism designBenchmarkKernel (computing)CompilerMeasurementKerberos <Kryptologie>Thread (computing)Structural loadSeitentabelleControl flowRead-only memoryThermodynamischer ProzessContent (media)Independence (probability theory)BefehlsprozessorPrototypeProof theoryOverhead (computing)Network topologyRevision controlMultiplication signSpectrum (functional analysis)Point (geometry)Category of beingResultantContent (media)BefehlsprozessorThermodynamischer ProzessInformationReading (process)Configuration spaceSpeicheradresseKernel (computing)BenchmarkNumberWindowMiniDiscSequenceLink (knot theory)Context awarenessDisk read-and-write headLocal ringDialectOverhead (computing)Address spaceLine (geometry)LeakCore dumpSocial classDifferent (Kate Ryan album)ArmInterface (computing)PrototypeStructural loadProcess (computing)VirtualizationIndependence (probability theory)Semiconductor memoryNormal (geometry)Thread (computing)Branch (computer science)Exterior algebraMeasurementCASE <Informatik>Game controllerSound effectOperator (mathematics)Order (biology)Cycle (graph theory)Remote procedure callSide channel attackDataflowLoop (music)Control systemSemaphore lineProof theoryControl flowRandom accessBitPhysical systemProgrammer (hardware)PlotterOptical disc driveLaptopSummierbarkeitStress (mechanics)Set (mathematics)System callSource codeDampingMereologyDifferential (mechanical device)WebsiteArithmetic meanForm (programming)ImplementationComputer animation
Decision theoryOperating systemProgrammer (hardware)AdditionPhysical systemLocal ringGroup actionBus (computing)Address spaceRight angleCartesian coordinate systemComputer animationMeeting/Interview
Address spaceKernel (computing)Similarity (geometry)Meeting/Interview
Right angleBuildingMeeting/Interview
Uniform resource locatorLocal ringInterprozesskommunikationRight angleInformation securityData structureMechanism designSemiconductor memoryThermodynamischer ProzessPrototypePurchasingMixed realityForcing (mathematics)Product (business)Symbol tableMeeting/Interview
Semiconductor memoryLocal ringThermodynamischer ProzessInformationKernel (computing)MehrplatzsystemPrice indexForcing (mathematics)Meeting/Interview
Meeting/Interview
Meeting/InterviewComputer animation
Transcript: English(auto-generated)
Hello everyone, I'm Sebastian Eydam. Thanks for having me and thanks for organizing this Devroom. Today I want to talk about you about mitigating processor vulnerabilities by restructuring the kernel address space.
But before we start, a few words about me and about Sybaros. So I am a computer science student at BTE Codpost since 2015. I just handed in my master's thesis at the end of 2021, so I'm nearly done.
I worked for Sybaros as an intern since 2017 and as a full-time employee since the start of this year. Sybaros is a German cyber security software company. It was founded in 2017, so it is rather young.
It focuses on secure virtualization and automated software testing. If you want to learn more about Sybaros, you can click one of these links. So, what are we going to talk about today? In 2018, several processor-level vulnerabilities were found.
Two of them were Spectre and Marathon. These vulnerabilities allow attacks that leak kernel memory from user space. There are other attacks possible, but today we will focus on this class of attacks. There are several mitigations for these attacks, but all of them introduce costly instructions into performance critical parts of the kernel.
They slow down your computer, and we don't want that. We want fast computers. I looked into an alternative mitigation strategy that modifies the kernel design,
and that ideally adds no one-time overhead and is CPU independent. What I mean by CPU independent, I will talk about later. Today, I explain the current status. Why is it that bad that attackers can leak kernel memory? How do existing mitigations mitigate that?
I present a proposed mitigation. Then, I implemented a prototype in Hadron, a microkernel hypervisor that is open source. I talk about that. Then, I always said the mitigation is zero cost.
I did some measurements, and I'll show you if this statement was right or wrong. We will talk about the mitigation properties of this mitigation and a short conclusion at the end. What you can see here are two virtual edges spaces.
The top one is an attacker, in this case a meltdown attacker. The bottom one is some sort of crypto process. Let's assume this is a process that encrypts your emails or signs your emails or something like that. On the left side of the virtual address space is a user part of this address space,
which is not so relevant here. It is just white. On the right, you see the kernel part. Here you can see that we have some information that belongs to the attacker, some information of the crypto process, and some information that belongs to the operating system.
You see on the right there is a big part where we have crypto keys in the kernel memory. You can also see that both processes have the same kernel address space. If the attacker can leak its own kernel memory,
it can also leak the crypto keys that belong to the crypto process. In this case, an attacker leaks your crypto keys and can decrypt your emails or can sign emails in your name and send them to other people. All sorts of bad things can happen, and we don't want that.
One mitigation for meltdown is KPTI or kernel page table isolation. KPTI works by hiding the information as long as the attacker wants a news or not. Again, you see two virtual address spaces.
This time, both address spaces belong to the attacker. You can see the upper address space is the address space of the attacker running in user mode. As long as the attacker, for example, tries to leak memory from kernel space,
the memory is just hidden. There is no memory the attacker can leak. If the attacker changes to kernel mode, for example, by executing a system call, the operating system does an address space switch and switches to the lower address space.
There, all information is available. The operating system can execute the system call, and when it switches back or when the system call is done executing, it switches back to the upper address space, and the attacker, again, can't leak any memory.
KPTI works by hiding the memory from the attacker. As I already said, we have two additional address space switches for each system call or for every switch into kernel mode.
This means that we have an overhead. We do extra work, and KPTI has an overhead of 5% to 30%. This depends on the workload. Workloads where you do many system calls, you have a higher overhead, and if you have fewer system calls, you have a lower overhead.
Different mitigations have different overheads. For example, speculative load hardening is a mitigation prospector we run and has an overhead of 10% to 50%. The red pull line is a mitigation prospector B2 and has an overhead of up to 20%.
This everything speculative execution in this case doesn't mean that I tell the CPU to stop speculating. This means to use serializing instructions. For example, in x86, you can use the airfence instruction, and here you can see the overhead is devastating.
This is just a temporary solution where you can't use any of the other mitigations. We have indirect branch control mechanisms. This is a mitigation prospector B2, and those mechanisms are only available on x86
and only on newer x86 processors. Here you can see what I mean by CPU independence. This mechanism is not CPU independent because you can only use it on x86, not on ARM or something like that, and only on newer processors. This is also a disadvantage of this mitigation.
Let's talk about the new mitigation. You can already see what the new mitigation does. Here you can again see the two virtual error spaces, and you also see that in the kernel memory
of these error spaces, only the memory or the confidential information of the process itself is available and of the operating system. This mitigation works by creating a process local memory area in the kernel, and then we move as much information as we can
into this process local memory area. That way, in this case, the crypto keys are hidden from the attacker, and switching between these process local memory areas is done as part of the address space
that we do for each process context switch. We don't have any overhead here. No overhead from this side.
We can see the memory is hidden from the attacker. We have no overhead. Everything is fine. The next question is, which information can we move into this process local memory area? For that, I created a decision tree. What you would do here is
you go through your operating system kernel, you take one data structure, and then you ask these four questions. The first question, does it contain secrets? This is more an optional question because, of course, you can make data structures process local if they don't contain any confidential information,
but it may not be feasible for you. You can ask this question, but you don't have to. The next question, does it have to be shared? Is it necessary that this data structure is available in multiple address spaces?
If this is the case, we can't make it process local because we just can't. It has to be available in multiple address spaces, but if it doesn't, you have to ask if we need this information or this data structure for a context switch. Even if you need it,
maybe you can split up the access to this data structure. What I mean by that, I will show you later. I have two examples where I will show you how to use these questions. The first example is a UDCB, the user spec control block.
Everyone uses the UTCB as a message buffer for a PC for inter-process communication. Let's go through the questions. Does it contain secrets? The answer here is yes because in this message buffer, you can find all sorts of information. You don't know what information
the process puts into this message buffer, so you have to assume that it contains secrets. Does it have to be shared? The answer is yes because during IPC, the sender puts the information into the UTCB, and then the kernel has to copy the information from the sender's UTCB
into the receiver's UTCB. At this moment, the UTCB has to be available, or the kernel has to receive both UTCBs, and that means we can't make it process local. It has to be available in multiple address spaces. The next data structure is the FPU state.
The FPU is a floating-point unit of your processor, and the FPU state is the memory area where the content of the FPU is saved while a thread doesn't execute.
Does this memory area contain any secrets? The answer is yes because the FPU is also used in cryptographics. There can be a key or something like that in there, so this is very secret. We don't want this to be leaked.
Does it have to be shared? No. The FPU state is tied to one thread in one process, so no need to share that. Is it accessed during context switch? Yes, because when you do a context switch, the FPU state of the currently executing thread
is saved to memory, and the FPU state of the thread that we are about to activate is restored, so yes, we need this doing a context switch. But the access can be split, so in here you will see what I mean.
We can save the FPU state of the currently executing thread before the address-based switch, and then we can do the address-based switch, and then we can restore the FPU state of the thread that is about to execute. So we can make this process local,
as you can see in the decision tree. So for my prototype, I took this data structure and made it process local. So what I had to do for that? The prototype needed a memory allocator for process local memory, because the memory allocators that were available,
they don't make the memory process local, so a new allocator was needed. I had to modify the context switch, so as I always said, I have to save the FPU state before switching address spaces
and restore the new FPU state after that, so some modifications were made there. And I needed a mechanism to initialize the process local memory, and you may ask, so I had to initialize memory before. Why is this something special? And I get you asked, so let me tell you about the initialization problem
and what you see here on the left and on the right, two virtual address spaces. The left one is currently active and the right one is inactive, and you see the tcreate is a thread that is currently running, that is executing in the left address space,
and tcreate now wants to create a new thread that is executing in the right address space, which is currently inactive. So tcreate would create a new thread object, and by that, it would also call the constructor of the thread class, and the thread class wants to allocate process local memory,
so it does that, and it writes that into the correct page tables, and everything is fine here, so this i.logpl returns a pointer, and now we want to initialize the memory,
so we want to write to that pointer, either for cleaning this memory or something like that, and so we write to this pointer, and because the left address space is still active, we have a problem here, so it's either a page fault or maybe memory corruption,
so we can't initialize the memory here. In my prototype, I added an initialization phase to the FPU, so the FPU doesn't initialize the memory when it is created. It initializes it when it first restores its state.
So the measurement. As I always said, I don't want any overhead for this mitigation, or I think this mitigation has no overhead, and I focused on the context switch mechanism, because I thought if I have any overhead,
I have it there, and there was another place where I thought some overhead may be possible, which is the creation of threads, but this is not so relevant in head1, because it is solely used for virtualization, and rarely creates any threads, so this was just not that interesting here.
To benchmark this context switch mechanism, I used one microbenchmark, and because it is possible that I forgot something or that I created some overhead somewhere else,
I used two other benchmarks. One is a Linux kernel compiler benchmark, so I create virtual machines at once on top of head1, and inside this virtual machine, I compile the Linux kernel, and I measure how long it takes, and then I had the Windows disk speed benchmark,
so again, I create a virtual machine on top of head1, which this time runs Windows, and the disk speed benchmark is a benchmark that measures how fast I can access the disk or the SSD or something like that, so I have one computation-heavy benchmark
and one IO-heavy benchmark. So let's at first take a look at the microbenchmark. This was a simple benchmark, so you can see I have two threads and two processes, and these threads use two semaphores
to pass the control flow around, so the local benchmark loop executes a semaphore endpoint down, and then the remote benchmark loop is active, and this uses a semaphore B to pass the control flow back,
and they do that in a loop, so I measure how long this loop takes, then I divide this value by the amount of repetitions, and then I know how long one context switch takes. And here are the results. So the unmodified head1 was
21 cycles faster than the mitigated head1, but this is about 1%, and I thought this was not enough to be attributed to the mitigation, so I found papers that stated that moving code around
and modifying the link order and stuff like that can have this effect on the program, or in this case on an operating system, so I don't think that this difference comes from the mitigation. The Linux kernel compile benchmarks,
this time I printed the results as a boxplot, and here you can see that the results are very similar, so no evidence for an overhead here, and the windows disk speed benchmark was a bit, it was not so easy to interpret,
so the windows disk speed benchmark doesn't give me one number, and I can look at this number and say, yeah, this is good and this is bad, because I get, so windows disk speed benchmark uses four different configurations, so a random access to the disk or sequential and how many threads it uses,
and then it measures the read and the write speeds, and so I get eight numbers, and here in the subtitle you can see the configuration here, and this result looks like the mitigated version is slower than the unmodified version,
but you can also see that I have a really high standard deviation, and if I look at some other results, for example this one, this time it looks like the mitigated version is faster than the unmodified,
but again, together with the standard deviation, all I can say is that I don't have an evidence for any one time overhead here, so yeah, those measurements look very promising. Okay, so the next point are the mitigation properties of this mitigation,
and for that I created a table, so we will make a tick if this mitigation has this mitigation property, and it's white if it doesn't. So KPTI, as I already said, is a meltdown mitigation,
and this is also CPU independent, because we don't need any special instructions or stuff like that. Before we come to the spectral mitigations, we have to classify spectral attacks, because spectral attacks can be cross-processed, which means that the attacker triggers a spectral gadget inside the victim,
and then the victim executes and loads the desired information into the side channel. This is a cross-process attack, and in an intra-process attack, the attacker itself executes and tries to load the desired information into the side channel.
So just to have this differentiation, disabling speculative execution mitigates SPECTRE-V1 and V2, and the dot means that it mitigates intra-process attacks and cross-process attacks.
So SPECTRE-V1 and V2 is mitigated, but as I already said, this is absolutely not zero-cost, and it is not CPU independent, because on x86 you use the LIFANS instruction, and on ARM you would use another instruction,
or some interface the CPU provides, or something like that, so this is not CPU independent. Speculative load hardening mitigates SPECTRE-V1. It is also not CPU independent, because it makes assumptions about how the CPU executes certain instructions.
The REDPO lines are SPECTRE-V2 mitigations. Again, it is not CPU independent, because it also makes assumptions about how the CPU works. And the interweig branch control mechanisms are SPECTRE-V2 mitigations,
and as I already said, this is an x86 interface, so not CPU independent. So the proposed mitigation mitigates MELTON. It mitigates intra-process SPECTRE-V1 and V2 attacks,
because as I already said, it hides information from the attacker, so as long as the attacker executes, it cannot leak the desired information. When the victim executes, the desired information is there in the kernel address space, so in this case, the proposed mitigation, yeah, it cannot mitigate that.
Unknown attacks means that it mitigates all attacks that leak kernel memory from user space. So this whole class of attacks are mitigated using this mitigation. As I already said, it is zero cost, I have shown that. And it is CPU independent,
because I don't need any special instructions or stuff like that. Yeah, so I think this looks very good for the proposed mitigation. To mitigate cross-process attacks, you would need another mitigation, like running trusted and untrusted threads on different CPU cores
or something like that, and with such a mitigation and my proposed mitigation, you would also mitigate cross-process SPECTRE attacks. So, the conclusion. I investigated and I've shown you an alternative mitigation
for side-channel attacks. This mitigation works by creating a process local memory region in the kernel, and the content of this region is distinct for different processes. And I switch between these process local memory regions
via the normal virtual address space switch doing a context switch, so I don't create any overhead there. The mitigation is zero cost, as I have shown, and is CPU independent. And to show that this mitigation works,
I implemented a prototype as a proof of concept for head run. And the measurements, as I already have said, shows no overhead, and if we want to take a look at the implementation, here's a link to my GitHub. Thanks for having me,
and if you have any questions, feel free to ask. This is a decision the programmer of the operating system has to decide.
So, yeah, you can configure this, because this is nothing the programmer of the user space application can do. The operating system does this,
so you can configure this. This has to be, yeah, the operating system decides this, so maybe this wasn't clear. All right. And the next question is that the idea reminds someone strongly of
a kernel address space isolation that was presented two years ago at FOSTEM for Linux, which supposedly is not the kptui version. Is the similarity just accidental, or are you aware of this work?
So this was accidental. I wasn't aware of this talk. All right. Yeah, so I think the questions are,
were all questions for now. Well, is there anything you would like to add, or maybe give some prospect at what would be the next steps for building on this kind of work? So, as I said, I implemented a prototype in Hadron,
but this is nothing that we can ship to any, so this is not production-ready, so I think the next step would be to make this production-ready, so the locator is very simple at the moment,
so the next step would be to work on that, and then to identify more data structures that can go into process local memory, or to think about how to mitigate data structures that we can't just put into process local memory,
and how to work with these data structures, like the UTCB I said, we can't make this process local memory at the moment, so then there's the question how to create a more secure IPC mechanism. All right.
And by now, another question popped in, can you imagine implementing this for Linux? So, yeah, this is something I already thought about, and this is, so you can't just say yes or no, because you can also Linux create
a process local memory area, and put some information to this process local memory area, but this won't work for everything, so I think the biggest problems are drivers, because they run in current context, and they have memory that, where you can find confidential information,
but you can't just make this process local, because there is no user process associated with this, or no single user process that is associated with this information, so, yeah, for this,
you can't just put it in process local memory, so you have to use some more case-specific mitigations for this, so you can do that for part of the Linux kernel, but not for the whole Linux kernel. Okay.
All right. So, I'm not sure if there is another question coming up. Well, anyway, you can, this channel will be public in a few minutes,
so the audience can ask you other questions in this room, so thank you, Sebastian, for your talk, and take care. Thanks for having me. Bye. You too, bye.