We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

ZombieLoad Attack

00:00

Formal Metadata

Title
ZombieLoad Attack
Subtitle
Leaking Your Recent Memory Operations on Intel CPUs
Title of Series
Number of Parts
254
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The ZombieLoad attack exploits a vulnerability of most Intel CPUs, which allows leaking data currently processed by other programs. ZombieLoad is extremely powerful, as it leaks data from user-processes, the kernel, secure enclaves, and even across virtual machines. Moreover, ZombieLoad also works on CPUs where Meltdown is fixed in software or hardware. The Meltdown attack published in 2018 was a hardware vulnerability which showed that the security guarantees of modern CPUs do not always hold. Meltdown allowed attackers to leak arbitrary memory by exploiting the lazy fault handling of Intel CPUs which continue transient execution with data received from faulting loads. With software mitigations, such as stronger kernel isolation, as well as new CPUs with this vulnerability fixed, Meltdown seemed to be solved. In this talk, we show that this is not true, and Meltdown is still an issue on modern CPUs. We present ZombieLoad, an attack closely related to the original Meltdown attack, which leaks data across multiple privilege boundaries: processes, kernel, SGX, hyperthreads, and even across virtual machines. Furthermore, we compare ZombieLoad to other microarchitectural data-sampling (MDS) attacks, such as Fallout and RIDL. The ZombieLoad attack can be mounted from any unprivileged application, without user interactions, both on Linux and Windows. In the talk, we present multiple attacks, such as monitoring the browsing behavior, stealing cryptographic keys, and leaking the root-password hash on Linux. In a live demo, we demonstrate that such attacks are not only feasible but also relatively easy to mount, and difficult to mitigate. We show that Meltdown mitigations do not affect ZombieLoad, and consequently outline challenges for future research on Meltdown attacks and mitigations. Finally, we discuss the short-term and long-term implications of Meltdown attacks for hardware vendors, software vendors, and users.
Keywords
Roundness (object)Structural loadDigital electronicsCASE <Informatik>Observational studySession Initiation ProtocolBefehlsprozessorComputer animationLecture/Conference
Universe (mathematics)Structural loadTwitterOffice suiteMoment (mathematics)EmailLecture/ConferenceComputer animation
Suite (music)YouTubeStructural loadVideo gamePoint (geometry)Meeting/InterviewLecture/Conference
Data storage deviceSuite (music)Cache (computing)BefehlsprozessorModemPredictionEvent horizonStructural loadSemiconductor memoryStructural loadType theoryPhysical systemNetwork topologyDifferent (Kate Ryan album)Computer configurationLecture/Conference
Graphical user interfaceTheory of relativityRight anglePhysical lawFamilySoftwareStructural loadCartesian coordinate systemComputer architectureBefehlsprozessorComputer animationLecture/Conference
Core dumpIntelSemiconductor memoryBefehlsprozessorMikroarchitekturComputer architectureVideo gameComputer programmingMereologyDebuggerCuboidBitResultantParallel portCodeLevel (video gaming)XMLProgram flowchartLecture/ConferenceComputer animation
Semiconductor memoryCache (computing)Queue (abstract data type)Scheduling (computing)OpcodeBranch (computer science)Parallel portConstructor (object-oriented programming)Resource allocationScheduling (computing)Queue (abstract data type)BefehlsprozessorBuffer solutionOrder (biology)Cache (computing)Translation (relic)Execution unitException handlingInstance (computer science)Phase transitionSoftwareLecture/ConferenceProgram flowchart
Cache (computing)TLB <Informatik>Structural loadScheduling (computing)Dependent and independent variablesSemiconductor memoryBefehlsprozessorData storage deviceMathematicsSemiconductor memoryExecution unitException handlingBuffer solutionStructural loadTLB <Informatik>Cache (computing)Line (geometry)Address spaceMultiplication signCartesian coordinate systemUniverse (mathematics)Cycle (graph theory)BefehlsprozessorLibrary (computing)Instance (computer science)Flash memoryShared memoryImage resolutionCASE <Informatik>ArmLecture/ConferenceComputer animation
Semiconductor memoryMassFlash memoryCache (computing)Set (mathematics)Cartesian coordinate systemSemiconductor memoryLibrary (computing)Shared memoryMereologyMultiplication signCASE <Informatik>Price indexLecture/ConferenceProgram flowchartComputer animation
Cache (computing)Address spaceLine (geometry)Set (mathematics)Physical lawGroup actionBitUniform resource locatorMereologyPhysicalismData storage deviceSubject indexingWebsiteMultiplicationWeb pageCache (computing)Moment (mathematics)NumberFile formatProgram flowchartEngineering drawing
Meta elementSemiconductor memoryKernel (computing)Exception handlingWeb pageAddress spaceCrash (computing)Cartesian coordinate systemCache (computing)Web pageSpeicheradresseSemiconductor memoryTable (information)Exception handlingAlphabet (computer science)Constructor (object-oriented programming)Structural loadGroup actionCASE <Informatik>Lecture/ConferenceComputer animation
Semiconductor memoryStructural loadCache (computing)Cache (computing)Alphabet (computer science)Operator (mathematics)Computer architectureAddress spaceKernel (computing)Reading (process)Structural loadEntire functionGame controller
VirtualizationWeb pageBuffer solutionSpeicheradresseGame controllerNumberStructural loadBefehlsprozessorSet (mathematics)Execution unitData storage devicePhysicalismPhysical lawUniform resource locatorProcess (computing)Revision control
Web pageCore dumpSemiconductor memorySign (mathematics)Military operationStructural loadIndependence (probability theory)IntelVideoconferencingComputer-generated imageryWeb pageStructural loadBuffer solutionData storage deviceNumberMultiplicationVirtualizationResultantBefehlsprozessorLevel (video gaming)Address spacePhysicalismMatching (graph theory)Operator (mathematics)Process (computing)CASE <Informatik>Pattern languageComputer architectureMultiplication signComputer animation
Semiconductor memoryKernel (computing)Cache (computing)Level (video gaming)IntelImplementationOperations researchSemiconductor memorySystem callPoint (geometry)Error messageLine (geometry)Bit rate2 (number)Kernel (computing)Surjective functionMessage passingLevel (video gaming)Operator (mathematics)Cache (computing)Noise (electronics)Information securityCartesian coordinate systemImplementationInstance (computer science)Computer animation
ImplementationMilitary operationSemiconductor memoryCache (computing)Bit rateKernel (computing)FreewareWebsiteLevel (video gaming)Cache (computing)Side channel attackError messageBit rateKernel (computing)BitMultiplication signGraph coloringSemiconductor memoryComputer animation
AverageEmailKernel (computing)Line (geometry)Cache (computing)Semiconductor memoryNoise (electronics)StatisticsMessage passingObject (grammar)Point (geometry)SeitentabelleSlide ruleCache (computing)BefehlsprozessorSpeicheradresseSemiconductor memoryBuffer solutionAddress spaceOnline helpCartesian coordinate systemCore dumpWeightClient (computing)Reading (process)Computer animationLecture/Conference
Thread (computing)Address spaceAddress spaceThread (computing)Line (geometry)Reading (process)Buffer solutionDiagramOperator (mathematics)Point (geometry)Structural loadOpen setOffice suiteField (computer science)Meeting/InterviewComputer animation
Cache (computing)Lemma (mathematics)Semiconductor memoryStructural loadCausalityMilitary operationLine (geometry)Revision controlBuffer solutionStructural loadUniform resource locatorAeroelasticityNetwork topologyPoint (geometry)Arithmetic meanPhysical lawSemiconductor memoryCache (computing)Web pageCASE <Informatik>Operator (mathematics)Line (geometry)Computer programmingState of matterNoise (electronics)Multiplication signLemma (mathematics)BefehlsprozessorAddress spaceDifferent (Kate Ryan album)MappingLevel (video gaming)Complex (psychology)Protein foldingRevision controlWeightCondition numberComputer architectureTexture mappingGroup actionAreaMathematicsOffice suiteWordSimultaneous localization and mappingRight angleComputer animationProgram flowchart
Cache (computing)Structural loadBroadcast programmingCore dumpSemiconductor memoryBuffer solutionStructural loadLine (geometry)Cache (computing)Data storage deviceMereologyFreewareBitBefehlsprozessorComplex (psychology)Physical lawRight angleProcess (computing)Computer hardwareSoftwareComputer animation
Structural loadCodierung <Programmierung>EmailFront and back endsWeb pageScheduling (computing)SoftwareWeb pageFront and back endsSpacetimeMappingComplex (psychology)Structural loadWindowData storage devicePersonal digital assistantCodierung <Programmierung>TDMAComputer hardwareStreaming mediaAddress spaceCASE <Informatik>Figurate numberSequenceMikroarchitekturKernel (computing)PhysicalismBitInsertion lossMultiplication signSemiconductor memorySeitentabelleVirtual memoryShared memory2 (number)Order (biology)Computer architectureMusical ensembleProcess (computing)Computer programmingCartesian coordinate systemFlash memoryLine (geometry)CodeLevel (video gaming)Lecture/Conference
Physical systemBitVirtual machineOperating systemGame controllerLine (geometry)Cache (computing)Different (Kate Ryan album)Address spacePhysical systemLeakCartesian coordinate systemPersonal identification numberPlastikkarteMatching (graph theory)Auditory maskingProgram slicingWeb pagePlanningMultiplication signCASE <Informatik>PasswordSet (mathematics)Scaling (geometry)AreaComputer animation
Address spaceKernel (computing)Thresholding (image processing)Mach's principlePasswordInformation securityData storage devicePersonal identification numberCodeMessage passingPlastikkarteVirtual machineConnected spaceInternetworkingVideo gameLevel (video gaming)DigitizingComputer animation
SeitentabelleKernel (computing)Endliche ModelltheorieBefehlsprozessorSummierbarkeitSoftwareBuffer solutionWindowSequenceBitBefehlsprozessorKernel (computing)Address spaceScheduling (computing)Endliche ModelltheoriePhysical systemoutputInternetworkingWebsiteVirtual machinePersonal digital assistant2 (number)Demo (music)Slide ruleNumberDifferent (Kate Ryan album)Structural loadGroup actionSeitentabelleDirection (geometry)Physical lawMultiplication signImage resolutionRight anglePoint (geometry)Real numberProgram slicingThomas BayesElectric generatorEstimator
ExplosionEmailCache (computing)Proof theoryStructural loadDirection (geometry)WebsiteTheory of relativityForcing (mathematics)System callDiagram
BefehlsprozessorMultiplication signEmbargoStructural loadBuffer solutionField (computer science)Line (geometry)MereologyComa Berenices19 (number)
Structural loadMiniDiscLine (geometry)Cache (computing)Event horizonInterior (topology)IntelCausalitySystem programmingOperations researchInterrupt <Informatik>EmailEndliche ModelltheorieWeb pageTable (information)VarianceCoefficient of determinationStructural loadSet (mathematics)Sheaf (mathematics)Database transactionEvent horizonException handlingRevision controlAddress spaceBitCASE <Informatik>Statement (computer science)SynchronizationSystem callDifferent (Kate Ryan album)Line (geometry)MappingCondition numberWeb pagePhysical systemPoint (geometry)LeakOracleGroup actionNormal (geometry)Interrupt <Informatik>Multiplication signFlash memoryFrequencyWhiteboardInstance (computer science)Buffer solutionPhysical lawNumberKernel (computing)CodeCache (computing)2 (number)Doubling the cubeComputer animationProgram flowchartLecture/Conference
Noise (electronics)Right angleState of matterMultiplication signComputer animation
FrequencySample (statistics)Thresholding (image processing)Total S.A.Physical systemStructural loadSystem callRevision controlBitMultiplication signComputer animation
Value-added networkStructural loadBefehlsprozessorEmbargoVulnerability (computing)12 (number)Process (computing)SequenceCodeSoftwareIntel
Value-added networkBoundary value problemCASE <Informatik>IntelEmailNetwork topologyMultiplication signLeakDifferent (Kate Ryan album)Type theoryCategory of beingCondition numberDefault (computer science)Inheritance (object-oriented programming)Slide ruleFlash memoryStructural loadTwitterWebsiteCodePersonal digital assistantSequenceBuffer solutionBitMikroarchitekturSpacetimeComputer animation
Structural loadPointer (computer programming)Sampling (music)Semiconductor memoryStatisticsBefehlsprozessorSpacetimeSide channel attackAddress spaceState of matterDifferent (Kate Ryan album)Sampling (statistics)Pointer (computer programming)Musical ensemblePrimitive (album)BefehlsprozessorTriangleLink (knot theory)InternetworkingMultiplication signMereologyNoise (electronics)Social classFilter <Stochastik>Line (geometry)SpacetimeType theoryMomentumDeterminismComputer programmingPoint (geometry)Area19 (number)VarianceComputer animation
Semiconductor memoryBefehlsprozessorUtility softwareFreewareFinite-state machineMoment (mathematics)Different (Kate Ryan album)EmbargoRight angleVulnerability (computing)MereologyNumberMultiplication signPower (physics)Inheritance (object-oriented programming)Internet service providerRow (database)Universe (mathematics)View (database)BitLecture/Conference
BefehlsprozessorRoundness (object)EmbargoPoint (geometry)Different (Kate Ryan album)Cloud computingUniverse (mathematics)Database transactionComputer architectureType theoryFreewareInsertion lossProcess (computing)WhiteboardLecture/Conference
Computer animation
Transcript: English(auto-generated)
You probably remember the meltdown attacks in 2018 and it was a pretty big flaw in modern CPUs and the CPUs that came afterwards or got fixed, they probably, they seemed to be fixed and the problem meltdown seems to be solved.
Well Michael Moritz and Daniel, they will show us that this is not the case. A new attack named zombie load is possible and in the following hour we'll learn all about it. Please give a really warm round of applause to Moritz, Michael and Daniel.
Thank you for this introduction, welcome everyone to our talk about the zombie load attack. So my name is Michael Schwarz, I'm a postdoc at Graz University of Technology in Austria.
So you can find me on Twitter, you can write me an email, I will be here the rest of the Congress anyway, so if you're interested in these topics or anything around that, just come talk to me, we can have a nice discussion. My name is Moritz Lipp, I'm a PhD candidate in the same office as Michael and Daniel.
You can also reach me on Twitter or just come and talk to me. Yeah, and my name is Daniel Gruss and I, yeah, I don't know, I don't have to repeat all of this. No, but before we dive into zombie load, we will start with some... Wait a second, Moritz, wait a second, I added a last-minute slide, you don't know about that.
You cannot just add a slide. Last, it's important, I mean it's right after Christmas, right, and we all remember this, ah. Come on, oh, come on, you're kidding. And last year, last year at CCC, we also had this Christmas-themed talk, right, and now
we all hear this still ringing in the ears, and this was a really nice talk, I think, as well, and we presented a lot of new Spectre and Meltdown variants there, maybe not as dangerous as zombie load, but still, I think, interesting. And when we presented this, this was uploaded to YouTube afterwards, and I was running around
in a suit at that point, and someone wrote, ditch the suit, please, he looks so uncomfortable. And today, I have a T-shirt that's much better, right. And we presented, in this talk, we presented a tree, a tree, a systematization tree, and you can see all the different attack variants here, Spectre-type attacks,
Meltdown-type attacks, and yeah, so the question is, how does this all relate to zombie load? And to start that, I think we will just present Spectre in a nutshell. Yes, and I think, what?
That's Spectre in a nutshell, yes. Yes, and maybe something more, there was also this song about Spectre, do you remember this song about Spectre? I think they also had a movie with that title, yeah.
Yeah, this is about the most technical explanation that you will get about Spectre today, because the relation from Spectre to, between Spectre and zombie load is not that. We are here to give a technical talk, not some goofing around here. So maybe we need some background first, have a really technical talk here, right?
So can you explain microarchitecture, I mean? Of course I can, I mean, it's really easy. So we all know we have a CPU, and then we have some software that runs on this CPU. That's what we always do.
And the software has this ISA, it can use this instruction set architecture like x86. So this application can use all the instructions defined by this instruction set architecture, and the CPU executes that. And of course the CPU has to implement this instruction set architecture to actually execute the instruction.
This is what we call the microarchitecture. Could be for example an Intel Core, Xeon, or some AMD Ryzen, or stuff like that. And CPUs are really easy, I learned that in my bachelor. So when you want to execute a program, there are just a few steps that the CPU has to do. So first it fetches the instruction, it decodes the instruction, it executes the instruction,
and then when it's finished executing, it writes back the result. Yes. It's really easy, you see? Yes, but this is very high level. I think we should go a bit more into details if you're asking for that. So maybe to go a bit more into details, we should look at what these boxes actually
do. Let's start with the front end. In the front end we will have some part that decodes the instructions that we send to the CPU. There is already a lot of parallelism in there, and also we have a branch predictor which tells us which micro op codes we should execute next.
There's a cache for that, and we have some MOOCs that combines all of this, and then we have an allocation queue which determines what the next instruction will be and sends that onwards. We also have an instruction cache, of course. We need to get the instructions from somewhere, and of course the instruction translation
buffer, the IT will be connected to that. This one basically translates addresses from virtual to physical. Yes. The next step would be the execution engine. In the execution engine, we have a scheduler and a reorder buffer.
The reorder buffer, although it is called reorder buffer, it actually contains all the micro ops in order, in exactly the order in which they should be executed. It's called reorder buffer because the scheduler just picks them as soon as they are ready and then schedules them on one of the execution units.
For instance, there are some for the ALU, there are some for loading data, some for storing data, and, yeah, it just schedules them as soon as possible, and then they are executed there, and as soon as they are finished executing, they will be retired from the reorder buffer, and that means that they will become architecturally
visible to all the software. And if something fails? Yeah, if something fails. If something fails, you mean a CPU exception? For instance, yes. Yes, then, of course, the exception has to be raised, and this happens at retirement.
So, first, the execution unit finishes the work, and then the exception is raised, and all the things that the execution unit did are just kicked out, just thrown away. So, then we go to the memory subsystem. Of course, if we want to make changes, we don't want to keep them in some internal registers. We want to store them somewhere, maybe load data from somewhere, and for that, we have
the load buffer and store buffer. And the load buffer and store buffer, they are then connected to the cache, the L1 data cache, and we again have a TLB to translate virtual to physical addresses, and the line fill buffer to fill cache lines in the L1, also for some other purposes, but we will
get to that later on. Yes, and caches, I think I also talked about caches in... We know about caches. Yeah, we've heard that term. Yes, caches, they are pretty easy. For instance, you have a simple application just accessing variable i twice. The first time, it's not in the cache, so we have a cache miss, so the CPU has to ask
the main memory, please give me whatever is stored at this address. The main memory will respond with the value and store it in the cache. So the second time you try to access this variable, it's already in the cache, so it's a cache hit, and this is much faster. So if it's a cache miss, it's slow, because we need a TRAM access.
On the other hand, if it's already in the cache, it's fast. And if you have a high-resolution timer, you can just measure that by measuring how long it takes to access the address. Can you really do that? Yes, I implemented that, and as we can see, around 60 cycles if the data is stored in
the cache, and around 320 cycles if it's a cache miss and if we have to load it from main memory. Oh, wait, I remember something. So we learned something at university about these caches and cache hits and misses that we can use that for attacks. So there was this flash and reload attack where we have two applications, an attacker
and a victim. We have our cache, and we have some shared memory, for example, a shared library like the libc. And if shared memory is in the cache, it's in the cache for all the applications that use it. So if we have, for example, an attacker that flushes it from the cache, it's also flushed
for all the other applications from the cache. So here, my cache has like four cache sets here, four parts, and the shared memory is in there. It was used before. So as an attacker, I can simply flush it from the cache. That's not in the cache any more. It's not cached any more. Then an attacker can simply wait until the victim is scheduled, and if the victim accesses
the shared memory, it will of course be in the cache again. That happens transparently, as you just explained. And then as an attacker again, when the attacker is scheduled, it can simply access the shared memory and measure the time it takes. And from the time, the attacker can infer whether it's in the cache, if this access
is fast, and then the attacker knows that the victim accessed the shared memory, and if the victim is slow, it was not accessed in the meantime, and it has to be loaded to the cache again. Really simple. Yes, you paid attention in my lecture, I see. But actually, there are some more details that we might want to show here.
So if we look at a cache, how a cache actually works, a cache today works not by just having these cache lines, but it divides these storage locations also into so-called ways, and they group these ways into a cache set.
So instead of a cache line, we now have a cache set. And the index, the cache index, now determines which cache set it is and not which cache line. So you have multiple congruent locations for data. The question then is, of course, how do you find the right data if you want to look something up in the cache? And for that, you take the remaining bits.
So the lowest bits are the offset, then we have n bits for the index, and the remaining parts, maybe the physical page number, is used as a tag, and this tag is then used for the comparison, and if one of the tags matches, we can directly return the data.
I prefer my simple cache. It's a lot easier. So if we combine the cache attack that Michael showed us with the thing that Daniel told us in the beginning, that except these are only handled when an instruction is retired, we can build the Meltdown attack. So let's talk about Meltdown in the beginning, because this is an attack that we built up on.
Yes, Moritz, I think for Meltdown, I mean, we already saw Spectre, and for Meltdown, I think there was a song about Meltdown, wasn't there?
That's not about the Meltdown attack. No. They sing about Meltdown, and it's clearly related to Meltdown. And this sounds serious, yes. But let's get back to the real attack. So it's really simple.
We just access an address we are not allowed to access, which makes an application crash, but we can take care of that. So a page fault exception happens, and what we do now, we use this value that we read, which we illegally read, but it's still executed that way, and encode it in our lookup
table in the cache. So here, the value is K, so what we do is we access the memory location on the left of the user memory where the K is, which means this value is loaded into the cache. And now what we can do is, after we executed this illegal instruction and recovered from the fault, we can just mount the flush and reload attack
on all possibilities of the alphabet. And at letter K, we will have a cache hit, so we know we read the value K. Yes, this is nice, but this doesn't really explain why this actually works. So let's look at the microarchitecture again. The Meltdown attack, actually the instruction that performs the Meltdown attack
is just one instruction, one operation that loads from a kernel address, moves something into a register. That's it. That's the entire Meltdown operation. Now we have our value in a register, and now we can do with it whatever we like. We can transmit it through the cache if we like, but we could use any other way.
The Meltdown attack is this reading from the kernel address that actually ends up in our register under our control. Now this enters the reorder buffer. It will be scheduled on a load data execution unit, and then it will go to the load buffer. In the load buffer, we will have an entry, and this entry has to store approximately something like the physical page number, the virtual page number
for the virtual address, the offset, which is the same for virtual and physical pages, the lowest 12 bits, something like that, and a register number. If you are familiar with register names like RAX, RBX, RCX, and so on, those are just variable names that are predefined.
There's actually a set of 160 registers, and the processor will just pick one of them independent of your variable name. And then, yes, we access the load buffer here, and in the next step, we will do a lookup for this memory location in...
Oh, sorry. We first have to update the load buffer, of course. We have to get a new register, right? This is the old values. The new values are marked in red. The register number, the offset, and the virtual page number are updated. The virtual page number is not used for the lookup in L1, store buffer in LFB. We only use the lowest 12 bits, the offset here.
And then what happens next is we do the lookup in the store buffer, in the L1 data cache, in the LFB, and also in the DTLB, we check what is the physical address. We get this from the DTLB. Now, in the next step, we would look up in the DTLB, so what does this entry say? And it says, oh, yeah, I have a physical page number.
It's present, and it's not user accessible. But the fast path, what the processor expects is always that this is a valid address, and it will, in the fast path, copy this physical address up here. At the same time, realize that this is not good. I shouldn't be doing this. But also, I mean, the virtual address matches, the physical address matches.
Why wouldn't I return the data to the register? And then the data ends up in the register. That's the meltdown attack on a microarchitectural level. So how fast is this attack? This is one question, and the other is also why does the processor do this?
And there's actually a patent, or multiple patents, actually, writing about this. And it says, if a fault occurs with respect to the load operation, it is marked as valid and completed. So in these cases, the processor deliberately sets this to valid and completed, because it knows the results will be thrown away anyway.
So why not let it succeed? So how fast is this attack? Actually, it's pretty fast. So it's 550 kilobytes per second, and the error rate is only 0.003%. Yeah, I can confirm that. So I also implemented that, and I put a secret into a cache line,
a known secret in kernel memory. And then when I try to leak that with this meltdown attack, I've just seen, then I get the values and also p in there. So x is the secret. x is the secret I brought in there. What is p? It's some noise, I guess. So it's a bit noisy, as you said, as in like this error rate from before. I'm not exactly sure what this noise is.
And actually, Intel explains that in more detail in their security advisory. So for instance, on some implementations, speculatively probing memory will only pass data onto subsequent operations if it's resident in the lowest level data cache, the L1 cache, as we've seen now.
This can allow the data in question to be queried by the malicious application, leading to a side channel that reveals supervisor data. Wait, I'm not sure it's correct. For me, it also works on the level three cache. The last level cache. Yeah, but it works. I implemented that. Did you try that? And it's also, it's not as fast anymore.
Yeah, it's just around 10 kilobytes. The error rate is 10 times as high as before, but still it works. So I removed it from the L1 cache, just have it in the L3 cache, my secret X again in kernel memory. And when I try to leak it, I get the X, and I get some P, Q, P, X. Look, the X you use as well. But it's also the X in here. There's some Xs in there and a bit of error.
There are more Xs than other letters, but still. Yeah, but I can't see the secret. But how can you get rid of that? So if you read a P? Yeah, I don't know. How can I get rid of the noise? So I need to get rid of the noise. I can't hear anything. Noise canceling headphones to get rid of the noise.
Yes, yes, yes. No, you just throw statistics on this. That's basically the message here. Just throw statistics on that, and it will be fine. Makes sense. And even if I think about what happened last year, so we presented the Melton attack at Black Hat, and then we had one slide because we
did one additional experiment because we said L1 is not a requirement. Because we can use uncacheable memory, where we mark pages as uncacheable in the page tables so the CPU is not allowed to load them into the cache. Wait, but if I do that, it doesn't work. So if I remove it from the L3 as well,
and I only have the DRAM, now my secret X, and I try that, I don't get it at all. I just get random noise here. A lot of noise. Did you read the slide? No, it just said something about not in the cache. Yeah, but there was more on the slide. Oh, I always stop reading. It can leak from that, but only if we have a legitimate access on the sibling hyper-thread.
So this is a legit access to this memory location that you try to leak. Did you try it that way? So you mean I have to leak it, and in the meantime, have a legitimate access from somewhere else? Yes, then you can just grab it from the other one. Huh, that works. I told you. I really should continue reading after the first point.
Maybe that helps. So OK, there is some noise in there, but yeah, that works. And if some people remember what we wrote in the paper back then, which I want to quote, we suspect that Meltdown reads the value from the line fill buffers. As the fill buffers are shared between threads running
on the same core, the read to the same address within the Meltdown attack could be served from one of the fill buffers, allowing the attack to succeed. However, we leave further investigations on this matter open for future work. Oh, I don't like this sentence. You always leave the stuff you don't want to do for future
you. Yeah, fuck future you. Yeah, but I can understand that at this point we had some kind of mental resource exhaustion already, but all this new stuff there. OK, so maybe back to the technical details, right? We want to understand why this works, right? And if we look at this diagram again, it pretty much is the same as before.
We have our load operation. It goes through the reorder buffer, through the scheduler, to the load data execution port, and then has an entry in the load buffer. And there we will still update the same entries. Everything's the same so far. But now we know that it is not in the L1 data cache. So even if we do the look up there, we are sure that we won't find it there. But there are other locations where we can still get it from.
And that's why Meltdown uncacheable works. It just gets it from a different buffer. Yeah, what else could we do with this? I mean, future work should probably investigate that. Future work, of course. Yes, yes. Sure. I mean, at some point, you're at this point
where future you becomes present you, and you actually have to do the stuff to set this future work. So yes, at some point we arrived at this point where we said, OK, we have to do this future work here. Yes. And maybe also here is a good point. During all these works that we published here in this area,
Meltdown, Specter, Zombieload, what we learned was that actually there is no noise. And this has become pretty much a mantra in our group. Every time someone says, oh, there's a lot of noise in this experiment, there is no noise. Noise is just someone else's data.
So what you say is we should analyze the noise, right? Oh, yeah. Because maybe it's something interesting. So maybe we do it in a scientific, mathematical way, like this lemma here, like noise is someone else's data. And we take the lemurs here of Meltdown, because if you have Meltdown and this noise
and we let the Meltdown attack go to nothing, then we are left with the noise, right? I don't think this is an appropriate use of lemurs. I don't think it works. Oh, it looks science-y. Yes, it does. But no. So from the deep dive, what Intel states is fill buffers may retain stale data from prior memory
requests until a new memory request overrides the fill buffer, like Daniel showed in the animation. Under certain conditions, the fill buffer may be speculatively forward data, including stale data. So under certain conditions, we can read what someone else
instruction or a program read before to a load operation that will cause a fault or assist. So we just need a load operation that faults. And with that, we can leak data. Wait, assist, what is that? That sounds confusing.
Let's look at that with an experiment, right? We are scientists. So let's look at a simple page here. And this page contains cache lines, as you explained before. And then we have some virtual mapping to this page. And if you remember Meltdown, as we had before, then we have this faulting load on this mapping because it was a kernel address.
It faulted there. And it was like this scenario of Meltdown. But now we need some complex situation or something. So let's map this physically page again with a different virtual address, so with a different mapping. And then we do something complicated for the CPU. So we have one access that's faulting.
And we have a different access in parallel to the same cache line that removes it from the cache, the same thing we want to access in the cache. What would the cache do then? Would it return? Might get out of resources there. It's like, oh, super confusing. So that's a certain condition, I would say. OK, so maybe we should also look at that zombie load case
in more detail in the microarchitecture again. And in the microarchitecture, we start again with the same single instruction. It's all the same. The difference between these attacks lies in the setup of the microarchitecture, not in this specific instruction that is executed.
And what we see here is that we again go through the same path. And this time, the load buffer entry is again updated. And again, this part is not used for the lookup in L1 store buffer and line fill buffer. The lookup happens. But here now, there is a complex load situation as Michael just described.
So the processor realizes, oh, I'm not sure how to resolve that, right? And says, I will stop this immediately. And now we have an interesting problem here. Because what happens? The execution port still has to do something. It still has to finish something. And it will finish as early as possible. And now, I mean, we have a PPN.
We have a cache line that matches. So why not return this one? And then we can just read any data that matches in the lowest few bits. Very nice. So this is basically use after free in the load buffer. So it's a software problem in the hardware now. Great thing.
But how do we then get the data out of that? I mean, it still dies, right? Yeah, but it's the same thing as in Meltdown. So instead of accessing the kernel address, we just have a folding load with a complex load situation. It's the same thing. And then again, we encode the value in the cache,
use flush and reload to look it up, and then we know exactly what was written there. OK, so I can do that. So I can really build that. It's not only theoretic. I can get to this complex situation here, actually, in software. So if I look at my application, I have the special address space, refusal space, and kernel space. If I allocate some physical page in physical memory,
I get a mapping in user space. And then I need a second mapping. How do I get that? It's a nice thing, really convenient. The kernel maps the entire physical memory as well in the direct physical map. And so for every physical page I have, there's also a kernel page that maps to this physical page.
So I have this situation as before here. So the physical memory and the virtual memory are not the same size, then? No, of course not. Virtual memory is a lot larger than that. But with that, I have one physical page mapped with an accessible page and mapped with an address I
cannot access. That's one of the variants, variant one, was the easiest to come up with. I also have another variant, variant three. So I have this physical memory. I can map a page in user space, simple allocated page. And then I use shared memory. If I have shared memory with myself,
I share the space with myself, I have two addresses to the same page. Wait, wait, wait, shared memory? That shouldn't fault. Yes, that's correct. So it still does. There's a nice trick with that. So of course, I can access that. It's my shared memory. I set it up. But there's something really interesting in the CPU,
the so-called microcode assists. So if you have the instruction stream that comes in, it has to be decoded. We have a decoder that can decode a lot of things to micro ops. And these micro ops then go to the mux somewhere and to the back end. And we had that before.
I listened to what you said. So yes, we have that decoder going on, back end scheduler, blah. But sometimes, there's something complicated. So maybe the decoder can't decode something because it's really complex. And it needs some assistance for that, a microcode assist. And it goes to this microcode ROM
to store software programs, software sequences that can handle certain things in the CPU. And this microcode ROM then emits the micro ops that are used in the back end. Huh, so that was not in my figure. No, that's interesting, complicated things here. So this is for really rare cases. So that shouldn't happen a lot of time
because this is really expensive. Has to clear the ROM, insert micro ops into the scheduler. So it's really complicated. It's a kind of a fault in the microarchitecture, microarchitectural fault. This happens, for example, in some cases. But one of the examples is when setting the access or the dirty bit in a page table entry.
So when I first access a page, then this microarchitectural fault happens. It needs an assist. And then, if we do that the first time, then it's a fault. And a nice thing on Windows is regularly reset. So we always have a fault all few seconds.
All this stuff about the zombie load attack, I think we also want to think about something else here. Because for Spectre, there was a movie and a song. For Meltdown, there was a movie. No, no, no, no. Come on, Daniel. There's no zombie load song. Just a few seconds, maybe. Everyone knows that.
There's a zombie loading there. That's the original. That's the original, yes. It's completely unmodified.
I'm sure this is the original.
I got this from the internet. OK, OK. We're doing a talk here.
We can continue playing it if you like it. Maybe later. We need to discuss some things. So what can we actually attack with zombie load? So what we know is we can leak data on the same and from the sibling hyper-thread. So what we can do is we can attack different applications
running on the system. We can attack the operating system. We can attack SGX enclaves. We can attack virtual machines. We can also attack the hypervisor running on the system from within a virtual machine. So it's really powerful, but we still have a problem there. So for Meltdown, it was really easy. You provide the entire virtual address,
leak the data from there. For Foreshadow, you can provide the physical address. You leak the data from there. Fallout, different attack. You can at least specify the page offset. But for zombie load, you can only specify like a few bits here in the cache line what to leak. So there's not any control there, right?
No, you can't really mount an attack with that. That's it, yeah. So we end here. It's impossible. No, it's not impossible. So what we can do is we call it the so-called domino attack. So what we do, we read one byte, and what we then do is we use the least significant four bits
as a mask and match that to the next value that we are going to read. And if they overlap and are the same, we know that this second byte belongs to the first byte. And we can continue and continue and read many, many bytes following after each other. So despite you saying we have no control,
we have pretty much control. Oh, that's nice. So I really implemented that. It's demo time. I hope it works. Let's see. So I need a credit card pin from someone. We don't see anything yet. I know, I know. Oh, no, the O.
Oh, no, what is my password? Oh, it's secure, right? Ah, yeah. No one tries a one-letter password. OK, so where is my? I have here this easy passcode.
What is it? It stores all my secure passwords in there. OK, and you use a pin for that? Yes, my credit card pin. Anyone wants to give me that four-digit credit card pin? I can try to leak that here. Yeah, yeah. Oh, no, that's boring. No one has 1234 as a credit card pin, I hope.
And it runs inside a virtual machine without internet, so nothing can leak here. A different code. It looks staged if we do that. Anyone else? 1337. Let's see. Well, you can do multiple numbers.
Three, seven. Nice. Live leakage, although it's in the VM, without any internet connection, without anything, just some below leaking the things I input inside my virtual machine from the outside. If you do that again with a different number? Yeah, because no one believes that, right? Yeah. OK.
Let's see. Different number or no? Anyone? 1280. 1280. Yeah, yeah, yeah. I can actually steal data with it. Nice.
So the question is what else can we do with that? Can we do something else? I don't know. Did you prepare any other demos? I need to find the slides again. You go back to the slides. Ah, there. Oh, so only this one demo. Oh, that's sad.
Wait a second. I find this very odd, right? There's variant one and three. Isn't that odd? No, we use the trinary system now to count. Trinary system. OK, whatever. No, we shouldn't skip here. So we have different attacker models. On the one hand, with variant one, as a privileged attacker,
where we have the kernel address and stuff like this, we can do this on Windows and on Linux. For the microcode assist for variant three, we can also do that as an unprivileged attacker on Windows because it clears the bit in the page table. So it's cross-platform, that's nice. OK, how fast is it? It's 5.3 kilobytes per second for variant one,
and for variant three, 7.7. That's not so impressive. I mean, if I want to make a logo in a website and everything, this won't... We need to get better than that. But it's a bit bad, right? We should still mitigate that, right? Yeah, yeah, yeah. So the things we can do is disable hyper-threading. Yeah, no, not practically.
So we can disable that. Group scheduling is more realistic. Yeah, but this is so hard to implement. We can also override the micro-architectural buffers so that if the data is not there anymore, we can't leak it. So it might be a bit over, there's this instruction that was updated. That overrides all the buffers, which is just a bit of cost there.
There are also software sequences that can evict all the buffers so there's no data there anymore. Which is quite odd because the software shouldn't see the buffers. OK, then we buy new CPUs. Learn new CPUs, which are not affected anymore. That's a good thing. So 8th, 9th generation, like the Coffee Lake, and then the Cascade Lake. So it says on the website, like,
it fixes meltdown, foreshadow, riddle, fallout, MLPDS, MD sum. So all these are text there. Ah, you copied this from the website? Yeah, it's from the website. Why is there three questions? There's no zombie load in there. I don't know. So it didn't say anything about zombie load on the website. Maybe it's fake. Ah, we'll see.
We'll see. OK. So if we go back to the timeline, we have been working on attacks in this direction already in 2016. And the Kaiser patch was actually a mitigation for a related attack. And we published this on May the 4th.
And, yeah. And in June, Jan Hoan reports the meltdown attack. And later this year, we also reported independently that the meltdown attack. Much later, though. Yes. So in February 15, we reported meltdown uncacheable because Intel said, no, you can only from leak one.
And we said, no, you cannot only leak from L1. So we implemented this proof of concept. Yeah, we had quite some emails exchanged around the time. We made it more nice, then, sent it again on March 28th. It was also difficult to convince our co-authors, actually, that we can leak data that is not in the L1 cache. But finally, before the paper was submitted,
actually, we were able to convince them with convincing pop. And we also explained a line field buffer leakage on May 30th. We reported zombie load then on April 12th 2019. Zombie load went public shortly afterwards because it was already
under embargo for a long time. How convenient at the same day there's this. New CPU's announced. So I bought a new CPU because I wanted to be safe. So everything is fine. Well... It's still fine. No, well...
I'm fine. Everything is fine. Are you sure? I'm not sure if everything is fine. Maybe we have a problem. Ah, maybe. Which zombie load variance works despite MDS mitigations? We're in one. We're in three. We're in two.
Or none of them. I want to use a choker. You don't have any chokers. So, Daniel Tranary was fake. There's no such thing. None. None. No. I will go with variant 2. It's the last question.
And... This is variant 2. Wait a second. You told me that there is no variant 2. Yeah, that was a joke. You really bought a trinary system? It's not even a word. Ok, I'm a bit confused. It's actually a variant 2. So we count in normal numbers like everyone else.
And if we go back to this we have this meltdown setup and then we have this certain condition setup with double mapping of one page. But this is so complex. Yes, it was too complex for you. So you simplified that. I didn't understand it when I came back from holiday. That's no joke. So you suppressed all the exceptions
with TSX transactions so you don't see any exception there. And then you decided to say oh, you have two mappings to one page. Why do I need a kernel mapping? I mean it's the same physical address so I can just use one address. Yeah, you use the same address here. And then I wrote that in four lines.
And it works. And it works. Well, that's bad. You use the transaction abort here with what can happen with data conflicts in TSX. Many different ways. Resource exhaustion again. If you use too much data there. Certain instructions like IO and syscalls and synchronous exceptions that can also abort a transaction there.
Yeah, and Intel also gave out a statement that asynchronous events that can occur during a transactional execution if this happens and leads to a transactional abort this is for instance an interrupt then this might be
a problem. So what is really happening? Because in the code we just access one address we're allowed to access and then we end the transaction. So what we do is we start a transaction we want to load our first address which is our mapping address. This will be executed and the value that we read from there we pass to our Oracle to load it
into the cache. So this is executed. If it returns the value we access the address in the cache and the transaction ends. And everything is fine. So why does this leak? Like Daniel said with asynchronous abort which we do not cause by our own code within our transaction
something can go wrong. So in this case when we start loading this address and this is still happening at some point in time an interrupt can occur like an NMI. And when this happens this transaction has to be aborted. And now the load address the load execution also needs to
be aborted and now picks up a stale value from the line fill buffer for instance from the load boards and leaks that which we then can recover. But this is a bit slow because we need to wait for an NMI to occur hitting the load execution at the right time. So what we now do is as in the previous
variants we use the flush instruction. Because there we induce a conflict in the cache line. So what is happening now? We dispatch the flush instruction we start our transaction we start our load and execute it. This induces a complex situation
which causes the transaction to abort allowing to leak with our load which is now faulting to our Oracle to recover our data. And this is really nice because this variant 2 now only relies on TSX. No complicated setup nothing anymore so as long as you have TSX you can leak data. Ok but how fast is this? Is this
now better? Yes this is really nice because now this is really fast you get up to 40 kilobytes per second. That's already a lot faster. We can really use that to spy on something. Wait a second if it's that fast could you leak something like something with a higher frequency with like a song? A song yes. Maybe we can leak
a song. You didn't like the song though right? No no I made it a bit faster. Sure. We can't see it though. I know. It's just a song.
Come on. Sounds better. No this does not sound no. And what do you want to do with that now? You want to leak this? Yes. I'm going to do that with a muted player. Ok with a muted player. With a muted player and then I run some reload at the
same time. You still can't see anything. Yeah I know. And then it should be able to pick up all the things I play. Ok. then we get and play that live. I mean this is you said there's a lot of noise for this attack right?
So it will be very noisy then. So I can play here. And I can leak here. Let's see if it might be a bit noisy. Let's see if it works.
It sounds a bit like a metal version of it. But you can imagine.
I think we can sell this as the zombie load filter. Yes. It's really good but imagine if you spy on a Skype call like that so you can still understand it works. So for the timeline we reported zombie load on April 12th and then on April 24 we reported variant 2. And we showed that it
works on a new CPU that shouldn't be vulnerable anymore. Yes. That was just before the embargo ended. That was fun. And then we had an emergency call. Another embargo and a disclosure. Without this variant of course. Yes which was quite funny because we had these ifdefs in the tech code of the
paper and just removed this variant. On the same day when zombie load was disclosed the new MDS resistant CPUs came out so you can actually buy them. Yes. We also reported on May 16 that the VRW and software sequences are insufficient. There's still some remaining leakage. It still makes
attacks a lot harder. But yes this is also Intel documented this. Only last month the variant 2 was disclosed. We had the public disclosure. I always wanted to be on a movie poster. Where am I actually? Here.
So the process with Intel improved quite a lot over the last year. They invested a lot of effort into improving their processes. And I think by now I'm really happy to work with them. I think they are also quite happy because they sent us a beer and we were so happy about that
and excited and we didn't have time until last week and then we finally had the beer and that was also very nice. Wait a minute. So the TAA attack, the variant 2 is just TSX over a leak gadget? Yes. Like I said earlier
when you go back one year we had some slides at Blackhead again where we had this code and if I look at this code it looks the same as before. Just without the flash so if I just wait I leak. This is basically just our code from GitHub. We had this on GitHub. And on the slides for one year.
And it was described in the meltdown paper. Not good. No one tries our POCs on GitHub. Maybe you should also fix that. It's really easy, right? What about the mitigations? If you don't have TSX anymore then you can't have that TSX support.
So super easy fix, right? You're kidding. No, actually that's one of the mitigations. You can just disable Intel TSX and that's the default after the latest microcode update. When you try to run the attack again it doesn't work and then you have to figure out why. It's the same performance penalty. But on the other hand we also have
VRW to overwrite the affected buffers as before. But unfortunately they do not work reliable. Also not the softer sequences. So under certain conditions you can still get leakage despite having these mitigations. Did we get any insights from that?
I mean, so for zombie load it again falls under this category of transient execution attacks. It's a meltdown type attack. It uses faults there. You can classify the different variants on the fault they use. So we have the space fault for variant 1. We have the microcode assist microarchitectural faults for variant 2 and variant 3.
One is the TAA. The TSX support, which is not a visible fault. But a microarchitectural fault and also the microcode assist for this exist and dirty bit. And as we've presented last year we've put this up on a website like transient.fail. So you can play around with that, see what kind of attacks have been explored already.
Yes, another insight is here. So we had this memory-based side channel attacks for quite some years now where we look at addresses and then you see the address is accessed or not. We can infer the instruction pointer. Then we had this meltdown attack where we had an address and we actually got the data from this address. It was completely new. And this looked a bit different.
And now with this data sampling, with some reload here, we have the missing link here between that. Because now we know when we have this certain instruction pointer, then we get the data. So we can't specify the address of the data we want to leak but a certain instruction pointer we simply get the data. And we've seen it's a nice triangle that combines all these things
and gives us more powerful primitives. So what are the lessons that we've learned? So when Meltan and Spectre came out for us it was like Spectre is here to stay that's a long problem we have to take care of and for Meltan everything is fixed. But by now we've seen much more Meltan-type attacks than Spectre-type attacks.
Yeah, so we were wrong in that assessment. If you want to play around with that, so everything is also on GitHub, all the variants, so you can try yourself to see if you can reproduce that and build your own nice music filters or stuff like that. And also in 2019 there were other papers in the same space there was the Fallout paper and the Riddle paper
which also presented attacks in this area. So to conclude our talk, transient execution attacks are now the gift that keeps on giving. Yes, and as we have seen this class of Meltan attacks is a lot larger than previously expected so we thought it's only one but we now have several Meltan-type attacks
that we know. There might be more. Yes, and CPUs are deterministic. Largely. There is no noise. If you see noise then usually it means it's data from somebody else. And now, do we still have time for the remaining part of the song?
Thank you very much.
We have some time left for questions so please line up at the microphones if you have questions.
We have questions from the internet so if you have any questions that's really nice. Signal engine please. Can users recognize attacks with power monitoring tools, CPU frequencies, memory IOPS or other free accessible tools? I don't think there is a tool tailored to detect those attacks.
Certainly you would see with the current Pox that we have you would see significant CPU utilization and probably also a lot of memory traffic. Other than that so there are no dedicated tools so far. But also I think it's better to just patch these
vulnerabilities than to try to detect them. Thank you. Microphone 4 please. In the timeline we saw that you reported variant 2 at the very end. You already had variant 1 and 3 so why the bizarre numbering? So we actually had variant 2
right in the beginning when we reported it but we only discovered very briefly before the embargo ended that it actually behaves a bit differently. So there are two key moments. In April we reported variant 1 and 3 and then two weeks later on April 24
we reported variant 2. But for Cascade Lake we really wanted to buy a CPU but university budget is limited so a few days before the embargo ended I ordered one online to test it. Also the CPU wasn't available before that. So that was apparently an accident
of the cloud provider. We suspect that we should not have been able to actually buy one before May 14 because that was the announcement of the CPU. When we were able to mount the attack on Cascade Lake which they assumed is not affected by MDS-type attacks
things got busy again because now we have an embargo ending in four days and there's a new variant that still is capable of leaking data on the newest CPUs that they have. So previously none of the POCs showed that there is a difference in the microarchitectural behavior between those
variants. So that the TSX transaction, the transactional abort, the asynchronous abort behaves differently, was only known at that point. Okay, question answered or you still have one more? Okay, thank you. We have more questions from the signal engine or somebody lining up at microphone 1 please.
May we ask do you have any other embargo going on right now? I don't know. Alright, so I don't see any other people any other
guys lining up at the microphone so thanks again a warm round of applause for those three.