Exploiting Out-of-Order-Execution

Video in TIB AV-Portal: Exploiting Out-of-Order-Execution

Formal Metadata

Title
Exploiting Out-of-Order-Execution
Subtitle
Processor Side Channels to Enable Cross VM Code Execution
Title of Series
Part Number
6
Number of Parts
18
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2015
Language
English

Content Metadata

Subject Area
Abstract
Given the rise in popularity of cloud computing and platform-as-a-service, vulnerabilities inherent to systems which share hardware resources will become increasingly attractive targets to malicious software authors. This talk first presents a classification of the possible cloud-based side channels which use hardware virtualization. Additionally, a novel side channel exploiting out-of-order-execution in the CPU pipeline is described and implemented. Finally, this talk will show constructions of several adversarial applications and demo two. These applications are deployed across the novel side channel to prove the viability of each exploit. We then analyze successful detection and mitigation techniques of the side channel attacks.
Computer animation Code Multiplication sign Cloud computing Data structure
Process (computing) Computer animation Finite difference Natural number Multiplication sign Computer hardware Memory management Virtual machine Physicalism Virtualization Instance (computer science) Physical system
Computer animation Different (Kate Ryan album) Computer hardware Virtual machine Cloud computing Side channel attack Resource allocation
NP-hard Multiplication sign Execution unit Virtual machine Archaeological field survey Numbering scheme Translation (relic) Water vapor Black box Computer programming Virtual memory Computer hardware Data structure Physical system Arm Information Model theory Content (media) Physicalism Cryptography Degree (graph theory) Process (computing) Integrated development environment Universe (mathematics) Transmissionskoeffizient
Noise (electronics) Server (computing) Arm Virtual machine Denial-of-service attack Instance (computer science) Electronic signature Latent heat Message passing Arithmetic mean Process (computing) Integrated development environment Software Telecommunication Computer hardware Transmissionskoeffizient Pattern language Spectrum (functional analysis) Resultant
State of matter Set (mathematics) Side channel attack Coprocessor Computer programming Latent heat Computer hardware Error message Physical system Noise (electronics) Information Cloud computing Line (geometry) Cryptography Cartesian coordinate system Public-key cryptography Cache (computing) Category of being Message passing Process (computing) Befehlsprozessor Computer animation Integrated development environment Vector space Personal digital assistant Predicate (grammar) Order (biology) Website Resultant
Presentation of a group Computer animation Information Personal digital assistant Order (biology) Resultant
Thread (computing) Code Multiplication sign Structural load Data storage device Frame problem Number Uniform resource locator Latent heat Computer animation Semiconductor memory Personal digital assistant Average Order (biology) Transmissionskoeffizient Pattern language Booting
Inference Computer animation Average Semiconductor memory Forcing (mathematics) Operator (mathematics) Multiplication sign Structural load Water vapor Transmissionskoeffizient Frame problem
Enterprise architecture Run time (program lifecycle phase) Cellular automaton Multiplication sign Model theory Type theory Category of being Arithmetic mean Computer animation Semiconductor memory Different (Kate Ryan album) Personal digital assistant Compilation album Mathematical optimization Resultant
Vapor barrier Structural load Multiplication sign Forcing (mathematics) Data storage device Type theory Latent heat Computer animation Different (Kate Ryan album) Semiconductor memory Personal digital assistant Average Order (biology) Video game
Computer animation Multiplication sign Structural load Order (biology) Computer hardware Virtual machine Independence (probability theory) Coprocessor Computing platform
Server (computing) Real number Model theory Virtual machine Similarity (geometry) Coprocessor Computer animation Integrated development environment Personal digital assistant Operator (mathematics) Telecommunication Right angle Window Spacetime
Scripting language Noise (electronics) Assembly language Wrapper (data mining) Multiplication sign Frame problem Demoscene Type theory Different (Kate Ryan album) Computer hardware Order (biology) Cloning Website Software testing Window Spacetime
Scripting language Noise (electronics) Graph (mathematics) Computer file Multiplication sign Counting Bit Mereology Frame problem Electronic signature Process (computing) Different (Kate Ryan album) Average Telecommunication Order (biology) Pattern language Physical system
Touchscreen Multiplication sign Bit Right angle Frame problem
Game controller Block (periodic table) Virtual machine Cloud computing Bit Cartesian coordinate system Mereology Flow separation Vector potential Virtual memory Computer animation Personal digital assistant Computer hardware
Web page Point (geometry) Link (knot theory) Multiplication sign Hypothesis Inference Latent heat Different (Kate Ryan album) Average Entropie <Informationstheorie> Energy level Cuboid Software testing YouTube Physical system Scripting language Noise (electronics) Algorithm Consistency Structural load Model theory Electronic signature Process (computing) Integrated development environment Personal digital assistant Order (biology) Normal (geometry) Pattern language Right angle Quicksort Reading (process) Row (database)
the who at the time uh at home if if if if the the half half of the pool of men the future and 1 this year my master's research at RPI focused and exploiting underwater execution and using that to try and enable cross the m I'm code execution so all begins at the club and everyone here is probably pretty familiar with how the cloud structures and just to go over
some of the basics you have a bunch of virtual instances there were 2 machines on already ensured hardware natural hardware on the shared resources is allocated by the hypervisor on of the different operating systems and that helps are dynamic allocation happens on through time so it's always changing which reduces costs for everyone and that means that everyone's happy
so there are a few problems with how the situation is set up I will 1st of all your data is stored remotely on it might not be secure or might be private on that host the year sharing your data with may be vulnerable to self or trustable and finally the 1 that people often talk about is that your VM some which are running and processes are your data it co-located with funds for the virtual machines that I you don't know who they are what they're doing and they're all sharing the same physical resources as this physical
co-location which leads the side channel abilities from again here's the basic hardware
structure of of the cloud of the of the hypervisor layer which is the middle on this taking that shared physical air and dynamically allocating up to the different upbringings machines of so each virtual machine will see its own virtual allocation of that shirt hardware I'm so
universal water bill is with us is in that translation between the physical and virtual hardware on because it does happen through time but it's based off of the need of each process reach virtual machine on this means that 1 virtual machine can cause contention with another for the same resource and that basically means that your audience activities even if it's just telling someone else that you need the physical resource Y and that means that your activities are not opaque someone else on the same hardware so how can we exploit those well we can do something on like in cryptography aside attack on the city attack which you can gain information from the surrounding system that's implementing the crypto scheme here I'm running with the program that's interesting something now called competing it's quite similar at it's a harbor Bayside channel which means that the environment or the physical environment surrounding the virtual machines is what's leaking information I'm a dozen units cost BMC which we aren't even though it's like a black box to other VM's they can't query inside of it and they can't directly access anything inside they can learn about the surrounding environment that that of virtual machines running on but this does mean that the information must be both recordable and also repeatedly report recordable arms you have to be able to reliably that learn information from the environment on enable to map that to the same onto the same known processes at the same known information on with a certain degree of certainty so you can kind of
structure a course a basic send and receive model of the survey basic channel I'm so this is hard work gnostic on you have 1 transmitter which is forcing artifacts on by could be knowingly forcing artifacts in the environment or unknowingly forcing but artifacts in the environment on and that short hardware is an being run by the receiver or the adversary from to to learn information about that 1
and so the different ways this can actually be used and if you just have a receiver or if you're just listening to the benign environment noise on you can do things like leaking what processes are running on other virtual machines or king environment in which Iran's you can create a unique signature from the environment and invite the average usage of the cash for instance on any use that to idea that specific from server that you're running on in the cloud and that he could do you could use that to determine on what physical resource running on now the absence of the spectrum if you just running a transmitter of just forcing artifacts and the shared hardware on you could do something like a denial of service attack where you the pipeline a you clog the cash so other processes that might need it can't and this can be from pretty basic and just forcing someone else or the couch but on it can actually alter the other arm execution or the other of processes and results if you mix these 2 together you have something that I'm is more like a communication network so someone's forcing patterns the environment knowingly and influence receiving those they can send messages back and forth this is what most people think of when they think of sigh channels and this is how it would look like so
suggests a simple communication network where when you have several virtual machines forcing artifacts and that shared hardware medium and 1 of the M is then reading those artifacts in the medium and averaging them out to create meaning from them I'm just bring this into more
common can create example on in the cache there's a tackle the flexural real attack is for being all 3 cached on the receiver or the adversary which is missing the environment flashes a pre-agreed line of L 3 cache and inquiries it later on for information on the victim of the and the 1 that's also using the shared resource on this axis here that same shared line of all 3 catch on and in this case it was doing it could encrypted it was doing of crypto basically and accessing that same tree-lined L 3 cache with its private key on the adversary was able to leak that predicate at the end let me more about the specific attack it's on my website on you can read it there never specifically my research involves attacking the pipeline so how can we use the pipeline in a similar way that most people use the cash to create sigh channels on all those couple benefits of the pipeline versus the cash the 1st it's quieter and it's much harder to detect that some misusing the pipeline as opposed to the cash on simply because the cache is easier to query or to interact with it's also not affected by the noise in the system is much it's not affected by cache misses the other errors the system may have I'm in a super noisy environment much like the cloud were tons of processes are just operating normally on the same hardware on the pipeline side channel is actually increased or amplified in strength which is great it so how we actually doing this how are we targeting the pipeline well 1st of all the attack
vectors we want this side channel onto it exploits inherent properties of this hardware medium so some things that we can be assured of being there I'm in this means we have to have some basic requirements to criticize channel you have to have shared hardware we and we have to know a dynamically allocated that resource both of which are determined inherent properties of Cloud systems so that's great and then we have to be co-located with our victim museums or are at other collaborating adversaries and that's something little harder to determine but like I said earlier it is possible so we're going to assume that going forward I'm so specifically we chose target the processor is the medium and on the processor the CPU's pipeline on the difficulty associated With the pipelines that we need to query these artifacts or these messages that were forcing dynamically there's no really easy way to create a pipeline for specific state on because you can if you did that you probably be affecting how the pipeline state is so all we really know what the pipeline is the instruction set of instructions were we feed for MA processes in the order of those instructions and results from these instructions that so we get to know the values that the pipeline from can return to us which basically means we can use out of order execution and that's the artifact they were going to be forcing and also recording from the pipeline to learn the state of the pipeline as well as to learn about other processes sharing pipeline for us this is how it's going to work so we have a bunch of yams all running on shared processor and on like you can see here the processes are all sharing it between 2 2 costs and so this does assume that S and T is turned on but in most modern systems of the case so some big deal and then finally 1 interesting thought here is that your instructions especially in the cloud and hardware are moral being executed together with instructions from other processes from foreign from 4 the that you know nothing about the all just being processed in the pipeline in 1 big pool on as if they were all from 1 big program which is kind of scary because it's supposed to be separate so they
how are we going to receive out of order executions from the pipeline like all good presentations we have a
picture of the info manual and this picture is basically just showing us that we can get the case on a pipeline that is out of order executions so we can get on results from the pipeline that are not expected and that's we can record this is what it looks like
specifically in receiver so we have 2 threads 3rd 1 2 and they're both storing a value to a specific spot memory and then loading from memory now the key thing here is that the load is happening from the opposite by location in memory so it's store of 1 2 at tax but the load from and X 2 are 2 happens in the other thread so in a perfect world the same thread case you get r 1 r 2 equals 1 on but more often the case and not your threads are going to be sent can also get a case where some of the store the store a lot of 1 happens before the store a lot of the other and in that case you could get either r 1 r 2 being 1 the other 0 In both these cases are pretty normal so we're going to ignore them however in the final case in the out of order execution case the loader actually reordered in from the stores in that scenario on X and Y were pre set to 0 and we're going to get R 2 and R 1 equal to 0 as well and that's the out of order execution case we can count and so this is just the code of our receiver were iterating through these 2 threads thousands and thousands of times in certain time frames and we do this we can actually get account of the average number of order executions received in a specific time on and that's great for us because ever does matter we can take these averages a specific time frame and learn what an expected average should be and what did analogous 1 would be or anomalous 1 would be and so the transmitter or to force
patterns in this received average of order-execution bitstream that we've now constructed and we have to have the ability to force out of order execution averages to increase or decrease so we're going to actually
force the average Ottawa execution counts to decrease using memory fences now everyone has is probably heard of and finds the x 86 instruction but it prevents memory reordering of any kind which is great because that's why do we want decrease the amount of water executions and is more expensive operation but I'm that's great because it's going to not
and so so that the pipeline would look like on our transmitter is forcing is memory France is in the same pipeline as the receiver and now the key thing here is that these and and sensors are being shoved in in the same time frames that our receiver is recording and so that is 1 key thing to have but in this scenario the inference is going to force the pipeline to store the values of x 1 2 x and of 1 2 y before the loads on so these actually should be flipped but it's going to force the proper ordering of our instructions this brings us to the
importance of memory models From cells to
different types of memory reordering compilation time and runtime obviously were focusing
on runtime or of the out of order-execution case where the pipelines dynamically reordering our instructions and
we're also focusing on unusually strong memory model so x 86 architecture and this basically means that for the most case the pipeline is going to him for instructions safely safely it's going to give us results that are expected on on correct Howard it's usually strong which doesn't mean always and that doesn't mean that we can get that at incorrect cases of are 1 r 2 equals 0 so were exploiting this inherent property of pipeline optimization and then there's 4
different types of memory barriers so the specific case that were focusing on in we wanna force the story to occur before the load however there are 4 different types unfortunately the store barrier is the most expensive so must be true for most things in life and but to reiterate what we were saying
earlier and we want a force of order executions so we're going to assume that and he's turned on and were any user store load barrier and funds to prevent that out of order execution case so to decrease the average out of order executions that we can we can read from the pipeline on in specific time intervals I'm in so for
1 hour for the victim or are transmitted we're going to force patterns and affect the order of stories loads and like I said earlier it's time for independence so now we have the ability to force out of order executions on pipeline as well as to receive an answer now we have to design the channel so in
that we have is an hypervisor just because it is the most popular commercial platform and Xeon processors assured hardware and specifically for Chorus insects virtual machines and obviously SMT was turned this well this is what it would look
like we have a 6 a Windows 7 virtual machines all running noisy operations and that was just to create similarities between our lab environment in the real world case in which the 1 server might have thousands and thousands of processors on right now specifically the proper
apply our space extending the model to this we have 1 the acting as the sender and 1 of the receiver or in a by party communication and situation you have 1 of the and having both and the other having both as well and in this In because our hearts are sigh channels over the pipeline these processes the send and receive would all have to be assured to be executed on the same 1
right so to demonstrate this this is just Windows 7 DM with Zen Center on it's like in correctly XenServer and on we have our clones here and they're all just sharing the amount of hardware the same space 0 and I forgot to say you can
follow along if you want on my website you can get the sender receiver type of Python scripts that have these are wrappers around the actual scene assembly code that I used to force and receive of order executions however as the pipeline is great because it helps immediate easily adjust for noise in different time frames and things like that during testing so an In this in
scenario on we had become 1 thing a receiver In this is In summary of receiver was just where should 1st just receiving the and
on noise from the system so we're canceling out the noise in the system and you can see that's true 0 here so this is just reading from the system saying there's nothing being forced I'm going to count is all zeros and its plotting it to a graph which is part of a Python script it also and since all the out of order execution averages to a file as well the and you can see it here the and you can see where the different time frames are and it looks a little bit more drastic than actually is the differences between each time for average however and like I said each system might have a unique signature given what different processes are running on a C would see different average patterns like that and based on what system you're in I'm now to show the communication Curran and R. Sanders going to force out of order executions in the same
time frames of the receiver sending them and it's just in this scenario going to send to high that's the and those too high bits are going to trigger something in the receiver so this is just a simple example to show that it works but on the field of
engineering could create something a bit more I guess so the subways screen right
so in conclusion and like all good
academic papers we had the potential mitigation techniques and but the most of the 1 that had the most possibility success was basically UV and so if you have your VM's and separate hardware and then you definitely not going to be affected by this attack and you can also turn off as and he and you could also have a custom hypervisor which is actually watching to make sure that processes from different virtual machines are not sharing resources at the same time and or if they are it's a separated somehow however the downsized all this are some of the cloud benefits the didn't do you get from sharing I'm so conclusion we are contribution was the largest part of it was creating the novel side-channel over the pipeline I'm even though like the cash is a bit more popular on the pipeline harder to query the state of amid as possible and we show this dynamic method and then we show the application of this in the cloud as well some potential mitigation techniques so I'd like to acknowledge Jerry blocks on from MIT Lincoln Lab for introducing me to this topic as well as our case I Control of its and there's any questions you can reach me I
received e-mail me or and few really ventures you can read my thesis it's on my website it's all 120 pages and so good luck and or any questions that I can take the and the that the be the
few in a few days it was not this will what would you defined as in noisy environment I when I say noise and basically saying that there's processes are activity in the system that's creating a load in resource that your target and so like in case of the cash and you have a bunch of other processes shoving values in the cashier using cash for something so you receive simulated noisy environment used have stuff going on nobody has particularly interested in really different biome versus no exactly that the point C 1 written environments enough on process the processes to create some sort of entropy in the signals that you're reading on just see know that you're noise canceling algorithms are working on it your receiver is not too delicate and things like that at the time yeah so how can you characterize normal level of links in the specific the environment so what I would have what I did for this case was and is basically take thousands and thousands of recordings from your system and average those together I have you tried that don't say she 2 0 were watering trying that now actually and so on and the key thing is is the active to know that co-located if you want to some sort of send receive model however it on you can just download my scripts now and run it on E C 2 and work order levels of noise consistency can create unique signature for that box the running and so on and based on what we know is algorithms and and averages you doing time it just it it would alter the it the granularity of that signature however you could do that right now is and of the stability of the patterns of you were observing again is there a way to characterize the the stability and on the so l let's say we also did you she any specific pattern for any specific system or process there with our that on while and I said you can on either take systems readings that are unique to that box that you're running on on or I did see patterns associated with different processes so I tested it would you can read this on thesis and I did a bunch of different tax but I I Inference and tests that I'm prone so I would have likely to open a running a bunch of things among the young and you actually get a unique signature from that I you when people to see what YouTube video they're watching but you can see that someone was watching something in fascinating thing so much by an will play Miami
Feedback