Virtunoid: Breaking out of KVM (Kernel Virtual Machine)

Video thumbnail (Frame 0) Video thumbnail (Frame 13952) Video thumbnail (Frame 27904) Video thumbnail (Frame 41856) Video thumbnail (Frame 55808) Video thumbnail (Frame 69757) Video thumbnail (Frame 71642) Video thumbnail (Frame 72815)
Video in TIB AV-Portal: Virtunoid: Breaking out of KVM (Kernel Virtual Machine)

Formal Metadata

Virtunoid: Breaking out of KVM (Kernel Virtual Machine)
Alternative Title
Virtualization under attack
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
KVM, the Linux Kernel Virtual Machine, seems destined to become the dominant open-source virtualization solution on Linux. Virtually every major Linux distribution has adopted it as their standard virtualization technology for the future. And yet, to date, remarkably little work has been done on exploiting vulnerabilities to break out of KVM. We're here to fix that. We'll take a high-level look at KVM's architecture, comparing and contrasting with other virtualization systems and describing attack surfaces and possible weaknesses. Using the development of a fully-functioning exploit for a recent KVM vulnerability, we'll describe some of the difficulties involved with breaking out of a VM, as well as some features of KVM that are helpful to an exploit author. Once we've explored the exploit in detail, we'll finish off with a demonstration against a live KVM instance. Nelson Elhage is a kernel hacker for Ksplice, Inc., where he works on providing rebootless security updates for the Linux kernel. In his spare time, he mines for bugs in the Linux kernel and other pieces of open-source systems software.
Complex (psychology) Context awareness User interface Demo (music) Execution unit Virtualization Perspective (visual) Computer programming Software bug Virtual reality Type theory Kernel (computing) Core dump Damping Information security Physical system Computer font Intel Block (periodic table) Binary code Bit Control flow Befehlsprozessor Process (computing) Emulator Ring (mathematics) Order (biology) Quicksort Module (mathematics) Computer file Connectivity (graph theory) Device driver Process capability index Number Product (business) Architecture Peripheral Bridging (networking) Computer hardware Data mining Energy level Directed set Data structure Computing platform Computer architecture Distribution (mathematics) Focus (optics) Demo (music) Interface (computing) Surface Code Plastikkarte Line (geometry) Exploit (computer security) Loop (music) Software Personal digital assistant Network topology Intel Euclidean vector Code Direction (geometry) Multiplication sign Cyberspace Mereology Order of magnitude Bookmark (World Wide Web) Emulator Semiconductor memory Befehlsprozessor Electronic visual display Extension (kinesiology) Menu (computing) Perturbation theory output Video game console Virtual reality Resultant Asynchronous Transfer Mode Row (database) Classical physics Surface Trail Functional (mathematics) Service (economics) Support vector machine Virtual machine Coprocessor Operator (mathematics) User interface Module (mathematics) Addition Dependent and independent variables Variety (linguistics) Projective plane Memory management Order of magnitude Subset Kernel (computing) Grand Unified Theory Routing Near-ring Extension (kinesiology)
Gateway (telecommunications) Virtuelles Netz Scheduling (computing) Randomization Context awareness Hoax Demo (music) Real-time operating system Computer programming Software bug Neuroinformatik Expected value Pointer (computer programming) Mechanism design Virtual reality Computer configuration Object (grammar) Single-precision floating-point format Information security Touchscreen Seitentabelle Mapping Intel Software developer Web page Constructor (object-oriented programming) Electronic mailing list Bit Control flow Dynamic Host Configuration Protocol Category of being Data management Virtual LAN Befehlsprozessor Process (computing) Emulator Oval Chain Order (biology) Quicksort Recursion Web page Point (geometry) Computer file Calculation Control flow Device driver Process capability index Event horizon Declarative programming Pivot element Number Wave Chain Peripheral Bridging (networking) Computer hardware Energy level Boundary value problem Data structure Address space Data type Default (computer science) Distribution (mathematics) Standard deviation Information Server (computing) Surface Uniqueness quantification Content (media) Code Plastikkarte Computer network Line (geometry) Limit (category theory) System call Exploit (computer security) Loop (music) Moment of inertia Resource allocation Integrated development environment Software Personal digital assistant Function (mathematics) Boom (sailing) Revision control Force Intel Greatest element System call Installation art State of matter Code Length Multiplication sign Boom (sailing) Set (mathematics) Cyberspace Parameter (computer programming) Mereology Dressing (medical) Emulator Programmer (hardware) Duality (mathematics) Strategy game Semiconductor memory Bus (computing) Flag Process (computing) Recursion Resource allocation Vulnerability (computing) Injektivität Email Clique-width Point (geometry) Process capability index Physicalism Perturbation theory Type theory Dynamic Host Configuration Protocol Architecture Bridging (networking) Direct numerical simulation output Convex hull Right angle Freeware Virtual reality Sinc function Asynchronous Transfer Mode Row (database) Classical physics Surface Trail Asynchronous Transfer Mode Functional (mathematics) Game controller Mobile app Server (computing) Identifiability Service (economics) Line (geometry) Virtual machine Power (physics) 2 (number) Hypothesis Revision control Writing Root Causality Operator (mathematics) String (computer science) Gastropod shell Absolute value Default (computer science) Execution unit Multiplication Dependent and independent variables Gateway (telecommunications) Inheritance (object-oriented programming) Cyberspace Similarity (geometry) Pointer (computer programming) Kernel (computing) Object (grammar) Communications protocol Computer worm Address space
Email Complex (psychology) Randomization Group action Context awareness Scheduling (computing) Hoax Thread (computing) Source code Range (statistics) Principle of maximum entropy 32-bit Computer programming Variable (mathematics) Neuroinformatik Software bug Pointer (computer programming) Mechanism design Different (Kate Ryan album) Computer configuration Core dump Information security Social class Mapping Web page Binary code Electronic mailing list Shared memory Sound effect Bit Control flow Arithmetic mean Process (computing) Befehlsprozessor Emulator Oval Pi Buffer solution Chain Order (biology) Interrupt <Informatik> Pattern language Quicksort Arithmetic progression Reading (process) Web page Point (geometry) Slide rule Control flow Checklist Chain Latent heat Computer hardware Data mining Computer worm Data structure Firmware Address space Compilation album Metropolitan area network Computing platform Distribution (mathematics) Standard deviation Information Demo (music) Direction (geometry) Content (media) Code Counting Multilateration Binary file Cartesian coordinate system Exploit (computer security) System call Symbol table Vector potential Word Loop (music) Resource allocation Software Integrated development environment Function (mathematics) Network topology Normed vector space Revision control Table (information) Control flow graph Intel State of matter Code Length Texture mapping Multiplication sign Direction (geometry) View (database) Set (mathematics) Cyberspace Parameter (computer programming) Mereology Leak Exclusive or Virtual memory Semiconductor memory Flag Cuboid Information Position operator Vulnerability (computing) Meta element Moment (mathematics) Physicalism Range (statistics) Statistics Benchmark Chaining Fluid statics Computer configuration Configuration space output Convex hull Right angle Virtual reality Data structure Resultant Data buffer Classical physics Functional (mathematics) Implementation Table (information) Virtual machine Spyware Web browser Power (physics) 2 (number) Revision control Writing Read-only memory Operator (mathematics) String (computer science) Software Gastropod shell Integer output Hydraulic jump Fingerprint Module (mathematics) Installation art Execution unit Vulnerability (computing) Multiplication Dialect Bound state Interactive television Device driver Cyberspace Leak Exclusive or Pointer (computer programming) Kernel (computing) Conditional-access module Object (grammar) Buffer overflow Address space
Scripting language Revision control Befehlsprozessor Kernel (computing) Information Gastropod shell Virtual machine Bus (computing) Virtual reality Booting
Pointer (computer programming) Demo (music) Code Calculation Real-time operating system
alright hi everybody I'm going to be giving a talk about entitled virtue annoyed breaking out of kvm the Linux kernel virtual machine so what exactly is this kvm thing and why do I care about it kvm is the sort of the new hotness for virtualization on Linux it's a virtualization system that was developed sort of from scratch to be the the official entry up supported by upstream you know working friendly with the Linux community virtualization solution that sort of started later than Zen and everyone else but as ramping up speed and it's sort of become the official leave blessed platform for a lot of peep for a lot of distributions to do virtualization and so it's an exciting new platform that I think is going to be seeing a lot of attention in the security space soon and so I decided to take a look at it and come up with some conclusions and share my results here so Who am I and why am I talking with you at you my day job actually has absolutely nothing to do with this I'm a colonel engineer and a company called K splice but it does mean I spend a lot of time staring at low level software systems and so in my spare time I tend to do a lot of low-level security stuff including this work on kvm all right so structure of this talk we're going to start by taking a high-level look at kvm at the Linux verte this Linux virtual machine will look at what the different pieces are and how they fit together and then we'll go back and focus in on each of them from an attack surface perspective from an attacker trying to break out of a kvm virtual machine what's interesting about which each of the components how are they exposed to the attacker what kind of things should we be thinking about and looking for and then we'll don't run into a deep technical dive into the guts of actually writing a breakout exploit against kvm learning in the process a bunch about some useful features of the kvm architecture that come in handy and that we can exploit in creative ways we'll start by just looking at the bug that my exploit centers around and then we'll just talk about the actual exploit that I wrote then I'll discuss some conclusions that I think are sort of implied by my work or suggested in some future directions that I hope to proceed in there or that I would love to see other people do work in and then we'll demo the exploit because no talk on exploitation is complete without an on-stage demo so you potentially screw up hilariously it's been reliable so far all right so kvm how does it work there's three main components to the kvm virtual machine there's a cave there's kvm KO the core kernel module then there's a pair of kernel modules for supporting Intel and AMD hardware support and then there's the user space driver program qmu kvm looking a little bit at the function of each of those and where they sit kbm KO is the core of the kernel modules of the kernel side support for kvm it's sort of responsible for emulating and tracking the virtual CPU and memory management unit that the core of the virtual machine using the x86 MDS and intel's hardware virtualization extensions it emulates a number of devices and input/output operations in kernel directly for efficiency but for the most part doesn't deal with emulating hardware and then it also provides a large interface for users based driver program like qm u kvm which we'll talk about in a second to communicate with it to allocate new virtual machines allocate new virtual CPUs on those machines set up the emulated physical memory and all of that and one of my favorite bits of trivia about kvm ko is despite the fact that we're using hardware virtualization extensions here which means that for for the most part part the virtual machine is just executed directly by the processor in a virtual machine nan rout context as Intel calls it kvm still contains an entire x86 emulator in the kernel module that's used for handling certain rare traps and just sort of from from a code complexity and attack service perspective I it's an interesting thing to know kvm Intel and kvm AMD are the other half really less than half of the kernel component here they just provide the glue code for communicating with the hardware virtualization extensions you know more or less you take those four chapters or whatever from the Intel manuals and implement the software half of that and you get these and they are relatively small there one see file four thousand lines or so each which compared to any of the other components is tiny and then finally we come to the user space component qmu kvm which is the direct user interface and user driver for the for kvm VMS it's based on the classic q mu emulator which i'm sure that almost all of you have heard of and probably even used because it's amazingly useful in handy and in addition to providing the interface and driver loop qm kvm implements basically all of the virtual devices that your vm talks to because a virtual machine much like a real machine is not just a cpu talking to a block of memory in order for it to be useful there needs to be peripheral devices PCI buses that you know hang off all kinds advices your your Ethernet card your display a serial console whatever a timer device of some sort all of that all of that lives in qmu kvm it's responsible for emulating those devices and devices are complicated it turns out this and as a result contains a huge quantity of code is an order of magnitude more code than kvm KO even if you only consider the devices that are actually in use by a typical vm if you consider you know all of the possible devices it's even bigger there's currently a a project underway by the upstream Linux community to potentially replace this user space component with a separate developed more or less from scratch kvm binary that would be maintained in the linux kernel tree but that work is relatively new it's not more than year too old and it's going to be some years before it's stable enough that people are seriously using it and then some for years before distributions are actually shipping it and you know lots of people are using it into production so if we're thinking about kvm we're thinking about attacking or defending kvm for the near future q mu kvm is what we need to be talking about alright so those are the three main components now we can go back and take a look at each one with a knife or tax surface kvm KO for an attacker is a very tempting target because it runs in the kernel mode on the host in ring 0 if I successfully find and exploit a bug here I have ultimate privileges on the host with no further exploitation or privilege escalation needed however despite being a tempting target it's also a bit of a tough target because there's not very much code there compared to this user space component and what could there is is much of that is dedicated to interfacing with the user space component and so is not directly available for attack by a guest on that x86 emulator that I mentioned is definitely an interesting target because there's a lot of code there that is rarely rarely exercised because the emulator is only used in some edge cases and that's sort of whenever I'm automating a system lots of subtle code that's rarely used sort of the first place that I look for bugs there have been a number of interesting bugs there that allowed privilege escalation within a guest because of bugs in the x86 emulator so there is a bit of a track record of bugs those aren't as interesting as breakout bugs but again it's a hint that this is an interesting place to look it's unfortunately not the focus of this talk but it's something that I want to highlight because I think future research should take a close look here and then in addition to profess within the gas just if we're talking about kvm KO we should keep in mind that there's also the possibility of pro jessica latian in the host because unprivileged users can communicate with this privileged kernel module just you know another thing to note if we're looking at the whole scope of kvm exploitation kvm intel and kb m AMD KO the kernel modules that interface with the hardware are not super interesting targets because there isn't a ton of code there it's most we sort of straight line code that just translates between you know the Colonel's data structures that represent the VM and how the hardware wants to model those and just bridges back and forth but there's a lot of subtlety and complexity of interacting with these hardware components and so there's potentially scope for some interesting bugs where the kvm is using the hardware support slightly incorrectly or an interesting or in an unusual way that that allows for interesting attacks but it's not the first place I'd look and
then finally we come to again to qmu kvm which is you can probably guess by now is the easiest place to look for targets here hundreds of thousands of lines of code emulating device code so the devices are talking directly to the guest via emulated memory mapped i/o or IO ports so they're they're parsing control structures that the guest is exporting and communicating this you know these these strange arcane hardware protocols lots of interesting code and much of this code comes straight from Q mu which is mostly written by one guy Fabrice bellard who is an absolute genius brilliant programmer but just one guy spewing out code over years with no one auditing it bugs are going to happen the one unfortunate thing about this as attackers is that it is often sandbox using selinux or app armor or some other technology and so if we do successfully break out from kvm into qmu kbm get code execution that process will probably need another privilege escalation attack to get full privileges on the host fortunately we're all running Linux which as any slashdot reader knows has no bugs whatsoever so we should be safe alright so that's the structure of kvm and taking a bit of look at the attack surface now we're going to dive in on our bug which is in fact on the bug that I use which is in fact a bug in qmu kvm in that user space driver program I've got here the text from the Red Hat security advisory the bug is CVE 2011 1751 all the major distributions have patched should now so you can go look it up redhat described as we found the pii x4 power management emulation layer did not properly check for hot plug eligibility so what does that mean first off what is P is for P I x4 was an actual physical ship that was the Southbridge in circa 2000 intel chipsets and since it's a southbridge what that means is that it talks to it it most of the physical devices in computers of that era hung off of the Southbridge architectural ii and so the pci bus the ACT p support the real-time clock all of that stuff the host the the cpu communicates with through the Southbridge and this is the default southbridge chip that qmu emulates it sports pci bus and sports pci hot-plug you can its ports in in physical hardware what this would mean is a Hot Club a device the chip and the piece yet you know instructs the the pci bus to electronically disconnect it so that you can pull it out and be safe in a virtual machine what this just means is that we disconnect the you know virtual mappings for those i/o ports and free the memory backing that device it's expected that you know the the destructor function for virtual device performs this successful unplugging of everything but not every device in qmu was implemented under the expectation that it might be hot clogged and so many of these destructive functions are some of these infrastructure functions are no ops or you know insufficiently cleanup state and that's supposed to be okay because you're not supposed to be able to hot plug these devices and so it's it's okay if you know they don't successfully clean themselves up but it turns out that it as the Advisory says insufficient checks for hot plug eligibility the the p.i XP CI hot-plug path if you handed it a device and identify on the pci bus to hot plug it would just blindly go in hot plugging it plug it without actually checking the flag that says this is a hot pluggable device in particular a tempting target is the emulated is a bridge so who here actually members you know physical is a cards and is a buses all right most of us we don't have those anymore at least you know outside of you know hobbyist are you know archaic hardware shows but it turns out that your your southbridge even in your modern Intel chipsets has an emulated is a process with emulated with virtual is a device is hanging off of it and QM you faithfully emulates this behavior it has a bunch of virtual is a device is hanging off of a PCI is a bridge and we can just unplug that and all of those is a devices which include such things as the real time clock that that keeps you know calendar time on the Virtual Machine just go away and that that real time clock it turns out is not expecting to be unplugged in particular it leaves around it's a real time clock so it has timer events that uses to keep track of the real time it leaves those timers hanging around on qm use run loop and so we can just sort of look a little bit at what this means in code the the real time clock is emulated by this struct RTC state at the bottom of the screen and it has this second timer that it just schedules to get fired every second so that it can update the time and if we look up above at how a qmu timer works it just has an expire time that says when this timer should fire and then it has a callback and an opaque pointer that is passed to that callback and so once a second the Q mu timer is fires and calls this RTC update function on that artsy state struct but if we free it it frees the RT it frees the RTC state structure but it doesn't free or unregister actually either of the timers at used I've only shown the one that's relevant for this exploit and so we're left with dangling pointers that that opaque pointer there is actually used as a pointer to the RTC state and so we have a dangling pointer to a free object and so within a second later with the next second tick we call this RTC updates second function on a freed object and we have a classic use-after-free bug here as those of you who are exploit developers probably know that's almost the most beautiful case you could hope for day for an exploitable bug is a use after free like that and just show how easy this is to reproduce that's the reproducer compile that program run it as root in a linux machine on a vulnerable version of KDM and kvm will seg faults and if and you know all this does is I opl just gets privileges tat to be able to write out i/o ports and you write a single value to a single IO court and boom in fact this bug was found by a fuzzer that just literally wrote random values out random i/o ports and eventually it stumbled on this value and I get a signal 11 and I went in debug it and then wrote this talk alright so that's the bug so we have this this beautiful use-after-free but so what does it take to in the kvm environment go from a use after free like this to a working exploit we'll talk about this process in three stages you know first off how do we go from this use EFT free to controlling the instruction pointer how do we get kvm to jump to some address that week that we specify then once we have that how do we leverage that into executing arbitrary shellcode inside the guest and then once we have and then I will have talked about those two assuming that I can guess addresses in the in the Q mu kvm process ie that there's no address space randomization in the process so i can predict addresses and then we'll talk a little bit about what we have to do to get rid of that assumption to work on a process with a randomized address space like kvm is going to be in the wild on any real deployment so forgetting our IP control the high level to do is sort of fairly straightforward if you if you look at how the code works is we we're going to need to create a fake qmu timer object that we control and you know inject that at it can add a known address then will trigger the eject will dump the is a bridge and then we will force an allocation into the space occupied by the RTC state structure that points that second time were filled at our timer then on the next second boundary RTC updates second will run on our fake RTC state and it will reschedule the second timer to run one second later but we've hijacked at pointer to point our timer and so one second later our timer will run which includes the our callback method that it will jump into and it will jump to code that we control so there's three steps here I've already shown you how to do number two how to eject the is a raging sort of actually trigger the vulnerability which leaves two things left to talk about one how do we construct fake devices in qm UK VMS address fake structures in q mu k VMS
address space how do we get objects that a appear in that address space somewhere so that we can write pointers to them and be appear at a known address or find out that address so that we can predict that address and and right that second timer pointer to point at our thing so this is one case where exporting a virtual machine is actually sort of in some ways has a unique advantage over many other types of exports is that that process turns out to be incredibly simple the the way that qmu kvm handles the the emulated ram for the guest is it it's just a big M map region in qm u kv m's address space that is the physical ram of the guest so there is no injection portion like I talked about here actually at all we just literally allocate an object however the hell we want statically and map it allocated on the stack whatever we feel like find the physical address of that in find the physical address in the guest which we can do in a couple of ways you know if we can do in kernel mode by walking the page tables explicitly or the kernel actually turns out exports a proc file that will just give us this information but you know clearly we can find that from the guests and we just add that to the base address of the M map in the in the host process and we found that object for now we're going to assume that we know the base address of that M map because this base address is predictable assuming no a SLR and again we'll talk about how to get by that assumption later so that's step one is injecting data to control that dress actually totally easy when you're attacking a virtual machine and this is true I believe in basically actually all current virtual machines do this or something similar so this is not totally unique to kvm although this is the first exploit I'm aware if it's talked about this technique second technique is forcing alec forcing allocations inside the q mu k DM process we need to get q mu k vm to do a malach and then populate it with data that we control of an appropriate size that that it will with high probability get allocated into the space that the struct RTC state used to occupy to do that I'm going to use a feature of the Q mu k vm network stack that I'll talk about kurma kvm inherits from Q mu a user mode networking stack that implements an entire virtual land inside the Q mu KBM process this is this is used as a way of allowing you to get network access to guests without having to mess around with bridge devices and ton tap devices or other such madness on the host and also without requiring privilege on the part of the Q mu kvm process it's the default networking setup so it's a reasonable thing to attack although in production environments it's probably more common that we'll see Bridge networks or something else and so we'd have to modify this step there but this the same fundamental principles of most of the attack would apply so this virtual network stack you know emulates the dhcp server dns server and gateway nat all plugged into this virtual LAN the way that this virtual LAN handles packet delivery is that normally packets are handled synchronously when you inject a packet on to the virtual LAN it just looks up the recipient and you know calls there liver packet callback method synchronously and so there's no buffering no queuing but in order to prevent recursion if a virtual device or virtual hosts deliver mechanism then injects a second packet in response to the first packet that packet is queued using malloc with a small header attendant appended and then written and then delivered later once we return to the main one loop so if we can find a device on the virtual land that responds to packets from the user by synchronously generating a second packet whose contents are all or almost all controlled by the contents of the first packet then that will generate am a lock with controlled contents can anyone think of a network service that has that property ping ICMP ping echo packets so the way we're actually going to force allocations inside the guest is by pinging the virtual gateway and it will reflect the ICMP packets back at us with the payload of length we control and contents we control which will force Malik's and let us exploit this use after free and so now putting the pieces together we allocate a fake qmu timer in our address space with a callback method pointed at whatever address we want kvm to jump to will calculate its address in the host using the arithmetic I showed earlier will do that i jacked dance and then will ping the emulated gateway as fast as we can with icmp packets that are have pointers to our fake timer in the host and with extremely high probability basically 11 of those will end up allocated into the space previously occupied by the RTC state and will win we we have the ability to get the qm you to jump to an address we control all right so that's part one we have our IP control we can get qmu to jump to any address we want but we still have to deal with non-executable pages or an injection shellcode somehow we're not quite there yet so what are our options to do next well we'll start with the classics we can set a I p equals 41 41 41 41 aaaaa and declare that that clearly demonstrates it's possible and you know we're done that's born I could disable NX non-executable pages in my bios or in my kernel and you know just black shell code wherever the hell I want and jump to it that's all so boring getting more interesting we could use a technique called retornar return oriented programming to chain together bits of QM use own code to do the standard m map or M protects to allocate executable shell code and then jump into the shell code that ways is this standard technique used in most exploits these days it would be a perfectly fine strategy here but I have a slightly different strategy that I happen to like that i stumbled upon here that I think has some cleaner properties and was easier to develop then doing a rough payload so we'll talk about that so let's take another look at that q mu timer structure that we're faking so we focused on the callback and opaque met members before but now let's look at this expire time and this next pointer and so the way that these timers are implemented is they're stored in a sorted linked list threaded through by that next pointer and every time we the the Q mu main loop wakes up it runs these timers which just walks these timers by that next this linked list by following that next pointer as long as that expire time is before the current time and so we control that next pointer because we control this entire timer so what we can do is construct multiple timers and chain them together by that next pointer so that we can cause q mu kbm to execute multiple functions in a row and perform sort of you know limited strings of computation in the host rather than just doing one jump and we're going to point these functions at existing functions inside the host that we're going to string together in an unexpected way so this is this sort of halfway between return to us traditional return to lipsy attacks and ROP attacks with you know a slightly unorthodox method of doing the dispatching so now one feature we notice here is that we are we have a large number of one argument function calls we want to do more argument function calls so to do that we're going to dive a little bit into a detail of the Ambu 64 calling convention how it works so on ambi 64 which I'm assuming we pass we pass arguments to functions in register starting with our di RSI RDS etc so RDX is the first argument that's going to get the opaque member of our fake timers RSI we don't directly control however every compiled version of QM you run timers that I've encountered leaves RSI untouched it doesn't club or RSI so suppose that we found a function that looked like this hypothetical set RSI function it sets the RSI register based on the RDI register which is the first argument register so back if we go back to our chain of function calls we set f1 to set RSI and we pass it some argument so then we're going to populate RSI with that first argument that
that argument to the first timer is RS is going to get populated with that and then we call the second function we now control both arguments so we can get two arguments this way the same trick doesn't work with RDX the third argument register because qm you run timers does clobber that and so we can't count on it being preserved across different different timers time our calls so we'll talk about what we do there in a second first this so this function happens to be the set RS i gadget that i chose what this actually does in q mu is so is relatively unimportant but the detail is it takes a value as its first argument and then it calls a function with that value as the second argument which means that it moves it into RSI and so as long as we choose a value of adder that makes io port right a no op or other or you know mostly harmless operation this has the effect of populating RSI from RT on so that's that so now to get to a three argument em protect call em protect which is what I call to set things executable we need to call with an address a length and a protection value that we want to contain at least prot exact a little bit of searching for useful patterns with some grep finds us this interesting function I oh poor tree Dale thunk again what this is actually supposed to do totally irrelevant but it takes two arguments which again we control and then it follows some function pointers to call it so we control those function pointers to call it on an address we control with a size that we control with a third argument that just conveniently happens to be equal to prod exec so it will set that memory executable all right so to put the steps together we need to allocate a fake io range opera obstruct which is that operation structure we need to set the read up to em protect we need to allocate an i/o range object that needs to be page aligned because we're going to call em protector on it and it has ops pointing at fake ops and bass is set up such that that a dur- IO port-based computation yields a useful length and then we're going to copy our shell code into that same page immediately following the IO range and then we'll do this timer chain will call CPU out L to set the second argument registered is zero because that happens to be a value that makes IO port read which are I'll port right up here harmless and then we do this io port riedell thunk which through this hilarious chain of indirection will result in an M protect and then we'll just jump right into that fake i/o port plus one which is where we stuck our shellcode alright and so that works turn off a SLR code up all the offsets right and you've got code execution and so one question before we jump into getting by a SLR is why didn't I use Rob return oriented programming is a standard well-understood technique that is the standard way to write these exploits why did I do this more creative thing there's a couple of reasons one is that this mechanism makes continuing execution in the makes that you know getting the Q mu kvm process to continue executing dead simple because we're not corrupting the stack we're not really smashing any state we're sort of hijacking the legitimate functionality of the run loop and so after all of this returns everything is actually except for you know the things that we freed as a result of our exploit everything is in a sane state and we can just continue executing another thing that I like about this technique is that I've shown you virtually no assembly on the previous few slides I've inferred one detail but there's very little dependence on how these functions actually got compiled and rock cares very deeply about how the functions got compile because you're chaining together strings of assembly when you're executing when you're exploiting a program under linux you have to deal with the fact that every Linux vendor has their own build of the program with a slightly different source version a slightly different GCC version a slightly different set of flags and so if you want to have to try to find Rob gadgets across every single one it's a fair amount of work and there aren't yet great tools for doing this in a completely autumn fashion on Linux whereas here I just need to look at the source and I just need to grab the symbol table from all of those different versions and I have all the addresses I need because they're just functions that exist in the C code and so are more or less preserved across compilation by different versions and then finally the cop out answer is that I'm just personally not that good at roth I don't know great tools for doing it on linux and so i decided to try something different all right so we've gone code execution but I've been assuming no aslr so let's let's get rid of that restriction so we need two addresses there's sort of fundamentally two addresses that we've been using one is the base address of the Q mu kvm binary if we assume we're attacking a known binary we know the layout of all of the functions in memory and so we know their offsets from the the executable base but we need to know where that executable is loaded and secondly we need to know the address of that physical memory mapping inside qmu kvm in order to in order to get the address of fake objects we're injecting so there's a couple of answers here the classic answer is find a sufficiently sufficiently powerful information leak that lets us leak the contents of pointers or the contents of memory and back solver back derives somehow these addresses from leaked information that's classic way of doing this I didn't end up going that way I decided to do against something else that I think is a little more informative and interesting the second option is we can take advantage of the fact that major every major distribution still compiles kvm as non p IE not as a position independent executable if you're not familiar with with implementation of address space randomization what this ends up meaning is that the Q mu kvm actual core elf binary is loaded at a fixed address every time it's always loaded at the same address in memory and so address is in the binary itself are not subject to randomization and we can assume that we know them so we can just assume that we know forgiven binary all of those code addresses and that means the only thing we have left to find is the fizz mem base address to do that we're going to use yet another obscure feature of qmu kvm that comes in handy here there's this fw CFG for firmware configuration subsystem in qmu kvm it's it's sort of a virtual device it emulates to i/o ports you're not supposed to know or care about it because it's used by qm use bios to communicate with the emulated hardware but it's just listening on Io ports there's no reason other software that's not the BIOS can't talk to it its purpose is to export data tables that the BIOS needs such as the e 820 map that describes the layout of physical man memory the akka p tables that describe how to interface with the emulated acme hardware etc it also has support for the BIOS feeding information back to the emulated hardware this is a little odd in that as of the versions that I checked this support for writable tables isn't actually used anywhere in qmu kvm they're actually no no place that it exports tables that the bios is expected to write but the infrastructure is there and not only is it there it actually lets the bios or any software rights to any of these exporting tables and conveniently several of these tables are backed by statically allocated buffers inside the host which means again assuming no p IE they're loaded as part of the elf binary at fixed addresses which means that we can write to them and we get nearly 500 bites of writable data at at a fixed static predictable address even under a SLR so that's enough that we can inject our fake timers are fake structures into that space rather than into the meta back memory map there's one complication which is that M protect as i mentioned needs a page aligned address and so 500 bite region is not likely to land on a page aligned address if you know its allocated if it's at a random offset which it basically is so what we actually need to do is we or what I did at least is we construct a different set of fake timer chains using a same the same techniques that does basically a read for it reads four bytes at an address we choose from the guests and then it writes it to a space or from the host process and it writes it to a space that's visible to the guests and so we use that read for basically as an information leak we build up our own our own information leak and we chase pointers using those reads to derive the value of that physical memory base map once we've found the address of that
mapping we proceed exactly as before there's a little bit of complexity here is that now this means we actually do that read for then do computation based on the results of that read and then execute more gadgets in the host so rather than ending our timer chains with next equals no we actually end them with another timer that calls that RTC updates second function again which remember it takes it follows pointers to find a timer and then schedules that timer at one second in the future so we execute this chain we do the read four and then one second from now a timer that we control will again and get executed but until then the host returns to the guest contacts the guests can do arithmetic do computation glue bits together based on that read and then one second later we'll jump back into another set of gadgets and so we use this to chain two to chain multiple time or chains one after another with guests guests execution in between and so that then is what it takes to at least the steps that I took and you know the broad sets what it takes to make this exploit work against a SLR bypassing non-executable pages and work completely on a stock install of kvm from a vendor and we'll demo that in just a moment so before we get to the demo so that I can end the talk on the high note with a demo we'll talk about a couple of conclusions that I think are thoughts that are raised by this by this work one thing i want to emphasize that i hope this work emphasize is that is it a virtual machine breakout aren't magical at all I think there's there's an often a tendency to virtual machines are kind of magic honestly they they they do this weird thing where you get multiple computers in one computer and they do that by doing lots of crazy low-level hardware and software tricks to make it possible and so people tend to assume that attacking virtual machines must be at least as complicated as writing one in the first place and so therefore they're probably a pretty strong security layer but that's just not true um all of the usual software protections a SLR NX etc apply to virtual machines but that's about it virtual machine breakouts typically our memory corruption bugs like any other bugs and frankly today I would prefer to be attacking a virtual machine than a modern web browser because the state of mitigating exploits and defend against exploits in sandboxing is is much more advanced there and so virtual machines don't don't think of them as magic security boxes um they're just as vulnerable anything else and of course as you might expect the the fake devices are the weak spot and just sort of to drive this point home and to give some context we'll take a brief look at some past virtual machine breakouts um in 2008 invisible things lab put out this paper adventures with a certain Zen vulnerability where they demonstrated a Zen break out the bug that they were exploiting was an integer overflow in the the para virtual frame buffer and they did much the same standard tricks that I did you know Rhett to lipsy or return oriented programming to em protect to copy a buffer in and then jump into it immunity at black hat 2009 presented cloudburst a breakout exploit for vmware again missing bounds checking in the virtual SVGA device get memory corruption entirely same techniques and then just too low to give a sense of where virtual machines can get interesting in just earlier this year invisible things lab again put out another attack against zen but this one based on a bug in the way that zen uses intel's v TD which is the the iommu and hardware I owe protection technology it turns out that if you if you turns out that there was a subtle bug in the way that interrupts were handled that actually allowed a guest to escape and so that's an example of the kind of thing I was talking about might be possible with the cam kvm md and kvm Intel modules is that these things are complicated there probably are subtle attacks that involve the hardware interaction but once again the the actual primitive that they got out of that attack was memory corruption in the host and so from there they proceeded in exactly the same set of techniques that everyone else uses for exploiting memory corruption bugs and sow virtual machines are interesting but they're not that much more interesting they're not that much more secure than anything else so that's some conclusions some further work that I want to see done and that I'm going to be pursuing is in hardening kvm the first takeaway is that you absolutely should be sandboxing qmu kvm if you're running it in a production environment fortunately if you're running it at least under ubuntu or red hat via libvirt they are already sandboxing it using selinux and a plumber um and I haven't I haven't yet but I want to take a look at those santa monica's and see how hard they are but at least but you know this is work that people do realize needs to be done but if any of you are deploying kvm make sure u kv eminent you know you know remotely possibly sensitive context make sure that you're using a technology that gives you that sandboxing i think another no-brainer is i'd like to see the distributions start building qmu kvm as p IE as full position independent the traditional reason not to do this is performance because it costs an extra register to build as p IE but virtually no word is running kvm on 32-bit platforms anymore and am be 64 has enough registers that the performance impact is Raylan negligible I haven't done benchmarks on kvm it's also on my to-do list but i think that's that's an obvious thing that that it will improve the state of the world then we can try some crazier ideas we can try you know to make things a little bit harder the standard technique if you know XOR encoding pointers you know X coring them with some constant value that's stored at some address and then uh next when you get it get them out so that if a memory corruption bug tries to clobber them they'll just basically end up with a random address I think we'll need to do a little more thought to figure out which which things are likely to be targets it might be might be worth applying this technique to but it's a standard technique that's used to harden applications against memory corruption bugs and I think it's time that we start asking the question of whether it's appropriate here and we can try doing even crazier things like protecting that guest memory region by lazily mapping or protecting regions of it or sandboxing individual devices so that they they you know run in a thread that somehow only has a view into that specific piece of memory and this is a topic that there's been I've seen some discussion on the kvm lists but that I think hopefully this work demonstrates does have potential value and of course I want to see more people looking at qmu kbm and auditing it and fuzzing it in finding and fixing bugs some future research directions from the offensive side I want to go after kvm KL I'd like to see someone else look into it because I think that there probably are bugs there and we should we should understand those bugs we should fix them another interesting question that I don't know the answer to and I want to know the answer to is how hard given that i know that i'm running in a guest on q mu k vm can I fingerprint the exact q mu k vm version by comparing the behavior or possibly even more subtle timing attacks against against you know the memory layout or something because i've been assuming that i know which q mu k vm binary i'm running against in the wild you can usually assume that you're running against you know a distros binary but that's still a range of binaries so how well can we fingerprint which will give us an understanding of how how weaponize abilities attacks really have general these such exploits can be made and then another question is what you in for what you information leaks in q mu k vm look like how if assuming that we get things built as full fully randomized we need we're going to start wanting to find information leaks to extract address is in order to write these kinds of exploits is there sort of a whole class of information leaks that we haven't realized yet or whole classes where you know the standard idioms that qmu uses for communicating with devices sort of have a tendency to leak addresses you know in a way that we hadn't realized analogous to Dan Rosenbergs work on the Linux kernel where he he demonstrated that you know as small as a 4-byte leak of uninitialized memory can reliably be used to deduce Colonel addresses with a bunch of cleverness alright so those that's sort of my immediate checklist of future work on kvm and I also hope to improve my fuzzer and pointed at other virtual machines because I haven't had a chance to do that but for kvm this is this is my checklist and if you're if you're interested in any of these I encourage you to talk to me and and you know share thoughts whatever you have all right so finally after all of that talking at you it's time to demo my exploit in progress in action so oh well
that text is not relevant I just ran a shell script that launched a kvm vm and i hope you can read this it's not super relevant but so i'm booting kvm just into and it went to carnal just to show that this is a work in kvm virtual machine um you know we're in a vm proc
cpu info shows that we're running a qmu cpu and the way the way that I've
implement this exploit is that because we are hot plugging the ISA bus there are a lot of virtual devices that go away and if anyone tries to use those there's a high risk of the machine zeg faulting which is not useful if we want to exploit it so I've actually packaged my exploit into an init rd that just contains a statically compiled version of my exploit and a minimal kernel that doesn't use any of those vulnerable devices and I put it in grub so that we can boot straight into the exploit and and greatly increase the reliability of this attack so will reboot the vm now
and from the grub prompt will select my exploit doing things chasing pointers and we pop up a calculator in the host
and obviously as I'm sure most of you are aware the calculator is the the standard yes I've exported demo but that could be arbitrary code and all right all right as you can see here the the guest Colonel is very upset that it's real time clock has gone away but it's it's valiantly valiantly stumbling forward alright so that is the end of
this talk ah the goons have told me that I shouldn't bother aunt heading to the QA room because there's no one after me so if you have questions just head up here and we can chat