Hardware-Assisted Rootkits and Instrumentation: ARM Edition

Video thumbnail (Frame 0) Video thumbnail (Frame 1439) Video thumbnail (Frame 2660) Video thumbnail (Frame 4289) Video thumbnail (Frame 6902) Video thumbnail (Frame 8587) Video thumbnail (Frame 10371) Video thumbnail (Frame 11890) Video thumbnail (Frame 13293) Video thumbnail (Frame 14655) Video thumbnail (Frame 17142) Video thumbnail (Frame 18429) Video thumbnail (Frame 20607) Video thumbnail (Frame 21790) Video thumbnail (Frame 23403) Video thumbnail (Frame 26178) Video thumbnail (Frame 28419) Video thumbnail (Frame 30885) Video thumbnail (Frame 38753) Video thumbnail (Frame 41462) Video thumbnail (Frame 42601) Video thumbnail (Frame 44958) Video thumbnail (Frame 46187) Video thumbnail (Frame 47583) Video thumbnail (Frame 48855) Video thumbnail (Frame 54374) Video thumbnail (Frame 56581) Video thumbnail (Frame 58394) Video thumbnail (Frame 60455) Video thumbnail (Frame 63299) Video thumbnail (Frame 65690) Video thumbnail (Frame 67151) Video thumbnail (Frame 68354) Video thumbnail (Frame 70243) Video thumbnail (Frame 72171) Video thumbnail (Frame 78580) Video thumbnail (Frame 81385) Video thumbnail (Frame 82710) Video thumbnail (Frame 83837) Video thumbnail (Frame 86286)
Video in TIB AV-Portal: Hardware-Assisted Rootkits and Instrumentation: ARM Edition

Formal Metadata

Hardware-Assisted Rootkits and Instrumentation: ARM Edition
Title of Series
Part Number
Number of Parts
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Security researchers have limited options when it comes to debuggers and dynamic binary instrumentation tools for ARM-based devices. Hardware-based solutions can be expensive or destructive, while software tools are often restricted to user mode. In this talk, we explore a common but often ignored feature of the ARM debug architecture in search of other options. Digging deeper into this hardware component reveals many interesting use-cases for researchers ranging from debugging and instrumentation to building a novel rootkit. First, we will shine a spotlight on a debug interface that dates back to ARMv6, and demonstrate how to control it from software in order to instrument code in normal world. We will introduce a prototype toolkit with IDA plugin that can perform real-time tracing, code coverage analysis, and more, of the Android kernel on COTS smartphones without requiring virtualization extensions or special hardware. Next, we will compare implementations of this hardware unit across multiple chipset vendors, and discuss applicability to other ARM CPUs found in your phone like WiFi and cellular basebands. The second half of our talk will add new meaning to the phrase “hardware-assisted rootkit”. Abusing this same debug interface we will have some fun with the Krait architecture in order to demonstrate a kernel-level rootkit for Android that can bypass the current state of the art in rootkit detection. We’ll discuss hijacking exceptions, interacting with TrustZone, and methods for detecting this unconventional rootkit. Finally, we will wrap up highlighting a use-case for exploit mitigations on embedded systems.
Type theory Computer animation Multiplication sign Bit
Computer animation Blog Content (media) Smartphone Game theory Series (mathematics) Sinc function Vulnerability (computing)
Android (robot) Interface (computing) Multiplication sign Real number Interactive television Staff (military) Mereology Emulator Goodness of fit Film editing Computer animation Computer hardware Universe (mathematics) output Musical ensemble Asynchronous Transfer Mode Modem
Decision tree learning Authentication Computer program Web crawler Focus (optics) Quantum state Debugger Sampling (statistics) Set (mathematics) Bit Event horizon Tracing (software) Power (physics) Data management Computer animation Software Computer configuration Term (mathematics) Phase transition Computer architecture Physical system
Standard deviation Game controller Arm Multiplication sign View (database) Interface (computing) Set (mathematics) Coprocessor Event horizon Computer animation Causality Software Computer configuration Computer hardware Interrupt <Informatik> Family Plug-in (computing) Buffer overflow Physical system
Arm Service (economics) Open source Software developer Information privacy System call Revision control Computer architecture Kernel (computing) Computer animation Software Vector space Ring (mathematics) Profil (magazine) Computer hardware Streamlines, streaklines, and pathlines Quicksort Table (information) Physical system Asynchronous Transfer Mode Exception handling Computer architecture
Arm Computer animation INTEGRAL Rootkit Sampling (statistics) Branch (computer science) Event horizon Information security Portable communications device Computer architecture
Point (geometry) Cache (computing) Computer architecture Game controller Computer animation Branch (computer science) Selectivity (electronic) Bit Quicksort Food energy Coprocessor Physical system
Default (computer science) Game controller Tape drive Debugger Sampling (statistics) Branch (computer science) Bit Event horizon Coprocessor Software bug Revision control Inclusion map Computer animation Query language Interrupt <Informatik> Right angle Quicksort Table (information) Information security Writing Buffer overflow Asynchronous Transfer Mode Spacetime
Point (geometry) Greatest element Service (economics) Observational study Reflection (mathematics) Coroutine Set (mathematics) Control flow Branch (computer science) Funktionalanalysis System call Revision control Film editing Computer animation Personal digital assistant Core dump Order (biology) Interrupt <Informatik> Website Buffer overflow
Computer program Arm Graph (mathematics) Ferry Corsten Code Motion capture Expert system Plastikkarte Branch (computer science) 3 (number) Machine vision Computer architecture Computer animation Visualization (computer graphics) Semiconductor memory Order (biology) Damping Directed graph
Slide rule Game controller Computer animation Quantum state Multiplication sign Order (biology) Ultraviolet photoelectron spectroscopy Interrupt <Informatik> Generic programming Window Number
Slide rule Implementation Computer file Multiplication sign Branch (computer science) Tracing (software) Number Latent heat Hypermedia Computer configuration Computer hardware Acoustic shadow Firmware Forcing (mathematics) Tablet computer Type theory Computer animation Vector space Personal digital assistant Network topology Interrupt <Informatik> Metric system Buffer overflow Tuple Directed graph
Point (geometry) Android (robot) Computer program Service (economics) Code Length Multiplication sign Control flow Branch (computer science) Graph coloring Number Prototype Component-based software engineering Term (mathematics) Core dump File system Acoustic shadow Module (mathematics) Dot product Algorithm Block (periodic table) Funktionalanalysis Kernel (computing) Computer animation Personal digital assistant Interrupt <Informatik> Spacetime
Point (geometry) Slide rule Computer program Thread (computing) Service (economics) Multiplication sign Patch (Unix) Motion capture Set (mathematics) Branch (computer science) Tracing (software) Number Term (mathematics) Cuboid Position operator Physical system Module (mathematics) Default (computer science) Block (periodic table) Uniqueness quantification Funktionalanalysis Price index System call Symbol table Shareware Message passing Kernel (computing) Process (computing) Loop (music) Computer animation Vector space Telecommunication Interrupt <Informatik> Quicksort Window Asynchronous Transfer Mode Library (computing)
Point (geometry) Service (economics) Mapping Multiplication sign Execution unit Mathematical analysis Coroutine Control flow Branch (computer science) Theory Computer architecture Kernel (computing) Computer animation Logic Interrupt <Informatik> Directed graph Asynchronous Transfer Mode
Game controller Patch (Unix) Bit Funktionalanalysis Perturbation theory Kernel (computing) Film editing Computer animation Vector space Rootkit Quicksort Table (information) Exception handling
Focus (optics) Arm Code Set (mathematics) Branch (computer science) Mass Event horizon Theory Subset Loop (music) Vector space Interrupt <Informatik> Quicksort Table (information) Spacetime Exception handling
Arm Cross-correlation Computer animation Vector space Core dump Table (information) Event horizon Exception handling
Default (computer science) Game controller Arm Kernel (computing) Computer animation Causality Quantum state Bit Extension (kinesiology) Computer architecture
Windows Registry Point (geometry) Greatest element Group action Service (economics) Link (knot theory) Code INTEGRAL Multiplication sign Patch (Unix) Combinational logic Coroutine Black box Mereology Event horizon Number Computer configuration Term (mathematics) Selectivity (electronic) Software testing Computer architecture Physical system Dialect Quantum state Eigenvalues and eigenvectors Constructor (object-oriented programming) Mathematical analysis Bit Machine code Line (geometry) System call Word Floating-point unit Befehlsprozessor Kernel (computing) Computer animation Vector space Rootkit Order (biology) Interrupt <Informatik> Right angle Iteration Smartphone Quicksort Table (information) Directed graph Spacetime
Point (geometry) Game controller Implementation Statistics Code Coroutine Sheaf (mathematics) System call Frequency Computer animation Vector space Personal digital assistant Interrupt <Informatik> Right angle Resultant Directed graph Exception handling Physical system
Point (geometry) Computer program Link (knot theory) PC Card Machine code Funktionalanalysis Word Message passing Computer animation Hooking Personal digital assistant Quicksort Table (information) Address space Asynchronous Transfer Mode
Point (geometry) Link (knot theory) Coroutine Branch (computer science) Funktionalanalysis Repetition Parameter (computer programming) Complete metric space Limit (category theory) Proof theory Frequency Pointer (computer programming) Kernel (computing) Computer animation Rootkit Order (biology) Buffer solution Right angle Address space Reading (process) Asynchronous Transfer Mode
Axiom of choice Group action Thread (computing) Hoax Digital electronics Computer file INTEGRAL Patch (Unix) Letterpress printing Malware File system Information security Dialect Inheritance (object-oriented programming) Reflection (mathematics) Electronic mailing list Shareware Process (computing) Kernel (computing) Computer animation Hash function Rootkit Self-organization Quicksort Table (information) Asynchronous Transfer Mode
Area Android (robot) Context awareness Digital electronics Touchscreen Graph (mathematics) Computer-generated imagery Reflection (mathematics) Bit Client (computing) Machine code Login Shareware Kernel (computing) Computer animation Bit rate Rootkit Routing Modem
INTEGRAL Code Multiplication sign Coroutine Software bug Fluid statics Semiconductor memory Hypermedia Core dump Information security Physical system Exception handling Decision tree learning Algorithm Constructor (object-oriented programming) Bulletin board system Hand fan Message passing Process (computing) Vector space Order (biology) Interrupt <Informatik> Quicksort Remote procedure call Asynchronous Transfer Mode Ocean current Game controller Server (computing) Service (economics) Observational study Patch (Unix) Streaming media Selectivity (electronic) Acoustic shadow Address space Module (mathematics) Dependent and independent variables Validity (statistics) Surface Line (geometry) Limit (category theory) System call Shareware Kernel (computing) Computer animation Software Rootkit Personal digital assistant Network topology Table (information) Window
Server (computing) Context awareness Greatest element Computer file Connectivity (graph theory) Combinational logic Similarity (geometry) Function (mathematics) Term (mathematics) Hypermedia Gastropod shell Computer-assisted translation Physical system Module (mathematics) Addition Structural load Closed set Proof theory Googol Kernel (computing) Computer animation Blog Order (biology) Website Object (grammar)
Android (robot) Thread (computing) Multiplication sign Range (statistics) Similarity (geometry) Mereology System call Vector potential Pointer (computer programming) Kernel (computing) Computer animation Hypermedia Personal digital assistant Plug-in (computing)
Slide rule Computer animation Personal digital assistant Rootkit Internet service provider Multiplication sign Feedback Game theory Information security Flow separation
type how bed and I and then
a and a and and
now we did again really get started with the 1st talk yeah so I'd like to just quickly introduced that's these actors would be talking about hard reciprocate of debugging reversing our good I think so how so this is the 1st time have been able to present present research publicly so that excited to be here but also a little bit nervous standing up in front of all these so bear with me but
so I would get at end game I work on a team focused on vulnerability research and prevention the and we do some vulnerability discovery but more recently we've been focused on exploit mitigation research but in the end I've been doing mobile and embedded research since since the Nokia N series was held the most popular the smartphone before the iPhone original he so outlined
today I had to scratch trusts unjust and this this is like 3 toxin 1 to talk fast here but but if there is interest in I can either a blog about the trust pod or you know the of the of questions all be here of throughout the contents so start to set the
stage for the original motivation top part of the motivation as i can get frustrated with with the tools and approaches available for debugging embedded systems and in particular I like working with with these bands and it is really no no good universal solution in with with hardware and yeah j tag is ideal in the problem is to get kind of widespread support on multiple devices is not realistic and cuts equipment and staff or on the other hand is kind of portable and scalable but in most the tools you find for like Android and IOS the the kind of limited the mode and then emulations really cool but the problem is you have I'm always interested in the low-level interface you know maybe the interaction the modem or other drivers and so on trying to spend time to emulate that is just uh too costly so my personal
philosophy has always been to make use of real hardware in everything I want is in this this phone right here so I make use of it and now I like software-based based tools just because it's it's been a more more portable so stepped
alarmed about architecture and Kevin search of of something else so the debugger architecture is basically invasive debugging and noninvasive the and invasive debugging you can do things like hope the CPU at below the take away from this side should be that there's a couple authentication signals that the bug enable and then the spider 1st secure world and if if neither of those asserted higher than that the only debug events supported the software breakpoint instruction uh so our options are kind limited on on carts devices because of that and the flipside noninvasive debugging includes features like trace uh my before my like uh Embedded Trace macrocell program trace that's not the subject of this talk other would would make for an interesting subject and it's interesting because phase can be controlled from software but I think I think it'll be it might be it this weather in which which devices which chip sets to support that but it is worthwhile mentioning that traces and something to to explore further another noninvasive feature that we we don't really FIL but I care about right now is the sample based profiling it's there's registered you can query periodically to get this state of the program counter but the focus of this talk is on the PM you and I know that the
opinion you might be a bit of an overloaded term in embedded systems were not I about the power management we're talking about the
performance monitoring in our Ammanford of of hardware performance counters uh so that's the subject of today's
discussion so the view you as an optional extension alarm but recommends it and most importantly it has a mandatory software interface via the system control co-processor so 15 I was introduced on V 6 and C can find an arm 11 cause cortex our cortex 80 and the only family what kind of in his cortex and your standard and custom cause as well and the premise behind the PMU is that you've got out of 1 or more counters a set of events that can be counted and then the PM You can can fire an interrupt when a counter overflows and that gives you the ability to to do this kind of sampling so if I wanted and to to instrument every time an event occurred 10 times I could I could initialize accounted as a negative 10 the and it provides
you feedback what's of therefore it allows you to kind of inspect the CPU and see how to perform it could be useful for for software or hardware purposes and there are tools out there commercial and open-source armed development studio 5 streamlines really cool is a screenshot from each check it out a trial version and lakes personal profiler privacy in the Linux kernel and they support you know more than arms in x 86 another architectures as well
so before kind of stepping forward a few quick abbreviations you'll sepium UP mind PMC throughout this talk about and I realize some of you may not be as familiar with the harm architecture sigh through the exception vector table and there it is the the the device exception vectors the Wonder I guess maybe should feel I remember wondrous member the supervisor call which most of us is used to service system calls um and then the SARS privilege modes you're gonna see like PLO and PL 1 throughout the talk ends of of may be really familiar with the ring terminology that there's a sort of the corresponding nodes so
the assisted research has been done before and 1 of the more interesting papers was this using harbor performance events to do instructional monitoring so basically they trap every single instruction a branch to a hypervisor but on x 86 I Whittington cool wrap detection work how using the PMU by at various researchers using mispredicted Brett return branches very and we've
seen rootkit detection using performance counters control-flow integrity you come check out our team a black at this hearing be talking about a control-flow integrity using the Peony on Intel architecture and really that's the real motivation and the stock is you know we've been doing this research on Intel on kind of a mobile security guy at heart so I was curious on the portability the arm and this is kind of where it ended up the but as you notice all this related work is focused on Intel architecture so there's plenty of room for exploration alarm so if you sample events
that we can call this is similar to what your final architectures instructions branches cache access cache miss etc. and now in the breeze
through the the registers that we can access through that system control co-processor so the control register it tells us how many calories exist on the PMU and it's sort of the the least significant that there is sort of like a global enables that's like turning the Peony on and off so there's there's
registers for enabling so at once you turn on the PM you have to enable a specific counter so you would write a did the appropriate bit offset to turn on the various counters turn them off to and then finally there's a counter selection register which you have to tell that the PMU in a from this point on I'm going to be referring to this countour and
once you do that you can set the events that you on account or the a query a right set the actual counter value and so notice in the event filter register is only 8 bits that are mn has 4 events so that they can limited there but the most significant bits are used for mode inclusions so on the right divisible table that shows you like if we want account branches we could do so in and lock in user-mode kernel-mode hypervisor mode and we can do those all by default because they're all the normal world because a lot more tricky with the secure world because we need to secure noninvasive debug signal to be asserted to and there are cops devices with that enable but finally the event counter register I mentioned that's that's the actual can count and you can read
write down so as far as configuring counters 1st we would but this space that turns on the PMU so we're setting that the can bit of the the control register next where we're setting the 1st counter SureSelect encounter 1 and the more writing and 8 to the event uh tape which is instructions executed sort configuring the Canada can instructions X we initialize the Canada minus 3 Mary 1 sample every 3 events and finally we enable counter 1 by writing to the counter enable them to the other half of of counter of the PMU as interrupts so this supper registers and harm for uh enabling or disabling interrupts for a given camera and so if we 1 enable both counters 1 to rewrite the value 3 to the interrupt enable register and then a free on in Europe we have to query the overflow status register and it so how do you know of a given p and you can count how we have to query you start by creating that the bug are status register which is in the debug coprocessor CP 14 and that will tell you whether invasive or noninvasive debuggers supported and and the and once you realize noninvasive debuggers supported higher then you can create bug feature register to get the specific version of the PMU that that this particular uh chip supports and there's about obviate introduced Kimia version 3 so here's kind of a sampling
of various chip sets tended to get a device with a kind of each chipset manufacturers so the main device I worked with was a nexus 6 but just to kind of show you know all these core text inside the wire way I silicon chips all have noninvasive the Work supported and they all have a P a new version to that node at the bottom there included the Broadcom chip from my nexus but it also has a P muon is this uh capable of of counting so obviously obviously the the Wally device looks like it's ready to be debugged his looks like anything goes but you know this 0 please just kind of a reflection that you know the PMU is quite common in cuts devices a so 1st case study is gonna
be using the PMU to do some instrumentation specifically tracing so our approach much like the paper I referenced how is that we want has frequent and he knew overflows in order to interrupt and trap uh instructions of branches and so course sites program flow trace uh captures to basically branches of the calls points and we can come pretty close to that of the same functionality using the PIM use we wanna account all branches both predicted and this predicted set our counters to minus 1 and Ruiz R R S R R interrupt service routine to do instrumentation
so the kind of visualize step to this we we initialize our Canada minus 1 recounting branches here so branch is about to occur counter overflows PMU triggers an interrupt our are and know captures saved program counter capture registers capture memory whatever whatever you wanna do then we reset the Canada minus 1 code continues to exit the use of branches so uh the the the counter is not incrementing here we come to another branch branch occurs at the counter overflows III Sorokin and do our our introspection the and the instrumentation then we reset the counter in it continues to so you they're familiar with
pertinent experts might ask all of 1 of our expert so I decided to Kennedy all this from scratch for a few reasons the 1st I wanted to write my own customers are right even if perf had a back over being invoked with an answer and long-term vision was on was more cards based on certain graph really wanna be like married to Linux and finally you know getting your hands getting my hands dirty was important in order to learn you know how the PMU works on arms so uh that said the per source is really a really handy resource for for the PMU not just on arm but on the and other architectures as well so a prime
mention I have I've got 2 kids at home and much of this research chemist started out after hours and it's you as you can imagine it's hard to find time to to do research with with kids who is always small windows you know whether I was putting and the about the order the well there watching cartoons I feel like the the graphics and this slide deck can accurately reflect my state of mind while doing this research so the 1st challenge
we have is we need to figure how to register for interrupts so generic interrupt controller spec can outlines both private prefer long and shared prefer ups and that interrupt ideas so basically in and like Linux for example we we need to registered for an hour but we need another number that the PMU is gonna use and that this spec recommends using interrupt ID 23 but not every chip has to use the generic interrupt controller there's there's custom uh so in the 1st
place we can look is device tree sources that the dot tedious and that tedious I files you might find in like an Android to a source and basically those files can outlining as specs and configurations of of the hardware and sometimes affine the PMU in their hands on this this is from the nexus 6 and we see this interrupts tuple so the 1st the 1st value is that the type of tells us whether it's a shared a private prefer interrupt in this case it's private and interrupt numbers like the offset so from our last slide we know that the private and interrupt started offset 16 so we can do the math and come up with 23 which matches the spec the problem is not every device uses the PNU even that's there and so you're not always gonna find something like this in the the device resource so for that I went for the brute force approach so I can that Amazon tablet or with with the media tech ChIP chip sets basically a red Proc interrupts found every unused shared and private interrupt registered for those uh then this the configured the PMU the overflow a specific number of times and went back to Proc interrupts to see you know which which interrupt number corresponded to x number of of cues and so that works but it's a kind of not as a straightforward looking in some source book and as far as implementation and mentioned as a couple handy API as linux and if we think about embedded firmware ultimately we need to pass their vector handler probably at the end before returning to the motor came from
so the 2nd big problem is something called of so let's say are counter overflows on this branch but the interrupt doesn't occur and another branch occurs I can't our commenced a positive 1 and eventually we get an interrupt so this these highlighted instructions can represent this Europe shadow and 2 we tighten a scared of for instructions before the p and you've got fired interrupt and because of this we can potentially lose up to 15 per cent that kind of rough metric of trace data vol cinnamon minute how we can kind of overcome that
so but together a prototype for Android has 3 components there's a kernel module a userspace Damon and either plug and basically the kernel module just configures the PMU on each core registers were interrupts on the core and then as a branch occurred the interrupt service routine just captures the SAFE program counter at that at the time of interrupt and it buffers it so it's not like sending you a small amounts of data I use the Linux relay file system to kind of pipe that to users space united severed either and so the meat we can use either for visualizing our are coverage as well as controlling uh the the Damon in terms of which threats we wanted to monitor the idea of starting and stopping and some of the features all show here in a minute so basically this this picture
can represents an ideal case where we're counting branches so we might get a saved the program counter from the branch to cis read from the branch from F get light from the bread breadth returning from DFS read and the branch returning from F but so we just need to connect the dots so kind of we we can use either time advantage because it's done all the hard work of of you know the the control flow of of each functions we can basically saying if we have a tracepoint blocks basic block we know that that that block has is being executed so we can count in color that 1 and if we only know that there's only 1 cross reference to the block from the block Ringo had income colors as well since we know that executions going that way so that but that helped us with this problem so I mention interrupt shadows let's say that the return from the FS 3 uh continued executing and we did triggered the interrupt until this this move instruction so here now we've got a block that doesn't have a trace point in it but our earlier out the algorithm can helps cover that most of you using this for fuzzing your code coverage and you don't really care because you can be having these code passed so many times that eventually and interrupts that occur in inside of that basic what we could probably make this more precise by using a 2nd counter to count the number of instructions between interrupts and then you know properly fill in the gaps but this this seemed effective enough it so 1st 1st
demo here and the the this clear of and have all recorded demos just causes too many moving pieces but the other requirements here that the device has to be rooted obviously the kernel might need to be recompiled for uh to support loadable kernel modules I used to configure pre and not the prompt notifiers to turn the PMU on and off based on the threat I'm interested in tracing so configured preempt how that I haven't seen many kernels that don't have that enabled by default and finally I I say that term dollar slide but 1 of the big challenges of tracing kernel-mode is that the idea that and when you if we were to reset our and counted to minus 1 within our interrupt service Satine as soon as it returned out of that would cause an overflow and you'd find ourselves in an interrupt loop of so the I got around it was added like for instructions to the end of of dire Q vector handler before returns to supervisor mode and reset the counter there 2 minus 2 that way the returned from the higher q-vector handlers handler and commends it to minus 1 and then the next branch we encounter is the 1 the 1 that we want a trap so that that I could potentially requiring a recompiling the kernel but I think you to do the same just hot patching uh the kernel through from the kernel module so 1st demo here we've got I to demo to them as of this tool so the 1st in a trace of the kernel mode and then uh and Annex 6 and then the 2nd will do something in user mode so this going where the lower plug to and I the device is already attached to organize attached and here we can choose whether we want a trace user-mode kernel-mode we can see all the running processes or kernel threads to attach to so the stem attached to the WL EventHandler which is a Wi-Fi kernel thread never get the eastern handler loaded here and so the nicer the trace and basically is is turned on the Wi-Fi and we start to see box getting colored in and at the window in the upper left to base shows us all the functions that have been hit and the number of unique tracepoints we've gotten from those functions and we can a click around and notice likable code-coverage it's this good telling us what percentage of at the basic block level instructional from that instructions this is the 2nd demo now a during the user mode so the positive set but it's were doing pretty well and time so this time I have 80 library loaded from that's dealing with uh Q my which is some of the communications coming back from the mode of soaring attached to a real D and the device and in the so this Grogan started trace and From with with the attached found to make a phone call and there's so much data coming back right now that the trace functions window can't keep up but I'm calling cooled Columbus time and weather and and the function I think that's loaded here's something that yeah to my voice or call status indicators so we can see you know which which instructions have been so composite traces of this this window can catch up and so the user is all the functions that have been hit if we scroll down we see a lot of Q my voice related things the but who where where I can see this being powerful as an with embedded systems is is doing like differential debugging so hearing a do sort of a differential debugs will tell the plug-in remember everything we've seen but in clear out uh that in the trace function history and now I will resume the trace and send ourselves a text message career on like isolate a semester handling so some myself a text uh and we see it text comes up says trace this please if we look at our trace functions those that that have symbols we can see you most of them are human SMS and so maybe we see this as a as command callback function and registered in this 1 interested in looking at it further so I just I can assure you can do more than just a capturing program counter we can go and say hey capture registers every time a tracepoint happens in this function so Nelson myself another text message and we should see that every new point would the highlighted where we
can click on it and no doubt register snapshot battery stacked up at that point in time inceptor occurred here
in In every everywhere we see this as 1 is red tracepoints we can
you click onto to the trace which are obviously could be useful for some dynamic analysis so you might you
might ask yourself so they del handed instrumentation and we analyze the way the thing about it is more the approach saying they were using the PMU you to do this and in theory we can apply this to other chips in fact I got as far with the Broadcom Wi-Fi where I had patched the interrupt service routine and I could cause of DeKalb branches that were I kind of got hold up was on the identifying the interrupt number but but this is you know can a less invasive than the source for break points the instruction tracing and we can easily support user our kernel mode and so again were not limited to branch tracing you could put whatever instrumentation logic you want a near the interrupt service routine and finally there the spans the wife cellular also have these PM so and Telemedia attack time potentially other ARM-based cellular-based spans also of him is apples the chips found in iPhones and iPads that have the PMU and were also not limited to Tom here in poppyseed maps and other architectures have similar performance monitoring units the so this is kind of where the
research go 180 because originally I was not thinking along that extend this to some sort of based and but I get
majorly distracted so yeah quick prior in ARM-based rootkits here I think the most traditional kernel rootkit would be to the patch this is called table or in patch the exception vector table and we've also seen in hot patching of kernel functions missing trust Sun-based rootkits that we've seen moving the exception vector table by toggling a bit in a control register and so there's been innocent interesting research and Armbruster cuts so the
inspiration for me came while reading the manual so arm is good has a there's like a set of maybe the 30 year so events that are like considered architectural on a subset of those are mandatory but then because the sort of flexibility for uh for for vendors to preserve extend the PMU that have this this table that lists the recommended event encodings and what they are and so as if you notice here uh various uh suggested events in the space that looks like the exception vector table so and coupled with the fact we were just sort of aggressively trapping of branch instructions by configuring a counter to to the minus 1 you in theory can we just trapped any of these exceptions now I will mention that arm does have account all exceptions and you could try to use that the problem is that since the PMU interrupt is delivered via higher Q and higher cues are included in in all exceptions you'll run into this big mass of you know uh a Iucky occurs which triggers a PMU Iike which then you might and you find yourself in a messier interrupt loop so this however is perfect because we can pick the here for example supervisor calls or something and focus on those so
this is kind where I was like what would we do next because you know this is kind of a a week ago are friends with you go the fence we could look closer at trust on an so rather than the doing a lot of thing it kind of ended up no doing a bunch of things so that that's kind of why this talk is like could be in separated and individual
talks but a quick note on arm licenses so there's basically correlations which is like I use arms core design and then there's architectural license which allows you to the build a custom designed as long as you implement the ARM instruction set so Qualcomm apple lots of vendors have architectural licenses but I mention that
because I wanted to show so which which cause a capable of counting the exception vector table and so the cortex a 7 a 53 a 50 72 and those are
all arms design and I wanted to just show that you know not only is it in the manual is a suggested event that included by
default on there is that the newest 64 bit cause a 70 72 no on the right united most my research on this nexus 6 device which is the crate architecture I want to show that in a custom based on designs from these three year by Qualcomm also account but you know I just wanna make a distinction that that's not obvious a unique to the custom custom-designed this comes the stems from the are manual and is included by default and now some of arms cost so if we look at if we look at the
manual see down earlier I can that ran through of the C 12 to C 14 control registers and those are the harm performance monitoring extensions but arm mental states you know that that are those the 1 extend the PMU and make sense to 1 extended because vendors might add custom features may be a custom designs being reserved stessi 15 have for what they call implementation-defined monitors and Qualcomm uses this and I looked at older I think an apple a 6 a chip and that it also uses this for their keeper kernel profiling and this probably others that that extend this as well
so quick quick good about Qualcomm's architectures so and basically they're the custom course eventually like 3 iterations of scorpion created cryo whatever point in time we generally you'll find them and kind of the dome of the highest and smartphones selecting the right now you can find the cry and h 2 C. tenants and the other devices that and on the folk singer create architecture but at the same the same definitely applies to scorpion and I haven't had time to to play with cryo but just from looking at source it seems like it kind of follows the same the same methodology so so what they've done is no extended the PMU and they've added for event select registers 1 for the vector floating-point unit and 3 from various other parts of the CPU and the creative and skin coded based on a combination of a code a group and in the region of the CPU and in order to use these extra even select registers after programming them you have to basically I can like to think of it as setting up a link so the other the ARM-based events like that history can after . 2 and the appropriate create region group so this table at the bottom we see uh in 3 different regions and how we can we can and read write to them that you can find some documentation and even codes from some really old scorpions source but to come up with this event code numbers ahead a little bit of black box analysis of this guy sort of walked every possible event code until I saw like a wrapper for like a duplication of the events which signifies that you have rather bits and it seems like you know there's there's a lot of events and only some of them a document had up to see the fullest because I think there some of the potentially powerful stuff you can do our but uh and then finally we see the bottom line there that's that's sort of like the and that's what I call the link so we're we're setting the harm events Select register to some base value sword with the group and it's kind of like telling the PNU to like look over here from the event could so for example if we wanted to count prefetch aborts with a combination of and the crate architecture that particular event that happens to be the and it's group 3 regions 0 so so in order to to to to configure rpm Unocal prefetch words we have to shift of the the over by 3 bytes and we write that to the creep region 0 event Select Registry and then we go back to the army events like register we pointed at Crate region 0 by coming up with this value c of which is uh and then you can actually look at the source and handwriting and I see now this is encoded in both in this the scorpion and create a P a new source FIL and basically Dad it's a new allows you to extend the PMU and kelp and sort of custom events the to so so the idea behind the PMU
assisted rootkit is than the fact that hey let's count supervisor call since you know most of us is use that vector forever handling system calls that's trapped in a try to trap all the supervisor constructions and I'll routine becomes the root so and you know we become we we get code execution at some point after that the supervisor call instruction and then we can redirect code execution by modifying the saves uh said registers however we want and so on 1 of the advantages this is that kind of avoids the current in a state in terms of of patch protection or you kernel integrity measuring test this against in the mobile space I'm only aware of like in Samsung has that thing is called Tema its eigen kernel integrity monitor but it runs interests someone in an apple has the care of Kernel Patch Protection and I would guess that neither of those would care about this because you're just registering for an interrupt service routine and you not touching the kernel image and so the installation of the OK risk-weighted configure the PM know which is a few C and C are instructions followed by registering for interrupts it
there are some challenges to implementation so 1st off as soon as the SVC instruction in goes to the exception vector handler and interrupts immediately disabled and then they're re-enabled later on so even if the PMU was ready to fire an interrupt because that's the most interrupt controllers a configured to to deliver the PMU was normal prior to a priority we wouldn't we wouldn't get code execution until after the CPS instruction and in this particular hundreds source so we have 3 cases we have to deal with because of the instructions get I mentioned before so even though interrupts are enabled right here we might be the PM you might the the is interrupt may be handled somewhere in the the vector S W I handler before branching to the result system coroutine and we could be interrupted right at the entry point of the system call routine or we could be interrupted after so if you are malicious ISR has to deal with all 3 of these
so the 1st case and I should mention that I I kind of put the stats on the frequency that I saw from each of these are by far the most common cases the green section but we do have to account for those others if we want to make sure we can always redirect so
basically we can we can retrieve uh the saved supervisor mode and user-mode registers in R S R and then we use register 7 you know to to filter and maybe we're looking for Mrs. read her there were looking to have agreed for example in this example and then from that point once we identified that word we are in the in the green case here with this thing to see how far off how far we've been interrupted pass that CPS sigh instructions we compute an offset and then we can emulate the remaining instructions but we can ignore every instruction that's dealing with resolving this is coroutines addressed from the this is called table lookup because wording to be taken care of that but we just apply whatever we use you know but whatever this codes doing towards the saved registers and at the end you know we can set the link register and set the program counter to our hook so that's sort of an easier case so this this case is
even easier so this is where the interrupt occurred in you happen to be right at the entry point of the the Cisco retain that we wanna talk so in this case we can just look up and see that Taylor program counter matches our the address of the legitimate cis read function and we swap swap the program counter with the address of our hope some of the final case which
is the the most challenging is were some of some point in the middle of of read or you pass the entry points out we deal with this so I just implemented by allowing the Mrs. coroutine the complete and hooking it after it was done so I did that by here we see a period but it's that rat FASyS calls being pushed onto the stack so I grabbed the saved stack pointer and I walk backwards until I find Rep Francisco Francisco replace that with a trampoline addressed in our trampoline address scanning you fix up the link register branch to what I call a post hoc function and so then our post-hoc function can then query the saved user-mode registers in order to to get the original parameters passed into the Cisco coroutine and then we can know copy buffers back to the kernel and modify them as necessary the limitation of this approach is this this kind of assumes we only care about the the about putting data that's coming from the kernel to user mode and if for example we wanted to have point right or sender something we would have to come up with another approach because we can't we can't just let cis rate complete because the the damage should be done at that point but of proof a concept that is said to to just cantilever that so I have to give the
demand side for a for the root the 1st
is that using a P a mutant Hough get 1064 nm which is a popular choice for processing file hiding so to set up this demo based in the unloading sort of a fake kernel integrity monitor were unjust periodically and scanning this is called table in other regions in the kernel in computing a hash and I just started some malware thread that's running in the kernel doing is printing out this is nasty print came nowhere so the so if we look at the file system we can see this Moura secrets file and if we do our process listing we should see a harm our worker thread and this again is on the same nexus 6 uh that I was working with earlier this to show that there is like a legitimate from the using the reflector to act which I highly recommend for mirroring in casting devices but and so now organ alone PMU rootkit has now it says opinion assisted rootkit there's no cloaking stuff so if we look on a file system or amour secrets file is gone and if we do process listing and we should be able to find them our worker thread and the same on the on the device and we should be able to see that the
molar work experience so then you just unloaded that rootkits and and if you notice that the kernel integrity monitor was still happy and we can now we can see that the worker thread again because the the PMU assisted rootkit was removed and this file should back up so then finally and do a traditional rootkit where I'm just gonna patch cis get tense and this is called table and they can approve the uh it was a legit patch we shouldn't we shouldn't see them our circuits file but now notice are super duper secure kernel integrity monitor says that things are corrupt so here that the intent of that that demo was just a kind of show 8 PM You assisted group action where were actually able to redirect and modifying data returning back to user mode all while using just PMU based traps but in my opinion Linux
circuits a boring and this is of this is the following so as to make it can a more interesting I decided to Qatar agreed but in the context of of Q marks the Aquino a userspace Damon and Andre uh much of the same codes in incomes in there area handled by comes west but it is responsible for routing Q my packets to and from the modem and to their respective clients uh in the high level OS it's you have to think of Q my is like a custom hazy T. commander something but we're gonna bring accessory the same marriages have you get dense but only using the PMU so that's our
next image here so this demo have 2 phones and everything was as a mentioned from using the reflector to active to mirror the devices but the iPhone on the left is a clean device and then receive account bearing the rate and that's going to be the the route to the device that so in this in the text from the iPhone and Android and this name and I found nexus receives it says hi how are you so now we're going to install new which is can read really really graph kernel log and we should start to seek you my packets of being printed out so I decided text from the device and we notice that uh on the screen it's on the nexus it's displaying hello with interdependent puny rootkits phone so I decided that you know preventing SMS from showing up is not very attractive demo so I decided to and you know basically passed Q append 7 bit
ASCII did in of of the incoming SMS which which was of pain all of that that but then there no now the iPhone's and the text says 0 h and the rootkit is appended I 0 so it's clearly a Buckeye fan if anybody knows what that even means and finally just a new show that were manner meddling that incoming few minus stream the holeable check balanced on T-Mobile which'll which will send the you the request and the we should see in that the Q might dump the response or the the originating in your system message from the network and it tells us it cropland active until whenever it and so see I mean that that's very much that demo basically just to showing that they were able to maintain the middle that the incoming stream of data in order to you potentially have interesting channels of over doing malicious things yeah so yeah so is 1 more goodbye text and again the rootkit appended with the same PMD rickets a phone message and so some analysis and
limitations of so trapping supervisor constructions and seems to add anywhere from like 2 to 5 % overhead depending on the can load on the on the device and I mentioned it should alleviate current kernel integrity minor algorithms I didn't validate this of but yeah I be surprised if these monitors care that you're always done is registered and interrupt service routine a couple limitations the the Peony registers themselves a persistent so this this rootkit would not be persistant every time the core comes out a reset the PMU registers all the loaded with their respective reset values the only other code running the kernel and or higher could tamper with the register so they could stop something from counting so that they could become problematic so how would we detect this rootkit you could try you know just reading Proc interrupts and however that the data that's showing there is easy to modify and potentially we could prevent that the PM use that our culine line from even showing up you could you could try to just query the registers and see if somebody's counting supervisor constructions and it's worth noting that access to the PM You can be trapped a hypervisor mode so if an attacker had control over the hypervisor exception vector table I can be interesting and as we'll see in a minute now what usage of this is malicious and I guess more generically just identifying an increase in interrupts is probably your best bet so there's tracepoints Linux kernel that that you can't really avoid here we can avoid and I suppose you could try to iterate and Lennox has like a radix tree that has all that interrupt handlers and based on higher Q number but I suppose you could try to validate addresses in there that you know that the handlers and maybe the static kernel image or something but the of familiar with shadow Walker and you might be able to do interesting things by trapping data prefetch reports so that In any time somebody tried to read memory where code lows we could potentially trap that and then you serve up bogus data like channel occured for a final of final case study on defense just to kind of show you that it's probably more more powerful and on the defensive side so same approach would attract the supervisor call instruction we're to do Cisco monitoring so if if my with Microsoft you met and or entire up tools like it they all pretty similar in that and windows they all injects a DLL into the process is the on protect which is kind of interesting because of they're preventing trying to prevent code reuse attacks but in doing so the increasing the surface of code reuse the and and so by doing monitoring from the kernel we can avoid that uh and this this is much easier to implement in a rootkit because we don't have to worry about redirecting code execution we just need apply some sort of integrity policy to make sure things religion on this is called 1 and for that I could allow us to protect carts binary soon after lean on vendors to to compile and the protections and we don't after the patch the kernel we just need to register when you just need a kernel module loaded or some other way to it register for an interrupt service routine so I decided to select stage fright which has been out a very popular since since last blackout this is the data provided from the nexus security bulletins and as we can see there's been a ton of remote code execution and privilege escalation bugs either a live stage fright data are labeled as media server but I'm just collectively calling of media so so it has
decided try to create a little proof concept that basically just call monitoring and detect some of the same entire objects that you met rock do and I use approval concepts from our brand of Google at a great blog on this particular city and and and that's the same CV that north that has and metaphor proof concepts I use both those kind of in combination so over last among terms from the show
exploiting the device and then will load Ahmad additions and tried again so we're starting that cat listener and will browse to a site that places particular media file and we should see we get a connection back so now we have a shell in the context of media server and and we can do get propped to show you at the jet devices hammerhead device running 105 1 it's it's a nexus 5 so except that show close of the browser and and now in loaded mitigations which is again just in a trap system calls in order to apply this entire objects and and we'll see what happens to similar kernel module we can see some verbose output I put in here where I'm configuring encounters the to count system calls to res instructions now we're browse the same site and how we should see so ASCII art appear which is saying exploit blocked so so if we look at if we look at the bottom now we can see that it had detected a stack it and
In this Binder Thread which is part of media server it was on and protect call and it notes that you know the current stack pointer is outside the range that was expecting so terminated that thread so this is kind of a new find extension aggressive demonstrating what you some defensive use cases that that you could do on an alarm based cause and with with pretty light overhead in under under 5 per cent
so the potential future work and I got sidetracked from Venice originally but you know that that damn 1st M I showed you of tracing I'd like to it will do that on some base spans and then you know also if you will to do something similar and I was kernel that would probably come be more valuable than than Android so I know I'm running low on time
but the slides important to me so I need to recognize some folks at an games of Coty Pierce uh has been extremely supportive and encouraging needed to published Eric malaise a mobile researcher at an Indian that helped me get some devices and provide feedback on some ideas Octavia Butler helped educate me on some some rootkit techniques and several others at any game that wished to remain anonymous and in uh and i to acknowledge the researchers that have kind of pioneered the PMU assistance security research because I think it's a pretty pretty interesting uh subject so we don't have
enough this time for questions of not of around but it also again so feel free to ask anything to the