Hardware-Assisted Rootkits and Instrumentation: ARM Edition
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 1 | |
Number of Parts | 20 | |
Author | ||
License | CC Attribution 4.0 International: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/32735 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
REcon 20161 / 20
3
5
6
9
10
13
15
17
20
00:00
Type theoryMultiplication signBitRootkitPresentation of a groupComputer hardwareArmJSONXMLUML
00:58
Slide ruleComputer programScalabilityFocus (optics)Computer fontProfil (magazine)EmulatorSoftwareoutputQuantum stateAuthenticationComputer configurationCommercial Orbital Transportation ServicesSmartphoneMereologyFilm editingRight angleSinc functionLevel (video gaming)Computer architectureSampling (statistics)Interface (computing)ArmBefehlsprozessorVideo gameDebuggerAndroid (robot)Asynchronous Transfer ModeMultiplication signMacro (computer science)Vulnerability (computing)Event horizonTime zonePhysical systemComputer hardwareReal numberCellular automatonDampingInteractive televisionBitModemInformation securitySeries (mathematics)Musical ensembleWeb crawlerGoodness of fitGame theoryPhase transitionSet (mathematics)Tracing (software)Decision tree learningStaff (military)Universe (mathematics)Content (media)Blog
04:36
Term (mathematics)Data managementBitPhysical systemPower (physics)Computer hardwareEmbedded systemExecution unitRight angleSoftware developerStreamlines, streaklines, and pathlinesEvent horizonQuery languageArmQuicksortRevision controlTape driveSelectivity (electronic)Point (geometry)CausalityObservational studyMultiplication signOpen sourceGame controllerCoprocessorPlug-in (computing)DebuggerInterrupt <Informatik>Reflection (mathematics)View (database)Computer configurationGreatest elementFilm editingFood energyPersonal digital assistantBuffer overflowCache (computing)Core dumpSet (mathematics)Ring (mathematics)Asynchronous Transfer ModeFamilyStandard deviationComputer architectureBranch (computer science)RootkitService (economics)Computer architectureWritingInformation securityINTEGRALException handlingSystem callSampling (statistics)Portable communications deviceSoftware bugTable (information)SpacetimeSoftwareInterface (computing)Default (computer science)Kernel (computing)Vector graphicsInformation privacyProfil (magazine)Inclusion mapControl flowCommercial Orbital Transportation ServicesCodeAuthorizationHypermediaFlow separationType theorySingle-precision floating-point formatComputer animation
12:17
Order (biology)Control flowBranch (computer science)CodeVisualization (computer graphics)Ferry CorstenComputer architectureBuffer overflowWebsiteSemiconductor memoryMotion captureFunctional (mathematics)Service (economics)Directed graphCoroutineCausalityMachine visionInterrupt <Informatik>Computer programObservational studyPersonal digital assistantDampingCore dump3 (number)PlastikkarteExpert systemPoint (geometry)ArmSystem callGraph (mathematics)
14:32
Interrupt <Informatik>Figurate numberSlide ruleQuantum stateMultiplication signWindowOrder (biology)PrototypeComputer fileAndroid (robot)Acoustic shadowBlock (periodic table)Computer hardwareGraph coloringTupleKernel (computing)Computer configurationTerm (mathematics)Personal digital assistantSpacetimeService (economics)Computer programDirected graphGame controllerNetwork topologyGeneric programmingBranch (computer science)Core dumpAsynchronous Transfer ModeNumberCodeUltraviolet photoelectron spectroscopyModule (mathematics)SharewarePatch (Unix)Vector graphicsFile systemControl flowLoop (music)Component-based software engineeringLatent heatTracing (software)Buffer overflowMetric systemDefault (computer science)Forcing (mathematics)Tablet computerHypermediaImplementationFirmwareFunctional (mathematics)Dot productPoint (geometry)LengthAlgorithmType theoryIntrusion detection systemNatural numberCountingDemonArmCoroutineThread (computing)Computer fontPlug-in (computing)Reading (process)Sound effectComputer animationProgram flowchart
22:13
Functional (mathematics)Multiplication signSystem callThread (computing)Block (periodic table)CuboidMessage passingPoint (geometry)Asynchronous Transfer ModePhysical systemUniqueness quantificationQuicksortProcess (computing)TelecommunicationSharewareKernel (computing)Tracing (software)Computer programWindowMotion captureNumberLibrary (computing)Position operatorSet (mathematics)Price indexSymbol tableModemEvent horizonGoodness of fitCodeStructural loadComputer animation
25:50
Point (geometry)Multiplication signMathematical analysisCodeArmKernel (computing)QuicksortAsynchronous Transfer ModeService (economics)CoroutineInterrupt <Informatik>PowerPCExecution unitSoftwareLogicComputer architectureBranch (computer science)HypermediaTheoryAndroid (robot)Cellular automatonDirected graphControl flowMappingComputer animationJSON
27:48
Vector graphicsException handlingKernel (computing)Table (information)Film editingSet (mathematics)MassPatch (Unix)SpacetimePerturbation theoryQuicksortBranch (computer science)ArmLoop (music)RootkitInterrupt <Informatik>BitSubsetTheoryCodeGame controllerFocus (optics)Event horizonFunctional (mathematics)CountingTime zoneSystem call
29:58
Core dumpTime zoneArmTable (information)Vector graphicsCross-correlationException handlingJSON
31:00
Event horizonArmDefault (computer science)Computer configurationSmartphoneQuicksortComputer architectureLink (knot theory)Right angleVector graphicsKernel (computing)Multiplication signBefehlsprozessorBitFloating-point unitMathematical analysisWordCodeOrder (biology)Group actionNumberPoint (geometry)Directed graphDialectLine (geometry)System callCombinational logicExtension (kinesiology)IterationQuantum stateRootkitGame controllerMereologyCausalitySelectivity (electronic)Black boxGreatest elementTable (information)Constructor (object-oriented programming)Windows RegistryPhysical systemCountingTraffic reportingCore dumpImplementationAndroid (robot)Execution unit
36:49
INTEGRALSheaf (mathematics)Asynchronous Transfer ModeAddress spaceParameter (computer programming)Computer programLink (knot theory)Branch (computer science)Personal digital assistantSpacetimeInterrupt <Informatik>CodeHookingReading (process)Directed graphSoftware testingTable (information)Game controllerSystem callStatisticsPatch (Unix)Computer fileMultilaterationCoroutineTerm (mathematics)Process (computing)FrequencyRootkitAxiom of choiceWritingVector graphicsRight angleProof theoryException handlingKernel (computing)Point (geometry)Normal (geometry)ImplementationOrder (biology)Limit (category theory)Buffer solutionTime zoneQuicksortSet (mathematics)Quantum stateComplete metric spaceFunctional (mathematics)WordPhysical systemPointer (computer programming)ResultantPC CardRepetitionMessage passingService (economics)Computer configurationEigenvalues and eigenvectors
42:29
MalwareThread (computing)Computer fileTable (information)Structural loadSystem callINTEGRALKernel (computing)Inheritance (object-oriented programming)Process (computing)Patch (Unix)RootkitElectronic mailing listQuicksortHoaxInformation securityHash functionSharewareDialectReflection (mathematics)Letterpress printingFile systemSelf-organizationDigital electronicsComputer animation
44:27
Asynchronous Transfer ModeGroup actionRootkitSharewareModemCodeComputer-generated imageryDigital electronicsContext awarenessAreaClient (computing)HookingReading (process)DemonSpacetimeoutputAndroid (robot)JSONXMLProgram flowchart
45:37
Reflection (mathematics)SharewareGraph (mathematics)BitTouchscreenAndroid (robot)Bit rateKernel (computing)LoginRoutingRootkitCountingRight angleASCII
46:50
SoftwareStreaming mediaDependent and independent variablesMessage passingHand fanSharewareRootkitOrder (biology)Physical systemMetropolitan area networkSource code
47:57
RootkitMessage passingLimit (category theory)Mathematical analysisInterrupt <Informatik>Proof theorySelectivity (electronic)Object (grammar)Network topologyWindowCodeHypermediaCoroutineSystem callObservational studyServer (computing)Kernel (computing)Patch (Unix)Table (information)Personal digital assistantLevel (video gaming)Vector graphicsCore dumpInformation securityModule (mathematics)Multiplication signQuicksortException handlingSemiconductor memorySoftware bugRemote procedure callINTEGRALDecision tree learningService (economics)Combinational logicConstructor (object-oriented programming)Bulletin board systemAcoustic shadowValidity (statistics)SurfaceLine (geometry)Fluid staticsAsynchronous Transfer ModeGame controllerOcean currentGoogolAlgorithmAddress spaceBlogProcess (computing)Commercial Orbital Transportation ServicesLibrary (computing)SharewareStructural loadOverhead (computing)BitTraffic reportingBinary codeNumberPoint (geometry)Tracing (software)Source codeJSONXML
52:52
Term (mathematics)System callPivot elementGreatest elementKernel (computing)Server (computing)Structural loadWebsiteHypermediaPointer (computer programming)Web browserFerry CorstenOrder (biology)CountingFunction (mathematics)MereologyPhysical systemComputer fileConnectivity (graph theory)Thread (computing)Context awarenessRange (statistics)Gastropod shellClosed setModule (mathematics)AdditionSimilarity (geometry)Computer-assisted translation
54:32
Personal digital assistantFlow separationMultiplication signPlug-in (computing)Game theorySimilarity (geometry)RootkitFeedbackInformation securityInternet service providerKernel (computing)Android (robot)Slide ruleVector potentialArmMusical ensembleoutputJSONComputer animation
Transcript: English(auto-generated)
00:24
Okay, okay, we're gonna get started with the first talk So I'd like to just quickly introduce Matt Spisak who's going to be talking about hardware assisted rootkits and Debugging reversing on arm cool. All right. Thanks
00:44
So this is the the first time I've been able to present present research publicly, so I'm excited to be here, but also a little bit nervous standing up in front of all of you. So bear with me So I work at an endgame I work on a team focused on vulnerability research and prevention
01:06
We do some vulnerability discovery but more recently we've been focused on exploit mitigation research and I've been doing mobile and embedded research since since the Nokia n-series was kind of the the most popular
01:23
Smartphone before the iPhone original life So our outline today I had to scratch trust zone just there's this is like three talks in one so I'm gonna have to talk fast here, but If there's interest, you know, I can either blog about the trust zone part or you know, if you have questions, I'll be here
01:45
throughout the conference so to start I'm gonna set the stage for the original motivation or part of the motivation is I kind of get frustrated with with the tools and approaches available for
02:01
debugging embedded systems in particular, I like working with with base bands and You know, there's really no no good universal solution, you know with with hardware You know JTAG is is ideal, you know The problem is to get kind of widespread support on multiple devices is
02:21
Not realistic on cuts equipment software on the other hand is kind of portable and scalable But you know most the tools you find for like Android and iOS They're they're kind of limited to user mode and then emulation is really cool But the problem is, you know, I'm always interested in the low-level interface
02:41
You know, maybe the interaction with the modem or other drivers and so Trying to spend time to emulate that is just too costly So my personal philosophy has always been to make use of real hardware, you know, everything I want is in this this phone right here So why not make use of it and I like software-based tools just because it's it's you know, more more portable
03:06
So we're gonna step through the arm debug architecture Kind of in search of something else so the debugger architecture is basically invasive debugging and non-invasive and You know invasive debugging you can do things like halt the CPU
03:23
Really the takeaway from the slide should be that there's a couple authentication signals the debug enable and the spieden for secure world and if if neither of those are asserted high then The only Debug event supported as the software breakpoint instruction
03:40
So our options are kind of limited on COTS devices because of that And the flipside non-invasive debugging includes features like trace You might be familiar with like embedded trace macro cell or program trace That's not the subject of this talk. Although it would would make for an interesting subject and
04:01
It's interesting because trace can be controlled from software But I think I think it'll be You know, it might be hit and miss whether You know which which devices which chipsets do support that but it is worthwhile mentioning that the trace is Something to explore further another non-invasive feature that we we don't really
04:24
Care about right now is sample based profiling. It's there's registers You can query periodically to get this the state of the program counter But the focus of this talk is on the PMU and I know that the PMU might be a bit of an overloaded term in
04:40
Embedded systems. So we're not talking about the power management We're talking about the performance monitoring unit or you may have heard of hardware performance counters So that's the subject of today's discussion. So the PMU is an optional extension in arm, but arm recommends it and Most importantly it has a mandatory software interface via the system control code processor. So CP 15
05:06
It was introduced in arm v6 So you can find it in arm 11 cores cortex are cortex a the only family you won't find it in as cortex M And you'll find it in custom cores as well and the premise behind the PMU is that you've got
05:23
One or more counters a set of events that can be counted and then the PMU can can fire an interrupt when a counter Overflows and so that kind of gives you the ability to do kind of sampling. So if I wanted To to instrument every time an event occurred ten times I could I could initialize a counter to say negative ten
05:44
And it provides, you know feedback. What's it there for it? It allows you to kind of Inspect the CPU and see how it's performing could be useful for for software or hardware purposes And there are tools out there commercial and open source Arm development studio 5 streamlines really cool. This is a screenshot from it. You should check it out. There's a trial version
06:06
And Linux perf no profile are obviously in the Linux kernel and they support, you know more than arm x86 and other architectures as well So before kind of stepping forward a few quick abbreviations, you'll see PMU PMI and PMC throughout this talk
06:23
And I realize some of you may not be as familiar with the arm architecture. So I threw the exception vector table in there these are the various exception vectors the one that I guess Maybe if you only remember one just remember the supervisor call which most of us is used to service system calls
06:42
And then as far as privilege modes you're gonna see like PLO and PL one throughout the talk And so if maybe you're only familiar with the ring terminology Those are sort of the corresponding modes so PMU assisted research has been done before
07:00
One of the more interesting papers was this using hardware performance events to do instruction level monitoring So basically they trap every single instruction or branch to a hypervisor but on x86 We've seen some cool ROP detection work using the PMU by various researchers using mispredicted brant return branches, sorry and
07:24
We've seen rootkit detection using performance counters Control flow integrity you can come check out our team at black hat this year We're gonna be talking about control flow integrity using the PMU on the Intel architecture And really that's the real motivation behind this talk is you know, we've been doing this research on Intel
07:41
I'm kind of a mobile security guy at heart So I was curious on the portability to arm and this is kind of where it ended up But as you notice all this related work is focused on the Intel architecture So there's plenty of room for exploration on arm. So a few sample events that We can count and this is similar to what you'll find on other architectures
08:03
Instructions branches cache access cache miss, etc And I'm gonna breeze through the the registers that we can access through that System control coprocessor. So the control register tells us how many counters exist on the PMU And it's sort of the the least significant bit. There is sort of like a global enable
08:24
So it's like turning the PMU on and off I There's there's registers for enabling at once you turn on the PMU you have to enable a specific counter so you would write The the appropriate bit offset to turn on the various counters or turn them off
08:42
and then finally there's a counter selection register, which you have to Tell the the PMU, you know from this point on I'm going to be referring to this counter and then once you do that, you can set the events that you want to count or the query or write set the actual counter value and so notice in the
09:01
Event filter register. There's only eight bits that arm has for events. So they're kind of limited there But the most significant bits are used for mode inclusions so on the right there's a little table that shows you like if we wanted to count branches we could do so and and More and user mode kernel mode hypervisor mode
09:22
We can do those all by default because they're all in normal world it gets a little bit more Tricky with the secure world because we need secure non-invasive debug signal to be asserted and there are COTS devices with that enabled but Finally the event counter register I mentioned that's that's
09:43
The actual account and you can read and write that so as far as configuring counters first we would This basically turns on the PMU so we're setting that the significant bit of the control register Next we're we're setting the the first counter. So we're selecting encounter one and then we're writing an eight to the event
10:06
Type which is instructions executed. So we're configuring the counter to count instructions Next we initialize the counter to minus three. Maybe we want to sample every three events and Then finally we enable counter one by writing to the counter enabled it
10:23
So the other half of counter of the PMU is interrupts. So there's separate registers an arm for enabling or disabling interrupts for a given counter and So if we want to enable both counters one and two We write the value three to the interrupt enable register
10:41
And then if we on an interrupt, we have to query the overflow status register and clear it So, how do you know if a given PMU can count we have to query you can start by querying the debug auth status register, which is in the debug coprocessor CP 14 and That will tell you whether invasive or non-invasive debug is supported
11:04
and then The If once you realize non-invasive debug is supported then you can query the debug feature register To get the specific version of the PMU that this particular Chip supports and there's arm v8 introduced PMU version 3
11:27
So here's kind of a sampling of various chipsets I tried to get a device with kind of each chip set manufacturers So the main device I worked with was a Nexus 6 But just to kind of show you you know all these Qualcomm media tech Samsung
11:44
Huawei high silicone chips all have non-invasive debug supported and they all have PMU version 2 and I've note at the bottom there. I included the Broadcom chip from my Nexus it also has a PMU and is capable of counting
12:01
so obviously the the Huawei device looks like it's ready to be debugged because it looks like anything goes but You know this Hopefully this is just kind of a reflection that you know, the PMU is quite common in in COTS devices So our first case study is going to be using the PMU to do some instrumentation specifically
12:24
tracing So our approach much like the paper I referenced is that we want to cause frequent PMU overflows in order to interrupt and trap Instructions or branches and so core sites program flow trace
12:42
Captures basically branches or it calls them waypoints and We can come pretty close to the the same functionality using the PMU So we want to count all branches both predicted and mispredicted We're gonna set our counters to minus one and we'll use our ISR our interrupt service routine to do the instrumentation
13:01
so to kind of Visualize step through this we initialize our counter to minus one and we're counting branches here So a branch is about to occur counter overflows PMU triggers an interrupt our ISR can then you know capture the saved program counter capture registers capture memory Whatever whatever you want to do, then we reset the counter to minus one
13:24
code continues to exit none of these are branches, so The counter is not incrementing here. We come to another branch branch occurs counter overflows ISR can then do our Introspection and
13:41
Or instrumentation and then we reset the counter and you know it continues So some of you that are familiar with per Linux perf might ask well What about Linux perf? So I decided to kind of do all this from scratch for a few reasons the first I wanted to write my own custom ISR even if perf had a
14:01
Callback for Being invoked within the ISR and Long-term vision was on was more towards baseband, so I didn't really want to be like married to Linux and Finally you know getting your hands getting my hands dirty was important in order to learn you know how the PMU
14:20
works on arm, so That said the perf source is really a really handy resource for For the PMU not just on arm, but on the other architectures as well So I should probably mention. I have I've got two kids at home and Much of this research kind of started out after hours
14:40
And it's you know as you can imagine it's hard to find time to to do research with with kids So it was always small windows you know whether I was putting him to bed or Or you know well they're watching cartoons, so I feel like the graphics in this Slide that kind of accurately reflect my state of mind while doing this research
15:03
So the first challenge we have is we need to figure out how to register for interrupts So the arm generic interrupt controller spec kind of outlines Both private peripheral and shared peripheral interrupts and the interrupt IDs so basically in like Linux for example We we need to register
15:21
For interrupt, but we need to know the IRQ number that the PMU is going to use And the this spec recommends using interrupt ID 23, but not every chipset has to use the generic interrupt controller There's there's custom So the first place we can look is device tree source those are the dot DTS and dot DTS
15:43
I files you might find in like an Android source and basically those files kind of outline As Specs and configurations of the hardware and sometimes you'll find the PMU in there and so in this this is from the Nexus 6 We see this interrupts tuple so the first the first value is that the type well?
16:05
Tells us whether it's a shared or private Peripheral interrupt in this case. It's private and the interrupt number is like the offset so from our last side We know that the private interrupts start at Offset 16 so we can do the math and come up with 23 which matches our spec
16:23
The problem is not every device uses the PMU even though it's there and so you're not always going to find Something like this and the device tree source so for that I went for the brute force approach so like on the Amazon tablet or with with the MediaTek chip chipset
16:41
I basically read proc interrupts Found every unused shared and private peripheral interrupt registered for those then Then basically configured the PMU to overflow a specific number of times and then went back to proc interrupts to see you know which which interrupt number corresponded to X number of
17:04
IRQs and so that works, but it's you know kind of not as straightforward as looking in some source And as far as implementation I mentioned there's a couple handy Apis and Linux and if we think about embedded firmware ultimately we need to patch the IRQ vector handler
17:24
probably at the end before returning to The mode it came from so the second big problem is something called interrupt shadow So let's say our counter overflows on this branch, but the interrupt doesn't occur and then another branch occurs
17:42
counter increments to a positive one and then eventually we get an interrupt so this these highlighted instructions kind of represent this interrupt shadow and So we've had an a skid of four instructions before the PMU Fired the interrupt and because of this we can potentially lose up to 15%
18:01
That's a kind of rough metric of trace data, but we'll see in a minute how we can kind of Overcome that so I put together a prototype for Android. There's three components. There's a kernel module userspace daemon and then an IDA plug-in and basically the kernel module just configures the PMU on each core registers for interrupts on that core and then as
18:26
Branches occur the interrupt service routine just captures the saved program counter at that at the time of interrupt And it buffers it so it's not like sending you know small amounts of data And I use the Linux relay file system to kind of pipe that to userspace
18:41
And then that gets sent over to IDA and so then we we can use IDA for visualizing our coverage as well as controlling The the daemon in terms of which threads we want it to monitor You know starting and stopping and some other features. I'll show here in a minute So basically this this picture kind of represents an ideal case where
19:02
We're counting branches So we might get a saved a program counter from the branch to sis read From the branch from F get light from the branch the branch returning from VFS read and the branch returning from F but so then we just need to connect the dots so kind of
19:21
We can use IDA to our advantage because it's done all the hard work of you know The the control flow of each function so we can basically say if we have a trace point in the block Basic block we know that that that block has is being executed so we can count and color that one And if we only know if there's only one
19:40
cross reference to the block or from the block we can go ahead and count and color those as well since we know that execution is going that way so that That helped us with this problem, so I mentioned interrupt shadow, so let's say that the return from VFS read Continued executing and we didn't trigger the interrupt until this this move instruction so here now
20:05
We've got a block that doesn't have a trace point in it But our our earlier out there algorithm kind of helps cover that plus if you're using this for fuzzing your code coverage You don't really care because you're going to be hitting these code paths so many times that eventually an
20:20
Interrupts going to occur in inside of that basic block We could probably make this more precise by using a second counter to count number of instructions between interrupts, and then you know properly fill in the gaps, but This this seemed effective enough, so first first demo here
20:40
The basically, um and I have all recorded demos just because there's too many moving pieces, but The the requirements here that the device has to be rooted Obviously The kernel might need to be recompiled for To support loadable kernel modules I use the config preempt the preempt notifiers to
21:04
Turn the PMU on and off based on the thread. I'm interested in tracing so config preempt how though I haven't seen many kernels that don't have that enabled by default and finally I Trimmed out a slide but One of the big challenges with tracing kernel mode is the the idea that
21:23
when you if we were to reset our Counter to minus one within our interrupt service routine as soon as it returned out of the ISR That would cause an overflow and we'd find ourselves in an interrupt loop So the way I got around it was I added like four instructions to the end of the IRQ
21:43
Vector handler before it returns to supervisor mode and I reset the counter there to minus two that way the return from the IRQ vector handler Increments it to minus one and then the next branch we encounter is the one the one that we want to trap so that
22:00
That could potentially require, you know, recompiling the kernel But I think you could do the same just hot patching the kernel through it from the kernel module So our first demo here we've got I'll do two demo or two demos of this tool. So the first we're going to trace kernel mode and then On the Nexus 6 and then the second we'll do
22:22
something in user mode, so basically, we're gonna we're gonna load our plug-in and The device is already attached so we're gonna Attach and here we can choose whether we want to trace user mode kernel mode we can see all the running processes or kernel threads to attach to
22:44
So for this demo, I'm gonna attach to the WL event handler, which is a Wi-Fi kernel thread and I've got the e-scan handler Loaded here. And so at the now I started the trace and I'm basically use I just turned on the Wi-Fi and we start to see blocks getting colored in
23:03
and The window in the upper left basically shows us all the functions that have been hit and the number of unique trace points We've gotten from those functions so we can click around and notice there's like a little code coverage So it's basically telling us, you know, what percentage of at the basic block level and instruction level from that instruction
23:23
So this is the second demo and now we're doing user mode So I'm gonna pause it a sec, but so we're doing pretty well on time. So this time I have a library loaded for that stealing with QMI Which is some of the communications coming back from the modem. So we're going to attach to real D on the device and
23:52
And so basically we're going to start a trace and With with the attached phone I'm going to make a phone call and
24:01
There's so much data coming back right now that the trace functions window can't keep up but I'm calling good old Columbus time and weather And the function I think that's loaded here is something that yeah to my voice all call status indicator So we can see you know, which? which
24:21
Instructions have been hit so now we can pause the trace so that this this window can catch up And so these are these are all the functions that have been hit If we scroll down we see a lot of QMI voice related things but Where I kind of see this being powerful as in
24:41
With embedded systems is doing like differential debugging. So here we're going to do sort of a differential debug So we'll tell the plug-in remember everything we've seen but go ahead and clear out the trace function history and then now We'll resume the trace and send ourselves a text message because maybe we want to like isolate SMS
25:05
Handling so I'll send myself a text and we see it the text comes up says trace this please And if we look at our trace functions those that have symbols we can see, you know, most of them are QMI SMS And so maybe we see this SMS command callback function. Maybe we're interested in this one
25:26
Interested in looking at it further. So just to kind of show you can do more than just Capturing program counter maybe we can Say hey capture registers every time a trace point happens in this function
25:41
So now I'll send myself another text message and we should see every new trace point with highlighted where we can click on it and At that point in time We can step through our code here and
26:01
Every everywhere we see this one of these red trace points we can you know click on to To view trace which obviously could be useful for some dynamic analysis so You might you might ask yourself. So a big deal Android instrumentation
26:22
You know, I guess the way to think about it is more the approach so, you know We're using the PMU to do this and in theory we can apply this to other chips in fact I got as far with the Broadcom Wi-Fi where I had patched the interrupt service routine and I could Cause it to count branches where I kind of got held up was on
26:43
Identifying the interrupt number but But This is you know, kind of less invasive than software breakpoint Instruction tracing we can easily support user or kernel mode And again, we're not limited to branch tracing you could put whatever instrumentation logic you want in your interrupt service routine
27:06
and finally You know base bands whether Wi-Fi or cellular also have these PMUs so Intel media tech Is potentially other arm based cellular base bands also have PMUs apples
27:23
chips found in iPhones and iPads Have a PMU and we're also not limited to arm here, you know power PC and MIPS and other architectures have similar performance monitoring units So this is kind of where the research took a 180 because originally I was you know
27:43
Extend this to some sort of baseband, but I got majorly distracted so Quick prior art and arm based rootkits, you know, I think the most traditional Arm kernel rootkit would be to either patch the syscall table or you know patch the exception vector table
28:04
We've also seen hot patching of kernel functions We've seen trust zone based rootkits We've seen moving the exception vector table by toggling a bit and a Control register so there's been you know, some
28:22
Interesting research in an arm based rootkits. So the inspiration for me came while reading the manual so arm basically has There's like a set of maybe 30 or so events that are like considered architectural and a subset of those are mandatory But then because there's sort of flexibility for
28:44
For for vendors to sort of extend the PMU they have this this table that lists sort of recommended event Encodings and and what they are and so if if you notice here For these
29:01
suggested events this basically looks like the exception vector table, so and Coupled with the fact we were just sort of aggressively trapping branch instructions by configuring a counter to be minus one You know in theory, couldn't we just trap any of these exceptions?
29:22
Now I will mention that arm does have a count all exceptions and you could try to use that The problem is that since the PMU interrupt is delivered via IRQ And IRQs are included in the all exceptions You'll run into this big mess of you know
29:42
Legit IRQ occurs which triggers a PMU IRQ, which then you might find yourself in a messy interrupt loop so this however is perfect because we can pick you know, for example supervisor calls or something and focus on those So this is kind of where I was like
30:01
What what do we do next because you know, this is kind of we could go offense we could go defense We could look closer at trust zone so rather than Doing a lot of one thing. I kind of ended up Doing a bunch of things so that that's kind of why this talk is like could be you know separated into individual
30:24
Talks, but a quick note on arm licenses. So there's basically a core license, which is like You use arms core design and then there's the architectural license, which allows you to build a custom design as long as you implement the arm instruction set and so
30:41
Qualcomm Apple, you know, there's lots of vendors have architectural licenses But I mentioned that because I wanted to show so which which cores are capable of counting the exception vector table and so The cortex a7 a 53 a 57 a 72 Those are all arms design and I wanted to just show that you know
31:05
Not only is it in the manual as a suggested event. It's included by default on the newest 64-bit cores a 57 a 72 And then on the right, you know, I did most of my research on this Nexus 6 device, which is a crate architecture
31:22
But I wanted to show that you know custom-based arm designs these three are Qualcomm also account but you know I just want to make the distinction that that's not obviously unique to their custom design This comes this stems from the arm manual and is included by default on
31:41
some of arms cores So if we look if we look at the manual we'll see Earlier I kind of ran through C12 through c14 control registers and those are the arm performance monitoring extensions but arm in their manual states
32:00
you know that that those that want to extend the PMU and it makes sense to want to extend it because Vendors might add custom features might be custom designs They reserved c15 for what they call implementation defined monitors and Qualcomm uses this I looked at an older I think an Apple a6
32:23
Chip and it it also uses this for their kperf kernel profiling And there's probably others that that extend this as well so Quick quick. What about Qualcomm's architecture? So
32:40
Basically their their custom cores have been through like three iterations so scorpion crate and cryo Whatever point in time Generally, you'll find them in kind of the the highest end smartphones so like, you know right now you can find the cryo and HTC 10 and some of the other devices that just came out and
33:01
I'm gonna be focusing on the crate architecture But the same the same definitely applies to scorpion and I haven't had time to play with cryo But just from looking at source, it seems like it kind of follows the same the same Methodology so so what what they've done is, you know extended the PMU and they've added four
33:24
Event select registers one for their vector floating point unit and three for various other parts of the CPU and The crate events get encoded based on a combination of a code a group and in the region of the CPU and
33:40
in order to use these Extra event select registers after programming them you have to basically I kind of like to think of it as setting up a link so the The arm based event select register you kind of have to point to the appropriate crate region and group
34:00
So in this table at the bottom we see You know three different regions How we can we can Read and write to them that You can find some documentation on event codes from some really old scorpion source But to come up with these event code numbers. I had to do a little bit of black box analysis
34:22
so basically I sort of walked every possible event code until I saw like a wrap or like a duplication of events which signifies that you know, I've run out of bits and It seems like you know, there's there's a lot of events and only some of them are documented
34:42
I'd love to see the fullest because I think there's some potentially powerful stuff you could do on arm, but And then finally we see the bottom line there. That's that's sort of like the That's what I call the link So we're we're setting the arm event select register to some base value
35:03
That's ordered with the group and it's kind of like telling the PMU to like look over here for the the event code So for example if we wanted to count prefetch reports with a combination in the crate architecture That particular event code happens to be B and it's group three region zero. So
35:24
so in order to to To configure our PMU to count prefetch reports we have to shift The B over by three bytes and then we write that to the crate region zero event select register and
35:40
then we go back to the arm event select register and we point it at crate region zero by coming up with this value CF which is and and you can actually look at the the source and Android and see How this is encoded? Both in this their scorpion and crate
36:01
PMU source and basically Yeah it's uh, you know allows you to extend the PMU and count sort of custom events So so the idea behind a PMU assisted rootkit is then the fact that hey Let's count supervisor calls since you know, most OSes use that vector for
36:25
handling system calls Let's trap Try to trap all these supervisor call instructions and then our ISR routine becomes the rootkit so You know, we become we get code execution at some point after the
36:43
Supervisor call instruction and then we can redirect code execution by modifying the saved saved registers However, we want and so One of the advantages of this is it kind of avoids the current? you know state in terms of patch protection or
37:01
Kernel integrity measuring now I didn't test this against in the mobile space I'm only aware of like, you know, Samsung has I think it's called Tima. It's like a kernel integrity monitor, but it runs in trust zone and then Apple has their kernel patch protection and I would guess that neither of those
37:23
Would care about this because you're just registering for an interrupt service routine And you're not touching the kernel image. And so the installation of the rootkit is basically you configure the PMU, you know Which is a few? MRC EMC our instructions followed by registering for interrupts
37:44
Now there are some challenges to the implementation so first off as soon as the SPC instruction, you know goes the the Exception vector handler Interrupts are immediately disabled and then they're re-enabled later. And so even if the PMU is ready to fire an interrupt
38:04
Because the most interrupt controllers are configured to to deliver the PMU with normal IRQ priority We wouldn't we wouldn't get code execution until after this CPS ie instruction and in this particular Android
38:22
source So we have three cases we have to deal with because of the instructions skit I mentioned before so even though interrupts are enabled right here. We might be the PMU might be the PMU's interrupt might be handled somewhere in the
38:40
Vector SWI handler before branching to the resolved system call routine We could be interrupted right at the entry point of the system call routine or we could be interrupted after and so we have to Malicious ISR has to deal with all three of these so the first case And I should mention I kind of put the stats on the frequency that I saw from each of these so by far
39:05
The most common case is the green section but we do have to account for those others if we want to make sure we can always redirect so basically, we can we can retrieve the saved Supervisor mode and user mode registers in our ISR, and then we use register 7 you know to to
39:27
Filter maybe we're looking for sis read or that we're looking to hook read for example in this example and then from that point once we identify that we're We are in the in the green case here. We basically need to see how far
39:44
How far we've been interrupted past that CPS ie instructions? So we compute an offset, and then we can emulate the remaining instructions, but we can ignore every instruction. That's dealing with resolving The syscall routines address from the syscall table lookup because we're going to be taking care of that
40:03
But we just apply whatever we just you know apply whatever this codes doing towards the saved registers and then at the end you know we can set the link register and Set the program counter to our hook, so that's sort of an easier case So this this case is even easier
40:21
so this is where the interrupt occurred and you happen to be right at the entry point of The syscall routine that we want to hook so in this case We can just look up and see that hey the program counter matches our The address of the legitimate syscall read function, and then we swap Swap the program counter with the address of our hook
40:44
So then the final case which is the the most challenging is We're some at some point in the middle of sys read or pass the entry points. How do we deal with this? so I chose to implement it by Allowing the syscall routine to complete and then hooking it after it was done, so I did that by
41:04
We see up here that it's that fast syscall is being pushed onto the stack so I Grabbed the save stack pointer, and I walk backwards until I find Rhett fat syscall fastest call and then I replace that with a trampoline address and our trampoline address can
41:22
You know fix up the link register, and then branch to what I call a post hook function And so then our post hook function can then query the saved user mode Registers in order to to get the original parameters passed into the syscall routine, and then we can you know copy buffers Back to the kernel and modify them as necessary
41:43
So the limitation of this approach is this this kind of assumes. We only care about About hooking data that's coming from the kernel to user mode If for example we wanted to hook like write or send or something we would have to come up with another approach because we can't we can't just let this write complete because the
42:05
Damage would be done at that point But for a proof of concept I decided to just kind of leave it at this So I have two or two demos sorry for for the rootkit the first is using the PMU to hook this get done 64 you know which is a popular choice for
42:27
Process and file hiding So to set up this demo basically I'm going to be I'm Loading sort of a fake kernel integrity monitor where I'm just periodically
42:44
Scanning the syscall table and other regions in the kernel and computing a hash And then I just started some malware thread that's running in the kernel all it's doing is printing out This is nasty print K malware So that so if we look on our file system we can see this malware secrets file
43:01
And if we do a process listing We should see our malware worker threat, and this again is on the same Nexus 6 that I was working with earlier Just to show that there's like a legitimate phone I'm using the reflector to app which I highly recommend for mirroring and casting devices, but
43:24
So now we're gonna load our PMU rootkit so now that it says PMU assisted rootkit is now Cloaking stuff so if we look on a file system our malware secrets file is gone and if we do a process listing we shouldn't be able to find the malware worker thread and
43:44
Same on that on the device We shouldn't be able to see that the malware worker thread, so then I've just unloaded that rootkit and If you notice that the kernel integrity monitor was still happy and we can now we can see the the worker thread again because the
44:03
PMU assisted rootkit was removed and the file showed back up So then finally I'm gonna do a traditional rootkit where I'm just gonna patch this get dense and assist call table and to kind of prove that It was a legit patch. We shouldn't we shouldn't see the malware secrets file, but now notice our
44:22
Super duper secure kernel integrity monitor says that things are corrupt So kind of the the intent of that that demo was just to kind of show a PMU assisted rootkit in action where we're actually able to redirect and modify Data returning back to user mode all while using just PMU based traps
44:46
But you know in my opinion Linux rootkits are boring and you know This is a phone so to make it kind of more interesting I Decided to try to hook read but in the context of qmuxd qmuxd is a user space daemon and Android
45:02
much of the same codes and in comm Center are handled by comm Center and iOS but It's basically responsible for routing qmi packets to and from the modem and to their respective clients in the high-level OS and It's you can kind of think of qmi as like a custom haze AT command or something but
45:26
We're gonna we're gonna hook this read the same way. We just hook get dense, but only using the PMU So that's our next demo here so for this demo. I have two phones and
45:41
Everything was as I mentioned before I'm using the reflector to app to mirror the devices but The iPhone on the left is a clean device and then we see the count there on the right and that's going to be the the root kitted device, but so I'm going to send the text from the iPhone to the Android and
46:03
This name the iPhone risk or the Nexus receives it it says hi, how are you? So now we're going to install our New rootkit which is hooking read And we're gonna read the kernel log and We should start to see qmi packets being printed out
46:23
So I just sent a text from the device and we noticed that on the screen. It's on the Nexus It's displaying hello, but it appended PMU rootkits are fun So I decided you know Preventing SMS from showing up is not very attractive demo, so I decided to
46:44
You know Basically parse the qmi and append 7-bit ASCII to the end of the incoming SMS, which which was a pain But then And now now the iPhones send a text says OH and the rootkit is appended IO
47:03
So it's clearly a Buckeye fan if anybody knows what that even means And Then finally just to show that we're man the meddling that incoming qmi stream Well, we'll check our balance on t-mobile which will which will send a
47:21
ussd request and We should see in the qmi dump the response or the the originating ussd message from the network and it tells us you know current plan active until whenever and So So yeah, I mean that that's pretty much that demo basically just just showing that we're able to man in the middle that
47:45
incoming stream of data in order to potentially have interesting channels of Doing malicious things So yeah, so we sent one more goodbye text and again the rootkit appended the same PMU rootkits are fun message
48:03
So some analysis and limitations So trapping supervisor call instructions Seems to add anywhere from like two to five percent overhead depending on
48:21
Kind of load on the on the device and I mentioned it should evade current kernel integrity monitor algorithm algorithms. I didn't validate this But I'd be surprised if these monitors cared that you know, all we've done is registered an interrupt service routine a
48:40
couple of limitations the PMU registers themselves are non-persistent. So this this rootkit would not be persistent every time a core comes out a reset the PMU registers will all be loaded with their respective reset values And then any other code running in the kernel or higher could tamper with the register so they could
49:04
Stop something from counting so that that could become problematic So, how would we detect this rootkit? You could try, you know, just reading proc interrupts however, the the data that's showing there is easy to modify and Potentially we could prevent the the PM use IRQ line from even showing up
49:26
You could you could try to just query the registers and see if somebody's counting supervisor call instructions It's worth noting that access to the PMU can be trapped to hypervisor mode. So if an attacker had control over the hypervisor
49:40
Exception vector table that can be interesting and as we'll see in a minute not all usage of this is malicious. I Guess more generically just Identifying an increase in interrupts is probably your best bet so there's trace points Linux kernel that that you can't really avoid or we couldn't avoid and
50:01
I suppose you could try to iterate Linux has like a radix tree That has all the interrupt handlers in it based on the IRQ number So I suppose you could try to validate addresses in there that you know that the handler is in maybe the static kernel image or something But if you're familiar with shadow Walker, you know
50:22
You might be able to do interesting things by trapping data or prefetch reports so that you know any time Somebody tried to read memory where our code lives. We could potentially trap that and then you know serve up bogus data like shadow Walker did So our final our final case study here is on defense
50:44
And just to kind of show you that it's probably more more powerful on The defensive side so same approach we're going to trap Supervisor call instructions, but we're gonna we're gonna do syscall monitoring. So if you're familiar with Microsoft's emit or
51:01
Antirapt tools like it. They all are pretty similar in that and windows. They all inject DLLs into the processes they want to protect which is kind of interesting because They're preventing trying to prevent code reuse attacks, but in doing so they're increasing the surface of code reuse attacks
51:20
And so by doing monitoring from the kernel we can avoid that And this this is much easier to implement than the rootkit because we don't have to worry about redirecting code execution We just need to apply some sort of integrity policy to make sure things are legit on the syscall and Further that could allow us to protect COTS binaries so we don't have to lean on vendors to compile in
51:46
Protections and we don't have to patch the kernel. We just need to register We need just need a kernel module loaded or some other way to register for an interrupt service routine So I decided to select stage fright, which has been obviously very popular
52:03
Since since last black hat this is data provided from the Nexus security bulletins and as we can see there's been a ton of remote code execution and privilege escalation bugs Either in lib stage fright or labeled as media server, but I'm just collectively calling them media
52:22
So so I decided to try to create a little proof of concept Basically just do syscall monitoring and detect some of the same anti-rop checks that the emat and rop guard do and I used proof of concepts from mark brand of Google who had a great blog on this particular CVE and
52:43
Then that's the same CV that North bit has a metaphor proof of concept. So I use both those kind of in combination So for our last demo I'm going to show Exploiting the device and then we'll load our mitigations and try it again. So we're gonna start a netcat listener
53:05
and We'll browse to a Site that plays this particular media file and we should see we get a connection back It's now we have a shell in the context of media server And we can do a get prop to show you it's a legit device this a hammerhead device running Android 5 1
53:27
It's a Nexus 5 So we'll exit that shell Close out our browser And now we're gonna load our mitigations, which is again just gonna trap system calls in order to apply those anti-rop checks
53:43
And we'll see what happens so we load our kernel module We can see some verbose output I put in here where I'm configuring the counters To count system calls or supervisor instructions. So now we're gonna browse the same site
54:01
And we should see some ASCII art up here which is saying exploit blocked So So if we look if we look at the bottom there we can see that it detected a stack pivot and This binder thread which is part of media server. It was on an M protect call
54:22
And it notes that you know, the current stack pointer is outside the the range that it was expecting So it terminated that thread So that's just kind of a you know fun Extension I guess of Demonstrating what you know some defensive use cases that that you could do on an arm based cores
54:43
With with pretty light overhead, you know under under five percent So potential future work I got sidetracked from doing this originally But you know that that then first time I showed you of tracing I'd like to be able to do that on some bass bands And then you know also if you're able to do something similar on iOS kernel
55:04
That would probably become be more valuable than than Android So I know I'm running low on time but this slides important to me So I need to recognize some folks at endgame. So Cody Pierce Has been extremely supportive and encouraging me to publish
55:22
Eric Miller is a mobile researcher at endgame that helped me Get some devices and provide feedback on some ideas. Jamie Butler helped educate me on some some rootkit techniques and several others at endgame that wish to remain anonymous and
55:40
then I'd like to acknowledge the researchers that have kind of pioneered the PMU assisted security research because I think it's a pretty pretty interesting Subject So with that, I don't know if there's time for questions. If not, I'll be around
56:00
All weekend, so feel free to ask anything