Emulation driven Reverse Engineering for Finding Vulns
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 85 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/62182 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
FamilyReverse engineeringEmulatorCore dumpFigurate numberComputer architectureMotion captureSymbol tableNeuroinformatikFluid staticsTrailCustomer relationship managementSemiconductor memorySystem callInterrupt <Informatik>BitMathematical analysisFlagElectric power transmissionComputer animation
02:21
WritingEmulatorMathematical analysisModule (mathematics)Different (Kate Ryan album)Computer architectureBitCodeComputer animation
03:12
Context awarenessVideo trackingSpeicheradresseRevision controlContext awarenessEmulatorGame theoryCodeTrailPoint (geometry)CausalityCASE <Informatik>Message passingPartial derivativeNumberParameter (computer programming)BitReading (process)Communications protocolComputer programmingMathematical analysisGoodness of fitInformationDecision theoryComputer animationMeeting/Interview
05:33
EmulatorMathematical analysisModule (mathematics)Functional (mathematics)Partial derivative1 (number)Extension (kinesiology)Reverse engineeringAxiom of choice
06:29
MathematicsFunction (mathematics)Context awarenessFunctional (mathematics)TrailState of matterRight angleAreaGreatest elementTouchscreenLine (geometry)Power (physics)Revision controlFunction (mathematics)Computer configurationEmulatorArmAddress space
08:53
Gamma functionComputer iconMathematical analysisData storage deviceParameter (computer programming)Functional (mathematics)MereologyModule (mathematics)Variable (mathematics)MetadataSystem callExclusive orBitHash functionCryptography
09:53
GEDCOMWebsiteDecimalManufacturing execution systemColor managementEmailCodeEmulatorRight angleNumberBasis <Mathematik>Parameter (computer programming)Functional (mathematics)System identificationIdentifiabilityPhysical systemSystem callVariable (mathematics)Address spaceMathematical analysisResultantMessage passingSet (mathematics)Module (mathematics)Computer architectureVirtualizationMaxima and minimaBuilding
11:31
Computer wormCausalitySpacetimeCountingStatisticsCompilation albumStack (abstract data type)TrailNumberBitAddress spaceMaxima and minimaParameter (computer programming)EmulatorFunctional (mathematics)Building
12:52
Term (mathematics)Keilförmige AnordnungCausalityCountingStandard deviationSystem callDerivation (linguistics)Functional (mathematics)Parameter (computer programming)CodeFamilyUniverse (mathematics)EmulatorBranch (computer science)State of matterPointer (computer programming)Point (geometry)MereologyDifferent (Kate Ryan album)Goodness of fitHalting problemDecision theoryPartial derivativeMathematical analysisOpcode
15:11
GEDCOMDemonGamma functionInformationCodeContext awarenessOpcodeWorld Wide Web ConsortiumPlot (narrative)Message passingEmulatorMaxima and minimaProper mapEmulatorCore dumpCodeNumberMathematical analysisModule (mathematics)Pointer (computer programming)Error messagePartial derivativeLevel (video gaming)Address spaceBranch (computer science)Point (geometry)Uniform resource locatorFunctional (mathematics)Context awarenessLokaler KörperDifferent (Kate Ryan album)Multiplication signMereologyCausalitySystem callMultilaterationLocal ringComputer animation
18:21
Pulse (signal processing)FingerprintLatent heatGEDCOMMeta elementComa BerenicesFlagLaptopEmulatorEvent horizonCurvatureMaizeException handlingQueue (abstract data type)Normed vector spaceVacuumFLOPSElectronic meeting systemMaxima and minimaMessage passingMUDEmulatorCodeType theoryException handlingWindowFerry CorstenPointer (computer programming)Functional (mathematics)Electronic mailing listUniform resource locatorDatabaseFehlererkennungOpcodeMachine codeDistribution (mathematics)FirmwareHeuristic1 (number)ParsingAreaMaxima and minimaKernel (computing)Multiplication signGoodness of fitArithmetic meanSource codeComputer animation
21:48
Dew pointCodeComputer-generated imageryRoundness (object)Functional (mathematics)AuthenticationHacker (term)Maxima and minimaAddress spaceHexagonEmulatorComputer animation
22:33
Gamma functionExtension (kinesiology)EmulatorHough transformFunction (mathematics)EmulatorMathematical analysisModule (mathematics)ResultantDistribution (mathematics)InformationFunctional (mathematics)Pointer (computer programming)Extension (kinesiology)MereologyDemo (music)Branch (computer science)Vector potentialComputer animation
24:21
Interior (topology)IEC-BusGraphic designConvex hullGroup actionAuthenticationFunctional (mathematics)Hacker (term)Right angleOperator (mathematics)FrequencyDynamical systemBootingSymbol tableLibrary (computing)Resource allocationBinary codeStandard deviationError messageSource codeComputer animation
26:13
Software development kitGamma functionMaizeStaff (military)DampingFunctional (mathematics)Default (computer science)String (computer science)Revision controlSource codeComputer animation
27:12
World Wide Web ConsortiumComputer architectureAreaHexagonVideo gameBuildingView (database)Functional (mathematics)Address spaceParameter (computer programming)Set (mathematics)NumberExpected valueError messageString (computer science)EmulatorCodeSystem callStructural loadVector potentialPoint (geometry)Default (computer science)Data storage devicePhysical constantReading (process)Reduced instruction set computingComputer animation
29:58
Coma BerenicesEmulatorFunction (mathematics)Maxima and minimaKey (cryptography)Process (computing)EmulatorReverse engineeringGraph of a functionMemory managementFirmwareSystem callComputer fileServer (computing)Graph (mathematics)Point (geometry)Semiconductor memoryNumberBitFitness functionView (database)Resource allocationInteractive televisionAdditionCodeFunctional (mathematics)outputHoaxRadical (chemistry)MereologyRevision controlLine (geometry)Integrated development environmentEnterprise architectureSoftwareOrder (biology)Gastropod shellFile systemHookingKernel (computing)WindowWindows RegistryForm (programming)Process (computing)Directory serviceSpacetimeComputer programmingPhysical systemMultiplication signVulnerability (computing)Binary codeUniform resource locatorReal numberImplementationAttribute grammarGoodness of fitCausalityMappingTrailWordSheaf (mathematics)Traverse (surveying)Electronic mailing listHexagonInterrupt <Informatik>Variable (mathematics)Computer animation
37:30
Maxima and minimaFunctional (mathematics)Hecke operatorFreewareSystem callComputer animation
38:09
Finite element methodMIDIStack (abstract data type)TouchscreenContext awarenessMultiplication signEmulatorMenu (computing)Process (computing)DisassemblerWindowFunctional (mathematics)Open setComputer animation
38:50
EncryptionTerm (mathematics)Stack (abstract data type)DemonPressureFunctional (mathematics)Integrated development environmentSystem callParameter (computer programming)Sheaf (mathematics)Keyboard shortcutRatsche <Physik>Source codeComputer animation
39:44
GeometryDegree (graph theory)Lie groupSynchronizationUser interfaceTranslation (relic)CausalitySource codeComputer animation
40:44
FlagKeilförmige AnordnungOperator (mathematics)Moment (mathematics)SpacetimeMultiplication signStack (abstract data type)Source codeComputer animation
41:45
Gamma functionDemo (music)Set (mathematics)Surjective functionElectronic data interchangeSynchronizationGame theoryBitSheaf (mathematics)Graph (mathematics)Graphical user interfaceGraph of a functionEmulatorSoftwareMereologyVideoconferencingSource codeComputer animation
42:31
Gamma functionOvalFlagEmulatorGraph of a functionCodePattern languageRight angleDisassemblerSource codeComputer animation
43:28
Stack (abstract data type)Division (mathematics)Twin primeWorld Wide Web ConsortiumEmailCommodore VIC-20Finitary relationHoaxGEDCOMUsabilityCryptographyEncryptionKernel (computing)MaizeHill differential equationAsynchronous Transfer ModeSimulationGraph (mathematics)LengthData conversionCountingAddress spaceBitUniform resource locatorMetadataParameter (computer programming)ParsingComputer fileExpressionSpacetimeString (computer science)EmulatorKey (cryptography)Functional (mathematics)Electronic mailing listCausalityCryptographyBuffer solutionData storage deviceEncryptionSystem callInheritance (object-oriented programming)Source codeComputer animation
46:31
UsabilityOvalAxiomEncryptionCryptographyBoom (sailing)Memory managementResource allocationAuthorizationBuffer solutionFunction (mathematics)String (computer science)Data storage deviceCore dumpComputer animationSource code
47:43
Read-only memoryCone penetration testWorld Wide Web ConsortiumVideo game consoleFile formatOctagonSystem callDemo (music)Ljapunov-ExponentFunctional (mathematics)Online helpLetterpress printingSemiconductor memorySingle-precision floating-point formatBranch (computer science)NumberSystem callSource codeComputer animation
48:46
OvalLocal ringRead-only memory10 (number)Video game consoleExecution unitMenu (computing)Demo (music)MaizeStack (abstract data type)Core dumpPoint (geometry)ACIDLibrary (computing)Gamma functionCodeFingerprintEmailMemory managementMemory managementCausalityResource allocationSource codeComputer animation
49:42
GEDCOMEmailAreaComputer virusRead-only memoryGamma functionBendingKeilförmige AnordnungVacuumData Encryption StandardBoom (sailing)String (computer science)Address spaceRight angleScripting languageSource codeComputer animation
50:46
Decision tree learningDegree (graph theory)GeometryEmulationThermal expansionVacuumMemory managementBitMultiplication signBoom (sailing)Branch (computer science)2 (number)AdditionFunctional (mathematics)Memory managementSource codeComputer animation
51:32
FingerprintLeakMIDIEquivalence relationAuthenticationProjective planeSheaf (mathematics)Image resolutionEmulatorUniform resource locatorData recoveryOnline helpSource codeComputer animation
52:20
View (database)HypermediaVideoconferencingJSON
Transcript: English(auto-generated)
00:00
This morning, we have a prerecorded talk for you. Atlas is gonna be talking to us about emulation-driven reverse engineering. Please sit back and enjoy. I unfortunately was unable to make it out there this year due to family health issues, but hopefully I'll be back there next year. Thankfully, Nikita was kind enough
00:23
to let me record something and send it out to you. Hopefully you'll enjoy it. Now, some of you have seen me speak in the past, and if you've seen me talk in the last few years, you've noticed that I've talked a lot about emulation. I've talked a lot about symbolic execution and emulation
00:43
and static analysis for reversing and bug hunting. So today's talk, unsurprisingly, is about emulation. Today is about emulation-driven RE. You'll figure out what that means in a few minutes. Basically, how do we let the computer
01:00
do more of the hard work so you can do more of the fun stuff and not rot your brain? So a little bit about who am I. Most of, many of you know me. I broke out of VMware with an amazing team, done a lot of power grid hacking, generated a tool called RFcat, done a lot of car hacking lately. I'm a vivisect core dev.
01:23
I played a lot of capture the flags. Some might say that I'm addicted. More importantly, I am a Jesus follower, a daddy, husband, and not a daddy, husband, a daddy and a husband, and I'm a principal researcher for the company named Grimm. Just to give you a quick refresher before we jump into the fun stuff on emulation,
01:41
what is emulation? How do you do emulation? Oh my gosh, you're going crazy. Emulation is actually very simple. You have to keep track of registers. You have to keep track of memory and you have to implement instructions that an architecture would that act on the memory and the registers.
02:02
You might implement peripherals if you're trying to go a little bit further. I know that QMU and a lot of the emulators that you see definitely do. For our purposes, it's not necessarily always important. Sometimes can help. Sometimes you will also implement system calls and interrupts and handle those.
02:22
So what does emulation do for us? Well, in many cases, we can write analysis modules using an emulator, a lightweight emulator that can answer questions that might be a lot harder to answer or at least abstract to cover a whole bunch of different circumstances
02:40
and architectures. You can also use emulator to get immediate context, and I'll show you about that in just a minute, within a function. And basically, the answer of the day, emulation does the thing you need it to. So it's like you're creating these little minions that you go off and you tell them to do this thing, and they return back with this big milk crate full of junk
03:04
that may have really awesome stuff, and it may have a little bit of cruft. You gotta weed through it, or you write your code even more intelligent to weed through it even better. So a little bit more beyond the idea of emulation, because you guys have run into emulation before, whether it's Game Boy emulation or QMU.
03:23
Partial emulation is the idea of emulating code that you don't have the full context for. So most emulators that you would run into, probably all the emulators you knowingly ran into, start at a given point, like the start of a binary program or the start of a firmware, and you start off and it's got the initialization process,
03:43
and it goes through everything, setting up everything. And gee, that sounds a lot like execution, and you can debug execution. Partial emulation allows you to get around some of the issues of getting full context by providing things like safe reads and writes.
04:02
If the emulator runs into a read of memory location that doesn't exist or a write, it'll just say, let's pretend that that worked. And on a read, it'll just give you the correct amount of bytes back of some known value so that you can say, oh, hey, that looks like a lowercase a or a bunch of them. In our case, that's what we'll run into for safe reads.
04:23
We can log reads and writes through an emulation pass. So if you run through a function, emulate through it, you can have all the things that are read to and written to tracked for you, as well as a path that you carved through the,
04:40
that your emulation pass carved through the function or code. You can also snap in analysis monitors or emulation monitors. Emulation analysis monitor is just a special version of one with a lot of wrapped in goodies. And emulation monitor is simply something that gets to watch as emulation happens and maybe make decisions or at least track information.
05:03
And we'll talk about that in a few minutes. And then taint tracking. We have the ability, we've got a great taint engine that we can say, hey, just give me a number. I'm gonna shove that number in here and here that number really means uninitialized register RDX, or this argument zero, for example.
05:25
So this kind of feels like baby monitor protocol and in a lot of really amazing ways, it is. In a good way, don't be offended. So a few examples of emulation at work. We're gonna talk about four specific ones today. Immediate context, we've already gone over it before.
05:42
If you've seen one of my previous talks, but it's still so amazing and so easy to use that I gotta call it out. We're gonna talk about the built-in vivisect i386 calling convention analysis module because it uses analysis monitors, it uses a lot of the emulation stuff that we're talking about. And then we'll go on to a little gizmo that I created
06:02
and continue to implement on my own called func recon or function recon. I implemented as a vivisect extension. Vivisect is my tool of choice and it provides a ton of these emulation, partial emulation toys for me. And then at the end, the whole reason that you're here
06:21
and not at a previous talk is Ninja EMU and how we can use emulators to drive our reverse engineering So starting off, immediate context. So built into vivisect when you're displaying a function, you can right click on anywhere in the function
06:42
and go to the function sub-menu and there's an option called show emulator state. Basically what's gonna happen is when you click on this, vivisect is going to spin up a workspace emulator which has all the bells and whistles with safe reads and taint tracking and path tracking and all that.
07:03
And it will say, I'm going to find a path and emulate to this location. And what I do, I'm going to spit out the context of that path. So at the bottom of the screen is the output from show emulator state, running emulator two and it's that address 32 DOE. And you can see that right in the function title
07:22
at the head or on the top. Showing register magic state at that location, stack Delta zero. We haven't actually changed the stack at all. Okay, that's good to know. And here's the instruction, 32 DOE is the LDRH R2
07:41
from a dereference of the R2 register. This is a power PC, or I'm sorry, this is an ARM function. Now it says, I know these operands, operand R2. I know that that is at this emulation point, zero X, zero eight, zero one, zero E, zero C.
08:04
And here's the decimal version of it. And the other operand, it's just a dereference of the register R2, shows the dereferences of R2, same address, it's the same register. And it spits back capital A's, 41414141.
08:23
Now the original safe reads used to read all capital A's. We shifted to lowercase A's for various reasons. And actually it is settable now. So this can be wildly beneficial while trying to reverse
08:42
through a complicated function, trying to get an idea of where you've taken an argument and you've added to it and you're subtracted to it and yada, yada, yada down the line. Next example, we're going to talk about the calling.py, the analysis module used for Intel,
09:03
I3D6, in other words, 32 bit functions to identify special things like the arguments to a function call, the local variables that are used in a function call, how deep the stack goes and including something cool called mnemonic distribution,
09:21
where it goes through all the instructions in a function and it just tracks how many usage of each are in the function. So if you've got 32 moves, it'll say MOV 32 and any pushes, pops, other reads, writes, XOR. XOR is a pretty interesting one.
09:41
And it just stores that as part of the metadata for the function and which can be very useful and particularly when finding interesting things like hash functions, crypto functions, things like that. One example of using emulation in vivisect on a daily basis is calling convention identification.
10:01
So for example, what I'm talking about is this function right here, sub 020A030 in this code has a number of arguments to it. It is identified as a system five AMD 64 call and has these stack variables identified and we can go through and we can name them
10:21
as we do analysis. This is all done on Intel 386, for example, we're just going to use this for example. Each of the architectures have their own version. We have an analysis module which starts out as analysis modules do using analyze function. The calling.py has analyze function.
10:43
We hand in a vivisect workspace and a function virtual address. It immediately spins up an emulator and this analysis monitor and snaps in the emulation monitor right here and then runs the function. Here's the, it says run function. Here's the virtual address, max hit one.
11:02
So it do all the paths. If you're running into something you've done before, quit. Down here, it then calls build function API based on the results of that emulation pass. So it hands in the workspace, the function address, the emulator itself and the emulator, the emulation monitor.
11:23
And what we get back from that allows the setting of all the things that vivisect makes use of. So let's take a look at that build function API. Specifically, one of the things that it does is it grabs the number of arguments
11:43
and it starts off determining that using the emulation monitors tracking of the stack max. What is the maximum stack address that was accessed? Because for Intel 386, you put your arguments on the stack. So from the base, it then you access up from the stack base.
12:05
So if we end up with more than 40 arguments, we think maybe that's a little weird. So we default to CDECL, the calling convention in I386. So dozens of calling conventions and the compilers all went nuts thought their stuff was the best.
12:22
But then if we return using the ret number for Intel, that means return but also clear off this much space off the stack because that was arguments. Basically, it's the colleague cleanup. We have a different way to identify the arg count
12:43
because that ret bytes divide that by four, 32 bits each, actually is the number of arguments, clear and simple. We then go on to say, hey, any uninitialized registers that are used
13:00
in the function without an initialization, let's figure that out and identify calling convention from there. Because CDECL and standard call are very common, but there are a whole bunch of derivations off them that add in handing in arguments in EAX and ECX and yada yada. So we have a dictionary that we look up
13:21
based on the undefined registers. And voila, we got our calling convention and our argument count. All right, next example we're gonna talk about is probably code. This is another internal function of a vivisect that heuristically determines what a pointer points to,
13:43
particularly looking for executable code. It's part of a family of is-probablies, and we'll show that in just a second. It starts out by spinning up a workspace emulator with partial EMU, bells and whistles. It attaches an analysis monitor
14:01
and then emulates every branch. Now, one of the cool things about our partial emulation, if I didn't mention it earlier, is the ability to say, hey, we're going to go to a conditional branch. Yeah, let's do both. So I'm going to take one path this time. We'll just store off the state over here. And then when I'm done doing this path,
14:21
I'll come back and I'll emulate through that path as well, given the same state that we hit right there. Wildly powerful. And so we hand in what's the argument max hit. So sometimes as we're going through a function, we'll get to a loop or we'll get to something
14:41
where we've got a couple of different paths that end up in the same place. And as soon as we run into the same opcode twice in the same location, then we just say, ah, we're good. We have enough, which allows us to get through all code without triggering the halting problem
15:02
and using up all the time, all the RAMs, all the resources in the universe. And at the end, make a decision. Is this code? Is it not? So this example of using partial emulation is built in an analysis module for vivisect proper.
15:21
I want to take you through the core code here. Analyze pointer is called a number of places, wherever viv finds a pointer. And it looks first to see if there's already a location specified because at that point you don't care. Is it probably Unicode? It checks to see if it's Unicode or then it checks to see if it's a string.
15:42
And then it calls this is probably code. So is probably code makes use of partial emulation, a vivisect emulator or a workspace emulator. We handed in a virtual address, the location that we're looking at to see if it's code. And it does a few simple checks like,
16:02
hey, is this executable? If not, probably not code. Or is there a function signature? If so, very likely it is. And other things is, have we already run this? We then set down log level so we can emulate through nasty stuff that's not code and not throw error messages everywhere.
16:21
We then call get emulator. And since this is the workspace itself, it's calling it on itself. Normally I will name a workspace emulator vw. So you'll often see in my code vw.getemulator. And it hands in anything that we happen to hand in to and is probably code.
16:41
It then creates this watcher object, which is an analysis module that we'll look at in just a second. We call set emulation monitor onto the emulator that we just created. And we hand in the watcher and then we try to run the function.
17:00
We hand in specifically the virtual address that we want to run and the max hit. So if you remember, a lot of partial emulation isn't straight through code. It actually says, oh, there's a branch here, save that for later, we'll run it again later. And we're just gonna continue through one branch.
17:20
And so this is a way of saying emulate everything. But if you run into code, you've already run into in some other branch, just stop. Call that an end. And that allows us to actually emulate through every part of code without eating up a ton of time and give relatively good context to each instruction.
17:45
If we throw an exception, then no, it's not code. So we just store that it's not code. And we then do a check at the end. Hey watcher, does it look good? Is that code cool? If it is, we store it, if not, we move on. So let's take a look at this watcher,
18:01
which is an emulation monitor. And it stores a number of different local fields and has this ability to log anomalies. If something goes weird, we can log anomalies. And then it's looks good and is code, but specifically this emulation monitor has a pre-hook.
18:27
So before each instruction is emulated, this pre-hook code is run. And so we hand it an emulator, the emulator we're using, we hand in the op code that we've parsed out
18:41
and the starting instruction pointer. If our op code is in our list of bad op codes. So we've generated a list of bad op codes that are very, very common when you're looking through firmware or code and including like, how do you parse out zeros?
19:00
How do you parse out all ones like FFFFFF. And so we'll hand in, we'll parse that out and we'll get whatever instruction that is. And then we have a list of those we compare against. And if it's a bad op, meaning we're emulating into something that's very likely just all zeros or all ones.
19:20
And so that's, we throw an exception there. We then look through and say, hey, have we run into a return in any of the code paths that we've emulated through? If we have, probably pretty good. So done with that. We then grab the location of the current instruction
19:42
as stored in the Vibosec database. If the type that we run into is not an op code, okay, we're done. This is an API built into Vibosec where we'll go through and analyze, does this not actually return?
20:01
So there are many functions, particularly in Libc or Windows kernel 32 that legitimately they don't return. Many of them are error codes or exits or things like that where you go here and you're just done. So we work hard to identify. And so we store that information, make it available to the API.
20:22
Is this a non-return VA? If it is, then we say, okay, this is known not to return. So we're gonna say that it has a return because we want it to look good. And that's very likely code again. And then we tell the emulator to stop. So at the end, we call,
20:43
after we've emulated through the function with a max hit of one, as long as it didn't throw an exception. And many times when you have code, if you're trying to emulate things that aren't code, you'll throw an exception. Most often it's just not a valid instruction. And so this will catch most of the things.
21:01
And if not, we ask the watcher, does this look good? It looks good is pretty simple. If it doesn't have a return set or there is bad code, then return false. It doesn't look good. Otherwise, if you have just repetitive
21:21
of the same instruction, you can end up with what looks like good code, but it just really isn't. And so we look through the mnemonic distribution and we do some heuristics here and say, you know, if we're over a certain percentage of one instruction, then that's probably not code too.
21:41
We're gonna just basically reproduce the functionality listed here and it's probably code. And we're going to use an area that came from the Hackesad qualifier rounds this year, something from the Rassanante authenticator. So I've zeroed in, I've got an interesting function here
22:00
and I want to know more about it. All right, so I ran the function, max hit one and it completed. Okay, so watcher, these are all the details of watcher. Watcher, it's got the workspace, try via and hacks. That's actually the address that we handed in. Does it have a return?
22:21
Yes, it has a return, yay. It has 27 different instructions. Is there bad code? No, so that looks pretty good. So I wanted to show you that because you create a emulation monitor, you store what's in it, you tell it what to do with it
22:41
and it drags along behind. So you can write an analysis module that easily straps in an emulation monitor and you let it go and then you get back the results. You can look at, so here's our mnemonic distribution. Probably one of the coolest things about is the watcher emulation monitor.
23:03
Okay, the next one we're going to look at is the thunk recon vivisect extension that I wrote. It's my way of easily deciding or determining what kinds of things are going on in a given function branch. So oftentimes you'll start off maybe a main function
23:22
or you'll have a main loop, something deeper down in, but you've got huge amounts of potential functionality and you can get down a rabbit hole really fast and lose complete sight of where you're going and where you've been and what's next. So I wrote function recon so that I can say,
23:41
oh, going through here, I just want to stay right here. I'll just do a recon of that function, recon of that function, recon of that function and try to get a really fast feeling for what's going on there. So this is part of my own collection at the Icarus collection that I started years ago
24:03
and given a starting point, a function, we emulate through all the paths and all the function calls and we grab information like strings, imports, functions, immediate pointers and indirect branches, which actually isn't going to be a part of today's demo because that's not ready for you yet.
24:22
So now let's show you function recon in action. We're gonna go back to our Resenante authenticator from Hackasat and I've picked up a function here that's kind of annoying, but a lot of stuff going on here.
24:41
I'm just gonna say, let's run function recon right here. See that, well, let's magnify this. Okay, so I'm looking through and I see HMAC,
25:01
I see delete auth, create auth, liborbital tautup. I'm seeing decrypting protected data, I like that. Authdata.bin, that's something that they provided along with the binary. I see unset, I see RB, so we're probably gonna be reading something with binary.
25:20
I see decrypting, using key, decrypted, blah, blah, blah, file, and I see error messages. And then I see rollover period and latitude and longitude. We're talking about Hackasat, so that's to be expected. HMAC key, secret key, allow me to make a lot of sense
25:41
out of this function without actually having to delve into it. I also see that it's importing a number of things, including dlopen and dlsim. That's a dynamic library loader and symbol resolver. We also see allocator and a bunch of C++ standard lib
26:03
function calls, which is probably why I chose this one. fopen, fseek, ftell, rewind, operator new. So those are the imports. And then it goes down and says, okay, do I recognize any functions that don't start
26:20
in the name sub underscore, which is the default unknown function name? And so we see actually the PLT entries for a lot of these. So I haven't updated the string version as recently as I've updated the non-string version where
26:41
we can see here are the different strings that are interesting, again, the same ones. Here are the imports that are referenced. And again, these are, I like this better than the text version right now,
27:02
because it just reads nicer. I can select and show things and I can actually drag that into, I can drag that into someplace and view what's going on there.
27:24
Down here, looking at the addresses, calling functions, you know, what other functions that we saw. And then we've added in, since the strings were updated, we've added in a search for immediate values. So if there are immediate values that we found
27:42
during emulation, including where we build a pointer, on Intel, this isn't as big a deal, but on many RISC architecture, actually almost every other architecture besides Intel, you'll find the code building addresses that are absolute addresses, because you've got a fixed instruction set
28:01
that simply can't store the entire address in the instruction. So it'll say, okay, load this half and then add this half to it or or it or whatever. So the emulator will go through that and each point say, hey, I got this number here, is it interesting to you? And so we store immediate values, zero is referenced 152 times,
28:22
one is 34, kind of like you'd expect. And then we've got different smaller numbers. Safe reads will give you, in this case, it'll give you lowercase a's, so hex 6161, 6161.
28:42
So if we see that, that we know that we've read from someplace that we didn't know what to do with, it wasn't an existing map, or it was trying to reference something built on an argument that wasn't legitimate. You'll also see these 4156 values,
29:01
these represent vivisect emulator taints. So at the beginning of an emulator pass, the emulator defaults to loading all the things that could hold arguments for a function call with a taint that indicates what it is. And all the registers are pretainted with a taint
29:23
that returns, oh, this is uninitialized ECX or RCX or whatever. So as you can see, there's a lot of potential values. So if you see like the number 1024, or you see the number 8080, or you see the number C00, blah, blah, blah,
29:41
some error message. These are numbers that just inherently make sense to you once you've been reversing in particular areas long enough, and it can make your life a lot easier, like hex 7DF, for example, if you're an automotive person. Hey, next, let's jump on and take a look at vivisection.
30:02
Vivisection is a huge helper, trying to give you all the access to be powerful in emulation and driving your reverse engineering using an emulator. So the problem is reversing is,
30:21
I couldn't think of a good word. It's hard, it's tedious, it's exhausting. Yeah, it's all these things, but I don't want to sound like a wimp because reversing is really fun. I get a huge rise out of it. It's what gets me up in the morning, that and making tools that allow reversing to be easier, but it can really rot your brain. You can get lost down big rabbit trails.
30:40
I just recently had to reverse engineer and do vulnerability research on a modified form of Apache as one of a CTF, part of a CTF, it was insane. Brain drain is very real. So how do we make tools that allow for great understanding, easy setups so that we can, it's a tool that we're not like,
31:02
I'm not sure if I really want to do that heavy, no, it's easy, let's make that easy. We want to make it so that it's repeatable, something that I don't have to get the program to a certain point with a certain inputs with the debugger and stand on one toe and hold my hand up here.
31:20
And then maybe it might be, no, I don't want to have to, again, easy setup. I want it to be repeatable easily. I want it to limit my brain drain. And maybe be a little fun. I find it very fun. So that's why I'm actually talking about it today. So Ninja EMU is a part of vivisection, and it basically a wraps a workspace emulator
31:43
that we can get from a vivisect workspace. It provides interactive debugging, kind of like a more intelligent version of GDB. I wouldn't go as far as Jeff or one of those full GDB environments, but something more along the lines of
32:01
just giving you what you need at each step along the way. So Ninja EMU, in addition to this UI, allows you to hook function calls and then replace it with Python code. So let's say you have a function call into some logging thing in the target binary
32:23
that you know the arguments, you can kind of just write it in so you're not actually stepping into custom focused code, or maybe the code is fraught with peril, like, you know, things that need a ton of setup. And you're like, yeah, I know what that is.
32:41
I don't need to know how to set it up. Let's just go. We can hook, oh, one of the biggest uses that I've found is I hook malloc, calloc, heapalloc, all the things that actually do a memory allocation. I hook and I emulate, and I've got basically a special heap implementation
33:03
that actually doesn't free anything, so you end up tracking everything. It's not really good for heap grooming, but it's amazing for tracking and seeing what's going on in a system. You can hook system calls, like interrupt calls and hex 80 or a 2e, not a problem. In the process of doing all this
33:21
in order to support all the function calls that I wanted to emulate, we've created three primary fake OS kernels. We've got the fake Win32 kernel. We've got the fake POSIX kernel for Linux, whatever. And we've got a raw kernel, which basically is just kind of fitting in the kernel space,
33:43
but it's for firmware that doesn't actually have a kernel. But it allows us a great deal of flexibility to expand and extend. We provide our own heap. We provide rudimentary file system ability, including the ability to create fake files just by name.
34:02
Hey, here's a path. Here are the bytes that are in that file and some attributes for the file. Or we also allow the mapping of a particular file directory, a fake directory space, and we redirect that into a real directory space
34:23
in your host operating system. So you want to make sure that be warned and be careful. Right now, we don't actually allow, that we don't actually protect against directory traversal. So if you're reversing something
34:40
that knows that you're reversing it in the section or in JMU, they may be able to do shenanigans. So just be warned. There's also developed file policies that try to help you keep from shooting yourself in the foot. We also have a fake registry because a lot of times when you're reversing Windows programs,
35:03
a fake registry is pretty important. And so you can just stick in your own stuff and satisfy what you need to get by. And we also do environment variables because those are very important as well. They're all hackery. This is not something that's intended to run a enterprise software solution on.
35:20
So please don't bite, please don't yell at me if it bites you in the ass. And then Ninja EMU, one of the more recent additions is the ability to drive a function graph to walk through emulation through a graph view. There are two ways that we do this. First of all, we've got in-process function graph driving,
35:43
which means basically from within vivisect, you jump out, you drop out to a shell from the same terminal that you started vivisect up in. It gives you an IPython or interactive prompt and you set up the Ninja EMU and you run it from there, telling it what function graph name that you wanna drive.
36:02
And as you emulate through, it will track it through that function graph. The other way is the more massively online multiplayer version, either using a viv server or a shared workspace where they're intended to have many people all sharing a vivisect workspace,
36:25
we're able to set up a follow the leader session from the EMU and the follow the leader session then just sends out where it's at at any given point in time. And any number of people can follow through, follow along with the emulator as it walks through.
36:42
It is a lot of fun. A little bit of setup goes a long way. So you can start off with basically not knowing anything and just spin up the emulator and walk through. You'll have taint tracking, you'll have safe reads and writes, but as you start zeroing in on important functions,
37:02
you can then have special setup like, oh, make this the ARG list. And I wanna hand in this string, which will generate a heap allocation for it and then put that value into the right location. I wanna hand in this number and this number, you hand in a list and Ninja EMU sets it all up for you
37:24
and allows you to just continue to make more and more sense. Now let's take a look at this 2005F7C function that I mentioned earlier. It did turn out to be very interesting, but it's a little bland.
37:41
But let's start off by running function recon on this boy. So we don't see a whole heck of a lot. Calls to calloc, memcpy, malloc and free. So pretty generic, but if you think about it, compared to the last thing that we looked at, very limited.
38:01
This isn't really about function recon. This is about making sense of what a function does. And you're gonna have to forgive me because I only have one screen that I can present to you. I'm not going to be able to do as good a job showing off just how awesome this is.
38:22
Typically I will have three monitors running at the same time so I can do things on one and have my disassembler window completely full screen on another and my emulator window open on another. This will be something that you select off of a context menu for a function. But for now, let's walk you through the old fashioned way.
38:43
So for this example, a lot of times when I'm using a Ninja emulator, I am starting to set up just function calls that set up the environment the way that I want. This is not necessary, but as I dive deeper and I do more,
39:04
I want to set up the call with arguments the way that I want them. So if we don't, we just jump in and we say, import the section.
39:22
Let's just say our name here is vivisection. Oh, okay. Okay.
39:45
Okay, so we start off, you look at the registers are printed here and you recognize these now as being tainted. Doped, if you will. This user interface prints out the address,
40:01
the bytes of the address, the instruction that we get there. And then as a comment, our first operand, because notice that this actually has an operand, which is the weirdest thing ever, is, and this is translating from the taint value,
40:21
because as you can see it, EDX is the actual, the full thing is RDX and it is a taint value. And so that taint value translates back into uninitialized register RDX. So we know that. Our next thing, push rbp,
40:40
which we also helpfully spit out that that's an uninitialized rbp register. And now we move rsp into rbp. Now rsp is initialized, but still this is saying our first operand is an uninitialized rbp. So let's create some space on the stack.
41:03
Now what you're not able to see, and I'm going to move over because you don't need to see everything going on with the registers all the time. Do notice that registers that change, they get highlighted. But I'm just going to drag this off over here like this for a moment,
41:22
so we can see what's going on on this side of the fence. Okay. Oh, nothing is going on on this side of the fence.
41:41
Why you might ask? Well, because I didn't tell it to. This is all stuff that's often set up and it will be set up for us as we, as we move forward with the release of the section. So let's just say,
42:04
yeah, we handed in an emulator, but we didn't actually hand in a GUI func graph name. This is one way to have the emulator drive a function graph around. There's another that is more network friendly where you're going to have a shared workspace and dozens of people can follow along with the emulator,
42:24
but I'm not going to demo that for you today. That is a part of the video section release coming up though. So let's start over again. Run step. Okay, we got the same thing as we had. Now, as we hit enter, we noticed that our function graph
42:41
highlighted the next instruction. So this allows us to be very emulator driven. Even if I don't want to pay attention to the emulator, I can say, you know, I'm more interested in the disassembler right now. So I'm going to say that's not really that important, but I just hit enter into the emulator UI
43:04
and it moves on and I'm able to now look through as the code goes through. But sometimes looking, I'm more looking at patterns in here and sometimes I will miss actual detailed things. So sometimes having the emulator forced me to just say, oh, okay, cool.
43:20
Going through and actively visually helping me focus and identify what's going on. I'm going to call into this get EMU because that's going to return me an emulator. Four, do the crypto thing decrypt. Now I did some work for that name. We basically just say, hey, parse this expression
43:43
and here's a string. I could have used the file name plus an offset or sub underscore blah, blah, blah, whatever the name is. And it takes that string, converts it into an integer. Then grab me an emulator and set that address as the virtual address that we start.
44:01
Let's log reads, let's log writes and turn off safe mem because if it goes wonky and tries to read from someplace that I don't want, then I actually want to know it. So I can come back here and fix this up. And then we hand that emulator, the workspace emulator into this Ninja emulator that's generated here.
44:26
We then read the off data.bin that I mentioned before and put that into the file. If we don't hand in file data, cause I can hand in a string that is the same thing as the file.
44:40
And then I call, this is actually pretty magical. Ninja emulator has a setup call function and then it takes a list of arguments. And if it's an integer, it'll push that to the right location, the stack or the register that's used for this calling convention. And file data is actually a string.
45:02
So it's going to call its internal malloc function, create some space that Ninja emulator manages and shove that string into that space just by handing it in here. We then hand in the length of the file data. So that's just a number. It's going to push that into the register
45:23
for the second argument. We then know that this is a secret key. I know it's super secret key. And so that is the Ninja emulator that takes this, says again, this is a string malloc something
45:41
that's that size plus some buffer space and store this here and then put that address into the appropriate location for the calling convention. And then here's a special way of handing in an argument. I've labeled it out count. And I'm just going to say, hey,
46:00
this is a 64 bit value of zeros. I could have said, yeah, no, I wanted this to malloc out a location because it's going to be handed in by reference. So this is a call malloc. It'll put in the emulator metadata, this name out count and give you the address.
46:22
So I can do post Mortem inspection when I'm done. And so that's initialized. So I'm just going to go in and do that there. So auth EMU is this thing that I was just showing you. The get EMU do crypto thing decrypt.
46:42
And I hand it in the workspace and it does the rest. And then I do run step. So if we emulate through, boom, boom, boom, boom. Okay, so RDI, we're going to store RDI. What is RDI?
47:00
RDI is our first heap allocation. And we can look at the heap allocations by typing in heap here. And our heap dump shows, this is what was handed in from the auth data. Here's our,
47:22
here's our passcode, our secret. And here's the output buffer that was allocated. So if I wanted to talk about more about this later, but let's just take a look at, this looks like something interesting probably as a string.
47:41
So let's just say, hey, show me that as a string. Sure enough, it's a string. There's a whole bunch of functionality that you can use from this horrible command line. I apologize in advance. It looks horrible. We'll keep working on it,
48:02
but we do have a help that gives you the commands, including quit and go silent. So you don't print out all this memory and register stuff. Then we can do a backtrace. We can, we use the go command to say, hey, go until this virtual address
48:21
or go this number of instructions forward. Then we also have the next instruction. So if we go to a call, for example, and we don't want to actually dive into the call, we can just do NI and skip it. It'll emulate it. It'll actually spit out all the data from the emulation, but you won't have to single step it.
48:41
B means emulate to the next branch instead of, again, single stepping. We can show our stack as we did before, but our stack, we can do, we can control how much of the stack we want to show. Apparently I ran past the end of the stack because there's not a hundred entries. I can create new heap items just because,
49:09
like if I'm emulating through and I run into a thing where I'm like, yeah, no, that's really supposed to be a heap alloc that I need to shove some data in. So I can say, yeah, malloc that, give me a hundred bytes.
49:25
And it says, okay, your new chunk is there. So now I can say, hey, make that, and that's a string, and make that equal to yo dog, hello, Defcon crew.
49:43
Boom. So now we say, what's on the heap? Okay, again. Let's take a look at that as a string. All right. So, I mean, at that point, if we're like, no, that really needs to be an rbx right now,
50:02
we can say rbx equals that address and then refresh. And now we see rbx is that address. And if that's the thing that keeps the thing going, you can go back and update your starter script,
50:20
but you can also keep going. Now, the beauty is if it doesn't work out and I figure out, oh, whoops, I screwed up. It's in the, if I have it in the starter script, I can just say, hey, go back to the starter script and then go to this address and it'll, boom, snap through all the stuff again.
50:44
All right, so moving right along. So much I want to show you. Sorry, I only have a little bit of time. Boom, boom, boom, boom, boom, boom, boom, boom, boom, boom. Oh, branch. Okay, cool, branch. I don't really care what's going on in this. Okay, I really do care, but for the sake of time,
51:01
we do nexty. It emulates through the function, returns. I got a return value. Huh, let's take a look at the heap for a second. So in addition to what I allocated there, there is what looks to be some text, some string data.
51:26
So let's see the return value that they returned. Let's just use rex string. And this is looking very close to what they gave us
51:43
in the hint for the resonante authenticator challenge. So that's vivisection, guys. Here is the location of the project. It should be released by the end of DEF CON. And if you have any questions, please feel free to holler.
52:01
Anyway, what are you waiting for? Solve problems with emulation. Reach out for help if you need it. Here's how to get a hold of me. Play around, the rest will come. And when you get the slides, there's a whole bunch of salvage yard stuff.
52:22
Thank you very much. Looking forward to seeing you guys next year. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you.