Fun with Symboliks
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 109 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/36331 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
DEF CON 2392 / 109
12
19
20
23
24
29
32
33
36
51
58
60
62
66
67
68
69
70
71
77
82
84
85
88
89
92
98
99
103
104
107
00:00
Mathematical analysisDebuggerEmulatorBinary fileSoftware frameworkData modelBitSoftwareComputer programmingWordFirmwareFocus (optics)Software frameworkBinary codeSoftware developerCore dumpEndliche ModelltheorieEmulatorDebuggerComputer configurationMotion capturePoint (geometry)Scaling (geometry)MathematicsGenetic programmingArithmetic meanCross-platformMathematical analysisFlow separationCodeMultiplication signAcoustic shadowVulnerability (computing)Computer hardwareFlagGraphical user interfaceDifferent (Kate Ryan album)Scripting languageStructural loadEntire functionFigurate numberAlpha (investment)Disk read-and-write headElectronic mailing listEvent horizonLevel (video gaming)Computer fileMereologyReverse engineeringDampingGoodness of fitGodRevision controlSlide ruleHacker (term)AreaInteractive televisionBinary filePlastikkarteSpacetimeClient (computing)Server (computing)
05:52
Vulnerability (computing)Addressing modeNetwork socketBuffer solutionSystem calloutputFunktionalanalysisDirected graphBuffer overflowSlide rulePort scannerReading (process)Vulnerability (computing)
06:47
Vulnerability (computing)DisassemblerEmulatorInformationGenetic programmingOpcodeTerm (mathematics)Graph (mathematics)TheoryBlock (periodic table)CodeThread (computing)Visual systemRight angleBuffer overflowString (computer science)Exploit (computer security)Sound effectGraph (mathematics)Graph theoryStandard deviationCommon Language InfrastructureRepresentation (politics)ArmDifferent (Kate Ryan album)ThumbnailAsynchronous Transfer ModeCodeEmulatorComputer fileFunktionalanalysisSemiconductor memoryPoint (geometry)MathematicsGraphical user interfaceRow (database)Mathematical analysisGenetic programmingComputer architectureBitCoprocessorView (database)Software frameworkCore dumpSpeicheradresseModule (mathematics)Graph (mathematics)Server (computing)Revision controlProcess (computing)AreaElectronic mailing listBinary codeInteractive televisionMereologyQuicksortThread (computing)Single-precision floating-point formatComputing platformSpacetimeState of matterMultiplication signVisualization (computer graphics)Vulnerability (computing)Computer programmingPhysical systemEvent horizon2 (number)Binary fileComputer cluster
12:38
Function (mathematics)Graph (mathematics)Prime idealTheoryCodeThread (computing)Visual systemPoint (geometry)MultilaterationCodeNetwork topologyDecision theoryGraph (mathematics)View (database)FunktionalanalysisBlock (periodic table)Vulnerability (computing)DataflowDirection (geometry)Social classDifferent (Kate Ryan album)Level (video gaming)Arithmetic mean
13:25
ArmTheoryGraph (mathematics)CodeGraph (mathematics)Pairwise comparisonDataflowCodeLevel (video gaming)Branch (computer science)System callMereologyFlagCondition number
15:03
Principle of maximum entropyRegulärer Ausdruck <Textverarbeitung>Video trackingConstraint (mathematics)CAN busDisassemblerRead-only memoryVariable (mathematics)DataflowMultiplication signBranch (computer science)Letterpress printingSound effectSpeicheradresseCodeState of matterGenetic programmingDifferent (Kate Ryan album)Interactive televisionConstraint (mathematics)Gastropod shellSystem callWrapper (data mining)Water vapor1 (number)Electronic mailing listGreatest elementMathematical analysisMereologyTerm (mathematics)Metropolitan area networkAddress spaceSet (mathematics)Physical systemBitGraph (mathematics)OpcodeDifferenz <Mathematik>EmulatorType theoryRevision controlVariable (mathematics)String (computer science)FunktionalanalysisLevel (video gaming)2 (number)Goodness of fitSemiconductor memoryParsingBinary codeDrag (physics)
21:06
DisassemblerVariable (mathematics)Read-only memoryOperator (mathematics)System callInheritance (object-oriented programming)Reduction of orderRegulärer Ausdruck <Textverarbeitung>Hash functionTrailGenetic programmingFunktionalanalysisRippingSound effectBitSocial classConnectivity (graph theory)MathematicsState of matterOperator (mathematics)System callReading (process)CASE <Informatik>Translation (relic)Image resolutionOpcodeGraph (mathematics)Product (business)AdditionDifferenz <Mathematik>Object (grammar)Constraint (mathematics)Electronic mailing listSemiconductor memoryComputer configurationVarianceRight angleDecision theoryPrimitive (album)Computer architectureCodeIndependence (probability theory)ArmArithmetic meanFormal languageMessage passingBinary codeMathematical analysisHash functionPoint (geometry)Moment (mathematics)EmulationWritingGroup actionNatural numberEmulatorVariable (mathematics)Directed graphNumber
27:09
Variable (mathematics)EmulatoroutputSet (mathematics)Function (mathematics)IntegerState of matterSummierbarkeitWrapper (data mining)Set (mathematics)Genetic programmingSubstitute goodEmulatorVariable (mathematics)Lecture/Conference
28:18
Mathematical analysisComputer configurationBranch (computer science)Constraint (mathematics)CASE <Informatik>Subject indexingCodeSubstitute goodRange (statistics)CompilerDifferent (Kate Ryan album)Mathematical analysisVariable (mathematics)Block (periodic table)Electronic mailing listFunktionalanalysisPointer (computer programming)Letterpress printingGroup actionState of matterSpacetime
30:31
Independence (probability theory)Library (computing)ArchitecturePattern languagePattern matchingObservational studyObservational studyFunktionalanalysisPoint (geometry)Vulnerability (computing)FingerprintCodeIndependence (probability theory)AreaBack-face cullingReverse engineeringGenetic programmingPairwise comparisonModule (mathematics)CASE <Informatik>State of matterMathematical analysisLatent heatRadical (chemistry)Right angleComputer architectureAbsolute value
33:14
State of matterVariable (mathematics)CodeLecture/Conference
34:23
ComputerRegulärer Ausdruck <Textverarbeitung>BackupPrice indexFunction (mathematics)Module (mathematics)Point (geometry)Mathematical analysisObservational studyEnterprise architectureKeilförmige AnordnungCodeVulnerability (computing)Computer configurationGenetic programmingCybersexHalting problemPointer (computer programming)Different (Kate Ryan album)Graph theoryElectronic mailing listVulnerability (computing)CodeCASE <Informatik>outputPoint (geometry)Computer architectureMessage passingLevel (video gaming)Address spaceSubject indexingBlock (periodic table)Constraint (mathematics)FunktionalanalysisBitPrinciple of maximum entropyBranch (computer science)Graph (mathematics)Ratsche <Physik>Graph (mathematics)Revision controlSystem callHydraulic jumpPivot elementDirected graphMathematical analysisComputational intelligenceSound effectMultiplication signCycle (graph theory)Set (mathematics)Complex analysisVirtualizationBuffer solutionPortable communications device
38:57
Enterprise architectureKeilförmige AnordnungCodeObservational studyVulnerability (computing)Computer configurationGenetic programmingBuffer solutionFunction (mathematics)Directed graphMathematical analysisConstraint (mathematics)CASE <Informatik>Graph (mathematics)Level (video gaming)Vulnerability (computing)String (computer science)Observational studyModule (mathematics)Port scanner
40:00
outputLimit (category theory)Reading (process)Observational studyBuffer overflowMereologyPort scannerSystem callLecture/Conference
40:37
InformationComputerVideo trackingObservational studyMathematical analysisGraph (mathematics)Binary codePoint (geometry)Message passingMathematical analysisContext awarenessGenetic programmingAsynchronous Transfer ModeGraph (mathematics)Computer fileComputer configurationCombinational logicSound effectSpacetimeStructural loadCommon Language Infrastructure
Transcript: English(auto-generated)
00:01
everybody. How are we doing? Good? Good. Awesome. How many people gone over to CTF today? Yeah? How many first timers do we have here? Okay. At some point you need to go over to the CTF area. When I came to my first DEF CON and I saw what CTF was and all the lights and everybody hacking
00:20
everybody else, I thought it was the coolest thing ever and it's still the coolest thing ever. So take some time to go over there. Now the DEF CON CTF is harder than just about any other contest in the entire world. And Atlas has won this four times. Is that right? Part of a bigger team but his teams have won it four different times. So one of the
00:43
best things I like about DEF CON is coming and finding people that are always just head and shoulders better at all kinds of stuff than you are. At home we're all the alpha nerds. Here we're all learning from everybody else. It's interesting to see people in our brain with some very technical stuff. Let's give them a big hand. How many of
01:10
you have participated in a capture the flag? How many of you are doing so right now and coming to see me? Thank you. That's very hot. How many of you are government? I
01:26
saw that. Well, welcome. Today we're going to talk about symbolic analysis specifically focused on a tool called symbolics. The creator of which was threatening to show up today and I don't see him yet. Dodge that bullet. How
01:45
many here have heard of symbolic execution? Very good. Most of us. How about symbolic analysis in a larger scale? Is symbolic execution the only thing you've heard of? Today
02:00
we're going to talk about some of the difficulties with doing binary analysis. We're going to talk about some of the holy crap. I love it. So today we're going to talk about
02:26
symbolic analysis and its use in determining very interesting things about binary. A little bit about me. Very fast. I'm a Jesus follower. Walk out if you like. I'm a husband of one wife, father of three. I have horses
02:43
and goats and chickens. And I ride a Honda shadow 2002. But that's not why you're here. I'm also this guy likes to tear things apart. Make them break in very interesting ways.
03:03
And I have learned some from several of you in the crowd that I've already seen. So this is not all about hey, I'm fucking rod star. This is about this is cool shit. Let's make good use of it. And whether you're on the side that wants to kill all the bugs or let them lead long
03:22
productive lives, hopefully this will help out. I love vulnerability research, reverse engineering, I play with hardware, radio, firmware, software and cars, medical devices, smart meters, the whole thing. I'm a bit ADD. If it
03:43
weren't for my wife ratcheting down, I'd probably be dead. Also a core developer for vivisect, a binary analysis framework created by Invisigoth. A little bit about vivisect to start off. It's a binary analysis framework like I just said. It is written in pure python and you can use
04:04
interactive python to poke at, figure out how code flows through your binary. It provides several different scripting options. It's written from scratch to be collaborative. There's a client server model, even a shared
04:22
work space model. It includes a multi-platform debugger. Emulators for multiple platforms. A gooey for those who want one. The focus is on programmatic analysis. We want to write programs that find zero day and exploit it. Yeah, I
04:45
think we're in the right time. Give you the colorized version because the last one was so hard. I'm going to throw some code at you early on so it's familiar to you.
05:01
You can come back to the slides later and start tearing apart a binary in your own interactive python session. To
05:21
analyze the crap out of it. And for today's specimen we'll be talking about stage 3 from the 2005 ctf from conchoto. It creates a .viv file which is you guys don't need to know this but it's basically a list of events that happened from the
05:40
creation. Which means it's not fully implemented yet but there are many ways to back out changes. Those of you who use IDA a lot, you know why that's important. Can you see a vulnerability in the slide? Yes. It's too small. Look
06:06
again. If you can't read it, that is a push 2047. A call to and then pushing an input buffer. Pushing our arc 0 for
06:22
this function which is the socket. And then reading. So we're reading in 2047 bytes. Into a buffer that basically goes on forever. So this is not a buffer overflow. However, down below you see load effective address blah, blah, blah. EBP minus 1048. That's basically a 1k buffer. And
06:45
then a call to scan F. Scan F reads in well up to the end of the string, right? Causing a stack smash and a fairly easy to implement overflow exploit. Here's how I like to view
07:05
vivisect. Most often I will have two versions of two ways of accessing vivisect. I will have the viv bin binary, I'm sorry, the viv bin GUI so I can scroll through function analysis, blah, blah, blah. Poked into a server which
07:22
actually houses all the changes and handles interaction. And I have a command line tool that allows me direct access to it as well. It doesn't matter. I make changes in the GUI, I make changes on the command line. It all updates and the GUI changes when I run my analysis stuff. So I get in using ipython. If you're using python and not ipython,
07:42
well, I'm sorry. Please consider, you know, seeing the light. Import vivisect.CLI as viv CLI. That's just standard python stuff. We create a vivisect work space. We then load from a file and it will look for, is this an elf
08:03
binary, is this a PE, is this an ihex or a binary, a blob. And it just puts the loadable modules into the work space and does nothing else. I then call analyze. Analyze does all
08:21
automatic analysis based on the architecture and the platform. So Linux on arm, for example, things like that. And then I call save work space and it saves out all the events that have happened during analysis including the loading. There we
08:41
go. So that's enough about viv for now. It's enough to get you started. Let's talk a little bit about symbolics. So one of the core foundations of vivisect is the NV disassembly
09:00
framework. NV is not just disassembly, though. In order to make an NV module, you're supposed to create an emulator for it as well because it's amazing what an emulator does when you're analyzing functions and the rest of code. Particularly an arm where, well, they have conditional
09:21
everything. And there's this whole hopping back and forth between arm and thumb mode. The idea of symbolics is the dragging of system state from a beginning of a code path
09:41
through to some arbitrary end state. So maybe that's from the entry of a function to a return. Many of you will know there are many ways through a function that has a beginning and an end. So we choose one. Graph theory helps us out here.
10:01
But for this point, just think of a list of assembly instructions that would get executed in a row. Those assembly instructions are translated into the symbolic effects that they would have on the underlying processor. For
10:21
example, push EVP, move EVP, ESP. Shout out if you know what that is. Yes, sir. It's function prologue. Exactly. So we translate these into symbolic effects, simple symbolic effects, and then later I translated this into
10:45
applied effects. We'll talk about the difference in a second. So it becomes set ESP to ESP minus 4. And then memory location that ESP points to now holds the EVP value. And then we
11:04
update EVP with the new ESP. I have a fly. And it's bugging me. So we have to talk a little bit about graph theory
11:21
though. This single thread of execution through a program that can find vulnerabilities in a specially crafted thing. But what we're really trying to do is exercise many code paths, as many as we can that are valuable. And to do so we rely on graph theory. Ever heard of graph theory? Yeah. I
11:43
hope so. Graph theory is amazing. It's not necessarily easy, but it can cover some very complex problems. There have been times where I found that Viv a while back didn't do a
12:00
very good job of creating a graph for a particular function and it caused me great headache. So that's where the first bullet came from. You've probably interacted with certain visual aspects of graph theory. For example, if you've
12:23
looked at graph view, it is kind of a visual representation of a graph. Of a code graph. It is a graph, obviously. So you can all hold your applause until later for my
12:42
visual wonderfulness. So you see at the top we have a code block that has a decision tree. Either goes left or goes right, they re-merge and end up exiting the function. Very simple view of a code graph. It is a directed graph, which
13:10
means that edges flow in one direction. Very important because if you could actually make your x86 code flow backwards, then we'd have a whole different class of
13:22
vulnerabilities. So to take this back to stage 3 just briefly, what you can't really read here is the code graph from the child request handler in stage 3. So I said that
13:42
it's not quite the same thing to have a code graph and have an IDA graph. And the reason is IDA and Viv, they don't follow every call and there are things that are conditional that don't necessarily show up as a
14:03
different node, which they should. If they are conditional, they are either executed or they are not and yet compare exchange for example on x86 just shows up in the code flow. So if we were to take this graph and take all
14:21
the calls and link them to other parts of the graph and have more code flow from there and then back into this graph, it would be more of a specific or of an exact code flow graph. If we were to take conditional instructions like
14:43
branch, for example JZ, if we jump, if the zero flag is set in reality, that's kind of its own thing. It deserves its own node because it may or may not actually do what you think it should do. We prettify it because that would be hard to
15:02
follow. So as we are analyzing a code path, we step through and we drag the initial state through
15:22
symbolically so that as we modify memory locations, as we modify register values, they are modified and represented and stored in terms of the initial state. So if EAX started off as zero, all of your state throughout each instruction would
15:46
be aware that EAX started out as zero and reference EAX as offsets and whatever as you add and subtract to it because it needs to maintain that initial state for us to do the analysis that we need. So when we are walking through code, we first
16:05
translate a binary opcode into a simple list of effects. As we hit conditional flow, we add constraints. So as the graph
16:21
branches based on a yes or a no, a constraint is added for a code path that goes left and its opposite is added for path that goes right. This allows us to determine a code path that we want and then figure out what the hell gets
16:40
us there. So I keep showing you things that are not really, what is that? It looks different every time. I don't quite understand. Well, busy in his wonderfulness created all of symbolics to supply a wrapper version and a string version
17:03
to represent what they mean. This really helps while developing because it allows us to see at a second's notice exactly what the state of the symbolic state is. So the top part here, we are looking at the wrapper version. It was
17:22
created such that you can copy and paste it into another interactive python shell or another code and it actually recreates the symbolic state because that's the name of all of the effects and the symbolic ‑‑ can I get some water? Anybody? Drag it over my tongue. Thank you. So if we
17:46
print symbolic state, you notice these are constrained paths at the top. Well, in the pretty version down below, this is also constrained paths. It says all that goodness that will recreate the python symbolic state, this is what they
18:04
really mean. So if a call to this function returns a nonzero, they didn't have any vodka in the speaker room. I was kind of depressed. This is very good. So I think you'll agree the
18:24
bottom one is a lot easier to read than the top one. Again, all leading back to interactively working with the system, creating code ‑‑ thank you ‑‑ that tears apart code very powerfully and very easily and easily debugged.
18:42
So a little bit more example. Set variable EAX to a constant one. Okay. Cool. Set variable ESP to the subtraction of const ‑‑ the top of the stack. I'm using tools that
19:01
actually turn the top of the stack into something very easy that we ‑‑ most of us kind of jive with. So basically we are subtracting from ESP for ‑‑ we're shoving then setting EBP to EBP. Then we add to ‑‑ oh, man, I'm not even
19:23
going to continue. Look at the bottom one. The bottom one is the exact same symbols. I'm just using print instead of repper. And it says, hey, I set EAX to one, ESP equals ESP minus four, EBP equals EBP, blah, blah, blah, blah, blah.
19:42
Much easier to read, I think you'll agree. So I have to call out, though, symbolic has two different ways of ‑‑ two different stages in symbolic analysis, both of which are actually very powerful and important to work together when you're doing ‑‑ when you're writing tools. I said
20:02
before we translate binary op codes into simple effects. We then apply those simple effects to a binary ‑‑ or a symbolic emulator and it extrudes the initial state through into every single effect that you've passed in. So you're left with
20:22
simple effects. So I know this is a push EBP so it subtracts from EAX ‑‑ from ESP4. It doesn't actually keep the state and then it pushes into the memory location of ESP, the thing. The applied effects are the ones that keep the state
20:42
all the way through. So to give you a couple of things to type in when you go home and want to play around with us, once you've set up your work space, you disassemble the op code. You say op equals VW dot parse op code and you give it
21:03
a virtual address. You then translate that op code into simple effects, having a translator and executing translate op code with the op code that you just created. And then you run the simple effects through the emulator, giving a list of applied effects. EMU dot apply effects
21:25
and then you get the effects from your translator and it spits out a list of applied effects. Basically, symbolics is architecture independent. The only got you there is the name
21:40
of things changes with the architecture. For example, R15 on arm would be PC or EIP or RIP on x86 or x64. So how is symbolics put together and why do I care? So used many
22:02
powerful things of the python language and sub class basically the arithmetic functions that every object has like addition, subtraction, X or whatever. And I'm jumping ahead of myself.
22:21
Please forgive me. I've been hacking on binaries for way too long. Sorry. I am in the middle of CTF. So python, I'm sorry, symbolics has the following primitives. We have a constant, which is just a constant value. We have a variable, which is a variable of whatever name. It could be a
22:44
register. It could be some known symbol in the work space. A memory object which represents memory. And we have a call and this allows us to keep track of where a call might
23:02
fit into a symbolic state but not necessarily go do all of the call before finishing our analysis pass. You may if you choose. An ARG meaning something we were handed into a given function. A C not effect which is basically saying, hey, do the
23:22
opposite of that. So if you have a VAR EAX, you register and you say not EAX, you end up with a C not of the variable EAX. And then an operator. So our operators are where hooking the python object methods come into play. So
23:43
basically we have an operator O underscore add which is used to represent the addition of two symbolic states. O underscore sub representing the subtraction of two symbolic states. Obviously the order is important here. These are
24:03
implemented using symbolic base which is the python class that all of symbolic components subclass. And simply overloading the underscore add function and I add because it doesn't matter if it's signed or unsigned in our case. We
24:23
have effects and these are the things that actually happen. These are action verbs. We have set variable, read memory, write memory, call function and constrained path. So the constrained path obviously is where you hit a decision case and you have to choose where to go from there. Your constraints
24:42
are little objects, well, little names anyway, called EQ, NE for not equals, greater than, less than, greater equal, as you know. When you run into an unk or a not unk, this is where we really can't reason very well about the
25:03
state of, about what the constraint is arithmetically because it's the product of an or or some other bit wise, I'm sorry, or X or some other bit wise effect. So let's talk a little bit about how to be powerful about this. I like to
25:26
use symbolics just interactively. I get code that I don't really know what it does. Throw it into a code path that's interesting to me. I symbolically emulate it and I get to see what the effect is and the symbolic state is at the
25:43
point of what I'm interested in. Well, that can be overwhelming. I'll tell you why in a moment. Our applied effects get run through the symbolic emulator, we just talked about that. We then have the option to run reduce on the symbolic effects.
26:02
This takes things like X or EAX, EAX and says, oh, that's zero. So EAX equals zero. And things of that nature. Addition, subtraction, they all kind of coalesce. If mathematically you can combine them easily, then they can be reduced. Why? I
26:22
mean, that kind of sounds nerdy, right? Just a simple number or something. Well, because this effect right here is enough to blow your mind and yet it reduces to something exceedingly simple. We're also given the capability of solving. Now, as many of you know, a symbolic state may be
26:46
discreet or it may not be discreet. If it's not discreet, how do we solve it? Well, if it is discreet very easily, we just run the numbers through and spit out the answer. But if it's not discreet, symbolics gives the ability at least to compare two symbolic states even though
27:02
they're not discreet. And that is using the hash of its basic repper. So, for example, VAR can't be discreet by definition. Spits out a long integer of an MD5 sum of its
27:27
thing. We can also update the symbolic state using an emulator that already has data. And as of, what, about a year
27:45
ago, busy wrapped in the ability to create substitutions. Now, actually, in my opinion, this should have been called solve as well because we put together a set of values that a
28:01
symbolic variable can have and then we ratchet that into the symbolic state solver and we get back a generator which gives us all the different things that those values would have provided. So here's an example of using substitution. I use
28:30
this in switch case analysis in my branch of vivisect. Basically, I put together a list of ranges given a
28:45
constraint. For example, when you're looking at a switch case, you generally start at some base zero index and you roll through, you know, three, five, 50 different options that follow that. I don't know how many of you have done the work to
29:00
analyze switch cases as spit out by a compiler. When they happen, there's often, they're broken up into like groups. Because if you have a switch case that has a zero case and a 32,000 case, you don't want to have a 31,999 if you
29:23
there's just those two. Or if there's like a pocket of five or 20 around each one. So generally they represent different code paths and they end up starting at a zero index with some relative base. So we come through and you can see the
29:41
debugging here. See how good my laser skills are. I'm not used to being this far behind the thing. So you can see my debugging here with the print of the variables of the given state. We create a range and we roll through every index that
30:03
we've identified that this switch case handles. And then by solving that, we're able to see what the outcome of the switch case is. So if it's a switch case zero, then there's some place in an array that has a pointer to a code
30:22
block that handles switch case zero and so on and so forth. So we ratchet through that so we can create cross references in the vivisect work space. I won't talk much about this right now, but I recommend that if you check this out,
30:42
look into arch in. In vivisect in symbolics, there's an arch in module that allows you to do a lot towards architecture independence. I happen to know firsthand that it's been used to solve completely the function comparison problem. So
31:04
what do we care about this? I know, I'm a nerd. Well, vulnerability research and reverse engineering, they
31:21
basically are solving problems or answering questions that are very difficult to answer. Reverse engineering is identifying behavior and vulnerability research is about finding very specific behavior. Vulnerable behavior. So we're
31:41
hunting fat, juicy behaviors? Absolutely. Got a couple case studies here. ROP gadgets. Who here is dug through looking for ROP gadgets? Yes, we all have. Well, it turns out that by
32:03
searching through a binary and executable area of a binary, you can trace symbolic state up to some known terminator point and ask very specific questions about what that little
32:22
code snippet does. So rather than starting at a ret and stepping back byte by byte by byte, making sure that it still decodes into a ret after some things and trying to figure out, oh, this is a really cool ROP gadget that does this thing
32:42
and it kind of writes over there, but it really updates these other things that I'm really interested in. You can do symbolics to do analysis on code snippets and spit out, hey, this moves EBX into EAX and it's three bytes long or
33:06
three instructions long. You can do actual culling of ROP gadgets using a symbolic state engine. For example, forgive
33:35
me. For example, we roll through a snippet of code and then we
33:45
dig into the variables that are discovered. So we say symbolic state, what things have been written to. We then look for, hey, is that thing this register and is it writing to a
34:07
register copy. And hey, if the value of the second register ended up in the first register, we know we have an exchange. These are things that are programmatic and solvable by
34:21
your own code. And just to give you something else to think about, I won't dig into these, but we already talked a little bit about switch case analysis. So basically what we
34:43
are doing is we are trying to tell the computer to do the things that we do in our own super magical portable. So in switch case analysis in my branch, we start off at every
35:02
dynamic branch. Then we say dynamic branch can be a call to a register or some dereference or jump to a register or some dereference or whatever your architecture version of those two are. So in the analysis pass, we already identified these
35:20
things. It's just a virtual address set. So we pull what vivasec has already given us. We then start at that point and we back up until we are able to identify which register is used as an index pointer. We then roll through looking
35:45
for anything that modifies that index pointer and it gives constraints that say this is like from 50 to 75. So our
36:03
switch case is 50 through 75 in this case. So let's now start at the beginning and ratchet through this one code path. We ratchet through it over and over with different indexes and it gives us the next code block that gets
36:21
executed for every different index and we are able to wire up the function, the code block edges and that helps us a lot. And that leads us back to stage 3. So as you have been
36:46
hearing about the cyber grand challenge, this whole idea of automating the discovery of vulnerabilities is a pretty big hit list. How do we do that? There are a couple of different
37:03
ways actually. Some people have taken to just basically symbolically fuzzing where you start at some place and you just keep going through different code paths until some
37:21
desired effect like, for example, I don't know, EIP equals something of our input. We can do that. And it can be a very impactful way. It can also be highly cumbersome to the computer. Yes, I know computers do awesome things
37:41
repetitively over and over. But there's this whole halting problem. With graph theory and with code path tracing, we end up running into code paths that may never end and we also can
38:04
meander and take up all the cycles for all the time in the world and still not find what we are after. So I prefer starting at where we are trying to go and back up and seeing if we are trying to go with a particular code path can
38:20
provide us with the behaviors that we are after. So we start with mem copy and all the references to mem copy and we trace backwards. And with a good graph, then we are able to say, well, this mem copy is called with two fixed buffers
38:41
and a fixed size. Crap. Not vulnerable. Find something that allows us to compare and look for a dynamic sized either source destination and move size. Now, it can be a little complex.
39:04
That's a fairly simple way and that's definitely one of the common analysis modules. But back at our stage 3 case study, the vulnerability is the fact that we have allowed creating of a string up to 2047 bytes and then run F scan F on that
39:28
string and put the output into a buffer that's too small. So we have to identify the size of our destination and our source and the constraints on them because our source is
39:42
actually unconstrained. It's huge. But we have to be able to copy them into a buffer that is too small and not have constraints applied that keep us from overflowing and overwriting. So here's that example. We call to read 2047
40:08
bytes, call to F scan F. Oh, I forgot the bacon part. We all love bacon, right? So since BFBFEBE4 is 1052 bytes from the
40:25
top of the stack which is red, then our overflow, we overflow by 995 bytes. We can do this programmatically. I'm not going to show you code for it. Go write your own. So as I said,
40:46
starting where you want to go and backing up is my preferred method. Starting at a known entry point and going forward is also very powerful. Turns out combination of the two is probably the best option. So I leave you a couple of things
41:01
for your playtime. Import vivisect.CLI as viv CLI, create a vivisect work space with vw equals viv CLI dot viv CLI. Vw dot load from file, some poor binary. I like to turn on
41:21
verbose mode because it spits out a lot of messages that I otherwise wouldn't have gotten. Import vivisect symbolic analysis and you read that correctly. Create a symbolic analysis context. Create a symbolic graph. And then get some
41:46
symbolic paths going. As you interactively create symbolic paths and you review the symbolic effects, I think that you will see just exactly how powerful you can be. And here
42:10
are a couple of places to go look. Thank you very much.