We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Lets break modern binary code obfuscation

00:00

Formal Metadata

Title
Lets break modern binary code obfuscation
Subtitle
A semantics based approach
Title of Series
Number of Parts
167
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Do you want to learn how modern binary code obfuscation and deobfuscation works? Did you ever encounter road-blocks where well-known deobfuscation techniques do not work? Do you want to see a novel deobfuscation method that learns the code's behavior without analyzing the code itself? Then come to our talk and we give you a step-by-step guide.
Keywords
7
Thumbnail
30:34
12
Thumbnail
55:26
43
61
Thumbnail
1:05:55
78
Thumbnail
1:01:42
83
92
Thumbnail
33:27
110
Thumbnail
31:25
141
Thumbnail
31:11
147
Thumbnail
31:30
Binary codeInformation securityTouchscreenAddress spaceState of matterBlock (periodic table)Level (video gaming)CodeIntegrated development environmentComputer programmingControl flowComputer animationLecture/Conference
System programmingInformation securityBinary codeCodeControl flowLogic synthesisPresentation of a groupSemantics (computer science)Associative propertyReverse engineeringDigital rights managementStudent's t-testReverse engineeringProjective planeMereologyCASE <Informatik>Lattice (order)Computer programmingHypothesisBinary codeRevision controlCategory of beingCodeInformation securityDegree (graph theory)DemosceneUniform boundedness principleComputer animationLecture/ConferenceXML
CASE <Informatik>Sound effectComputer wormTwitterOrder (biology)Workstation <Musikinstrument>Electronic signatureAlgorithmCone penetration testCategory of beingGame theoryComputer chessMathematical analysisDistribution (mathematics)Context awarenessDigital rights managementSoftwareRight angleMalwareDemosceneCodeDigitizingDisk read-and-write headLecture/ConferenceMeeting/Interview
Reverse engineeringDigital rights managementSoftwareParsingProcess (computing)Multiplication signSoftwareMereologyLatent heatCrash (computing)CodeGame theoryMathematical analysisComputer programmingWordOffice suiteSequenceProteinComputer animation
ParsingOffice suiteProcess (computing)Exception handlingDebuggerComputer-generated imageryVideoconferencingMessage passingGame theoryError messageRead-only memoryField (computer science)Crash (computing)Integrated development environmentEmailUsabilityProcess (computing)Semiconductor memoryReverse engineeringGame theoryTraffic reportingInformationWindowSummierbarkeitOperating systemDebuggerCodeBitPosition operatorComputer programmingCartesian coordinate systemPatch (Unix)Computer animation
Workstation <Musikinstrument>Software developerCuboidCartesian coordinate systemInternetworkingHypermediaSound effectPoint (geometry)AreaFlow separationCodeMathematicsContext awarenessGame theoryServer (computing)Reverse engineeringXMLComputer animationLecture/Conference
CodePredicate (grammar)Kolmogorov complexityForcePointer (computer programming)TwitterProcess (computing)Branch (computer science)Control flowBlock (periodic table)Mathematical analysisCartesian coordinate systemPredicate (grammar)Computer programmingWindowReverse engineeringComplex (psychology)Point (geometry)Control flow graphLinear codeVirtual machineZoom lensComputing platformCondition numberCausalityPunched cardCASE <Informatik>Workstation <Musikinstrument>Run time (program lifecycle phase)MereologyFunctional (mathematics)BitSystem callIdentity managementVelocityResultantRandomizationMultiplicationTracing (software)NumberCodeComputer animation
CodeCore dumpComponent-based software engineeringContext awarenessFlagKeilförmige AnordnungControl flowSemantics (computer science)Table (information)Virtual realityOpcodePointer (computer programming)Function (mathematics)Control flowCategory of beingMachine codeAtomic nucleusBefehlsprozessorRight angleSheaf (mathematics)Group actionMachine visionDemosceneBitContext awarenessSystem callTable (information)CodeNatural numberRevision controlTask (computing)Ferry CorstenMappingSpecial unitary groupOpcodeVirtual machinePoint (geometry)Sampling (statistics)Connectivity (graph theory)Functional (mathematics)FlagVirtualizationEntire functionPointer (computer programming)SoftwareLecture/ConferenceComputer animation
Revision controlConnectivity (graph theory)Workstation <Musikinstrument>Point (geometry)Ferry CorstenVirtualizationContext awarenessBranch (computer science)Vector spaceSocial classForm (programming)Data structureArithmetic progressionInterpreter (computing)Division (mathematics)CoalitionTable (information)Lecture/ConferenceComputer animation
Context awarenessComponent-based software engineeringTransformation (genetics)CodeSet (mathematics)WorkloadComplex (psychology)Electronic mailing listTable (information)MultiplicationPoint (geometry)Sampling (statistics)Single-precision floating-point formatRevision controlVirtualizationTransformation (genetics)Branch (computer science)Reverse engineeringContext awarenessCodeMathematical analysisData structureOperator (mathematics)Right anglePointer (computer programming)Similarity (geometry)Subject indexingSemantics (computer science)Semiconductor memoryPredicate (grammar)Computer animation
Demo (music)Address spaceCodeBoolean algebraMixed realityAdditionTable (information)Matrix (mathematics)Line (geometry)VarianceUniform resource locatorTerm (mathematics)PressureExponentiationExpert systemBranch (computer science)Sound effectRevision controlExpressionBitSemiconductor memoryCentralizer and normalizerTracing (software)Address spaceBoolean algebraMixed reality
AlgebraBoolean algebraRegulärer Ausdruck <Textverarbeitung>IntegerModul <Datentyp>Bookmark (World Wide Web)Term (mathematics)Identity managementMetadataProof theoryEquivalence relationOperator (mathematics)Endliche ModelltheorieFinite differencePairwise comparisonExpressionBoolean algebraVariety (linguistics)Office suiteTouchscreenMultiplication signTheoryMixed realityIntegerModul <Datentyp>Ring (mathematics)SchaltalgebraKeilförmige AnordnungLecture/ConferenceComputer animation
Genetic programmingSound effectFlagSymbol tableDiscrete groupSymbol tablePoint (geometry)Negative numberExpressionKey (cryptography)NumberVelocityNormal (geometry)Operator (mathematics)Core dumpSemantics (computer science)CASE <Informatik>Physical systemResultantIdentity managementLimit (category theory)Symbolic computationSemiconductor memoryCellular automatonBoolean algebraLecture/ConferenceComputer animation
Virtual realityVirtual machineRegulärer Ausdruck <Textverarbeitung>Mixed realityRepetitionSemantics (computer science)CodeAlgebraComputerPhysical systemUsabilityKolmogorov complexityArtificial neural networkAlgebraic numberNumbering schemeComplex (psychology)Semiconductor memoryMixed realityUsabilitySchaltalgebraCodeSymbolic computationOperator (mathematics)InformationCellular automatonExpressionSemantics (computer science)Tracing (software)Degree (graph theory)Predicate (grammar)CASE <Informatik>Category of beingMatrix (mathematics)Slide ruleNeuroinformatikExponentiationEndliche ModelltheorieSymbol tablePoint (geometry)Computer programmingComputer animation
Logic synthesisComputer programLimit (category theory)Semantics (computer science)outputFunction (mathematics)Functional (mathematics)Instance (computer science)Multiplication signLecture/ConferenceMeeting/InterviewComputer animation
Function (mathematics)Maxima and minimaMathematical optimizationStochasticComputer programNetwork topologySimilarity (geometry)Multiplication signSoftwarePoint (geometry)Type theoryFunctional (mathematics)Sampling (statistics)Phase transitionComputer programmingWebsiteCASE <Informatik>Term (mathematics)Maxima and minimaFormal grammarAreaSymbol tableWaveNeuroinformatikWeightEndliche ModelltheorieSocial classRule of inferenceSurfaceAlgorithmCodeoutputMereologyRobotInstance (computer science)Black boxArithmetic meanOptimization problemState observerGame theoryIterationExpressionContent (media)Video gameFunction (mathematics)Variable (mathematics)Mathematical optimizationRandomizationRootNetwork topologyMultilaterationQuery languageRadical (chemistry)Modulo (jargon)Direction (geometry)Derivation (linguistics)Logic synthesisComputer animation
Similarity (geometry)Range (statistics)Numerical analysisFlagSound effectLogic synthesisSoftware frameworkCodeOracleGame theoryBitoutputCNNStability theorySimilarity (geometry)Vector spaceoutputFunction (mathematics)Uniform boundedness principleUniqueness quantificationTerm (mathematics)Different (Kate Ryan album)BitCASE <Informatik>CuboidInstance (computer science)Hamming distanceBlack boxIntermediate languageParity (mathematics)Semiconductor memoryMultiplication signAdditionSystem callGame theoryGenderLink (knot theory)CodePoint (geometry)Semantics (computer science)Operator (mathematics)RandomizationSocial classDemo (music)Phase transitionComputer programmingLevel (video gaming)Functional (mathematics)QuicksortFlagSlide ruleSound effectCartesian coordinate systemExpressionAverageMatrix (mathematics)Insertion lossPhysical lawDistanceNegative numberRange (statistics)Dynamical systemEmulatorLogic synthesisPersonal identification numberBuffer overflowElectric generatorRevision controlCellular automatonOracleBinary codeCore dumpBranch (computer science)Sampling (statistics)Assembly languageAbstractionExclusive orSoftware frameworkComputer animation
Sample (statistics)Logic synthesisRandom numberHypothesisIterationDemo (music)Electronic mailing listForm (programming)Multiplication signoutputDifferent (Kate Ryan album)2 (number)Source codeComputer animation
EmailIterationLogic synthesisWorld Wide Web ConsortiumMaxima and minimaMIDISoftware engineering2 (number)Element (mathematics)Form (programming)Revision controlCodeSource codeAdditionFunctional (mathematics)PlanningCartesian coordinate systemMultiplication signSource codeLecture/ConferenceComputer animation
Demo (music)Logic synthesisSample (statistics)World Wide Web ConsortiumLatin squareHypothesisSampling (music)Host Identity ProtocolGamma functionIterationRandom numberMultiplicationFunction (mathematics)Grass (card game)Parallel portCore dumpGroup actionLogical constantPrice indexParsingMetreAssembly languageOracleParameter (computer programming)OvalTape driveRadical (chemistry)Semiconductor memoryPhase transitionCodeoutputMultiplication signFunction (mathematics)Social classComputer programmingLevel (video gaming)Computer fileSampling (statistics)Cellular automatonDifferent (Kate Ryan album)Video gameSemantics (computer science)ExpressionInteger2 (number)Functional (mathematics)MereologyFlock (web browser)Reading (process)Graph coloringTracing (software)Logic synthesisElectronic mailing listParameter (computer programming)Variable (mathematics)ResultantWrapper (data mining)Slide ruleRandomizationSource code
Logic synthesisCodeBinary codeControl flowSystem programmingInformation securityDegree (graph theory)Virtual machineVirtual realityVirtualizationControl flowBefehlsprozessorNumbering schemeCASE <Informatik>Virtual machineSemantics (computer science)Reverse engineeringPattern languageComputer animationLecture/Conference
Component-based software engineeringQuadrilateralSymbol tableSemantics (computer science)Table (information)AdditionComplete metric spacePhase transitionBitCalculationShift operatorParameter (computer programming)Line (geometry)Workstation <Musikinstrument>MereologyFormal grammarSemiconductor memoryStructural loadCentralizer and normalizerImage resolutionComputer animation
Flow separationMereologyTracing (software)Semantics (computer science)Address spaceTable (information)CalculationCodeBitControl flowProcess (computing)Heegaard splittingBus (computing)Expert systemLattice (group)Computer animation
Sample (statistics)Branch (computer science)CodeGenetic programmingLogic synthesisMathematical analysisTracing (software)Control flowBus (computing)Graph coloringDifferent (Kate Ryan album)Level (video gaming)HypothesisNumbering schemeConnectivity (graph theory)Symbol tableFreewareMixed realitySampling (statistics)Roundness (object)CodeBlack boxSchaltalgebraDrill commandsSemantics (computer science)CASE <Informatik>Sheaf (mathematics)Field (computer science)WebsiteSet (mathematics)Lecture/ConferenceComputer animation
Key (cryptography)AreaStokes' theoremCompilation albumStochasticInheritance (object-oriented programming)Point (geometry)Different (Kate Ryan album)CuboidCombinational logicCodeMathematical optimizationSemiconductor memoryLogic synthesisVirtual machineNP-hardParameter (computer programming)Computer hardwareReverse engineeringIntegrated development environmentMechanism designSoftware protection dongleOperating systemComputer programmingBasis <Mathematik>Direction (geometry)WebsitePhysical systemData storage deviceElement (mathematics)Lecture/ConferenceMeeting/Interview
ExpressionParameter (computer programming)Line (geometry)Level (video gaming)Numeral (linguistics)Functional (mathematics)Formal grammarCurveInternetworkingSemantics (computer science)Lecture/ConferenceMeeting/Interview
InternetworkingFormal grammarNumberMultiplication signBasis <Mathematik>Lecture/ConferenceComputer animation
Function (mathematics)ExpressionSpacetimeMultiplicationInheritance (object-oriented programming)Flow separationBitConnectivity (graph theory)outputPrice indexSemantics (computer science)Division (mathematics)Shift operatorEndliche ModelltheorieExclusive orNetwork topologyFormal grammarMatrix (mathematics)Lattice (group)Metric systemVariable (mathematics)Civil engineeringWordLinear subspaceForm (programming)Hand fanMeeting/Interview
Sound effectCASE <Informatik>Boundary value problemTerm (mathematics)Level (video gaming)State observerCodeHeegaard splittingSimilarity (geometry)Instance (computer science)Functional (mathematics)EncryptionWindowPhysical systemFlow separationCombinational logicLocal ringForm (programming)Lecture/ConferenceMeeting/Interview
Roundness (object)Level (video gaming)State of matterLecture/Conference
Transcript: English(auto-generated)
Welcome to the 6.30 talk of today. One of the many things I love congress for is the bold programming.
Normally this is the arts and culture stage and society stage, and so I'm very excited that there's also a security talk today happening. Our next talk, let's break modern binary code obfuscation, is going to be translated into German,
under this address, streaming.c3lingo.org, which should also appear on the screen now. Wonderful. And from Bochum, I'm going to welcome now Tim Platzitko and Moritz Kontak,
and they're going to explain to us how you can de-obfuscate code without looking at the code, but only its behavior. A warm welcome and a lot of applause.
Thank you. First, can we enable the TV down here? Technik, could you… Well, hi. So, welcome to our talk.
So, we are, Moritz and I, two PhD students and reverse engineers at the Ruhr Universität in Bochum, Germany, and we discuss and we, in our daily work, we do reverse engineering and things like that, and as part of one of our academic projects, we started with programs and thesis for binary obfuscation.
And then, thank you. And so, this is basically the more technical version of our academic talk that we presented and public… …publicated at USENIX Security this year. So, in the first, Moritz will talk about code obfuscation techniques
and de-obfuscation techniques, and I will later join with programs and thesis and how to apply that. Thank you. All right. Thank you for the introduction. Okay, first things first. Why do we want to obfuscate code in the first place? So, to settle the scene, it's important to note
that we cannot really prevent reverse engineering attempts, but rather we seek to complicate them, at least to some degree. Reasons why we would want to do so are mainly intellectual property and the protection thereof. So, like, if you got some super secret algorithm which gives you some competitive advantage over some competitor, you can just protect it and, yeah,
just have a head start to this other corporation or something. Another reason why we want to obfuscate code would be malicious payloads. So, we want to make them harder to detect so that analysts are not easily able to create signatures for those and the malware lives longer without being picked up by an AV.
And the third, I think, most interesting case is digital rights management where software, especially AAA games or larger games, are just obfuscated in order to prevent cracking attempts and illegal distribution of the game itself.
And especially in the context of digital rights management, there's a very fitting quote which settles the scene as to what we can expect from code obfuscation. It's from Martin Slater of 2K Australia and was issued regarding to the release of BioShop back in 2007.
And he told us that they achieved their goals because they were uncracked for 13 whole days. So, apparently, we cannot really prevent obfuscation. We can just make sure to, yeah, make the software withstand cracking attempts at least 13 days, which is about the time in which the game distributor
makes most of its revenue. Okay. How do we protect software? So, there are different approaches to protecting software, some better, some worse.
For example, we can just take a look at the software that's been used to analyze the programs itself, like Ollie Debug, like Disassemblers, and so on. And one idea that comes to our mind is just to abuse shortcomings of these tools. So, if we know Ollie Debug crashes due to some code analysis as part of a specific FPU sequence,
we could just issue that and make Ollie Debug crash. Or similarly, if we get some process stumper, which is relying on fields of the PE header memory, we can easily confuse them and just render these tools unusable. Another approach that's very popular is just to detect the environment the program runs in and check if we are running an application
in an environment that gives us information about the presence of a debugger. So, there's some capability by the operating system. For example, the being debugged bit in the PEB, which is being set if a debugger is attached. Similarly, there are more low-level tricks we can abuse,
or some operating system internals, which allow us to somehow detect the presence of a debugger and then just abort execution and to just prevent any reverse engineering attempts. However, both techniques have the drawback that first, once you know how they work, they are easily fixed.
You can just patch your tool, or you can just circumvent the anti-debugging tricks by just supplying the correct values. Also, if you go to Google and just search for game doesn't start debug detected, you get over six million people complaining that they cannot run their latest AAA game because the anti-debugging technique
which might have worked reliably on Windows XP does not really hold up on Windows 7, and issues are false positive. So, benign customers cannot really use their game, cannot really play their game, because the debugger detection was faulty. So, let's set up some requirements we need for code obfuscation.
An important one is that we want the code to be semantics-preserving. So, the mere fact of protecting the application shouldn't change its server behavior. So, we do not want the game to break only because we want to protect it. A second point, we want to avoid external dependencies,
at least in the context of our discussion here. So, there are various attempts to outsource data like on the Internet, on an Internet server, on a USB dongle, or on other separate media. But we are mostly interested in techniques that protect the code only. So, we've got a white box attack scenario
where the attacker has everything he needs to attack the application at his or her hand. And finally, the most important point probably is that we want to employ techniques that are way easier to deploy for us than for the reverse engineer to attack. So, the anti-debugging tricks we've seen
or also the shortcomings of the tools which we can abuse, they are more or less effort 101. Once you know how it works, you can easily bypass it. But we want to employ techniques that are easy to deploy for us, but they are very hard to attack by other parties.
Okay, counter-code obfuscation techniques. One technique that's been used in commercial protection engines is what is more known as opaque predicates. So, consider this rather easy CFG on the left side. So, we've got some application with a linear control flow graph.
If we now insert what is known as opaque predicates, the program looks vastly more complex. So, we've got more branches than control flow, which we have to track, and we have no clue that the underlying program, in fact, is very simple. Let's zoom in on one of those predicates.
Okay, so we've got a bit of code, we've got a true branch and a false branch, both branching to a different basic block. So, if we're just looking at this, we might come to the conclusion, okay, we don't know which branch is going to be taken, right? However, it turns out that this is what we call in the park true predicates.
So, in this case, the true branch is always taken, irregardless of the behavior here in the predicate. So, what's happening here is that we've got an opaque predicate on top. This is the whole block, which is just constructing a predicate based on an opaque value,
like the return value of the API called getCurrentProcess. And if you've ever worked with a Windows API before, you might know that getCurrentProcess always returns a constant value, namely minus one. So, by just issuing this predicate,
we make sure that we always take the left branch. The right branch might point to that code, or to other point to confuse the reverse engineer, but at runtime, we always take the left branch. Similarly, there are opaque false predicates, which just invert this condition and follow the false branch.
And there's another flavor called random opaque predicate, in which you branch upon a random value. This means that we can potentially, at runtime, follow both branches. So, the challenge here is that we have to make sure that both blocks, or both paths that follow, have to be semantically equivalent,
because the program breaks otherwise. However, it is also a challenge to ensure that it's not easily detected by the attacker, that both branches are in fact semantically equivalent. Okay, recapping on the opaque predicates, obviously they increase the complexity of the application.
Also, they can be built on hard problems, but what is most important is that they force the analyst to encode additional knowledge that either the analyst himself has to know it, or he has to just extend his tool to know maybe about the WinAPI,
know about FPU instructions, know about arithmetic identities, or some other hard problems that he all has to encode in his analysis tool to provide reasonable results. So, they are hard to solve statically. However, if we now opt to only look at concrete execution traces,
so we just let the application run and look at concrete values, we know that opaque predicates are essentially solved for free for us. So, this is good to keep in mind. Another very interesting technique that has also been deployed in, I guess, every major copy protection out there are what you call virtual machines.
So, take for example this native code on the left. We've got some loop here and some x86 code, and let's just assume the loop here is our precious intellectual property we want to protect. Obviously, we can use our common tools like disassemblers,
like IDA or OLLI debug, to look at this code and reason about this code and get to know what this code does. So, the idea here is to replace the code on the left with something our common tools cannot understand. So, what we're doing, we're just getting creative, and we're just making up an entire instruction set,
as you see on the right. So, VLD, VPOP, instructions that are not known in this way by any other architecture, full with new registers, with new encodings, and so on. And both the native code and the new instruction codes
are semantically equivalent, again. What you're now doing is just replace the native code by a call to what you call a virtual machine, also known as interpreter, which is basically a CPU and software which just lets you run the imaginary architecture we've thought about.
So, if we now try to take our common tools like IDA or OLLI debug, or some other tools to just analyze this code, it's not really viable anymore because they don't know about our made-up architecture. But we still have to make sure that the transition from native code to a virtual instruction here goes seamlessly.
And to this effect, we have to look at the components. So, there are three core components to a virtual machine. Mainly, we have VM entry and exit. And what they do is just perform the context switch from native to virtual context and back. So, the entry just copies the native context, say, registers and flags of the native architecture to the virtual context,
and the exit copies them back to the native context. Usually, the mapping of native to virtual registers is one-to-one, which makes it a bit easier. Then we've got a traditional fetch decode execute loop, like in a traditional CPU. And what it does, it just fetches and decodes one instruction
and forwards the virtual instruction pointer accordingly. Then it looks up the handler which defines the virtual instruction in what we call a handler table here on the right and just invokes this handler, and the handler goes on to actually execute the instruction.
So, as we've seen, the handler table is just a table of function pointers indexed by opcode, and we have at least one handler per virtual instruction. And each handler then just decodes the operands, operates on them, and just updates the VM context accordingly. Okay, so this is an example of an un-obfuscated version
of a popular VM obfuscation, and we might be able to make out these components here. So, on top, we see the VM entry. We're coming from the native code, calling into the interpreter, and here we're just switching the context to the virtual context,
initializing the virtual context. Then we're at the VM dispatcher loop, which just looks up the current virtual instruction in the handler table, which in turn points to the individual handlers you see here below. So get direct reference to these handlers, and the handler then updates the virtual context and branches back to the VM dispatcher.
Eventually, we end up at the VM exit handler, which just performs the context switch from virtual context back to the native one. Okay, taking a closer look. In RSI, we have the virtual instruction pointer, and in RBP, we've got the VM context. So you see that the VM dispatcher, all it does is just
taking one byte of memory, increasing the instruction pointer, and looking up the corresponding handler and jumping to it. It might jump to ignore handler, as seen here, and as you can see, it just reads out of the virtual context, performs some semantics, and then writes values back into the context,
and finally jumps back to the dispatcher to execute the next virtual instruction. Okay, so this is rather simple and easily understandable. How do we harm this whole concept? For one, obviously, as we've seen, the handlers are simple. There are only a few instructions, and they are easily understood.
What we can do here is just apply traditional code obfuscation transformations to make analysis of them harder, like substituting operations, inserting apart predicates, inserting junk code, and so on. So on the right, you see some slightly more complex assembly listing than before. Another technique, imagine you only got four handlers in your virtual instruction set.
You want to make it more complicated, more work for the reverse engineer. And what we're doing is just duplicating individual handlers to increase the workload for the attacker. So because the handler table is usually indexed using one single byte,
we've got up to 256 entries we can populate. So what we've been doing is just duplicate existing handlers to populate the full table, and then again use traditional code obfuscation techniques to hinder the attacker from easily finding out that two handlers are, in fact, similar.
And again, we want to increase the workload to the attacker, who has now analyzed up to 256 differently obfuscated versions of the handlers. Another technique you might recognize, we here get the sample dispatcher and multiple handlers, which all branch back to the dispatcher, which executes the next instruction. What you can do here is to omit this single point of failure
and just inline the dispatcher at the end of each handler. So what happens here is that we don't have a central dispatcher, but every handler branches to the next version. In summary, we inline the dispatcher into each handler
because a central VM dispatcher just allows an attacker to more easily observe what happens inside the VM, like recording every handler that has been executed. And the third technique, we can also get rid of the handler table, actually, because the explicit handler table easily reveals all VM handlers, as we've seen earlier,
because it points to every single start of each handler. What you can then do, instead of querying the explicit handler table we've been lying around in memory, we just encode the next handler address in the VM instruction set itself. So this might look a bit like this.
We get the opcode, the operand, and then somewhere in the encoding, we're encoding the next handler address. This has the effect that it essentially hides the starting location of each handler that has not been executed yet. The handler table would easily see where every handler lies and can geophysicate and analyze it,
but with these indirect encoding, we can only observe those handlers that have been occurring in the concrete execution trace. Okay, going on to mixed Boolean arithmetic, which is another obfuscation technique, not so widely used in modern obfuscation yet,
but to give you an idea how it works, just look at this instruction here, or at this term. So it looks a bit involved, and you might not be surprised when I tell you that there's a simpler version of this expression, which is a simple addition of the values x plus y. You might guess that even for this more involved term,
there's again a simpler variant. Especially here, it's the addition of x plus y plus z. Okay, you might believe me that it's easy if you throw the first term and the second term in your favorite SMT solver and you are able to prove equivalence. However, what is interesting, how do we get to the second term
if we only have the first. So we can try to maybe apply Boolean identities we know from school or something. Maybe arithmetic identities, maybe we might even go so far as to draw a nice conic wedge map. However, it becomes evident that this doesn't really help us here.
In fact, what we are using here is the concept of the Boolean arithmetic algebra, which gives us a ton of different operations that are in this algebra, like Boolean operators, comparisons, and even arithmetic operators.
It's worth noting that the Boolean algebra also includes, sorry, that the mixed Boolean arithmetic also includes a Boolean algebra and an integer modular ring. And it is worth noting that no techniques exist to simplify expressions that contain both.
We know how to reduce expressions in Boolean algebra and the integer modular ring, but there is no underlying theory that helps us in easily tackling both at the same time. Okay, on to de-affiscation. In every de-affiscation scheme, or in every de-affiscation tool we've seen,
we get to a point where we employ a technique at some point which is called a symbolic execution. So consider the handler here on the left. This is the null handler we've seen earlier. And we now want to reason about this handler. So what we are doing is we execute it, but not with concrete values,
but with symbolic values. This looks like this. So if we are just executing this move symbolically, we assign to Rcx the value of the memory cell indexed by the register RVP. We can continue to do so. We've got the second assignment.
Also, if we've got a NOT operation on Rcx, we just assign to Rcx the negation of Rcx, which is the same as the negation of the memory cell pointed to by the register RVP. And we go on. And what gets interesting is this case, where we got a logical AND on Rcx and Rbx,
which is the same like the negation of the memory cell pointed to by RVP, with the logical AND and the negation of the other memory cell. But because symbolic execution is basically a computer algebra system, it encodes some well-known identities from the Boolean algorithm, for example.
And in this case, it knows that this expression is equivalent to a NOR of both memory cells. We can then just continue with execution. And again, here, the symbolic execution engine recognizes the next NOR.
At some point, this works rather well, and we see that the core semantics of our handler here is just a NOR operation on two memory cells and strongly result in another memory cell. So apparently this works fine for this handler. But what can we do, or what is the result, if we try to throw this whole concept of symbolic execution at a more obfuscated handler?
Here's a rather simple example of an obfuscated handler. You might be able to make out some parts, but let's just throw a symbolic execution at it. We see that we get a ton of information here. But the problem is we don't really want a ton of information.
What we want is just the underlying semantic, which is rather simple. It's the NOR operation of two memory cells and the assignment to another memory cell. Also, let's try to throw this at mixed Boolean arithmetic. So we have this rather complicated mixed Boolean arithmetic expression, and I'll just simply compile it into a program, just like the binary program,
and then run symbolic execution on this trace. What we get is this semantic, and it didn't really fit on the slide, so it goes on and on. And you might be able to make out the resulting values in RAX, and we got this super complicated expression.
And again, what we got here as underlying semantic is rather simple. So we don't want to have this complicated expression. We just want this more simple semantic here. Symbolic execution is nice because it allows us to capture the full semantics of the code we execute. Also, it's a computer algebra system, and as we've seen in the NOR example,
it allows some degree of simplification of the intermediate expressions. However, the usability decreases to some point. The syntactic complexity of the underlying code increases. For example, we can introduce artificial complexity by just substituting instructions,
or just using opaque predicates or other schemes we've seen earlier. Or we can also increase the algebraic complexity by employing techniques like mixed Boolean arithmetic expressions. Okay, so it's obvious that we have a problem with syntactic complexity,
and we want to handle this somehow. So the interesting question here is, what if we could reason about the semantics of the code only instead of having to fight about the syntax and just improving our tools to be able to cope with more complicated syntax?
This leads us to the topic of problem synthesis, which Tim will tell you more about. Yeah, thank you. So we have seen some limitations of syntactic complexity, but it comes to my mind, obfuscation is semantics preserving. That means it has the same IO behavior, input-output behavior.
So why not just using a function as a black box and observe what it's doing? So, for instance, we might for this just generate some random inputs, one, one, one, and observe three. And we note that down, and we do this several times more.
So we store this one and this one and 20 others again. And then we don't look at the code at all. We just look at its IO behaviors this year. And then what we learn is that we learn a function that has the same behavior. So we might learn X plus Y plus Z.
And the goal of program synthesis is to automatically learn these things based on IO samples. So how do we do that? So we use a probabilistic approach. Basically, we have an optimization problem.
We have a surface, something like that, and this thing has some deep points, some large points, and we have a global maxima top there. And the global maxima is a program that has exactly the same IO behavior as our black box. And we have a concrete value for each point on the surface.
So the closer we are to the global maxima, the higher is our score. And in probabilistic optimization problem, we basically start with a minor score and just increase until we find the global maxima.
How we do that is an algorithm that is based on Monte Carlo research. Basically, Monte Carlo research is one of the main reasons why AlphaGo last year was able to beat world-class human Go players in ComputerGo.
So let's get practical, and let's just synthesize A plus B modulo 8. We first somehow have to define how we generate programs. Well, we define a grammar. We have a non-terminal symbol U. Non-terminal means that we just can replace it with other symbols.
So for instance, we replace U by U plus U, or by U times U, or by A or B. We say that A and B are our input variables. We cannot derive them any further. So therefore, we have a candidate program, something as A, B, A times B, A plus B, B plus B, and so on.
Contrary, an intermediate program is a program that contains at least one U. So we can derive it further. So with that, as a foundation, we do Monte Carlo research.
We start with an empty tree that has just the root as root node the U. And then we randomly apply the roots of our grammar. So we derive A. A in this case is a terminal program, so it cannot be derived further.
So we can give a score to that. Because we are in a probabilistic surface, we have to score things. And so how we calculate the score, it doesn't matter now. We come to that later. Just we give it a score. And then we derive the next thing, B. And also there, we give it just a score.
This time, we derive U times U. So we cannot give it a score since we have an intermediate program. What do we do? We do something that is called random playout. We apply the roots of the grammar randomly and replace the U until we get a finite program that we can evaluate.
So we have something as A plus A times B plus A and give it a score. And then we do the same with U plus U. Here again, we have an intermediate program. We derive something, score it, and go back. Now we don't have any further rules to apply.
What we then do is we choose the best child node in a probabilistic manner. In this case, it is U times U. And do the same thing again. We derive some expression, do a playout, give it a score, go back.
Now, each score basically represents the average score of all the children's scores. So in this case, we had a new score here, so we just update this one. And we go back. Again, we choose the best node. We do a derivation step and so on.
Go back, update, choose the best node, play out, score, update, and so on. So now here, U plus A, we have a really high score. U plus B, we have a rather low score. And we want to synthesize A plus B.
This means that U plus B had a bad playout and that U plus A had a good playout because here the score is better. So we update this and go back. So we again choose this node, go back, and go back. So as you can see, we always explored this area more than that.
The reason is, well, because U plus U seems to be way more promising than U times U. However, we just have some probabilistic behavior. That means we sometimes explore also different ways just to get to know if we might miss something.
So in this case, we would normally choose U plus A again. But we just decide to choose that. We do a playout, we give it a score, and well, it wasn't really good. So because of that, the next step, we go back to U plus A.
Then we derive B plus A, and so we have a final program. We cannot derive it any further, so we can score directly. We give it a score of one. Why one? Well, because B plus A has exactly the same IO behavior as A plus B.
So in other terms, we are finished with our program sentence part. How do we score some values? How do we calculate something? Basically, what we do is we generate a random playout, we generate random inputs, and query our black box and observe the output.
So two plus two is four. And now we use the same inputs and query our intermediate program and observe the output. Then we calculate the similarity of these two and get a score. Similarity, I will come back to this on the next slide.
So we compare somehow the similarity. We don't do this only for one input pair, we do it for many. So we do this, we do it for this one. In this case, we have the same output. That means the similarity is one because it's exactly the same.
And we do it several times more. And finally, our score for this node is basically the average score of all these similarities. How do we calculate the similarity? Well, we are operating in the bit vector space, so we compare the similarity of bit vectors.
In other terms, we can compare if they are in the same range. So we have a look at the trailing zeros or ones, and at the leading zeros or ones, and count how many are the same. Then we can still compare how many bits are different.
This is the Hamming distance. We use that, for instance, for something as overflows in terms of addition, or XOR operations, and end operations, and things like that. And then we still have A plus B or something like that without overflow,
where we can have a look at how close are two values numerically, so basically the distance. And we use this matrix to tackle these different behaviors of the bit vector space, such as overflows and things like that,
and we take again the average of all these matrix. Okay, how do we use that as a core to synthesize obfuscated binary code? So basically what we do is that we have an execution pass, perhaps obtained from an instruction trace, perhaps from static analysis, whatever you want,
and we somehow emulate it and observe indirectly inputs and outputs. We say that everything is an input which we read before we write to it. So in this case, these are the two memory cells at the top there.
And everything is output in red that we write lastly to. So for instance, the RBX is not modified anywhere, so this is an output, the same for RCX and things like that. And then we treat this again as a black box, generate random inputs,
and feed it into our black box and observe the output. In this case, RBX. So we do this many times, so something about 20 or 30 times, and then we synthesize this output. And then we learn that RBX is not M0, and M0 is here, in this case, the first memory cell.
We do the same for RCX and learn that this is basically the NOR of these operations. And finally, we do it for this one, and also there we learn that this is a NOR. To conclude, we learn the semantics of the two NORs and the negation top there.
What we don't learn in this handler is the push-f operation that is flag-based and the flags that are written back into the memory cell pointing at RBP. Why don't we learn that? Because we don't care.
So basically, we designed our grammar in such a way that it just abstracts higher-level semantics, ignores bit-level semantics, and just abstracts higher-level semantics. Or in other terms, we don't consider flags as any sort of input. Okay, we do random sampling about a code.
What is if there is a branch that may be conditional, and we feed in random inputs? Well, we might trigger some different behavior, such that the emulator decides to choose another path. Well, what we do here again is we just ignore the flags,
and we force the execution to go the path that we observed. So if we want to go to A, we just force it to go to A. If we want to go to B, we force it to go to B. So how do we implement something like this? Basically, you have lots of different opportunities.
What you need is some code base that you can somehow execute code. For instance, emulate it as a box or a unicorn engine, or you can do it with dynamic binary instrumentation, as in pin or dynamo.io. What you also can do is you can just translate it into intermediate language,
symbolically execute it, and evaluate just the symbolic expressions while feeding it with concrete inputs. But normally, this is much, much slower, especially for large constraints, if you've seen it before. But in general, it is possible.
What we did is that we implemented everything in our framework that we've called Zuntia. Basically, Zuntia is a program synthesis framework for code obfuscation. It is written in Python. It performs random I.O. sampling for assembly code based on the unicorn engine, and it also implements MCTS,
or the Monte Carlo Research-Based Program Synthesis, as a core. We published this under GPL version 2, and this is the link where you can get it. So we will just do a quick demo. So first, I will show something about the program synthesis itself.
Basically, we define an oracle that is a function that we treat as a black box, that we use for I.O. sampling. And what we want just is to synthesize X plus X plus Y plus Y. So we just run it,
and what you can see here is that we start with different nodes, that we try out different things, and always assign a higher reward to this. And you also see, oh, in this case,
it somehow learned that there must be a lot of additions, and at some point, we get something like that. So this is basically V1 or V1, and V2 and V2. So this isn't the simplest form, but it is simple enough such that we can easily simplify it
to something like two times the input plus two times the other input. So we can do that another time, and we might choose a completely different path because it's all probabilistic. Perhaps one playout was better than the other,
then we choose another path. It might take up to five seconds, up to one minute, so it basically depends how the quality of our input, how we choose a path, and things like that. But at the end, we will mainly reach our goal. If not, we just start it again. So the first one, it took nearly 20 seconds.
Now it took 24 seconds, and you see now we have a much more or less obfuscated version in our non-simplified form. So it's okay. How do we use it for simplified code, to synthesize obfuscated code?
Let's have a look at the source. This, unobfuscated, top there, this is the plain code. Basically, it takes five inputs, just performs an addition and a multiplication,
and returns the value. Note, this is a function that means our final value, the return value, will be stored in EAX. And the obfuscated is with tigress and MBAs, and we get something like that. This is a little bit longer. And then we compile that,
and get some assembly listing that basically is about 760 lines. So we don't symbolically execute it here because it would be really, really much of time.
Instead, we just learn it. What we do is we just random sampling input. So we give it by basically a code file. This is our instruction trace. We define the architecture. We do, well, let's take 20 random IOS samples,
and we write it to an output file. So you see this were only less than five seconds. So if you look at the file, what we see is that we have different outputs.
Here, this is one output. This is another output. We basically have 17, 18 outputs because we start counting at zero. We are only interested in EAX, so we will just synthesize that in a few minutes. And you see we have here our five memory inputs,
our parameters for the function, and also some registers that were treated as input. Okay. So what we then do is we just define in our sample synthesis. This is a wrapper about MCTS.
This reads basically the input file from before, and here I noted zero because we only want to synthesize the first output now. So we just start it.
It basically takes the sampling file as input and an output file. And this again, this might take between seconds and one minute, so it depends. So we are ready.
We can have a look at the result. You see this is the output. The top non-terminal is some integer that has not been further derived plus a concrete variable. The best top terminal,
so the best program that cannot be derived any further, is a memory cell times the memory cell plus a memory cell. And so this is our final expression. Basically, we learned that the whole semantics is the first memory read times the third memory read
plus the fifth memory read. So, okay, coming back to our slides. How can we use that to break virtual machine-based obfuscation? Quick reminder, the goal of VM-based obfuscation
was to introduce a new CPU scheme that you can't analyze easily with your tools. You have to manually reverse engineer all the handlers and things like that. And well, one simple thing we can do with that is just learn the semantics of arithmetic handlers,
such as if the semantics is an AT or NOR or something like that. So Moritz discussed four different hardening techniques, and we will talk about how we can break all of these. So the first one was obfuscating the individual 4M handlers
with more complex obfuscation schemes. Then just duplicate handlers and somehow transform them such that they don't look the same. Then we don't have load central VM dispatcher, so we don't have an FDE loop. Instead, each handler inlines and completes the calculation of the next handler,
and then we don't have any explicit handler table. Okay. What you see here is a handler of timida. It looks quite more complex, and you don't easily see, also not with symbolic execution, what this is doing. Well, if we ask Cynthia,
it says, well, the semantic is that it is an addition of the 13th and the 14th memory read, and it is stored into a 64-bit value. Okay. Solved this problem. The other problem, duplicated VM handlers.
Assume we have a handler table where we can see each handler, or we have an instruction trace where we know where each handler is located. So given we know where handlers are, so how can we learn duplicated VM handlers?
Well, at least we learn its simple semantics. We can learn based on our grammar if it's an at, if it's an XOR, if it's a sub, if it's a shift left, and things like that. And of course, we can find duplicate for free because we learn the semantics, and if they are the same, we learn it.
Okay. So coming back to our timida handler and no central VM dispatcher. Have a look at the end of the handler. Let's zoom in. Basically, this plus the code above there is the code to calculate the next handler address.
So we don't, this calculation is a little bit too complicated for us into this part, so it doesn't tackle our semantic approach because we still learn the semantics of the handler. And since the semantics of the next address calculation is too complex, we just don't learn it. And one other thing.
Have a look at the JMP-RBX. One advantage or disadvantage, depends how you see it, of inlining something like that is that we have a hard heuristic how we can find the end of VM handlers.
Basically, they end with a return or with a JMP-RBX. So basically, if we split an instruction trace at indirect control flow transfers, we split it into separate VM handlers. This helps us for our fourth part.
We don't have an explicit handler table. So we don't know by static analysis where each handler is. So what we can do is we just execute it and we observe the instruction trace. And then we have something like that. We just then apply the rule from above. We split everything where we observe indirect control flow
and we get something like that. And as it points out, each color is a different handler. And then we can take these single components and just synthesize it on their own. And in this case, we automatically
learned large portions or large things of the instruction trace. And we have built in semantic disassembler for free for the new architecture, if you want it that way. So we can say a lot of things, just try it on your own.
We also released our code and furthermore, we released all of our samples. So just feel free to use it and play it with the round. So to conclude, we talked about obfuscation techniques, mainly opaque predicates, VM obfuscation schemes and hardening,
and mixed Boolean arithmetic. And of course, they all can be used together. And it's all mainly the same if we work on deobfuscation on the instruction trace level. So then we discussed about symbolic execution and for syntactic deobfuscation. We saw that it's really great,
but it has some drawbacks if it is syntactically too complex. So on the other hand, we asked us then, what about not looking at the code level, but instead looking at the semantic level, using the code as a black box to obtain IO samples and learn something different. So this is programs and thesis.
And a final reminder, you find everything in our Git. Thank you very much.
All right. Yes, I was wondering... And please try to go out silent. All right. Okay, do you still hear me? I was wondering, when it takes a look at these obfuscation techniques,
wouldn't it be possible for a closed-source operating system to just prevent any debugging of code? So say we could encrypt the code with some key that only the operating system has access to, and then only shovel it to a protected memory area. Well, what you might do
is that you execute it inside a virtual machine or something like that. Assume you can do that, and you can take memory snapshots and memory breakpoints at different points of execution. What you can then do is, if you know somewhere, in combination with reverse engineering,
that at this memory area now lies an obfuscated code, decrypted code, because you know it just was decrypted because it is used for execution now. You can take a memory snapshot and dump the code, and then feed it into something like that.
Also, to add up on this, this is also kind of a non-technical argument to make here, because all the... Like the argument we had on the slides, if you try to outsource your code, this is a similar scenario, because to have hard guarantees on that, you might want to have some HSM or some other hardware mechanism to enforce this,
which might not always be feasible. Imagine having a corporate environment where you need to supply USB dongles or something, which might be in violation of some corporate policy. We've been looking only at things we can do in the white box attack scenario where the attacker has everything he needs to analyze the code,
which is the most non-restrictive scenario here. Hello? I would like to ask, these techniques also seem to be very useful for
code optimization in compilers. Has any research been done in this direction? Yes, lots of research. So basically there is one big project, this is called STOKE. This is from the Stanford guys. They do stochastic program synthesis for super optimization. And this was also
one of the inspiration for us to use a stochastic synthesis approach. Thank you. Do we have a question on the left also? Yes. Then you go. If I understood you correctly, then you focus primarily on arithmetical functions
or arithmetical expressions. Can't you simply use techniques of numerical curve fitting? Well, this might be possible. We haven't looked into that. But we also mainly looked at that
because we needed something to evaluate our approach. So we designed our grammar to learn something like that. But we are free to choose any more complex grammar or learn much more deep expressions. So this was just obvious that we wanted to
deobfuscate was rather easy from the semantic level here. But the approach is much more powerful. Thank you. I heard that the internet is also still awake and we have a question from the SignalAngel. Please, SignalAngel. Yes, we do. Have you tried synthesizing
intermediate representations? No. This is in general possible because and others did some work on that. And there is no reason why you cannot do this with that approach. But we basically it depends mainly on the grammar.
What you want to synthesize. And on your IO pairs. So if you design the grammar that way, that you want to synthesize some IO, you can do it. Because we are moving so fast, I think we still have time for one question of number two in the back. Hi, does this work? So you have been using
kind of fancy words, but in the end I think what you are doing is essentially an optimized search for an expression that looks like it gives the same semantics. And now I am wondering what is it that makes you believe that this is faster than just randomly trying expressions. So why do you think that the child nodes in this MCTS tree are some indications
are somehow better if the parent node is better? Well, basically we have a really large search space. So for VM handlers if you have just A plus B this is rather simple in semantics. But normally we have something about 20 to 50 inputs and several outputs and we have not only
just multiplication or something like that we have a much, much larger search space. So we have 8 bit input variables, we have 64 input variables, we have downcast, upcast zero extends, the whole grammar as bit shift, left shift arithmetic shift and XOR and also model low and division
and things like that. So we have lots of components such that we need. So that for A plus B something like that, you are probably faster. But if you take A plus B plus 4, plus C you are not that fast anymore. So we really have large
search space and if we so if you use some kind of guided learning, this helps much. Final question, mic one. Hi. So it looks to me a very interesting work, but my only question is what if the
function you are trying to obtain maybe, I mean what if the similarity function does not respect what the code is doing? What if for instance it's not the case that the similarity increases the more you get closer and stuff like that. In shorter terms, how would you break your own system by letting the
assumptions of your similarity function? Well, of course. Of course, if you just make it semantically harder for instance, it won't work at some level anymore. But the observation, so what you can do for instance is if you apply some form of local encryption or something like that this might break or this will break
for sure. But one of the observations we made is this isn't done yet. So this was a really interesting fact. And also one of the main things is how you choose the boundaries of where you start or where you end with your window that you analyze.
And there can also be, in this case we had a look on the handler level but there's no reason why not to combine several handlers or split into handlers in half or things like that. So there might also be ways by choosing window boundaries to tackle this again. Thanks. Okay, those who didn't get a chance
I'm sure you can approach the speakers next to the stage once they leave. Again, thank you very much for the interesting talk and let's conclude this with a round of applause.