Bestand wählen
Merken

Lets break modern binary code obfuscation

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
2 2 and
and the and the the the along with him and to this 630 . gov today it's and the 1 of the many things I love congress for is the boat programming normally this is staffed encounters states and societies stage so I'm very excited that dad did there's also a security talk today happening the and um I'm that's next outlets break modern binary code obfuscation is going to be translated into German as that of badge that's um under this address streaming . seed 3 lingo . all which should also appear on the screen now a wonderful thing and from both home environment gonna
want them not to let it go and content again by gonna um explain to us how you can be obfuscated carrot without looking at the code but only its behavior warm welcome and a lot a lot of a block pH few pH safety uh for us can be able the you of him it technique could you well well hi so uh welcome don't talk so we have more that's and I had to PhD students and give us engineers at that will you visited in both home and Germany and to be so we discuss and we in our daily work we do we into meeting and things like that and as part of a 1 of all academic projects we started
with low-density this fall by many D this case and the then thank you and so this is basically that more technical version of all our Academy to talk that we present at a pub public abdicated at Usenix Security this year so in the fast bowlers will talk about code of his cation techniques and be obfuscation techniques and I will later joined with programs and the this and hold to apply that think alright thank you for the introduction on OK 1st things 1st on why do want to
obfuscate code in the 1st place so to settle the scene it's important to note that we can really prevent reverse engineering attempts but rather we seek to complicate them and this is some degree of reasons why you would want to do so many intellectual property and the protection thereof so like if you get some super-secret algorithm which causes some competitive advantage over some competition you can just protected and we're just that have a head start on to to this of the cooperation on another reason why we want to obfuscate cone would be a malicious payloads so want to make them harder to detect so that analysis easily able to create signatures for those and then the mound when this longer without being picked up by an 80 and and the story I think most interesting cases didn't rights management where of software especially triple-A gains from larger games of chess obfuscated in order to prevent cracking and trends and illegal distribution of the gain of of the game itself and especially in the context of the rights management as a very fitting quote which of settled since he as to what it we can expect from court of station is from Martin's later of 2 K Australia and and was so should to the release of Irish effect into those of 7 and on he told us that they achieve the goal because they were ungrateful protein hold based so from the wouldn't even prevent obfuscation can just of make sure to the yes make the softer would stand cracking attempts at least 14 days which is above about the time in which the game on distributed makes most of the office of revenue looking on how do we
protect so they different approaches to protecting
suffered some but some words for example we can just take a look at the software that's been used to analyze of the province itself psychology like December's and so on and 1 idea that from so my justice to abuse shortcomings of these tools if you know of already but questions you choose some cold and others as part of the of of specific after you sequence just she an issue that and make only the crash was similarly if we get some process on which is relying on of fields of the piano memory because easily confuse them adjustment when that the students at unusable the now approach that's very popular as just to detect environment a problem once and then checked for you running an application that in an environment at yeah it gives us information about the presence of a debugger so there's some capability by the operating system for example the being the butt of bit and P. being which has been set of really different debuggers attached summary there are more known other tricks you can abuse awesome operating system notes which allows to somehow detect the presence of a debugger and then just about execution and on here too just prevent any reverse-engineer the fence however the book techniques have the drawback that the 1st once you know how they work they easily fixed you could just picture to you all you can just circumvent dividing tricks for just supplying the code this also if you
I go to google and just social game the sum starting but detected you get over 6 million people complaining to the Canada on the latest report a game because the anti deepening technique that you might have which might have worked reliably on 106 doesn't really hold up on Windows 7 and issues of false positive so benign customers can really use the game can do to play the game and because the devil dignity by their detection was faulty so that up some requirements we
need for the code of the station and apartments that we want to call to beat Semantics-Preserving on so development effect of protecting the application should the changes of social behavior so we don't do not want the game to break only because we want to protect it the 2nd point we want to avoid external dependencies released in the context of what discussion here so there there's a chance to outsource data like on the internet on internet so on a news feed on or on other of separate media but so we're most interested in techniques the protect the code on it so that it white box a Texan area where the attacker has everything he needs to attack the application of his or her hand and finally the most important point probably is that we want to employ techniques that easier or way easier to deploy for us and for the reverse engineer to text so the anti-debugging tricks we've seen and also the shortcomings of the tools which we can abuse they more must afford what 1 wants to know how it works you can easily and by process but we want to employ techniques that are easy to fast but they're very hard to detect but by other parties talking to
court of station techniques from 1 technique that's been used in commercial protection engines is what we need this more known as scope of predicates so consider this but rather uses CFG on the on on the left side to that some the application with a linear control flow graph if we not insert what is known as part predicates the problem looks like a vastly more complex so that more branches and control for which we have to check and we have no clue that the underlying problem in fact it's very simple let's assume in on 1 of those case OK so
with a bit of code with the true function a false planted for planting 2 different basic block so for just looking at this we might come to the conclusion OK wouldn't know which 1 just going to be taking right so on however it turns out that this is what a common apart from predicate predicates so in this case the true function is always taken regardless of the behavior here in the predicate so was is that medical part predicate on top of this is the whole block which is just constructing predicate based on an apartment like that which on value of the API call gift from process and if a work with a velocity PPI before you might know that get from process always returns a constant value namely minus 1 so but just issuing from this predicate we make sure that we always take the left punch the right branch my pointed that caudal to other point to confuse around get confused to reverse engineer that would I dreamt I will restrict the left branch similarly they're all platforms for the carrots which just invert this condition and full of warts punch and there's a flavor called random apart predicate in which branch upon random value this means that again potentially at runtime formal both conscious but so the challenge here is that we have to make sure that both blocks from both past that follow has to be semantically equivalent because the problem no problem breaks otherwise however it is also a challenge to ensure that it's not easy and detected by the attacker that both quantify infect semantically equivalent OK a captain apply predicates obviously they increase the complexity of the application so they can be built on top problems but what is most important is that they force the analysis and the analyst to encode additional knowledge that you the the analyst and so has at what he has to be just extend tool to know maybe about the when API no but if you instructions noble automatic identities or some other heart problems that we all have to encode this analysis tool to provide a reasonable results so there are 2 so static however around if you know what to only look at concrete execution traces so just let the multiplication and look at concrete values we note that apart predicates essentially sold for free for us so this is good to keep in mind a number of other
interesting technique that has been cause of the forward and I guess every major copy-protection out there I what to call the from she's so take for
example this native left some on some nuclear and from x 86 cult and let's just assume the nucleus a of precious intellectual property we want to protect the the obviously we can on use of common tools like the sun just like I only back to look at this code and reason about this code and get to know what is called dust so the idea here is to replace the called on the left with something call contours cannot understand so what you doing which is getting creative and we just making up an entire instruction set that's is seen on the right so the of the proper instructions that I mean not known and in this way by any the architecture I the full with the new registers with you and colleagues and so on that involve the latest court and the new instruction codes are semantically equivalent to get what an audience just replace the native quote by a call to what a common little machine on like also known as which is basically a CPU and softer which just on yeah that's you run the imaginary architecture we've thought about so if we now try to take tools like I already developed or some other to just analyze this cold it's a really viable anymore because we don't on because they don't know about made of architecture but is still of to make sure that the transition from native code for vision assumption not you go seamlessly into this effective to look at the
components so that we call components to of 4 machine so many that the invention exit and they what they do is just a sample from the context switch from native to virtual context and back so that entry just copies the native context say rugosus inflexibly native architecture to the visual context and the exits corpus and back to the native context usually the mapping of nature to which a basis of one-to-one which makes it a bit easier then traditional fetch decode execute look like in a traditional CPU and what it does suggest fetches and decoders 1 instruction and for what's the from section point according that looks up on the task handler on which defines the version and what you call on the table here on the right and just invokes this handler and and the goes on to actually execute the instructions so as a scene that the table is just a table function ponderous indexed by of called and if it is 1 hand provisions such and channel of them just to coach the operands operates on them and just updates of you in context accordingly OK so this
is an example of here on obfuscated on version of a popular the amateur station and not able to make of these components here so on top of we
see the VM entry solutions what will come from the native quote according to the interpreter and he would just switching the context to the visual context when it is initialized individual context then we at the VM destruction and which just looks up the coalition structure and the anchor table which in turn points within individual and class you see here below so that progress has to be sent and enter them up they do little context and branch back to the the destruction eventually the end up at the via Mexicana which has 2 forms a context switch from special context vector the major OK taking a closer look on
in our society we have divisions such a point and I would be with that the context so we see that structure a DOS is just taking 1 byte of memory increasing the instruction pointer and looking up the corresponding handler and jumping to and much of your wanted a senior and again see just reached out of the visual context perform some semantics and then writes when his back into the context and finally just stick to the dispatcher to execute the next which such OK so this is rather simple and easily understandable how do we harm this
whole cont the concept for want of
obviously as assumed down simple there are only a few instructions and they're easily understood what it can do here is justified traditional code of spatial transformations to make analysis of Denmark like substituting operations in certain about predicates and setting GenCode and so on so on the right you see some slightly more complex on assembly listing them before another technique
imagine you only got 4 and this innovation such as that want to make it more complicated more work for the reverse engineer and what they're doing is
just not deprecating indivisible and less to Greece were called for the attacker
soul because the tables is index using 1 single byte would get up to 256 entries we can properly the so what it and doing just topic at existing and to populate the full table and then again use traditional codes of efficient things to in the detector from and from easily finding out that 2 and the side-effect of similar and again here we want to increase the worker to detector was normalized after 2 and half 6 differently of obfuscated versions of the handlers another technique much
recognized me if you get the simplest especially and multiple handlers which all branch back to the the dispatcher which executes the next instruction what you can do here's to on the discourse some of what a failure and just in 90 dispatcher at the end of the you chunk so what happens here is that we don't have a central dispatcher but every other branches to the next explosion summary
on the we in the destruction to Chandra because essentially and pressure just allows an attacker more easily observe what happens inside the VM it's like recording every and that has been executed the and the thought technique we can also get rid of amplitude lecture because the expressive entertaining to is reviews all the amount of in audio the because of the evidence and it points to every single start of each handler which can then do instead of screwing the explicit 100 table with a thin line around memory in which a or the next hand addressed in the game of such such itself so this might look a bit like
this you get the approach the and and then somewhere in the encoding encoding the next and addressed but this has the effect
that is essentially has of the the starting location of the that has not been executed yet with then the table is you see where every underlies intend to forsaken analyzes but spoke with these indirect and
going on we can we can only observe those handlers that have been occurring in a concrete execution trace talking about an
exponent of matrix which is another obfuscation technique on not so widely used in modern of efficient yet but to give you an idea how it works just look at this instruction
in all that is true so it looks a bit involved and you wanna be surprised when it earlier that there's a simpler version of this expression which is a simple addition of the various experts 1 the you my guess that even for this more this again of the variance especially here it's the addition of X plus Y plus Z OK you might believe me that it's easy if you follow the 1st term and the 2nd term in your favor you solve and you have to prove equivalence of what's interesting how do we get to the 2nd term if you only have to 1st the so you can try to magnified wooden identities were known from school or something maybe the metadata entities women even go so far as to draw conic Richmond the however on it becomes evident that this is really help us here in fact what we what we're using here is of the concept of the ball in the metric and about which gives us a tone of different operations that are in this side of like born in and variety of boolean operators who who comparisons and have no arithmetic operators this was not in the opinion of food and also includes the so that the next would have metric also the Boolean and rock and into model and it is worth noting that mole techniques exist to simply want to simplify expressions that contain both when to the reduced expressions and going and in the into the in but there's no underlying fear with that helps us in easily attract attention both at the same time OK onto the office creation and
every discussion screen or in in evident discretion to we've seen we get to the point where we employ a technique at some the at some point which quotas symbolic
execution so consider them badly on the left this is known as we've seen earlier on and we now want to reason about the center so what in doing this we executed but not what conquered values but with symbolic values this looks like this so we just executing this move symbolically with sound to us takes the value of the number so on indexed by the way register of key we can continue to do so In the 2nd assignment also if we better not operation on express assigned to our sakes the negation of our 6 a which is the same as the negation of the members so appointed by the Registrar of and along and with gets interesting in this case where we get our on logical and velocity of the x which is the same like the negation of the members so appointed well became but with the logical and and the negation of the other members of political symbolic execution is basically a computer algebra system and coats someone on identities are from the Goodwin Model for example in this case it knows that of this expression it's equivalent to a norm of support among members of we can just continue with executions and again here the symbolic execution engine recognizes that makes no at some point this
which the well and we see that the core semantics of our and let here is just an operation on tumor cells and stronger result in another Member so so apparently this works fine for the center but what can we do so what is the result of if we try to so this whole concept of symbolic execution that more of a skater come His rather
simple example on of not just get 100 might be a limit to make at some parts but let's just talks symbolic execution it we see
that a ton of information here but the problem is we don't we wanted to of information would want us just the underlying semantic which is rather simple it's the operation of 2 members and the assignment to another members the much of a stretch of true for the search on exponent of matrix so we have this rather complicated mixed
with fashion and just simply compile it into the a problem just like the violin and program and then run some of its execution on this trace
what I get the semantic and it didn't really fit on the slide so goes on and on model to make here this the resulting values in our our X and but this super complicated expression and again what about you as underlying semantic is rather simple so do not want to have this complicated expression just want this more simple semantic so symbolic
execution on is nice because it allows us to capture the full semantics of the code we execute also a computer a system and as we've seen in more examples lost some degree of simplification of the intermediate expressions however the usability decreases from to some point of this syntactic complexity of the underlying code increases for example we can introduce artificial complexity approaches substituting instructions or just the use a property case all of those schemes with no idea all can also increase the increase the other bird complexity by employing techniques like an exponent of magic expressions OK and so on it's obvious that with the problem with syntactic complexity and you want to handle this summer so the interesting question what if you could reason what the semantics of the code on the itself from having to find what the syntax just improving our tools to be able to cope with more complicated syntax this leads us to the topic of problems
in which from to model yes thank you so we have seen some limitations off syntactic complexity so that it comes to my mind of of obfuscation semantics link that means it has the same height all behavior input output behavior so why not just using a function as a
black box and observe what it's doing so for instance we might so this just generate some kind of inputs 1 1 1 end of lost sleep and we know that down and we do this it would have small so based on this 1 and this 1 and 20 of us again so and then we don't look at the code at all we just look at this site will behave this young and then what we learn is that in lieu of a function that has the same behavior so we might learn X plus Y plus that and the goal of programs and this is to automatically on these things based on I all samples so how do we do that
so we use in the political
approach basically we have an optimization problem we have so software is something like that and each and this this thing has some of these points somehow that point and we have a global maxima top there and the global maximum is a program that has exactly the same I O behavior as all black box and we have a complete value for each point on the surface so the closer we now too little global maxima the higher is all school or and in public is the optimization problem we basically start with the uh with the might also want and just these until we find the global maxima and how we do that is an algorithm that is based on Monte Carlo the thought they want to kind the that is 1 of the main reasons why the eyes had gold last year I was able to utilize class human go play us 0 in Computer go so
let's get quite and less justifies a plus B model weights we 1st somehow whole have to define a poet and a holder
well we find a grammar we have a nonterminal symbols you non-tonal means that he just can replace it with other symbols so for instance the place you buy you plus a lot like plastic you cut you off by u times you off by a bot be we say that a and B L input variables we cannot life them any farther so that's what we have a candidate program something as a B 8 times b a + B B + B and so on content a intermediate program is a program that contains at least 1 you so we can divide it far far so what is that as the foundation we do want to kind of the such we started an entity that has just the wood as you know you and then we then then we apply the rules of oligomer so we did I say 8 in this case is the term the program so it cannot be denied other so we can give us got that because we in the ballistics of phase we have to use gothic and so how we calculate this got it doesn't matter now we come to that later but just to give it a score and then we divide the next thing being and also that we if it does the score and this time we devise new types you so we cannot do it at school on since we have intermediate program what do we do and we do something that is quite and play out we apply the rules of the cannot randomly to the and the place you until we get a tiny program that can emulate so we have something 80 s a plus they have be plus a and you just got the and then we do the same with you pass through he again we have intermediate program we devise something score light and go back now we don't have any further goes to apply what you then do is you choose the best side note in a political manner In this case it if you have no and that do the same thing again we devise some expression will play out here at the school or go back now each school basically the represents that about its score of quality to the scores so in this case we have we had new gone yeah so we just update this 1 and we go back a game shows the best note we do the iteration step and so on go back updates choose the best note playout school of updates and so on in so now here you pass a we have a really high school you have be we have the another low school and he wanted to the size a plus b this means that you must be had to bat play out in that you pass they had a good play out because you got better so we update this and go back so we again to this note go back go back so as you can see we always have explored this area more often that the reason is well because you pass using the medium wave more promising than who times however and we just have some impolitic behavior that means we sometimes explores the different way is just to get to mold if we might be something so in this case we would know only to who plus a again but we just decided to do that we do a play out we give it a go off into whether it was really good so because of that the next step we go back to you plus a then we devise B plus a and so we have the final poll we cannot delighted any farther so we can god I let directly OK we give this golf 1 why 1 well because B plus 8 has exactly the same I O behavior as a plus B so in other terms we have
finished oppilative parts OK
how with the least cloth some values hold we calculate something basically what we do is we generated and play out we then had input and clearly all black box and observed output so 2 plus 2 is 4 and now we use the same input and really all of its immediate hold on and observe the output then we calculate the similarity of these tools and get a school and the lady I will come to the effect of this on the next slide filter dust we compare somehow the celebrity we don't do this only for 1 input pay review which was for many so we do this we do would for this 1 in this case we have the same output that means the similarity is 1 because this is the saving and it would have that have long and finally I go off of this note is basically the average score of all these similarities the how do we calculate the similarity well we are operating in the bed
and the bit vector space so we compare the stability of bit vector and in other terms we can compare out if they have the same so we have a look at the tailings feels all ones and at the leading the was all what and columns hold many of the same then we can still compare our whole many bits are different this is the Hamming distance we use that for instance of something s all the flaws in terms of addition all our X operations and and operations it like that and then we still have a plus B or something like that without all of law we can where we can have a look at how close up to that value of medical E so basically the distance it and you'll see these may take To tackle these diffident behavior of of the vector space such as all of the moles of the fact that and we take again the average of all of this matrix OK how do we use that as the
caller to synthesize obfuscated by the cold so basically
what we do is that we have an execution paths perhaps obtained from an infection phase perhaps something together whatever you want and we somehow immolate its end up loft in that the inputs and outputs we saying that everything is an input which we eat before like to it so in this case the 2 memory so pubs they are and everything is output in the vector that we uh that you like last week to so for instance the obviates is not any is not modified anywhere so this is the output the same 5 thinks and think that that and then repeat this big game as the black box gender date random input and heated into all black box and observe the output in this case of EECS so we do this the many times so something about 20 or 30 times and then synthesized this output and then we learn that obviously is not in the viewable on and the always here in this case memory so
and we do the same for the class and
to learn that this is basically than all of these operations and finally we do it for this 1 and also that we
learn that this is the more to control what we learn the semantics of the true loss and the negation pops they what he don't learn this Hendler is the push F applications deflect based and reflects that of the back into a with those pointing at IBP why don't we learn that because we don't care so basically we designed that the designed organize them and such a way that it's just at the next higher level semantics ignore us the ignore at level semantics and just expect higher level semantics on other times we don't consider flexes any sort of input OK we do not know something about a cold what is if there's a planet that may be conditional and in an input well we might together some different behavior such that there were what decides to do to consider the past was what we do he again is we just ignore the flag would be foster execution to go the path that we have lost so if you want to go to a b just lost a if we want to go to be the Fossett to go to be so how do we implement something like this basically you have a lot of different opportunities what you need is some code base that you can somehow execute code for instance uh immolated S and box on unique on engine all you can with this then they make the instrumentation S P in ordinal all what you also can do with um you can just conflated intermediate language the body he executed and every 8 just the the body expressions uh while feeding it with complete input but normally this is much much slower because especially for last Wednesday that if you thought but the Internet as possible what we did
is that the implemented everything in our favor that he puts in and Cayley isn't yet the pulled out of his name but for codification it is that in Python it profound than the biosensing flies in the court basically when a car engine and it also implements entity s of the multiparity parity such based programs that is as the call will be public published is undoubtedly the there's pool and this is the link layer can get its so we will just
do a quick demo the the so 1st I would
show something about the do this itself and basically we defined in a wide all back that is the function that we the black box that use all them think of what we want just this that the size x plus X plus Y plus y so we just it the the and what you can see young it is that the status different nodes that we try out different things and always assign high
and the what this and you also see all in this case it's somehow learn that there must be a lot of addition and at some point we get something like that
so this is basically the 1 of the 1 and we 2 and the tool so this is the simplest form
but it is simple enough such that we can easily simplify it to something like
2 times the import last time other so of we can do that another time and we might we might use a completely different paths because it's all ballistic perhaps 1 play out of better than the other on the tools and the past it might take up to 5 seconds up to 1 minute so it basically depends how all of the other quality of or input how we choose the path and things like that but in the end we will mainly it's all goal if not we just start again so the first one it took nearly 20 seconds no it 54 seconds and you see now we have a
much smaller less obvious catered lodging and element simplified form so so OK how do we use
it difficult the fall to synthesize obfuscated code OK so let's have a look it is thoughts this the
this is an obfuscated Dutton pumps that this is the plane told basically it takes 5 imports just performs an additional and him with application and the times the value note this function that means our final value of the time value will be starting the x that and then the obfuscated this spyglass in MBA is when we get something like that of the lipid locked the
so and then compile that's and get them to get some assembly listing the the
that they the Cayley is about the 760 light cell or we don't symbolic executed TIA because uh it would be
really really a matter of time instead we just not what you do is you just than than simply input so we give it by basically caught file this is or inspection phase we divide the IT picture the read too well let's take 28 and my all samples and we'd like to output file so you see that we only less than 5 Sec so if you look at the files what we
see is that we have different output yeah this is 1 output is not output we basically have 17 18 outputs because columns . coding of the all we are only interested in the DAX so you just the device that assuming that and you see we have the 5 memory input color palette of function and also some biggest of that but he did as input so OK so what you then do
is we just defined In our solution into this this is the the part about
entity s and this leads basically user input size from the flock and yeah I noted the whole because the only 1 to synthesize the fast output now so we
just started the
and it basically takes the something file as input and the output file and this again this might take
between seconds and 1 min. so it
depends so we had a we can have a look at that you see
uh this is the output the top
nonterminal is contact them some some some integer some
integer that have not been further the the
completed the variable and the best top terminals so the best program that is cannot be the life in the father is a memory so the times the memory so class and memory so and so this is on all final expression basically we learned that
the whole of the semantics is the thrust level at the at times the 2nd memo that have a supplement we need plus the 5th penalty so OK the coming back to
all tonight how can we
use that to build a little machine sophistication and and preaching mind on the goal of the and EM-based of his case she is lost to introduce them and you see people being that you don't can analyze believes it would you have to manually vast diversity of the hand lever things like that and well 1 simple thing we can do with that is just learned the semantics of a MIT can last such as the semantic this and at 1 off or something like that so long as discussed for the for the patterning
techniques and we will talk about how we can like all of these so all the fast 1 was obvious skating earning it obfuscating the interview for hand left more complex the station you then just duplicate had and somehow informed them that they don't look the same then we don't have notinclude the in this picture are so we don't have an FTU LBP instead of 10 laughter in lines and complete of the calculation of the next 10 lot and then we don't have any x 10 let table OK what you see here is in hand laughter might up it looks quite more complex and you don't really easily see was a lot with symbolic execution what this is doing well if the act also if you if you have plenty of status was the semantic is that it is an addition of the thought he the part of what we eat and it is starting to and 64 bits of value OK so if this this problem the other problem and duplicated the mn last assume we have a had like table 0 well we can see each had all we have inspection phase will be no way have less patents so given what we know where and that's our so how can we learn duplicated the M had last well at least we learn a simple semantics you can learn based on argument is it's an at it is explore artifice and stop with some shift left thinks like that and of course we could find duplicates for the because the learned semantics that if they are the same a lot OK so coming back to our party had and no sense to me in this picture I have a look at the end of the headline that's the winning basically as this plastic coated bus there out is the cold to calculate and extend lattice so we don't do this calculation is a little bit too complicated for others the perhaps so it doesn't include all semantic approach because we still don't have semantic of the handler and since the magic of the mixed at best calculation is too complex we just don't like it and 1 other thing have a look at the job of 1 of and and take that is advantageous it depends how you see it off inlining something like that is that we have a hospital latest data of how we can find the end of the and and last basically at the end of the tunnel with John obviates so particularly if the split inspection place it in that neck can put food places could put food on the we submitted into the separated him and last this
comes out of this helps us fall or fall off part we don't have an expert had let table so we don't know by steady and all of those we have the where 10 lies so what you can do is we just execute it and the of the of the detection place and then we have something like that we just then applies that were on the bus we did everything we we observe that the polar flow and we
get something like that and as it points out each Palau is a difficult handler the so and then became OK can take the simplest thing the components and just the size of the own and in this case we automatically of learns latch lab so portions of a lot of things of inspection face and you haven't it in semantic is the set of all the for the you he takes up the what it that way so that we can save you a lot of things just
kind of your own we also really start cold and furthermore released all of our sample so just to the to you that and played around
so to compute we talked about obfuscation techniques mainly OPEC dedicates the end of this case gives the patterning and make we added at the end of course they all can be used together and they all and maybe this is on me the same is the field if we work on devastation on section place level so then you will be discussed about symbolic execution and for syntactic the amplification of the thought that it's really related but it has some blob X if it is syntactically too complex so on the other hand we ask us then and what about not looking at the code level but instead looking at the semantic level using the code as a black box to obtain I all samples and learn something different so this is intervals and uh finally mind that site everything you know like it's thank you very much few and this is unique in the problem we now have about 10 minutes you name and you know the drill that you might be in the middle and went to the land use and how all those things in you and me and OK 1 and having questions from the audience here yeah 1 kind of and that these all right there's always 1 of these tried to silence is I was looking at the ceremony that I was wondering there the wind takes a look at these obfuscation techniques would it be possible for a close stores operating system to adjust prevents any debugging of cold so saying we could include the code with some key that only the operating system sites assuming only show to a protected-memory area well what you might do it is and that you execute it uh inside of the machine or something like that assume you can do that and you can take memory snapshot memorably played by the different points of execution which you can then do is it you know somewhere I was in combination with the wasn't anything that at that at this memory area now lies in the obfuscated code of it it could become difficult because you know of you know it's just what cooked it because it is used for our execution also you can take the mentally snapshot and that's the cold and then feed it into something like that on the true here also to up on this is also the mean yeah a non-technical argument to make you because all the world like that element with the size of you should also so cold this is like a summons and because to have half guarantees that you might want to have like some ageism and what some of our hardware mechanism to enforce this which might not always be feasible like imagine having a corporate environment where you need to supply you speed on something which 1 of of you which might be in violation of some corporate policy so with looking only at things we can do in white box a Texan and where the attackers everything that he needs to analyze the code which is the most of fear of non-restrictive scenario the and I'm I would like to ask these techniques also seem to be very useful for code optimization in compilers and hats and research being done in this direction yeah a lot of these so basically this 1 big politics is called still this is on the Stanford guys they have they do still has a columns into this for the balkanization and this was also 1 of the uh is the basis for us to use and so Hassocks's into the support thank you do you have a question on land on some it came back and then you go and so if I understood you can read correctly then you've focused primarily on functions or on their expressions so can't use or consistently use techniques often numerically well as it might be possible we haven't looked into that but uh we also you may you look at that because you we needed some uh some and leave something to emulate or poets so the design argument to you learn something like that but we are free to choose any more complex come on line much more I did expressions so this was just obvious that that we wanted to what the optic cables the easy from the the magic level have but it is the reported as much more powerful that thank you am I on the internet is also still way and we have a question from the signal and you please signal into yes you have you tried to synthesizing intermediate representations and know this isn't another possible because of all of those and other also did some work on that and there's no reason why you cannot do this with that approach but we basically it depends mainly on the gun out what you want to synthesize and unlike O S so if you would be like the gonna that way that you want to synthesize some I ask you can do it the because willing surpassed I think we still have time for 1 question number 2 in the back and I just with the basis uh you know you've been using kind of fancy words but in the end I think what you're doing is essentially an optimized search for an expression that looks like gives the same semantics yeah and now I'm wondering what what is it that makes you believe that this assists faster than just randomly trying expressions so so why do you think that the child nodes in the just tree or a some indications are somehow better if the metric of the burners better well basically we have really lattice subspace so if you would fall the and last if you have just a plus B this is rather simpler semantics but normally we have something about 20 to 50 imports and civil outputs and we have not only just in which indication something like that we have learned much much like our thoughts space so we have 8 bit input variables the of 64 to improve any others we have downcast outcome you extend the whole and again I have which is left to have that matrix is and explore and also model and division and things like that so we have a lot of components such that you need so that form of what's workplace that you something like that you are probably fast stuff but as uh if you take a path people thought of plus the you're not that nothing more so you you have not that space and if we so if you use some kind of of guided learning this helps a a lot Fan western like 1 but uh so it's looks to me a very interesting work but my only question is what if the the the function you're trying to OK maybe I mean what if this not function does not respect would be the d d to complicated code is doing what the for instance it's not the case that similarity increases the more you get closer and some players in these shorter and shorter terms of Newbury Curacies them us by making the assumptions of your similarity function well of course and of course if you just make it semantically harder for instance there it won't work at some level anymore but a and so what you can do for instance this is you apply some form of local a collection or something like that this might like the like for sure but the 1 of the observations we made is that is is the standard yet so before the really interesting effect and is also the 1 of the main things as opposed to the boundaries of waste that way you end with your we know that you analyze and there can also be in this case we have a look on the headline level but there's and there's no reason why not to combine simple had level split into and less than half of things like that so that might also be ways by choosing we know qualities protective this again thanks but gave us in Indiana tendency can encourage the use next to the states and when when they leave and again thank you very much for the interesting talk and listen to a few sh B 2 be
that he was he was the home team but it could change the type at
Binärcode
Computersicherheit
Adressraum
Befehl <Informatik>
Twitter <Softwareplattform>
Information
Binärcode
Arithmetischer Ausdruck
Code
Client
Optimierung
Programmierumgebung
Aggregatzustand
Touchscreen
Prinzip der gleichmäßigen Beschränktheit
Binärcode
Formale Semantik
Kontrollstruktur
Computersicherheit
Dezimalbruch
t-Test
Versionsverwaltung
p-Block
Code
Systemprogrammierung
Verbandstheorie
Maschinencode
Mereologie
Computersicherheit
Logiksynthese
Projektive Ebene
Optimierung
Distributionstheorie
Digital Rights Management
Code
Demoszene <Programmierung>
Algorithmus
Spieltheorie
Software
Reverse Engineering
Arbeitsplatzcomputer
Digital Rights Management
Analysis
Drucksondierung
Soundverarbeitung
Kategorie <Mathematik>
Reverse Engineering
Anwendungsspezifischer Prozessor
Wurm <Informatik>
Kontextbezogenes System
Elektronische Unterschrift
Office-Paket
Software
Computerschach
Minimalgrad
Twitter <Softwareplattform>
Ordnung <Mathematik>
Bit
Folge <Mathematik>
Prozess <Physik>
Gewichtete Summe
Ausnahmebehandlung
Systemzusammenbruch
Parser
ROM <Informatik>
Code
Spezialrechner
Message-Passing
Software
Reverse Engineering
Spieltheorie
Netzbetriebssystem
Bildschirmfenster
Fehlermeldung
Prozess <Informatik>
Spieltheorie
Debugging
Videokonferenz
Datenfeld
Festspeicher
Mereologie
Debugging
Wort <Informatik>
Information
Programmierumgebung
Verkehrsinformation
Formale Semantik
Punkt
Prozess <Physik>
Quader
Mathematisierung
Kartesische Koordinaten
Code
Internetworking
Spieltheorie
Reverse Engineering
Code
Maschinencode
Arbeitsplatzcomputer
Softwareentwickler
Soundverarbeitung
Trennungsaxiom
Prädikat <Logik>
Verzweigendes Programm
Kontextbezogenes System
Linearisierung
Prädikat <Logik>
Flächeninhalt
Kontrollflussdiagramm
Hypermedia
Mereologie
Gamecontroller
Geschwindigkeit
Resultante
Bit
Punkt
Prozess <Physik>
Lochstreifen
Kraft
Kolmogorov-Komplexität
Kartesische Koordinaten
Komplex <Algebra>
Systemplattform
Code
Multiplikation
Zufallszahlen
Reverse Engineering
Code
Maschinencode
Nichtunterscheidbarkeit
Randomisierung
Kontrollstruktur
Analysis
Prädikat <Logik>
Lineares Funktional
Äquivalenzklasse
Physikalischer Effekt
Verzweigendes Programm
Systemaufruf
Rechenzeit
p-Block
Aliasing
Zeiger <Informatik>
Prädikat <Logik>
Konditionszahl
Mereologie
p-Block
Ablaufverfolgung
Formale Semantik
Maschinencode
Bit
Punkt
Befehlscode
Natürliche Zahl
Gruppenoperation
Versionsverwaltung
Zentraleinheit
Kontextbezogenes System
Code
Task
Demoszene <Programmierung>
Komponente <Software>
Virtuelle Maschine
Fahne <Mathematik>
Stichprobenumfang
Virtuelle Realität
Speicherabzug
Zusammenhängender Graph
Maschinelles Sehen
Tabelle <Informatik>
Lineares Funktional
Kategorie <Mathematik>
Befehlscode
Systemaufruf
Zeiger <Informatik>
Kontextbezogenes System
Kern <Mathematik>
Mapping <Computergraphik>
System F
Funktion <Mathematik>
Loop
Rechter Winkel
Garbentheorie
Computerarchitektur
Tabelle <Informatik>
Interpretierer
Punkt
Klasse <Mathematik>
Verzweigendes Programm
Versionsverwaltung
Vektorraum
Kontextbezogenes System
Koalitionstheorie
Bildschirmmaske
Arithmetische Folge
Arbeitsplatzcomputer
Zusammenhängender Graph
Datenstruktur
Tabelle <Informatik>
Prädikat <Logik>
Nichtlinearer Operator
Punkt
Assembler
Transformation <Mathematik>
Virtuelle Maschine
Transformation <Mathematik>
Zeiger <Informatik>
Kontextbezogenes System
Kontextbezogenes System
Division
Code
Formale Semantik
Komponente <Software>
Prädikat <Logik>
Rechter Winkel
Code
Festspeicher
Maschinencode
Virtuelle Realität
Datenstruktur
Zeiger <Informatik>
Analysis
Maschinencode
Automatische Indexierung
Reverse Engineering
Code
Reverse Engineering
Machsches Prinzip
Versionsverwaltung
Graphiktablett
Ähnlichkeitsgeometrie
Beanspruchung
Ähnlichkeitsgeometrie
Tabelle <Informatik>
Druckverlauf
Adressraum
Festspeicher
Machsches Prinzip
Verzweigendes Programm
Gerade
Tabelle <Informatik>
Matrizenrechnung
Decodierung
Algebraisches Modell
Versionsverwaltung
Regulärer Ausdruck
Identitätsverwaltung
Äquivalenzklasse
Term
Metadaten
Arithmetischer Ausdruck
Informationsmodellierung
Ganze Zahl
Modul <Datentyp>
Adressraum
Maschinencode
Nichtunterscheidbarkeit
Varianz
Differenzenrechnung
Soundverarbeitung
Expertensystem
Nichtlinearer Operator
Addition
Exponent
Mixed Reality
Boolesche Algebra
Paarvergleich
Office-Paket
Beweistheorie
Boolesche Algebra
URL
Tabelle <Informatik>
Varietät <Mathematik>
Geschwindigkeit
Arithmetischer Ausdruck
Negative Zahl
Punkt
Nichtunterscheidbarkeit
Soundverarbeitung
Zahlenbereich
Symboltabelle
Physikalisches System
Computeralgebra
Diskrete Gruppe
Normalvektor
Schlüsselverwaltung
Touchscreen
Resultante
Nichtlinearer Operator
Matrizenrechnung
Exponent
Virtuelle Maschine
Wiederkehrender Zustand
Symboltabelle
Lie-Gruppe
Formale Semantik
Einheit <Mathematik>
Fahne <Mathematik>
Mereologie
Virtuelle Realität
Minimalgrad
Inverser Limes
Speicherabzug
Information
Formale Semantik
Punkt
Algebraisches Modell
Information-Retrieval-System
Regulärer Ausdruck
Computer
Kolmogorov-Komplexität
Benutzerfreundlichkeit
Computer
Komplex <Algebra>
Computermusik
Code
Iteriertes Funktionensystem
Formale Semantik
Physikalisches System
Arithmetischer Ausdruck
Informationsmodellierung
Mailing-Liste
Code
Optimierung
Gravitationsgesetz
Neuronales Netz
Exponent
Kategorie <Mathematik>
Benutzerfreundlichkeit
Konvexe Hülle
Mixed Reality
Nummerung
Vorzeichen <Mathematik>
Rechenschieber
Algebraische Zahl
Minimalgrad
Einheit <Mathematik>
Programm
Lineares Funktional
Web Site
Funktion <Mathematik>
Stichprobenumfang
Logiksynthese
Inverser Limes
Optimierung
Ein-Ausgabe
Komplex <Algebra>
Code
Instantiierung
Funktion <Mathematik>
Formale Semantik
Stochastik
Programm
Punkt
Extrempunkt
Blackbox
Klasse <Mathematik>
Optimierungsproblem
Globale Optimierung
Computer
Extrempunkt
Netzwerktopologie
Algorithmus
Flächentheorie
Software
Logiksynthese
Optimierung
Gewicht <Mathematik>
Wellenlehre
Formale Grammatik
Iteration
Symboltabelle
Term
Chatbot
Arithmetischer Ausdruck
Informationsmodellierung
Variable
Spieltheorie
Datentyp
Inhalt <Mathematik>
Optimierung
Phasenumwandlung
Programm
Videospiel
Symboltabelle
Schlussregel
Ein-Ausgabe
Variable
Netzwerktopologie
Arithmetisches Mittel
Flächeninhalt
Ein-Ausgabe
Instantiierung
Netzwerktopologie
Rechenschieber
Soundverarbeitung
Blackbox
Mereologie
Luenberger-Beobachter
Ähnlichkeitsgeometrie
Ein-Ausgabe
Ähnlichkeitsgeometrie
Rechenbuch
Funktion <Mathematik>
Matrizenrechnung
Nichtlinearer Operator
Addition
Stabilitätstheorie <Logik>
Bit
Vektorraum
Gesetz <Physik>
Term
Ähnlichkeitsgeometrie
Spannweite <Stochastik>
Näherungsverfahren
Funktion <Mathematik>
Mittelwert
Code
Hamming-Abstand
Abstand
Steuerwerk
Instantiierung
Prinzip der gleichmäßigen Beschränktheit
Formale Semantik
Geschlecht <Mathematik>
Spieltheorie
Blackbox
Festspeicher
Maschinencode
Soundverarbeitung
Randomisierung
Vektorraum
Ein-Ausgabe
Phasenumwandlung
Instantiierung
Funktion <Mathematik>
Nichtlinearer Operator
Formale Semantik
Subtraktion
Einfügungsdämpfung
Quader
Eindeutigkeit
Soundverarbeitung
Zwischensprache
Kartesische Koordinaten
Ein-Ausgabe
Quick-Sort
Code
Formale Semantik
Übergang
Negative Zahl
Arithmetischer Ausdruck
Fahne <Mathematik>
Maschinencode
Instantiierung
Lineares Funktional
Subtraktion
Knotenmenge
Demo <Programm>
Framework <Informatik>
Gerade Zahl
Code
Blackbox
Logiksynthese
Systemaufruf
Binder <Informatik>
Optimierung
Demo <Programm>
Addition
Subtraktion
Bildschirmmaske
Iteration
Punkt
Programm/Quellcode
Zwei
Ein-Ausgabe
Ebene
Lineares Funktional
Bildschirmmaske
Programm/Quellcode
Vorzeichen <Mathematik>
Kartesische Koordinaten
Element <Mathematik>
Code
Chord <Kommunikationsprotokoll>
Assembler
Programm/Quellcode
Kommunikationsdesign
Zellularer Automat
Vorzeichen <Mathematik>
Ein-Ausgabe
Elektronische Publikation
Datumsgrenze
Stichprobenumfang
Ruhmasse
Phasenumwandlung
Funktion <Mathematik>
Lesen <Datenverarbeitung>
Spannweite <Stochastik>
Lineares Funktional
Subtraktion
Maschinencode
Einheit <Mathematik>
Festspeicher
Programm/Quellcode
Schar <Mathematik>
Kantenfärbung
Tablet PC
Ein-Ausgabe
Lie-Gruppe
Funktion <Mathematik>
Videospiel
Programm/Quellcode
Zwei
Klasse <Mathematik>
Elektronische Publikation
Ein-Ausgabe
Arithmetischer Ausdruck
Funktion <Mathematik>
Fermatsche Vermutung
Ganze Zahl
Festspeicher
Radikal <Mathematik>
Optimierung
Funktion <Mathematik>
Binärcode
Kontrollstruktur
Virtuelle Maschine
Formale Semantik
Übergang
Virtuelle Maschine
Systemprogrammierung
Mustersprache
Maschinencode
Logiksynthese
Computersicherheit
Virtuelle Realität
Unordnung
Bit
Virtuelle Maschine
IRIS-T
NP-hartes Problem
Benutzerfreundlichkeit
E-Mail
Formale Semantik
Komponente <Software>
Prozess <Informatik>
Arbeitsplatzcomputer
Virtuelle Realität
Gerade
Phasenumwandlung
Verschiebungsoperator
Tabelle <Informatik>
Binärdaten
Addition
Parametersystem
Expertensystem
Vervollständigung <Mathematik>
Symboltabelle
Programmierumgebung
Rechnen
Datenfluss
Verbandstheorie
Heegaard-Zerlegung
Mereologie
Ruhmasse
Bus <Informatik>
Persönliche Identifikationsnummer
Tabelle <Informatik>
Tabelle <Informatik>
Menge
Stichprobenumfang
Stichprobe
Zusammenhängender Graph
Formale Semantik
Matrizenrechnung
Bit
Punkt
Blackbox
Minimierung
Compiler
Baumechanik
Element <Mathematik>
Raum-Zeit
Internetworking
Übergang
Formale Semantik
Richtung
Netzwerktopologie
Medianwert
Arithmetischer Ausdruck
Gerade
Funktion <Mathematik>
Lineares Funktional
Kraftfahrzeugmechatroniker
Parametersystem
Hardware
Ähnlichkeitsgeometrie
Ein-Ausgabe
Unterraum
Randwert
Datenfeld
Verbandstheorie
Festspeicher
Heegaard-Zerlegung
Logiksynthese
Garbentheorie
Programmierumgebung
Schlüsselverwaltung
Aggregatzustand
Instantiierung
Web Site
Subtraktion
Quader
Schaltnetz
Zahlenbereich
Term
Code
Division
Hypermedia
Systemprogrammierung
Virtuelle Maschine
Variable
Bildschirmmaske
Informationsmodellierung
Fächer <Mathematik>
Netzbetriebssystem
Stichprobenumfang
Luenberger-Beobachter
Zusammenhängender Graph
Indexberechnung
Speicher <Informatik>
Soundverarbeitung
Linienelement
Symboltabelle
Physikalisches System
Kommandosprache
Flächeninhalt
Basisvektor
Wort <Informatik>
Evolutionsstrategie

Metadaten

Formale Metadaten

Titel Lets break modern binary code obfuscation
Untertitel A semantics based approach
Serientitel 34th Chaos Communication Congress
Autor Blazytko, Tim
Contag, Moritz
Lizenz CC-Namensnennung 4.0 International:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/34944
Herausgeber Chaos Computer Club e.V.
Erscheinungsjahr 2017
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Do you want to learn how modern binary code obfuscation and deobfuscation works? Did you ever encounter road-blocks where well-known deobfuscation techniques do not work? Do you want to see a novel deobfuscation method that learns the code's behavior without analyzing the code itself? Then come to our talk and we give you a step-by-step guide.
Schlagwörter Security

Zugehöriges Material

Folgende Ressource ist Begleitmaterial zum Video
Video wird in der folgenden Ressource zitiert

Ähnliche Filme

Loading...
Feedback