Merken

# GRAP: define and match graph patterns within binaries

#### Automatisierte Medienanalyse

## Diese automatischen Videoanalysen setzt das TIB|AV-Portal ein:

**Szenenerkennung**—

**Shot Boundary Detection**segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.

**Texterkennung**–

**Intelligent Character Recognition**erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.

**Spracherkennung**–

**Speech to Text**notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.

**Bilderkennung**–

**Visual Concept Detection**indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).

**Verschlagwortung**–

**Named Entity Recognition**beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.

Erkannte Entitäten

Sprachtranskript

00:00

i at Al Hack rapid mankind ruled on time and it the I'll arm wording thousand 17 we

00:21

so we're going to talk about and pattern matching

00:24

and other matching within and there is and the buttons and putting the define our graphs so we will see how we can lights and buttons as a graph and match them with inventories just a quick word about ourselves and so I am the only entity Mirandese that Airbus and an Jonathan through time also my way on the east but that extension sold through our

00:49

presentation we will talk about some arrest simple which is a backspace sexually that's more like a

00:54

where family i it was 1st seen in 2015 and their rights to answer so what we found in what is what we find in the space is small decryption function that we agree that encrypted configuration by suggestions is also always with 3 entries so what we see is that some of the decryption function is quite simple so it is more with in here we can see that on the left so you are simply a loop with because all of which uh there's an argument 11 then followed with a set that has as argument 25 In the backspace family we find many bottoms of this decryption some so With this scalar family what you want to do is 1st to detection so the detector whereas then to classify the violence based on the decrease from the gorithm that is used and then to be able to decrypt the configuration viable since we know which algorithm is being used 1 thing you can use to do that is to use up your eyes bytes at you realize so we're gonna expressions and bites and so if we look like the following year expression you save in order to detect absorbed with is then the sub you would say OK I need the bite 18 then they need any bytes can thinking about the name the debate 11 which is the argument here of the door even then they need the that is bait 80 then in debates then the by 25 which is the argument of the sub but this and the way of writing and matching as signature is that which is easy to read and so what we wanted to do is list let's signature that are based on the instructions that are within the binary and how they are linked to each other so they can of and signature that we want to do is to say for this impediment to say OK not find me absorb that has as argument to 11 and it is directly followed with a server that has has argument to 25 I didn't a quick overview of the abrupt projects so since we're working on graphs and we use that that language to move the graphs that is a simple text file is that we really see lots of examples later but it's quite simple twice the graphs in this language and graphical user and as is and then assigned onto that you can use the command line then it has spent many things through live wages the wasn't from graph from Python and we also developed and the that begins everything is open source and relay benign so graph has 2 main components the 1st 1 is the single that will take care binaries in the viewer that phase that is a graph representing the binary so it's and it's good respectively and space and get some the other part of what is a graph matching the library that can prosper that phase and then can do the graph matching is this part is more marketed in science at this point so a typical workflow for and on is with we to whites patterns so that's what we have here but in that that's so you we want a button for backspace the new ones when I expect they that exist so or you give it to the the some of that we give you a about and the backspace that that which is the sum of part uh as a graph and then he would give the pattern and the and left to would work matching neighboring this will give you matches and extracts sections that were interested in so backspace and I is a binary OK so is a sequence of bytes those bias can be interpreted in assembly language so the sequence of instructions so it would consider form the on the subroutine for the inca the decryption then where we can know create a graph of a series on the the flow of the execution of the instructions and in in the stand-alone tool we use a recursive function to create this graph then we all best for the disassembler in on captain the but for the IDA plug-in we use the idea engine photographs the so we can create the graph at the right we have in principle uh sorry if my French were um we have the graph of backspace OK at the right and we want to match the pattern on the left which is the decryption routine to do so we use the usual we can use the usual algorithm sold at the left we have the pattern that we have the this graph in the gold is just to search over the 1st cold so again we found the 1st goal then we're going to look to the children node there is an added so we get to look to the right we have that absorbs all going to try the all the time so we have to push so we got a search form of cold so OK we match a the same thing we have ahead so we could go to look to the push at the right the right children and old so we try the hand yeah we match that so again there is a cone yeah there's a compilable OK so the next and so on so we found the bush OK and so on so OK is pretty simple but you need to check every child of the node and it's pretty small on the graphs so since that the usually solution is pretty strong graphs undergrads we wanted to have a fast resolution so we wanted to have different algorithms so 1 thing we can say about cover for us is that there are not many graphs so the 1st property they have is that usually when you are uh when we do this and then you see that and in order to confer for graph has at most 2 children it happens when you have for instance a and the sorry a conditional jump when you have a 1 share that would be reached when the condition is fulfilled and 1 share that with be which when the condition is not fulfilled so that the 1st and priority we haven't seen before West another property is that the children are not symmetrical usually you we find that the 1 of the child that is and directed the fully version is the next address and you know the child that is a remote searching somewhere Singapore 1 so we cannot in road then so we really false that in all graphs you can add to invert the order children so what you need to see now is that once we say that the children are all the inner graphs the object's we manipulating and not really grass anymore and that will help us have a very fast matching the other thing we can say about patterns so I talked about before for was in general but but and unless this specific has we so audio it's easier to match them from the 1st known so we really falls that every pattern has a 1st node so the 1st node that is and I would no that's that's from this node to every other node of the graph can be reached on the example on the right that is all part and for backspace you have the 1st node and that which is the root node because from from it you can reach every OK so with this efficiently we will have 1st mentioned I've been light so decided we show you how we can represent the about and also under different forms so the pattern we have now around the world to what we see on the left so we have a correlation added compare the push and push and the children all the reaction Chen number 1 inch and number 2 in general ones here so there is with this kind of thing since we have a word to the nodes which is of course there is a unique way to number this graph with a depth-first search so that is the following if I think the root node so this accord I number this 1 1 then they take the 1st child of the caller ID I did here to the other and then to the native the Shiite animal 3 then they go back to the coarse every go to the genome but to remember it for and go to the genome or 1 number it's right so I can describe its and it takes way as this so that's equivalent to the bottom and that means OK I'm looking for a pattern for graph which has as a 1st know the corn then it has a 1st child if you take it to get to an ad then or I think again the fish and if you take it you get to compare then you go back to the 1st to the node numbered 1 before the corn so you go back here and say OK now you take the Chair number 2 when you get to push then dignified and the genome around and get to know the cost of this representation as a text and by the state's representation is equivalent to the bottom of condition of the work so once you have this was the question we need to ask is can be performed across all the

10:07

bottom within that this graph so we saw that the button on the left is a grant to the text representation I add some coral because it's may is going as it is and OK so now it's easier I need to find the for the initiative is to find uh would know that would be accord so I have candidates and the right here which is this 1 then I get to the you know and I get to the other part so in the end to see here if if there nodes that share number 1 that would be another so I think the chain number 1 year I get to also it doesn't match I have to go back so I have to find and although would know that could be a course so there is 1 here now I ask it is the trend in the 1 that is another yes I remember this 1 to like it there is the channel 1 that is the compare and yes it's worse than OK take me back to the change number to the node numbered 1 to back to the coarse so get their asking their child number 2 that's the push it works I say is there a general and that is if we should also words so this gives us a matching for all the button from the battle for that as well so that is the algorithm that we use so the difference with the previous algorithm is that now that children are ordered is we so that we know that we're looking for channel 1 or channel number 2 so you don't have these conditional and uh factor anymore you would only have but when known then for matchings it's much faster OK so I receptors algorithms now I we go back to the bottom so I talked about how we can match them but now I would say what opportunities do they look like so on the less you have or uh the bottom for backspace which was the decryption within and so we said that but on the other applied to specific fields of because we reuse like conditions reuse conditions of grows on our on arguments of instructions and address of this option so on so on the right here you have an an example of a button for a public space that matches exactly the 2 instructions on sub so what it says here so it's about finding it so as they were that's a adults keyword then we have the name of the patterns with the equipped with the guys on this and then it has we say it has no it's testing the 1st node is a with the the condition and the node is OK I want to match an infection that has or as it's of good and has as ultimate number 2 hidden I have another node that is the and the condition is I want to mention of good that is 7 and that as or argument number to 25 and then the party uh specifies it is so there is a need from node aid to the these it means they have a child from the baby that is that the but in a few more things about not options and Egyptians in oil and are part of the language of so I said that the but on graphs are supposed to other would know that is a node from which you can reach every other node so in order to specify this node you have to you can just say it uh Egyptian which is true that means this not the is the is the word you have the core uh feel that stays that there's a condition so we so we can have a conditional course for instance we have the get idea flag at midday the means to do with them what OK when you find a match for this note that I'm interested in this in this match so I want you to print it on wanted to keep it as sites like an excessive later the what is of the form of the funds for Chad options for edge of edges options sorry we so that shies on number of the the 1 or 2 so we have an option to specify the uninitiated such an 1 or genome so the example below is the same pattern as previously but with every option explicit and so we have still the condition we have a k this node is a word so that the 1st node of the pattern who are interested in this node and I want you to whom March the nodes with a so that the name of the node for the interested also in the notes so you have to market the and the Chen number of the edge was 1 because they are following each other that's sequential instructions the but is so back conditions conditions of can we be able to make conditions on instructions and they will be conditioned on the uh forward stranger than for select move IEEE aid is a x something for instance we have the condition and the address conditioned on the of God and the arguments of the number of arguments and text of the arguments so the the upstream fields we also have conditions on the number of incoming and the number of I agree it is a source so that means the number of others the and the number of children of a node the the the I have an example here the that said and looking for a function that is identical that so that there is a call to this node and so the code is in a and the construction encourages the and then there is no net between a and B a took so the number of the edges to because it's a it's not affirming assertions is a women's instruction and the condition we have an an and that the is that it has multiple incoming nodes that will help us find a function that are frequently called into code the OK so until now explain how we can match 1 the 2 1 infection that's what we really want to do next is to match 1 node to multiple instructions later when you want to match basic blocks sometimes you don't care how many insertion there is that is a good command to match a again this not needs much of what we think like so we can specify and minimum number of matches for another on the maximum number of matches and any works and sequential instructions with busy blacks so these are antigens that have 1 other online chat to think that words you have a busy good that we see here Nader so there is doubt to incoming nodes to a degree nodes and 93 insertions here to match this kind of things in graph this have to say again I want you to match an infection that means cone is true and you and you to match 1 or more instructions it is plus if I want to match the proportions instructions every why something like that like the and once insertion revealed good push and I want them to be repeated 3 times so by default graph we and takes the most matching instructions and when you will use this affair repetition but you have an option to change that you can say it is repeat is true so the 1st point was greedy repeatedly predictor most of them with lazy repeats you is that when the next missions was treaties which is as given example here so back to backs basis and then we had the decryption look so we're interested suspicion in that's on the 7 sections but we can say OK I want to room that looks a bit like that so we want to match 1 to 5 and officials we have 3 year but maybe there is less maybe there is more then we have the door then that the Sun than we have gain 1 to 5 instructions and then we have a condition engine that groups back to the beginning of the loop so but only do like that we have we have here on that's that representation on group on graphs 1 2 5 in infection then there's also 7 1 2 5 in this section and then the Commission engine in go up it's really where the button would look like that a user can do for instructions repeated that next 5 times and I want you to stop once the next condition is fulfilled so it would match for this node age we match moving Adam than here it's was stuff because the next mission and this would fit because we have absorbed than the rematch Gawronski we match 7 the we match again instructions so the compare moved into a step and the last 1 because that's when it says that the source instruction to begin with J and 2 after should run so that the case with the condition of from OK so back to a bike face symbols so I'm going to do do we have 100 the like this imposed I want to do that some of them and I then they want to detect the decryption algorithm that that is so that is the 1 we reverse that then I want to find variants of this division and with them and yeah we need to detect all the data from the poisons so do with for this so

19:21

as the I a symposium so a hundred

19:25

samples of x space and I have a

19:28

button so that the 1 we saw in the states and so we have a Lizzie repeats we have befriended XOR here then the Sun I want to match it against the the breaks the simple we know so as it has grabbed too much it's against simple this 1 so that the 1 we worked on the 1st 2 it is a single the the binary then it determines the patterns so there is

19:52

1 pattern that was wasn't the demographic which is a simple has 80 thousand nodes and it found when matching here so the matching is is described here so we have this option that match we

20:04

see the good on the said that working for but it so that flows for once a I wanted

20:10

written it backspace to see what happens would it take think away because um attitude needs to this as the somebody and 99 assembled takes around 20 seconds so my to receive a lot of matching that of text and then they would show you another option to have less less uh as their POS outputs and the more quiet 1 with 1 1 the 1 simple paraline so the for this area have that of texts so and I ask you to do it quietly so treated much quicker because we don't have to the the friends again I show you that the fisher of has the following areas adults phase which are the some of the versions of the banners the would so

20:59

now I launch it again but with a quite options so here I see all the atoms that all the and there is that's matched and all pattern but the 1 that so we find a lot of the sample that use the same direction with them that is the same but maybe there are more so I ru I read as graph to show me all the binary in so OK we see that's some samples match and some of them much so what they wanted to do next with find bonds to fund earns them in that way that which is here OK so here we have

21:35

the 2 groups going on there is it hard to school so what we see that it's very simple you ever before they the middle than another basic back we'll see how it is called so there is a function here that has a lot of calls to the Wiktionary algorithm is always to push is because the the other arguments then a call to the decryption function so I wanted to write about and to see and to detect this

22:02

so what they did but this pattern the so

22:08

I asked that we find something with 2 versions then the court then again to push is another course 4 times in the all points to a function that is that has the 1st big black then the look at then and then they got then return assertion and that what they say here that I'm interested in getting the new so I think it I did I said get a AG on the on the root node so I show you might troops lecture me that medical version because it's a is is kernels the

22:44

OK so we have here 2 preseason according to which is another call or point to the same function and we're interested in the here which is where the decryption algorithm happens the OK

22:57

so 92 matches against the mice impose the I have a home so I find here the 1st and within that we already know

23:14

it's exhausting 11 and 25 again the same here we have another 1 there is a sudden then there's always another constraints then another set then there is again the same 1 then when

23:26

we find that there are many more emotions see them through them or but what we

23:31

did next the what's right at some about for this these adequate environment so we have a there various more buttons so the 1st 1 is the 1 you so 1125 this 1 is the 1 we just sort of greedy and 20 and they were like 5 other ones of them and they so what we do now is we want to

23:53

match them against that experts and pose and so what we see here in that that wrote more and no samples matched and we know which algorithm was used so here and then that and that match with the loss of the resources a service also here there is another 1 and another 1 in Section so 1 more thing we wanted to know is that we have to have some and

24:22

match the and boundaries

24:26

so soon so I want to know where the match 1 simple thing in the 2 in this sort of at its worst like is we asked for uh stations 6 names and we see that it's you big so it's the 1st some symbols and seemed to be back to you peaks so what they do is I wanted to have a creates signature for these you peaks phase so I will look for our basic that loops that basic graphs that to themselves to have a patent for that's the real use now the the and this better so I find

25:13

to be the good to and we looked into whether

25:17

the ring but and wanted through them as signatures so I wrote a database nature which is which contains these troops invasive 1 speaks to so where they did next was to mention OK so there are lessons to be more simple than that but the debates and I always match the troops so I know how to do the guidance of backspacing and not

25:45

detects the phase spectrum peaks so I wanted to take them was recommended to the 2 signatures and I will mention the what is this a so we still have a few and measurement errors but most of them we see that there is a a violent of space and we know which algorithm is used all the opportunities OK but to

26:20

states so what we saw that is that we found 7 variants of debates so there was a the role the first one with the this or said 25 there was another 1 that we saw and then type of the atoms the we also found out that there are so many result but and want a smaller than the creek signature for this phase we match so what we did is the on off 100 the but this imposed he took around 20 seconds to the decimal them then to do the matching of 9 but only to the more than 2 seconds and then we see that there are like Simmons and Pfizer the properties is replaced the Huns uh the half of the samples that have the 1st button that we looked at and the other patterns are used sometimes and and the end there was the given uh in areas that we did not identify corrected the so what can

27:18

I come to simplify the creation of scripts the we create a Python bindings of the C + + library with weight and now with Python can give us some but we can all patterns and test graphs and can grab a button graphs and the graphs that can match but tense and past older results so here we have some examples of Python scripts so at the beginning we have as the imports

27:51

of a library then we have the will of the pattern that find the we re extracts those on the graph from the binary switch to the some find them with all the tests graph and after we much there the graph the about them In but there's graph the so here is an example of from 2 pattern we want to match uh 3 pushes so we use the match graph and now we can get old uh matching but patterns then we can select the pushes buttons and now we can go with this vicious patterns we can get all matching the instructions then can say OK if I want to get the the IDP so it give all instruction with the get IDP and then with like the 1st instructions so now we

28:53

have the instructions so we can request for some information like the addresses the all of the old constriction strings deal called the arguments and so on so OK we can match uh the decryption group in which so that we have a lot of Bush's and cold to the decryption strings so we want to match the 2 pushes and the cold today the decryption routine care so we have we want to check match the 1st which which of them and then the encrypted string and then the cold to the entry point of the function to be could be on both strings soul we have to know and decryption entry points we can match the during the address of the all the decryption loop so we have the address of the match there 1 so to get the entry point we know that the function is a very slow very early on tiny so we can say in we know also that the decryption entry point is before the decryption loop there so we can say we want to search an old between the decryption address and between the decryption address minus 30 and we want to match an old with at least 5 or more incoming so and that their own and

30:45

so this correct this the we have the imports yeah we get done the binaries would create the the graphs in then we have bowls but

31:04

tends to match the intra pants then too much the pushes before the call of the decryption function and we implement the old clients of the decryption algorithm so here you have subzone set said out and so on so the in my lord so we go through figure could description with a sample the so now we can decrypt bolstering so we have the sample we match the decree

31:54

also then we found did the address of this routine here we match the entry points with more than 1 of 5 or more than 5 incoming between the address of still

32:12

decryption minus 30 then the

32:15

address of the decryption the so we found the decryption of function the

32:21

standard parents so now we get to look

32:24

for the moment the pushes again

32:28

With the entry point of the function and then we matched 35 calls to this decryption function and so after this we take the strain bankrupt and string them lead a critic with the the algorithm and now we can see we have some registers and some fixed 5

32:51

things on the so now we can

32:55

but if and all samples

33:01

so here we have the 1st samples was strains

33:08

registries then we can find some big thing on all the

33:13

samples and so on so there we have all those strings of the configuration and so on of much space so here we have a tool but it

33:29

can be nice to have a plug to in I don't too much both the and those spot and so we have a nite uploading we convert the ITER graphs through no fall grapple then can much attention recently graph which is the Python bindings and then we can with GUI rolls balls matches and put some current and then we apply some filters to the so

33:58

match a single pattern for example of a fall s fall said keep we have 2 small groups for the initialization an if we create just 1 pattern for those 2 groups we have a lot of false positives so we get some fit the techniques that it's quite simple and abuse so on for example the 6 key so far as the form we have 2 groups fall were the 1st of these the 1st pattern in this and would be the 2nd pattern so if we don't have those loops in the function and it's not all functions so you can see at the right we have to to potential OK it's accepted but if uh in the EU we have just 1 part and it's rejected we have also the rates the and techniques so also quite simple but we have 2 hopes but if we have more than the more than 2 groups in the in the function so if we match uh for example 3 times the pattern 1 of the the function is rejected to because we just had the 1st but the 1st loop 1 time in the 2nd 1 9 2 then we have the threshold so sometimes with the the pattern can have some errors soul the at the right that we have the full patterns for 1 function so we match all pattern so OK it's good but in the middle of of the bottom number of thought answering that are not here so the function is also for example the function of decryption but we missed some some patents so we create treshold so here we have a threshold of 0 . 75 that means that at least 1 and 3 other form of uh patterns so in the middle it's accepted to that at the left we just have to add a force so you really we reject this function the last 1 is the overlapping so in for example again the FC false key we have the 2 groups who creates 1 pattern for the 1st and another for the Sun and if we match the same loop ways those 2 but and that means we met the same group and we don't this but in the example at the last we have a to match of the 2 matches of the 1st but then in 1 of the Sun and there is an overlapping between the 1st and the 2nd so if remove the pattern 1 of 4 the overlapping we have the honor of body the the matches the at the rights so we can say OK it's all function that this technique is not yet implemented in either graph so the creation of the rule fall the plugin is quite simple to solve for the SE fostered would create the uh the 1st loop so no 1 we give the the past to the adults time to the bottom down to give a name a decryption description of this button then we say OK a minimum but down is 1 so that mean I want to match 1 time uh this pattern almost then we fix the max pattern so we say I want at at next 1 time this pattern so we must match just 1 time the but the 1st so it the same things for the 2nd and then we traits of the function of that down so we give the 2 groups because we we must match those 2 of 2 loops in the function the city of st formed then we put the threshold that here is 1 so it means we must uh get the 2 uh um the to attempt to regulate the function then name the disk in the so demo OK

38:27

in all the in the the

38:36

the sessions for the family the it little the in FIL the fingerprint is to find the what dance so OK ranch here we can see we have the match some patterns so the we can think comment and then we have the decryption routine OK but sometimes the we don't know really where all the matched so we put some current so here means then it and then we have the current on the 2 instruction for the decryption look and we have some current on the in the entry point so that in here the we have good please the pushes for them the frames and if we don't know in bringing yeah them lunch again this can here we have the name of the function also with the number of instructions so is it's OK

40:06

and s if all the match or not efficient because the groups all very small and very generic so we have we still have many um false positives but we have a few false negative which is cool so do the cooling is appropriate content and we can try and maybe to to create all those signatures for all over cryptographic algorithm by chaos and so on and maybe you can put the filtering techniques that in some of key wrap because of its only in either grep so the perspective some mutation

40:56

of all tools all the arguments and structure of cold and so on all in of strings so sometimes we have to do some dirty hack to get uh 0 for example and you have the conditional jump so we want to match Ernesta chairman of strings that begin use J OK and we want the note to have 2 children because in the graph data insertion with children all polls and conditional jumps so will not also begin with C so that then then we have all the problem because in the sum the stand-alone tools and we use the captain and Zhang in in either we use the I'd engines so sometimes we for example the variables we have in Aida 25 h fall the in the the the the smaller value 25 in caps summary of all x 25 so again a dirty hacks to catch the both cases so often whom we catch 25 all the all x 25 so the solution is semantics the idea behind that is to say OK and I want to check if my the argument 1 is registered and all I have to check if the old cold is a conditional jumps and so on the same things for the for log 1 if we want to catch the intended the interviews with the value 25 not gravity to game there is another imitation of the how right bottoms that is and that we can we explain how we can do not repetition that there is 1 new mutation that that is the following year we can see the 1 with the repetition we can say I want the least number of matched instructions or I want the maximum number of matches which there is no in between so the problem is is this 1 we have gotten that say is that I want to match 1 to 4 instruction then name want to match absorb more than 1 too much of course and then I say OK this alpha instructions so it's 1 2 5 anything then the law than the core and so there is a lazy we been here we can develop a true or false if we put true between we mean that you want to dig the least number of instructions to you if you put false move to take the maximum number insertions so we'll see how to match the spot on on the on the binary year uh graph that would have a push in the push then or then another goes and then of course so with the foreign with and the current that was that if we put lazy be to shrewd the matching for the any know them with step once it's which is absorbed so for any it with they push push then interest up then or node we meds absorb than the core node which way to much of it or it doesn't work so there is no match so it doesn't work was later repeats at true if it would later repeated false so any any node would try to match the maximum or instruction so the problem is it is greedy so the any know that we match push push all because it matches for insertions and then we get to the door node that finds a corner so doesn't match the naughty there so there is no way for all to now to match this a binary thing with this pattern is counterintuitive because as with regular expressions would like to say OK I don't care if any takes 1 2 3 of all for the instructions I would like it is the more so of course the solution may be also for granted to would be to try every year number of repetitions of 1 2 3 4 and of course attributes Larosa lead to watch watch out for and but number forms OK we have a lot of more idea that you would like to implement in the buttons 1 thing that is limited also with the current version is that for basic goods when you have a instructions on this the condition and basic blocks for now we apply to every node and that you would like to say is I want of is graph that has at least ones or insertion off when you get graph that has had his to the insertion 1 compare instruction that again a defined as a maybe we'll also work on that data so we said that the children are numbered so you have to specify when you do it is if you you looking but can number 1 or trend but sometimes we we want to say OK I don't care so it think and the for for instance for condition and and you could say this could be the Chad for when the conditions which all 4 when the condition is fulfilled I don't care so we would like to add an option that would be generators of a question mark and so they go with them to try the both options then as that and expand you would like to bit of atoms in that paragraph which would be able to for to say OK are 5 patterns and won the match if we find the 3 of them that's enough or we would like to say OK after battles B 1 and B to find new binary that has 1 match over the 1 that is directly followed by a match of the 2 that is there is 1 institution of the 1 that has as a child in this section of the 2 that we would like to do some puzzling may that's the last point is that for now we constructed the grass manually with the writing the text buys it for its it's OK when you have a the when you use streets its course quite fast but it would still be 0 to be able to select nodes in either up and then Kij exports and rule generator nicer that with the uh the all the nodes analogy that you want it to be much faster integrate patterns OK so this will end or uh all talk the conclusion is that the graph is at the same time a solution to the test by the meanings we have name that begins to fall guru was to create a tool that can make your own graph buttons that there are is its rights and easy to understand we believe it is used for for the diction on intimate and is we show that the space and his words the is a free open source so you can find information and this address as in perspectives for the future amid before that the 2 that we would like to add some but features we talked about processing the semantics information from instructions that we talked about condition and basic Rex we would like to be able to create buttons and directly with an that and then you would like to have many more examples of work untutored those we will want to play with a yes and on bikers on command vectors so as we end

47:59

also works thank you for listening and we try to answer questions if thank you for the talk verdict of uh following if you had thought about using it since can have uh capstan and I don't know I think I haven't quite it but uh could you just steps timing error and that by that anniversary 2 steps in Nader yeah but you we thought about it but we have not decided gets hot enough to put into the system the

00:00

Binärdaten

Graph

Mustervergleich

Vorlesung/Konferenz

Plug in

Wort <Informatik>

Graphentheorie

Hacker

Computeranimation

00:46

Gewichtete Summe

Polynom

Parser

Raum-Zeit

Computeranimation

Matching <Graphentheorie>

Algorithmus

Softwaretest

Mustersprache

Teilgraph

Kontrollstruktur

Phasenumwandlung

Korrelationsfunktion

Koroutine

Kategorie <Mathematik>

Malware

Disassembler

Menge

Rechter Winkel

Einheit <Mathematik>

Wurzel <Mathematik>

Konditionszahl

Server

Graphentheorie

Instantiierung

Folge <Mathematik>

Kontrollstruktur

Tiefensuche

Äquivalenzklasse

Nummerung

Viewer

Überlagerung <Mathematik>

Chiffrierung

Open Source

Bildschirmmaske

Knotenmenge

Programmbibliothek

Konfigurationsraum

Drucksondierung

Algorithmus

Open Source

Verzweigendes Programm

Elektronische Publikation

Datensatz

Schnelltaste

Wort <Informatik>

Punkt

Gemeinsamer Speicher

Formale Sprache

Adressraum

Selbstrepräsentation

Versionsverwaltung

Familie <Mathematik>

Binärcode

Marketinginformationssystem

Analysis

Eins

Arithmetischer Ausdruck

Minimum

Wurzel <Mathematik>

Bildauflösung

Umwandlungsenthalpie

Graphentheorie

Lineares Funktional

Parametersystem

Assembler

Plug in

Reihe

Nummerung

Elektronische Unterschrift

Knotenmenge

Systemaufruf

Variable

Mustersprache

Chiffrierung

Isomorphismus

Projektive Ebene

Garbentheorie

Disassembler

Aggregatzustand

Kombinatorische Gruppentheorie

Assembler

Graph

Koroutine

Zusammenhängender Graph

Binärcode

Elektronische Publikation

Graph

Matching <Graphentheorie>

Konfigurationsraum

Bildauflösung

Mailing-Liste

Objekt <Kategorie>

Formale Sprache

Mereologie

GRASS <Programm>

Skalarprodukt

Vollständigkeit

10:05

Bit

Einfügungsdämpfung

Punkt

Gemeinsamer Speicher

Extrempunkt

Selbstrepräsentation

Formale Sprache

Adressraum

Gruppenkeim

Computeranimation

Matching <Graphentheorie>

Algorithmus

Fahne <Mathematik>

Trennschärfe <Statistik>

Mustersprache

Minimum

Konditionszahl

Default

Folge <Mathematik>

Konstruktor <Informatik>

Parametersystem

Lineares Funktional

Güte der Anpassung

Stichprobe

Systemaufruf

Nummerung

p-Block

Systemaufruf

Teilbarkeit

Arithmetisches Mittel

Mustersprache

Datenfeld

Verkettung <Informatik>

Chiffrierung

Genetische Programmierung

Twitter <Softwareplattform>

Wurzel <Mathematik>

Einheit <Mathematik>

Rechter Winkel

Konditionszahl

Grundsätze ordnungsmäßiger Datenverarbeitung

Garbentheorie

p-Block

Graphentheorie

Instantiierung

Folge <Mathematik>

Web Site

Subtraktion

Gerichteter Graph

Gewicht <Mathematik>

Befehlscode

Mathematisierung

Nummerung

Code

Chiffrierung

Graph

Loop

Knotenmenge

Bildschirmmaske

Adressraum

Demo <Programm>

Drucksondierung

Algorithmus

Graph

Matching <Graphentheorie>

Open Source

Default

Netzwerktopologie

Zeichenkette

Minimalgrad

Körper <Physik>

Loop

Basisvektor

Mereologie

Wort <Informatik>

Speicherabzug

Skalarprodukt

19:18

Gerichteter Graph

Matching <Graphentheorie>

Stichprobe

Disjunktion <Logik>

Knotenmenge

Visuelles System

Raum-Zeit

Computeranimation

Netzwerktopologie

Graph

Chiffrierung

Knotenmenge

Softwaretest

Stichprobenumfang

Mustersprache

Konditionszahl

Kontrollstruktur

Vorlesung/Konferenz

Eindeutigkeit

20:03

Zwei

Versionsverwaltung

Stichprobe

Euler-Winkel

Knotenmenge

Computeranimation

Netzwerktopologie

Graph

Softwaretest

F-Test

Flächeninhalt

Eindeutigkeit

Phasenumwandlung

Funktion <Mathematik>

20:58

Lineares Funktional

Parametersystem

Dualitätstheorie

Graph

Stichprobe

Gruppenkeim

Systemaufruf

Ordinalzahl

Binärcode

Computeranimation

Quantisierung <Physik>

Richtung

Chiffrierung

Algorithmus

Mustersprache

Stichprobenumfang

Elektronischer Fingerabdruck

22:00

Lineares Funktional

Punkt

Gerichteter Graph

Versionsverwaltung

Stichprobe

Mathematisierung

Visuelles System

Computeranimation

Kernel <Informatik>

Chiffrierung

Knotenmenge

Loop

Mustersprache

Wurzel <Mathematik>

22:42

Inklusion <Mathematik>

Nebenbedingung

Punkt

Kontrollstruktur

Matching <Graphentheorie>

Stichprobe

Systemaufruf

Knotenmenge

Computeranimation

Netzwerktopologie

Graph

Chiffrierung

Algorithmus

Softwaretest

Loop

Eindeutigkeit

23:23

Einfügungsdämpfung

Gerichteter Graph

Matching <Graphentheorie>

Kontrollstruktur

Stichprobe

Visuelles System

Knotenmenge

Systemaufruf

Computeranimation

Eins

Chiffrierung

Graph

Dienst <Informatik>

Softwaretest

Algorithmus

Loop

Stichprobenumfang

Garbentheorie

Greedy-Algorithmus

Programmierumgebung

24:21

Graph

Loop

Randwert

Last

Elektronische Publikation

Genetische Programmierung

Matching <Graphentheorie>

Arbeitsplatzcomputer

Stichprobe

Content <Internet>

Graphentheorie

Dateiformat

Elektronische Unterschrift

Codierung

Computeranimation

25:12

Gerichteter Graph

Messfehler

Natürliche Zahl

Datenhaltung

Stichprobe

Benutzeroberfläche

Elektronische Unterschrift

Punktspektrum

Knotenmenge

Raum-Zeit

Computeranimation

Netzwerktopologie

Chiffrierung

Graph

Softwaretest

Algorithmus

Loop

Einheit <Mathematik>

Konditionszahl

Eindeutigkeit

Phasenumwandlung

26:20

Resultante

Gewicht <Mathematik>

Kontrollstruktur

Parser

Ordinalzahl

Assembler

Computeranimation

Graph

Chiffrierung

Last

Mustersprache

Datentyp

Stichprobenumfang

Programmbibliothek

Skript <Programm>

Phasenumwandlung

Graphentheorie

Softwaretest

Schnelltaste

Matching <Graphentheorie>

Kategorie <Mathematik>

Zwei

Stichprobe

Elektronische Unterschrift

Mustersprache

Schnelltaste

Flächeninhalt

Loop

Disassembler

Graphentheorie

Aggregatzustand

27:50

Punkt

Gerichteter Graph

Befehlscode

Adressraum

Gruppenkeim

Parser

Information

Computeranimation

Chiffrierung

Loop

Graph

Last

Adressraum

Mustersprache

Koroutine

Konditionszahl

Programmbibliothek

Softwaretest

Graphentheorie

Koroutine

Algorithmus

Lineares Funktional

Parametersystem

Elektronische Publikation

Matching <Graphentheorie>

Graph

Konfigurationsraum

Knotenmenge

Systemaufruf

Zeichenkette

Mustersprache

Chiffrierung

Schnelltaste

Information

Disassembler

Zeichenkette

30:43

Lineares Funktional

Fehlermeldung

Gerichteter Graph

Systemaufruf

Information

Visuelles System

Binärcode

Hochdruck

Computeranimation

Mustersprache

Chiffrierung

Deskriptive Statistik

Client

Chiffrierung

Algorithmus

Funktion <Mathematik>

Menge

Adressraum

Stichprobenumfang

Disassembler

Graphentheorie

Figurierte Zahl

31:52

Algorithmus

Lineares Funktional

Punkt

Gerichteter Graph

Prozess <Informatik>

Momentenproblem

Adressraum

Stichprobe

Systemaufruf

Systemaufruf

Computeranimation

Zeichenkette

Physikalisches System

Mustersprache

Chiffrierung

Chiffrierung

Algorithmus

Funktion <Mathematik>

Adressraum

Vererbungshierarchie

32:50

Binärcode

Internetworking

Schnelltaste

Firefox <Programm>

Filter <Stochastik>

Prozess <Informatik>

Graph

Plug in

Stichprobe

Browser

Umsetzung <Informatik>

Raum-Zeit

Systemaufruf

Computeranimation

Zeichenkette

Graph

Mustersprache

Physikalisches System

Funktion <Mathematik>

Parametersystem

Stichprobenumfang

Graphentheorie

Gleitendes Mittel

Konfigurationsraum

Konfigurationsdatenbank

Zeichenkette

33:56

Korrelationsfunktion

Ortsoperator

Extrempunkt

Gruppenkeim

Computeranimation

Loop

Deskriptive Statistik

Bildschirmmaske

Mini-Disc

Mustersprache

Minimum

Schwellwertverfahren

Inklusion <Mathematik>

Lineares Funktional

Schwellwertverfahren

Graph

Matching <Graphentheorie>

Kryptologie

Einfache Genauigkeit

Nummerung

Schlussregel

Plug in

Bitrate

Schlussregel

Mustersprache

Chiffrierung

Funktion <Mathematik>

Forcing

Loop

Rechter Winkel

Mereologie

Bitrate

Zeitzone

Schlüsselverwaltung

Fehlermeldung

38:34

Korrelationsfunktion

Lineares Funktional

Punkt

Ortsoperator

Matching <Graphentheorie>

Gruppenkeim

Familie <Mathematik>

Negative Zahl

Nummerung

Elektronische Unterschrift

Computeranimation

Mustersprache

Algorithmus

Chiffrierung

Verschlingung

Perspektive

Kryptologie

Generizität

Mustersprache

Unordnung

Elektronischer Fingerabdruck

MIDI <Musikelektronik>

Inhalt <Mathematik>

40:53

Bit

Einfügungsdämpfung

Gewichtete Summe

Punkt

Extrempunkt

Adressraum

Versionsverwaltung

Parser

Binärcode

Gesetz <Physik>

Analysis

Raum-Zeit

Computeranimation

Formale Semantik

Matching <Graphentheorie>

Mustersprache

Minimum

Konditionszahl

Hacker

Softwaretest

Schreiben <Datenverarbeitung>

Parametersystem

Perspektive

Plug in

Kryptologie

Güte der Anpassung

Nummerung

p-Block

Kugelkappe

Arithmetisches Mittel

Mustersprache

Generator <Informatik>

Twitter <Softwareplattform>

Konditionszahl

Garbentheorie

Information

p-Block

Fehlermeldung

Instantiierung

Zeichenkette

Gravitation

Formale Semantik

Gerichteter Graph

Befehlscode

Regulärer Ausdruck

Ordinalzahl

Open Source

Knotenmenge

Bildschirmmaske

Variable

Spieltheorie

Perspektive

Jensen-Maß

Datenstruktur

Disjunktion <Logik>

Meta-Tag

Attributierte Grammatik

Algorithmus

Matching <Graphentheorie>

Graph

Open Source

Multiplikationssatz

Verzweigendes Programm

Schlussregel

Physikalisches System

Vektorraum

Zeichenkette

Regulärer Ausdruck

Schnelltaste

Körper <Physik>

Zustand

Speicherabzug

Wort <Informatik>

GRASS <Programm>

Skalarprodukt

Bitrate

### Metadaten

#### Formale Metadaten

Titel | GRAP: define and match graph patterns within binaries |

Serientitel | REcon 2017 Brussels Hacking Conference |

Teil | 10 |

Anzahl der Teile | 20 |

Autor |
Thierry, Aurelien Thieuleux, Jonathan |

Lizenz |
CC-Namensnennung 4.0 International: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. |

DOI | 10.5446/32387 |

Herausgeber | REcon |

Erscheinungsjahr | 2017 |

Sprache | Englisch |

Produktionsort | Brüssel |

#### Inhaltliche Metadaten

Fachgebiet | Informatik |

Abstract | Disassembled binary code can be turned into a graph of instructions linked by possible execution flow (Control Flow Graph). Based on academic research on malware detection through graph matching and facing large numbers of similar files to analyze, we aim to provide accurate results to an analyst working on malware families. Our approach is a YARA-like detection tool: GRAP matches user-defined graph patterns against the CFG of a given code. GRAP is a standalone tool that takes patterns and binary files, uses a Capstone-based disassembler to obtain the CFGs from the binaries, then matches the patterns against them. Patterns are user-defined graphs with instruction conditions (“opcode is xor and arg1 is eax”) and repetition conditions (3 identical instructions, basic blocks…). The algorithm solves a simplified version of the subgraph isomorphism problem, allowing the matching to be very quick. It can be used to find generic patterns such as loops and to write signatures to detect malware variants. We also developed a plugin giving IDA the capabilities to detect and browse matches directly within the GUI. Python bindings are available to create scripts based on GRAP and extract valuable information (addresses, instructions) from matched parts. In this talk, we will introduce the algorithms used and then focus on practical use cases: detect common patterns (from the command line or within IDA), create a malware pattern, and extract information from matched instructions. The tool and the plugin will be released under an open source license. |