Merken
GRAP: define and match graph patterns within binaries
Automatisierte Medienanalyse
Diese automatischen Videoanalysen setzt das TIBAVPortal ein:
Szenenerkennung — Shot Boundary Detection segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.
Texterkennung – Intelligent Character Recognition erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.
Spracherkennung – Speech to Text notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.
Bilderkennung – Visual Concept Detection indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).
Verschlagwortung – Named Entity Recognition beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.
Erkannte Entitäten
Sprachtranskript
00:00
i at Al Hack rapid mankind ruled on time and it the I'll arm wording thousand 17 we
00:21
so we're going to talk about and pattern matching
00:24
and other matching within and there is and the buttons and putting the define our graphs so we will see how we can lights and buttons as a graph and match them with inventories just a quick word about ourselves and so I am the only entity Mirandese that Airbus and an Jonathan through time also my way on the east but that extension sold through our
00:49
presentation we will talk about some arrest simple which is a backspace sexually that's more like a
00:54
where family i it was 1st seen in 2015 and their rights to answer so what we found in what is what we find in the space is small decryption function that we agree that encrypted configuration by suggestions is also always with 3 entries so what we see is that some of the decryption function is quite simple so it is more with in here we can see that on the left so you are simply a loop with because all of which uh there's an argument 11 then followed with a set that has as argument 25 In the backspace family we find many bottoms of this decryption some so With this scalar family what you want to do is 1st to detection so the detector whereas then to classify the violence based on the decrease from the gorithm that is used and then to be able to decrypt the configuration viable since we know which algorithm is being used 1 thing you can use to do that is to use up your eyes bytes at you realize so we're gonna expressions and bites and so if we look like the following year expression you save in order to detect absorbed with is then the sub you would say OK I need the bite 18 then they need any bytes can thinking about the name the debate 11 which is the argument here of the door even then they need the that is bait 80 then in debates then the by 25 which is the argument of the sub but this and the way of writing and matching as signature is that which is easy to read and so what we wanted to do is list let's signature that are based on the instructions that are within the binary and how they are linked to each other so they can of and signature that we want to do is to say for this impediment to say OK not find me absorb that has as argument to 11 and it is directly followed with a server that has has argument to 25 I didn't a quick overview of the abrupt projects so since we're working on graphs and we use that that language to move the graphs that is a simple text file is that we really see lots of examples later but it's quite simple twice the graphs in this language and graphical user and as is and then assigned onto that you can use the command line then it has spent many things through live wages the wasn't from graph from Python and we also developed and the that begins everything is open source and relay benign so graph has 2 main components the 1st 1 is the single that will take care binaries in the viewer that phase that is a graph representing the binary so it's and it's good respectively and space and get some the other part of what is a graph matching the library that can prosper that phase and then can do the graph matching is this part is more marketed in science at this point so a typical workflow for and on is with we to whites patterns so that's what we have here but in that that's so you we want a button for backspace the new ones when I expect they that exist so or you give it to the the some of that we give you a about and the backspace that that which is the sum of part uh as a graph and then he would give the pattern and the and left to would work matching neighboring this will give you matches and extracts sections that were interested in so backspace and I is a binary OK so is a sequence of bytes those bias can be interpreted in assembly language so the sequence of instructions so it would consider form the on the subroutine for the inca the decryption then where we can know create a graph of a series on the the flow of the execution of the instructions and in in the standalone tool we use a recursive function to create this graph then we all best for the disassembler in on captain the but for the IDA plugin we use the idea engine photographs the so we can create the graph at the right we have in principle uh sorry if my French were um we have the graph of backspace OK at the right and we want to match the pattern on the left which is the decryption routine to do so we use the usual we can use the usual algorithm sold at the left we have the pattern that we have the this graph in the gold is just to search over the 1st cold so again we found the 1st goal then we're going to look to the children node there is an added so we get to look to the right we have that absorbs all going to try the all the time so we have to push so we got a search form of cold so OK we match a the same thing we have ahead so we could go to look to the push at the right the right children and old so we try the hand yeah we match that so again there is a cone yeah there's a compilable OK so the next and so on so we found the bush OK and so on so OK is pretty simple but you need to check every child of the node and it's pretty small on the graphs so since that the usually solution is pretty strong graphs undergrads we wanted to have a fast resolution so we wanted to have different algorithms so 1 thing we can say about cover for us is that there are not many graphs so the 1st property they have is that usually when you are uh when we do this and then you see that and in order to confer for graph has at most 2 children it happens when you have for instance a and the sorry a conditional jump when you have a 1 share that would be reached when the condition is fulfilled and 1 share that with be which when the condition is not fulfilled so that the 1st and priority we haven't seen before West another property is that the children are not symmetrical usually you we find that the 1 of the child that is and directed the fully version is the next address and you know the child that is a remote searching somewhere Singapore 1 so we cannot in road then so we really false that in all graphs you can add to invert the order children so what you need to see now is that once we say that the children are all the inner graphs the object's we manipulating and not really grass anymore and that will help us have a very fast matching the other thing we can say about patterns so I talked about before for was in general but but and unless this specific has we so audio it's easier to match them from the 1st known so we really falls that every pattern has a 1st node so the 1st node that is and I would no that's that's from this node to every other node of the graph can be reached on the example on the right that is all part and for backspace you have the 1st node and that which is the root node because from from it you can reach every OK so with this efficiently we will have 1st mentioned I've been light so decided we show you how we can represent the about and also under different forms so the pattern we have now around the world to what we see on the left so we have a correlation added compare the push and push and the children all the reaction Chen number 1 inch and number 2 in general ones here so there is with this kind of thing since we have a word to the nodes which is of course there is a unique way to number this graph with a depthfirst search so that is the following if I think the root node so this accord I number this 1 1 then they take the 1st child of the caller ID I did here to the other and then to the native the Shiite animal 3 then they go back to the coarse every go to the genome but to remember it for and go to the genome or 1 number it's right so I can describe its and it takes way as this so that's equivalent to the bottom and that means OK I'm looking for a pattern for graph which has as a 1st know the corn then it has a 1st child if you take it to get to an ad then or I think again the fish and if you take it you get to compare then you go back to the 1st to the node numbered 1 before the corn so you go back here and say OK now you take the Chair number 2 when you get to push then dignified and the genome around and get to know the cost of this representation as a text and by the state's representation is equivalent to the bottom of condition of the work so once you have this was the question we need to ask is can be performed across all the
10:07
bottom within that this graph so we saw that the button on the left is a grant to the text representation I add some coral because it's may is going as it is and OK so now it's easier I need to find the for the initiative is to find uh would know that would be accord so I have candidates and the right here which is this 1 then I get to the you know and I get to the other part so in the end to see here if if there nodes that share number 1 that would be another so I think the chain number 1 year I get to also it doesn't match I have to go back so I have to find and although would know that could be a course so there is 1 here now I ask it is the trend in the 1 that is another yes I remember this 1 to like it there is the channel 1 that is the compare and yes it's worse than OK take me back to the change number to the node numbered 1 to back to the coarse so get their asking their child number 2 that's the push it works I say is there a general and that is if we should also words so this gives us a matching for all the button from the battle for that as well so that is the algorithm that we use so the difference with the previous algorithm is that now that children are ordered is we so that we know that we're looking for channel 1 or channel number 2 so you don't have these conditional and uh factor anymore you would only have but when known then for matchings it's much faster OK so I receptors algorithms now I we go back to the bottom so I talked about how we can match them but now I would say what opportunities do they look like so on the less you have or uh the bottom for backspace which was the decryption within and so we said that but on the other applied to specific fields of because we reuse like conditions reuse conditions of grows on our on arguments of instructions and address of this option so on so on the right here you have an an example of a button for a public space that matches exactly the 2 instructions on sub so what it says here so it's about finding it so as they were that's a adults keyword then we have the name of the patterns with the equipped with the guys on this and then it has we say it has no it's testing the 1st node is a with the the condition and the node is OK I want to match an infection that has or as it's of good and has as ultimate number 2 hidden I have another node that is the and the condition is I want to mention of good that is 7 and that as or argument number to 25 and then the party uh specifies it is so there is a need from node aid to the these it means they have a child from the baby that is that the but in a few more things about not options and Egyptians in oil and are part of the language of so I said that the but on graphs are supposed to other would know that is a node from which you can reach every other node so in order to specify this node you have to you can just say it uh Egyptian which is true that means this not the is the is the word you have the core uh feel that stays that there's a condition so we so we can have a conditional course for instance we have the get idea flag at midday the means to do with them what OK when you find a match for this note that I'm interested in this in this match so I want you to print it on wanted to keep it as sites like an excessive later the what is of the form of the funds for Chad options for edge of edges options sorry we so that shies on number of the the 1 or 2 so we have an option to specify the uninitiated such an 1 or genome so the example below is the same pattern as previously but with every option explicit and so we have still the condition we have a k this node is a word so that the 1st node of the pattern who are interested in this node and I want you to whom March the nodes with a so that the name of the node for the interested also in the notes so you have to market the and the Chen number of the edge was 1 because they are following each other that's sequential instructions the but is so back conditions conditions of can we be able to make conditions on instructions and they will be conditioned on the uh forward stranger than for select move IEEE aid is a x something for instance we have the condition and the address conditioned on the of God and the arguments of the number of arguments and text of the arguments so the the upstream fields we also have conditions on the number of incoming and the number of I agree it is a source so that means the number of others the and the number of children of a node the the the I have an example here the that said and looking for a function that is identical that so that there is a call to this node and so the code is in a and the construction encourages the and then there is no net between a and B a took so the number of the edges to because it's a it's not affirming assertions is a women's instruction and the condition we have an an and that the is that it has multiple incoming nodes that will help us find a function that are frequently called into code the OK so until now explain how we can match 1 the 2 1 infection that's what we really want to do next is to match 1 node to multiple instructions later when you want to match basic blocks sometimes you don't care how many insertion there is that is a good command to match a again this not needs much of what we think like so we can specify and minimum number of matches for another on the maximum number of matches and any works and sequential instructions with busy blacks so these are antigens that have 1 other online chat to think that words you have a busy good that we see here Nader so there is doubt to incoming nodes to a degree nodes and 93 insertions here to match this kind of things in graph this have to say again I want you to match an infection that means cone is true and you and you to match 1 or more instructions it is plus if I want to match the proportions instructions every why something like that like the and once insertion revealed good push and I want them to be repeated 3 times so by default graph we and takes the most matching instructions and when you will use this affair repetition but you have an option to change that you can say it is repeat is true so the 1st point was greedy repeatedly predictor most of them with lazy repeats you is that when the next missions was treaties which is as given example here so back to backs basis and then we had the decryption look so we're interested suspicion in that's on the 7 sections but we can say OK I want to room that looks a bit like that so we want to match 1 to 5 and officials we have 3 year but maybe there is less maybe there is more then we have the door then that the Sun than we have gain 1 to 5 instructions and then we have a condition engine that groups back to the beginning of the loop so but only do like that we have we have here on that's that representation on group on graphs 1 2 5 in infection then there's also 7 1 2 5 in this section and then the Commission engine in go up it's really where the button would look like that a user can do for instructions repeated that next 5 times and I want you to stop once the next condition is fulfilled so it would match for this node age we match moving Adam than here it's was stuff because the next mission and this would fit because we have absorbed than the rematch Gawronski we match 7 the we match again instructions so the compare moved into a step and the last 1 because that's when it says that the source instruction to begin with J and 2 after should run so that the case with the condition of from OK so back to a bike face symbols so I'm going to do do we have 100 the like this imposed I want to do that some of them and I then they want to detect the decryption algorithm that that is so that is the 1 we reverse that then I want to find variants of this division and with them and yeah we need to detect all the data from the poisons so do with for this so
19:21
as the I a symposium so a hundred
19:25
samples of x space and I have a
19:28
button so that the 1 we saw in the states and so we have a Lizzie repeats we have befriended XOR here then the Sun I want to match it against the the breaks the simple we know so as it has grabbed too much it's against simple this 1 so that the 1 we worked on the 1st 2 it is a single the the binary then it determines the patterns so there is
19:52
1 pattern that was wasn't the demographic which is a simple has 80 thousand nodes and it found when matching here so the matching is is described here so we have this option that match we
20:04
see the good on the said that working for but it so that flows for once a I wanted
20:10
written it backspace to see what happens would it take think away because um attitude needs to this as the somebody and 99 assembled takes around 20 seconds so my to receive a lot of matching that of text and then they would show you another option to have less less uh as their POS outputs and the more quiet 1 with 1 1 the 1 simple paraline so the for this area have that of texts so and I ask you to do it quietly so treated much quicker because we don't have to the the friends again I show you that the fisher of has the following areas adults phase which are the some of the versions of the banners the would so
20:59
now I launch it again but with a quite options so here I see all the atoms that all the and there is that's matched and all pattern but the 1 that so we find a lot of the sample that use the same direction with them that is the same but maybe there are more so I ru I read as graph to show me all the binary in so OK we see that's some samples match and some of them much so what they wanted to do next with find bonds to fund earns them in that way that which is here OK so here we have
21:35
the 2 groups going on there is it hard to school so what we see that it's very simple you ever before they the middle than another basic back we'll see how it is called so there is a function here that has a lot of calls to the Wiktionary algorithm is always to push is because the the other arguments then a call to the decryption function so I wanted to write about and to see and to detect this
22:02
so what they did but this pattern the so
22:08
I asked that we find something with 2 versions then the court then again to push is another course 4 times in the all points to a function that is that has the 1st big black then the look at then and then they got then return assertion and that what they say here that I'm interested in getting the new so I think it I did I said get a AG on the on the root node so I show you might troops lecture me that medical version because it's a is is kernels the
22:44
OK so we have here 2 preseason according to which is another call or point to the same function and we're interested in the here which is where the decryption algorithm happens the OK
22:57
so 92 matches against the mice impose the I have a home so I find here the 1st and within that we already know
23:14
it's exhausting 11 and 25 again the same here we have another 1 there is a sudden then there's always another constraints then another set then there is again the same 1 then when
23:26
we find that there are many more emotions see them through them or but what we
23:31
did next the what's right at some about for this these adequate environment so we have a there various more buttons so the 1st 1 is the 1 you so 1125 this 1 is the 1 we just sort of greedy and 20 and they were like 5 other ones of them and they so what we do now is we want to
23:53
match them against that experts and pose and so what we see here in that that wrote more and no samples matched and we know which algorithm was used so here and then that and that match with the loss of the resources a service also here there is another 1 and another 1 in Section so 1 more thing we wanted to know is that we have to have some and
24:22
match the and boundaries
24:26
so soon so I want to know where the match 1 simple thing in the 2 in this sort of at its worst like is we asked for uh stations 6 names and we see that it's you big so it's the 1st some symbols and seemed to be back to you peaks so what they do is I wanted to have a creates signature for these you peaks phase so I will look for our basic that loops that basic graphs that to themselves to have a patent for that's the real use now the the and this better so I find
25:13
to be the good to and we looked into whether
25:17
the ring but and wanted through them as signatures so I wrote a database nature which is which contains these troops invasive 1 speaks to so where they did next was to mention OK so there are lessons to be more simple than that but the debates and I always match the troops so I know how to do the guidance of backspacing and not
25:45
detects the phase spectrum peaks so I wanted to take them was recommended to the 2 signatures and I will mention the what is this a so we still have a few and measurement errors but most of them we see that there is a a violent of space and we know which algorithm is used all the opportunities OK but to
26:20
states so what we saw that is that we found 7 variants of debates so there was a the role the first one with the this or said 25 there was another 1 that we saw and then type of the atoms the we also found out that there are so many result but and want a smaller than the creek signature for this phase we match so what we did is the on off 100 the but this imposed he took around 20 seconds to the decimal them then to do the matching of 9 but only to the more than 2 seconds and then we see that there are like Simmons and Pfizer the properties is replaced the Huns uh the half of the samples that have the 1st button that we looked at and the other patterns are used sometimes and and the end there was the given uh in areas that we did not identify corrected the so what can
27:18
I come to simplify the creation of scripts the we create a Python bindings of the C + + library with weight and now with Python can give us some but we can all patterns and test graphs and can grab a button graphs and the graphs that can match but tense and past older results so here we have some examples of Python scripts so at the beginning we have as the imports
27:51
of a library then we have the will of the pattern that find the we re extracts those on the graph from the binary switch to the some find them with all the tests graph and after we much there the graph the about them In but there's graph the so here is an example of from 2 pattern we want to match uh 3 pushes so we use the match graph and now we can get old uh matching but patterns then we can select the pushes buttons and now we can go with this vicious patterns we can get all matching the instructions then can say OK if I want to get the the IDP so it give all instruction with the get IDP and then with like the 1st instructions so now we
28:53
have the instructions so we can request for some information like the addresses the all of the old constriction strings deal called the arguments and so on so OK we can match uh the decryption group in which so that we have a lot of Bush's and cold to the decryption strings so we want to match the 2 pushes and the cold today the decryption routine care so we have we want to check match the 1st which which of them and then the encrypted string and then the cold to the entry point of the function to be could be on both strings soul we have to know and decryption entry points we can match the during the address of the all the decryption loop so we have the address of the match there 1 so to get the entry point we know that the function is a very slow very early on tiny so we can say in we know also that the decryption entry point is before the decryption loop there so we can say we want to search an old between the decryption address and between the decryption address minus 30 and we want to match an old with at least 5 or more incoming so and that their own and
30:45
so this correct this the we have the imports yeah we get done the binaries would create the the graphs in then we have bowls but
31:04
tends to match the intra pants then too much the pushes before the call of the decryption function and we implement the old clients of the decryption algorithm so here you have subzone set said out and so on so the in my lord so we go through figure could description with a sample the so now we can decrypt bolstering so we have the sample we match the decree
31:54
also then we found did the address of this routine here we match the entry points with more than 1 of 5 or more than 5 incoming between the address of still
32:12
decryption minus 30 then the
32:15
address of the decryption the so we found the decryption of function the
32:21
standard parents so now we get to look
32:24
for the moment the pushes again
32:28
With the entry point of the function and then we matched 35 calls to this decryption function and so after this we take the strain bankrupt and string them lead a critic with the the algorithm and now we can see we have some registers and some fixed 5
32:51
things on the so now we can
32:55
but if and all samples
33:01
so here we have the 1st samples was strains
33:08
registries then we can find some big thing on all the
33:13
samples and so on so there we have all those strings of the configuration and so on of much space so here we have a tool but it
33:29
can be nice to have a plug to in I don't too much both the and those spot and so we have a nite uploading we convert the ITER graphs through no fall grapple then can much attention recently graph which is the Python bindings and then we can with GUI rolls balls matches and put some current and then we apply some filters to the so
33:58
match a single pattern for example of a fall s fall said keep we have 2 small groups for the initialization an if we create just 1 pattern for those 2 groups we have a lot of false positives so we get some fit the techniques that it's quite simple and abuse so on for example the 6 key so far as the form we have 2 groups fall were the 1st of these the 1st pattern in this and would be the 2nd pattern so if we don't have those loops in the function and it's not all functions so you can see at the right we have to to potential OK it's accepted but if uh in the EU we have just 1 part and it's rejected we have also the rates the and techniques so also quite simple but we have 2 hopes but if we have more than the more than 2 groups in the in the function so if we match uh for example 3 times the pattern 1 of the the function is rejected to because we just had the 1st but the 1st loop 1 time in the 2nd 1 9 2 then we have the threshold so sometimes with the the pattern can have some errors soul the at the right that we have the full patterns for 1 function so we match all pattern so OK it's good but in the middle of of the bottom number of thought answering that are not here so the function is also for example the function of decryption but we missed some some patents so we create treshold so here we have a threshold of 0 . 75 that means that at least 1 and 3 other form of uh patterns so in the middle it's accepted to that at the left we just have to add a force so you really we reject this function the last 1 is the overlapping so in for example again the FC false key we have the 2 groups who creates 1 pattern for the 1st and another for the Sun and if we match the same loop ways those 2 but and that means we met the same group and we don't this but in the example at the last we have a to match of the 2 matches of the 1st but then in 1 of the Sun and there is an overlapping between the 1st and the 2nd so if remove the pattern 1 of 4 the overlapping we have the honor of body the the matches the at the rights so we can say OK it's all function that this technique is not yet implemented in either graph so the creation of the rule fall the plugin is quite simple to solve for the SE fostered would create the uh the 1st loop so no 1 we give the the past to the adults time to the bottom down to give a name a decryption description of this button then we say OK a minimum but down is 1 so that mean I want to match 1 time uh this pattern almost then we fix the max pattern so we say I want at at next 1 time this pattern so we must match just 1 time the but the 1st so it the same things for the 2nd and then we traits of the function of that down so we give the 2 groups because we we must match those 2 of 2 loops in the function the city of st formed then we put the threshold that here is 1 so it means we must uh get the 2 uh um the to attempt to regulate the function then name the disk in the so demo OK
38:27
in all the in the the
38:36
the sessions for the family the it little the in FIL the fingerprint is to find the what dance so OK ranch here we can see we have the match some patterns so the we can think comment and then we have the decryption routine OK but sometimes the we don't know really where all the matched so we put some current so here means then it and then we have the current on the 2 instruction for the decryption look and we have some current on the in the entry point so that in here the we have good please the pushes for them the frames and if we don't know in bringing yeah them lunch again this can here we have the name of the function also with the number of instructions so is it's OK
40:06
and s if all the match or not efficient because the groups all very small and very generic so we have we still have many um false positives but we have a few false negative which is cool so do the cooling is appropriate content and we can try and maybe to to create all those signatures for all over cryptographic algorithm by chaos and so on and maybe you can put the filtering techniques that in some of key wrap because of its only in either grep so the perspective some mutation
40:56
of all tools all the arguments and structure of cold and so on all in of strings so sometimes we have to do some dirty hack to get uh 0 for example and you have the conditional jump so we want to match Ernesta chairman of strings that begin use J OK and we want the note to have 2 children because in the graph data insertion with children all polls and conditional jumps so will not also begin with C so that then then we have all the problem because in the sum the standalone tools and we use the captain and Zhang in in either we use the I'd engines so sometimes we for example the variables we have in Aida 25 h fall the in the the the the smaller value 25 in caps summary of all x 25 so again a dirty hacks to catch the both cases so often whom we catch 25 all the all x 25 so the solution is semantics the idea behind that is to say OK and I want to check if my the argument 1 is registered and all I have to check if the old cold is a conditional jumps and so on the same things for the for log 1 if we want to catch the intended the interviews with the value 25 not gravity to game there is another imitation of the how right bottoms that is and that we can we explain how we can do not repetition that there is 1 new mutation that that is the following year we can see the 1 with the repetition we can say I want the least number of matched instructions or I want the maximum number of matches which there is no in between so the problem is is this 1 we have gotten that say is that I want to match 1 to 4 instruction then name want to match absorb more than 1 too much of course and then I say OK this alpha instructions so it's 1 2 5 anything then the law than the core and so there is a lazy we been here we can develop a true or false if we put true between we mean that you want to dig the least number of instructions to you if you put false move to take the maximum number insertions so we'll see how to match the spot on on the on the binary year uh graph that would have a push in the push then or then another goes and then of course so with the foreign with and the current that was that if we put lazy be to shrewd the matching for the any know them with step once it's which is absorbed so for any it with they push push then interest up then or node we meds absorb than the core node which way to much of it or it doesn't work so there is no match so it doesn't work was later repeats at true if it would later repeated false so any any node would try to match the maximum or instruction so the problem is it is greedy so the any know that we match push push all because it matches for insertions and then we get to the door node that finds a corner so doesn't match the naughty there so there is no way for all to now to match this a binary thing with this pattern is counterintuitive because as with regular expressions would like to say OK I don't care if any takes 1 2 3 of all for the instructions I would like it is the more so of course the solution may be also for granted to would be to try every year number of repetitions of 1 2 3 4 and of course attributes Larosa lead to watch watch out for and but number forms OK we have a lot of more idea that you would like to implement in the buttons 1 thing that is limited also with the current version is that for basic goods when you have a instructions on this the condition and basic blocks for now we apply to every node and that you would like to say is I want of is graph that has at least ones or insertion off when you get graph that has had his to the insertion 1 compare instruction that again a defined as a maybe we'll also work on that data so we said that the children are numbered so you have to specify when you do it is if you you looking but can number 1 or trend but sometimes we we want to say OK I don't care so it think and the for for instance for condition and and you could say this could be the Chad for when the conditions which all 4 when the condition is fulfilled I don't care so we would like to add an option that would be generators of a question mark and so they go with them to try the both options then as that and expand you would like to bit of atoms in that paragraph which would be able to for to say OK are 5 patterns and won the match if we find the 3 of them that's enough or we would like to say OK after battles B 1 and B to find new binary that has 1 match over the 1 that is directly followed by a match of the 2 that is there is 1 institution of the 1 that has as a child in this section of the 2 that we would like to do some puzzling may that's the last point is that for now we constructed the grass manually with the writing the text buys it for its it's OK when you have a the when you use streets its course quite fast but it would still be 0 to be able to select nodes in either up and then Kij exports and rule generator nicer that with the uh the all the nodes analogy that you want it to be much faster integrate patterns OK so this will end or uh all talk the conclusion is that the graph is at the same time a solution to the test by the meanings we have name that begins to fall guru was to create a tool that can make your own graph buttons that there are is its rights and easy to understand we believe it is used for for the diction on intimate and is we show that the space and his words the is a free open source so you can find information and this address as in perspectives for the future amid before that the 2 that we would like to add some but features we talked about processing the semantics information from instructions that we talked about condition and basic Rex we would like to be able to create buttons and directly with an that and then you would like to have many more examples of work untutored those we will want to play with a yes and on bikers on command vectors so as we end
47:59
also works thank you for listening and we try to answer questions if thank you for the talk verdict of uh following if you had thought about using it since can have uh capstan and I don't know I think I haven't quite it but uh could you just steps timing error and that by that anniversary 2 steps in Nader yeah but you we thought about it but we have not decided gets hot enough to put into the system the
00:00
Binärdaten
Graph
Mustervergleich
Plug in
Wort <Informatik>
Graphentheorie
Hacker
Computeranimation
00:46
Gewichtete Summe
Polynom
Parser
RaumZeit
Computeranimation
Matching <Graphentheorie>
Algorithmus
Softwaretest
Mustersprache
Teilgraph
Kontrollstruktur
Phasenumwandlung
Korrelationsfunktion
Koroutine
Kategorie <Mathematik>
Malware
Disassembler
Menge
Rechter Winkel
Einheit <Mathematik>
Wurzel <Mathematik>
Konditionszahl
Server
Graphentheorie
Instantiierung
Folge <Mathematik>
Kontrollstruktur
Tiefensuche
Äquivalenzklasse
Nummerung
Viewer
Überlagerung <Mathematik>
Chiffrierung
Open Source
Bildschirmmaske
Knotenmenge
Programmbibliothek
Konfigurationsraum
Drucksondierung
Algorithmus
Open Source
Verzweigendes Programm
Elektronische Publikation
Datensatz
Schnelltaste
Wort <Informatik>
Punkt
Gemeinsamer Speicher
Adressraum
Formale Sprache
Selbstrepräsentation
Versionsverwaltung
Familie <Mathematik>
Binärcode
Marketinginformationssystem
Analysis
Eins
Arithmetischer Ausdruck
Minimum
Wurzel <Mathematik>
Bildauflösung
Umwandlungsenthalpie
Graphentheorie
Lineares Funktional
Parametersystem
Assembler
Plug in
Reihe
Nummerung
Elektronische Unterschrift
Knotenmenge
Systemaufruf
Variable
Mustersprache
Chiffrierung
Isomorphismus
Garbentheorie
Projektive Ebene
Disassembler
Aggregatzustand
Kombinatorische Gruppentheorie
Assembler
Graph
Koroutine
Zusammenhängender Graph
Binärcode
Elektronische Publikation
Matching <Graphentheorie>
Graph
Konfigurationsraum
Bildauflösung
MailingListe
Objekt <Kategorie>
Formale Sprache
Mereologie
GRASS <Programm>
Skalarprodukt
Vollständigkeit
10:05
Bit
Einfügungsdämpfung
Punkt
Gemeinsamer Speicher
Extrempunkt
Formale Sprache
Selbstrepräsentation
Adressraum
Gruppenkeim
Computeranimation
Matching <Graphentheorie>
Algorithmus
Fahne <Mathematik>
Trennschärfe <Statistik>
Mustersprache
Minimum
Konditionszahl
Default
Folge <Mathematik>
Konstruktor <Informatik>
Parametersystem
Lineares Funktional
Güte der Anpassung
Stichprobe
Systemaufruf
Nummerung
pBlock
Systemaufruf
Teilbarkeit
Arithmetisches Mittel
Mustersprache
Datenfeld
Chiffrierung
Verkettung <Informatik>
Genetische Programmierung
Twitter <Softwareplattform>
Wurzel <Mathematik>
Einheit <Mathematik>
Rechter Winkel
Konditionszahl
Grundsätze ordnungsmäßiger Datenverarbeitung
Garbentheorie
pBlock
Graphentheorie
Instantiierung
Subtraktion
Web Site
Folge <Mathematik>
Gerichteter Graph
Gewicht <Mathematik>
Befehlscode
Mathematisierung
Nummerung
Code
Chiffrierung
Graph
Loop
Knotenmenge
Bildschirmmaske
Adressraum
Demo <Programm>
Drucksondierung
Algorithmus
Graph
Matching <Graphentheorie>
Open Source
Default
Netzwerktopologie
Zeichenkette
Minimalgrad
Körper <Physik>
Loop
Basisvektor
Mereologie
Wort <Informatik>
Speicherabzug
Skalarprodukt
19:18
Gerichteter Graph
Matching <Graphentheorie>
Stichprobe
Disjunktion <Logik>
Knotenmenge
Visuelles System
RaumZeit
Computeranimation
Netzwerktopologie
Graph
Chiffrierung
Knotenmenge
Softwaretest
Stichprobenumfang
Mustersprache
Konditionszahl
Kontrollstruktur
Eindeutigkeit
20:03
Zwei
Versionsverwaltung
Stichprobe
EulerWinkel
Knotenmenge
Computeranimation
Netzwerktopologie
Graph
Softwaretest
FTest
Flächeninhalt
Eindeutigkeit
Phasenumwandlung
Funktion <Mathematik>
20:58
Lineares Funktional
Parametersystem
Dualitätstheorie
Graph
Stichprobe
Gruppenkeim
Systemaufruf
Ordinalzahl
Binärcode
Computeranimation
Quantisierung <Physik>
Richtung
Algorithmus
Chiffrierung
Mustersprache
Stichprobenumfang
Elektronischer Fingerabdruck
22:00
Lineares Funktional
Punkt
Gerichteter Graph
Versionsverwaltung
Stichprobe
Mathematisierung
Visuelles System
Computeranimation
Kernel <Informatik>
Chiffrierung
Knotenmenge
Loop
Mustersprache
Wurzel <Mathematik>
22:42
Inklusion <Mathematik>
Nebenbedingung
Punkt
Kontrollstruktur
Matching <Graphentheorie>
Stichprobe
Systemaufruf
Knotenmenge
Computeranimation
Netzwerktopologie
Graph
Chiffrierung
Algorithmus
Softwaretest
Loop
Eindeutigkeit
23:23
Einfügungsdämpfung
Gerichteter Graph
Matching <Graphentheorie>
Kontrollstruktur
Stichprobe
Visuelles System
Knotenmenge
Systemaufruf
Computeranimation
Eins
Chiffrierung
Graph
Dienst <Informatik>
Softwaretest
Algorithmus
Loop
Stichprobenumfang
Garbentheorie
GreedyAlgorithmus
Programmierumgebung
24:21
Graph
Loop
Randwert
Last
Elektronische Publikation
Genetische Programmierung
Matching <Graphentheorie>
Arbeitsplatzcomputer
Stichprobe
Content <Internet>
Graphentheorie
Dateiformat
Elektronische Unterschrift
Codierung
Computeranimation
25:12
Gerichteter Graph
Messfehler
Datenhaltung
Natürliche Zahl
Stichprobe
Benutzeroberfläche
Elektronische Unterschrift
Punktspektrum
Knotenmenge
RaumZeit
Computeranimation
Netzwerktopologie
Chiffrierung
Graph
Algorithmus
Softwaretest
Loop
Einheit <Mathematik>
Konditionszahl
Eindeutigkeit
Phasenumwandlung
26:20
Resultante
Gewicht <Mathematik>
Kontrollstruktur
Parser
Ordinalzahl
Assembler
Computeranimation
Graph
Chiffrierung
Last
Stichprobenumfang
Mustersprache
Datentyp
Programmbibliothek
Skript <Programm>
Phasenumwandlung
Graphentheorie
Softwaretest
Schnelltaste
Matching <Graphentheorie>
Kategorie <Mathematik>
Zwei
Stichprobe
Elektronische Unterschrift
Mustersprache
Schnelltaste
Flächeninhalt
Loop
Disassembler
Graphentheorie
Aggregatzustand
27:50
Punkt
Gerichteter Graph
Befehlscode
Adressraum
Gruppenkeim
Parser
Information
Computeranimation
Chiffrierung
Loop
Graph
Last
Adressraum
Mustersprache
Koroutine
Konditionszahl
Programmbibliothek
Softwaretest
Graphentheorie
Koroutine
Algorithmus
Lineares Funktional
Parametersystem
Elektronische Publikation
Matching <Graphentheorie>
Graph
Konfigurationsraum
Knotenmenge
Systemaufruf
Zeichenkette
Mustersprache
Chiffrierung
Schnelltaste
Information
Disassembler
Zeichenkette
30:43
Lineares Funktional
Fehlermeldung
Gerichteter Graph
Systemaufruf
Information
Visuelles System
Binärcode
Hochdruck
Computeranimation
Mustersprache
Chiffrierung
Deskriptive Statistik
Client
Chiffrierung
Algorithmus
Funktion <Mathematik>
Menge
Adressraum
Stichprobenumfang
Disassembler
Graphentheorie
Figurierte Zahl
31:52
Algorithmus
Lineares Funktional
Punkt
Gerichteter Graph
Prozess <Informatik>
Momentenproblem
Adressraum
Stichprobe
Systemaufruf
Systemaufruf
Computeranimation
Zeichenkette
Physikalisches System
Chiffrierung
Mustersprache
Chiffrierung
Algorithmus
Funktion <Mathematik>
Adressraum
Vererbungshierarchie
32:50
Binärcode
Internetworking
Schnelltaste
Firefox <Programm>
Filter <Stochastik>
Prozess <Informatik>
Graph
Plug in
Stichprobe
Browser
Umsetzung <Informatik>
RaumZeit
Systemaufruf
Computeranimation
Zeichenkette
Graph
Mustersprache
Physikalisches System
Funktion <Mathematik>
Parametersystem
Stichprobenumfang
Gleitendes Mittel
Graphentheorie
Konfigurationsraum
Konfigurationsdatenbank
Zeichenkette
33:56
Korrelationsfunktion
Ortsoperator
Extrempunkt
Gruppenkeim
Computeranimation
Deskriptive Statistik
Loop
Bildschirmmaske
MiniDisc
Mustersprache
Minimum
Schwellwertverfahren
Inklusion <Mathematik>
Lineares Funktional
Schwellwertverfahren
Matching <Graphentheorie>
Graph
Kryptologie
Einfache Genauigkeit
Schlussregel
Nummerung
Plug in
Bitrate
Schlussregel
Mustersprache
Chiffrierung
Funktion <Mathematik>
Forcing
Loop
Rechter Winkel
Mereologie
Bitrate
Zeitzone
Schlüsselverwaltung
Fehlermeldung
38:34
Korrelationsfunktion
Lineares Funktional
Punkt
Ortsoperator
Matching <Graphentheorie>
Familie <Mathematik>
Gruppenkeim
Negative Zahl
Nummerung
Elektronische Unterschrift
Computeranimation
Mustersprache
Algorithmus
Chiffrierung
Verschlingung
Kryptologie
Perspektive
Generizität
Mustersprache
Unordnung
Elektronischer Fingerabdruck
MIDI <Musikelektronik>
Inhalt <Mathematik>
40:53
Bit
Einfügungsdämpfung
Punkt
Gewichtete Summe
Extrempunkt
Adressraum
Versionsverwaltung
Parser
Gesetz <Physik>
Binärcode
Analysis
RaumZeit
Computeranimation
Formale Semantik
Matching <Graphentheorie>
Mustersprache
Minimum
Konditionszahl
Hacker
Softwaretest
Schreiben <Datenverarbeitung>
Parametersystem
Perspektive
Plug in
Kryptologie
Güte der Anpassung
Nummerung
pBlock
Kugelkappe
Arithmetisches Mittel
Mustersprache
Generator <Informatik>
Twitter <Softwareplattform>
Konditionszahl
Garbentheorie
Information
pBlock
Fehlermeldung
Zeichenkette
Instantiierung
Gravitation
Formale Semantik
Gerichteter Graph
Befehlscode
Regulärer Ausdruck
Ordinalzahl
Open Source
Bildschirmmaske
Variable
Knotenmenge
Spieltheorie
Perspektive
JensenMaß
Datenstruktur
Disjunktion <Logik>
MetaTag
Attributierte Grammatik
Algorithmus
Matching <Graphentheorie>
Graph
Open Source
Multiplikationssatz
Verzweigendes Programm
Schlussregel
Physikalisches System
Vektorraum
Zeichenkette
Regulärer Ausdruck
Schnelltaste
Körper <Physik>
Zustand
Speicherabzug
GRASS <Programm>
Wort <Informatik>
Skalarprodukt
Bitrate
Metadaten
Formale Metadaten
Titel  GRAP: define and match graph patterns within binaries 
Serientitel  REcon 2017 Brussels Hacking Conference 
Teil  10 
Anzahl der Teile  20 
Autor 
Thierry, Aurelien

Lizenz 
CCNamensnennung 4.0 International: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. 
DOI  10.5446/32387 
Herausgeber  REcon 
Erscheinungsjahr  2017 
Sprache  Englisch 
Produktionsort  Brüssel 
Inhaltliche Metadaten
Fachgebiet  Informatik 
Abstract  Disassembled binary code can be turned into a graph of instructions linked by possible execution flow (Control Flow Graph). Based on academic research on malware detection through graph matching and facing large numbers of similar files to analyze, we aim to provide accurate results to an analyst working on malware families. Our approach is a YARAlike detection tool: GRAP matches userdefined graph patterns against the CFG of a given code. GRAP is a standalone tool that takes patterns and binary files, uses a Capstonebased disassembler to obtain the CFGs from the binaries, then matches the patterns against them. Patterns are userdefined graphs with instruction conditions (“opcode is xor and arg1 is eax”) and repetition conditions (3 identical instructions, basic blocks…). The algorithm solves a simplified version of the subgraph isomorphism problem, allowing the matching to be very quick. It can be used to find generic patterns such as loops and to write signatures to detect malware variants. We also developed a plugin giving IDA the capabilities to detect and browse matches directly within the GUI. Python bindings are available to create scripts based on GRAP and extract valuable information (addresses, instructions) from matched parts. In this talk, we will introduce the algorithms used and then focus on practical use cases: detect common patterns (from the command line or within IDA), create a malware pattern, and extract information from matched instructions. The tool and the plugin will be released under an open source license. 