GRAP: define and match graph patterns within binaries
Formal Metadata
Title 
GRAP: define and match graph patterns within binaries

Title of Series  
Part Number 
10

Number of Parts 
20

Author 

License 
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2017

Language 
English

Production Place 
Brüssel

Content Metadata
Subject Area  
Abstract 
Disassembled binary code can be turned into a graph of instructions linked by possible execution flow (Control Flow Graph). Based on academic research on malware detection through graph matching and facing large numbers of similar files to analyze, we aim to provide accurate results to an analyst working on malware families. Our approach is a YARAlike detection tool: GRAP matches userdefined graph patterns against the CFG of a given code. GRAP is a standalone tool that takes patterns and binary files, uses a Capstonebased disassembler to obtain the CFGs from the binaries, then matches the patterns against them. Patterns are userdefined graphs with instruction conditions (“opcode is xor and arg1 is eax”) and repetition conditions (3 identical instructions, basic blocks…). The algorithm solves a simplified version of the subgraph isomorphism problem, allowing the matching to be very quick. It can be used to find generic patterns such as loops and to write signatures to detect malware variants. We also developed a plugin giving IDA the capabilities to detect and browse matches directly within the GUI. Python bindings are available to create scripts based on GRAP and extract valuable information (addresses, instructions) from matched parts. In this talk, we will introduce the algorithms used and then focus on practical use cases: detect common patterns (from the command line or within IDA), create a malware pattern, and extract information from matched instructions. The tool and the plugin will be released under an open source license.

00:00
Graph theory
Word
Arm
Pattern matching
Computer animation
Hacker (term)
Multiplication sign
Plugin (computing)
00:46
Presentation of a group
Greatest element
State of matter
Multiplication sign
Coroutine
1 (number)
Sheaf (mathematics)
Set (mathematics)
Parameter (computer programming)
Mereology
Encryption
Series (mathematics)
Covering space
Programming language
Algorithm
Binary code
Shared memory
Electronic mailing list
Instance (computer science)
Sequence
Electronic signature
Category of being
Configuration space
Right angle
Pattern language
Summierbarkeit
File viewer
Speicheradresse
Point (geometry)
Server (computing)
Functional (mathematics)
Computer file
Open source
Cone penetration test
Image resolution
Connectivity (graph theory)
Control flow
Branch (computer science)
Number
Revision control
Latent heat
Crosscorrelation
Root
Software design pattern
Representation (politics)
Address space
Condition number
Form (programming)
Graph (mathematics)
Matching (graph theory)
Assembly language
Projective plane
Expression
Depthfirst search
Grass (card game)
Equivalence relation
Graph theory
Word
Maize
Computer animation
Object (grammar)
Family
Disassembler
Library (computing)
10:05
Group action
Greatest element
Code
Multiplication sign
Genetic programming
Sheaf (mathematics)
Insertion loss
Parameter (computer programming)
Mereology
Mathematics
Different (Kate Ryan album)
Core dump
Encryption
Flag
God
Programming language
Algorithm
Block (periodic table)
Constructor (objectoriented programming)
Shared memory
Maxima and minima
Special unitary group
Bit
Instance (computer science)
Sequence
Degree (graph theory)
Arithmetic mean
Chain
Website
Right angle
Pattern language
Speicheradresse
Directed graph
Point (geometry)
Functional (mathematics)
Cone penetration test
Divisor
Control flow
Field (computer science)
Number
Twitter
Goodness of fit
Software design pattern
Representation (politics)
Selectivity (electronic)
Form (programming)
Condition number
Default (computer science)
Graph (mathematics)
Matching (graph theory)
Weight
Basis <Mathematik>
System call
Graph theory
Word
Computer animation
Personal digital assistant
19:18
Exclusive or
Matching (graph theory)
Computer animation
Software design pattern
Sampling (statistics)
Control flow
Special unitary group
Pattern language
Address space
20:03
Area
Revision control
Fisher's exact test
Computer animation
Euler angles
Function (mathematics)
2 (number)
20:58
Functional (mathematics)
Algorithm
Group action
Graph (mathematics)
Direction (geometry)
Binary code
Sampling (statistics)
Parameter (computer programming)
System call
Computer animation
Atomic number
Encryption
Pattern language
22:00
Revision control
Point (geometry)
Functional (mathematics)
Kernel (computing)
Computer animation
Root
Multiplication sign
Pattern language
22:42
Point (geometry)
Algorithm
Matching (graph theory)
Constraint (mathematics)
Computer animation
Encryption
System call
23:23
Greedy algorithm
Algorithm
Service (economics)
Matching (graph theory)
Computer animation
Integrated development environment
1 (number)
Sheaf (mathematics)
Sampling (statistics)
Insertion loss
24:21
Graph theory
Matching (graph theory)
Computer animation
Workstation <Musikinstrument>
Genetic programming
Boundary value problem
Control flow
Electronic signature
25:12
Algorithm
Computer animation
Natural number
Observational error
Database
Address space
Spectrum (functional analysis)
Electronic signature
26:20
Area
Scripting language
Matching (graph theory)
State of matter
Weight
Keyboard shortcut
Sampling (statistics)
2 (number)
Electronic signature
Graph theory
Type theory
Category of being
Computer animation
Atomic number
Software design pattern
Software testing
Resultant
Library (computing)
27:50
Point (geometry)
Functional (mathematics)
Group action
Graph (mathematics)
Matching (graph theory)
Information
Coroutine
Control flow
Parameter (computer programming)
Computer animation
String (computer science)
Software design pattern
Encryption
Pattern language
Software testing
Speicheradresse
Address space
Library (computing)
30:43
Graph theory
Functional (mathematics)
Algorithm
Computer animation
Binary code
Encryption
Sampling (statistics)
Set (mathematics)
Figurate number
Client (computing)
System call
Descriptive statistics
31:52
Point (geometry)
Functional (mathematics)
Algorithm
Computer animation
Inheritance (objectoriented programming)
Moment (mathematics)
Encryption
System call
Speicheradresse
32:50
Windows Registry
Filter <Stochastik>
Graph theory
Graph (mathematics)
Computer animation
String (computer science)
Keyboard shortcut
Sampling (statistics)
Configuration space
Moving average
Address space
33:56
Functional (mathematics)
Group action
Greatest element
Multiplication sign
Control flow
Mereology
Rule of inference
Thresholding (image processing)
Number
Bit rate
Singleprecision floatingpoint format
Software design pattern
Encryption
Error message
Descriptive statistics
Position operator
Plugin (computing)
Form (programming)
Graph (mathematics)
Matching (graph theory)
Key (cryptography)
Forcing (mathematics)
Maxima and minima
Special unitary group
Computer animation
MiniDisc
Right angle
Pattern language
38:34
Point (geometry)
Functional (mathematics)
Group action
Algorithm
Matching (graph theory)
Content (media)
Chaos (cosmogony)
Cryptography
Perspective (visual)
Number
Electronic signature
Computer animation
Software design pattern
Encryption
Family
Position operator
Fingerprint
40:53
Greatest element
Multiplication sign
Sheaf (mathematics)
Insertion loss
Parameter (computer programming)
Semantics (computer science)
Perspective (visual)
Spherical cap
Atomic number
Core dump
Error message
Physical system
Electric generator
Block (periodic table)
Binary code
Maxima and minima
Bit
Instance (computer science)
Regulärer Ausdruck <Textverarbeitung>
Variable (mathematics)
Arithmetic mean
Vector space
Pattern language
Summierbarkeit
Speicheradresse
Point (geometry)
Open source
Branch (computer science)
Rule of inference
Number
Twitter
Attribute grammar
Revision control
Goodness of fit
Hacker (term)
String (computer science)
Software design pattern
Software testing
Data structure
Address space
Condition number
Alpha (investment)
Form (programming)
Matching (graph theory)
Graph (mathematics)
Information
Physical law
Grass (card game)
Word
Computer animation
Personal digital assistant
Gravitation
Game theory
00:00
i at Al Hack rapid mankind ruled on time and it the I'll arm wording thousand 17 we
00:21
so we're going to talk about and pattern matching
00:24
and other matching within and there is and the buttons and putting the define our graphs so we will see how we can lights and buttons as a graph and match them with inventories just a quick word about ourselves and so I am the only entity Mirandese that Airbus and an Jonathan through time also my way on the east but that extension sold through our
00:49
presentation we will talk about some arrest simple which is a backspace sexually that's more like a
00:54
where family i it was 1st seen in 2015 and their rights to answer so what we found in what is what we find in the space is small decryption function that we agree that encrypted configuration by suggestions is also always with 3 entries so what we see is that some of the decryption function is quite simple so it is more with in here we can see that on the left so you are simply a loop with because all of which uh there's an argument 11 then followed with a set that has as argument 25 In the backspace family we find many bottoms of this decryption some so With this scalar family what you want to do is 1st to detection so the detector whereas then to classify the violence based on the decrease from the gorithm that is used and then to be able to decrypt the configuration viable since we know which algorithm is being used 1 thing you can use to do that is to use up your eyes bytes at you realize so we're gonna expressions and bites and so if we look like the following year expression you save in order to detect absorbed with is then the sub you would say OK I need the bite 18 then they need any bytes can thinking about the name the debate 11 which is the argument here of the door even then they need the that is bait 80 then in debates then the by 25 which is the argument of the sub but this and the way of writing and matching as signature is that which is easy to read and so what we wanted to do is list let's signature that are based on the instructions that are within the binary and how they are linked to each other so they can of and signature that we want to do is to say for this impediment to say OK not find me absorb that has as argument to 11 and it is directly followed with a server that has has argument to 25 I didn't a quick overview of the abrupt projects so since we're working on graphs and we use that that language to move the graphs that is a simple text file is that we really see lots of examples later but it's quite simple twice the graphs in this language and graphical user and as is and then assigned onto that you can use the command line then it has spent many things through live wages the wasn't from graph from Python and we also developed and the that begins everything is open source and relay benign so graph has 2 main components the 1st 1 is the single that will take care binaries in the viewer that phase that is a graph representing the binary so it's and it's good respectively and space and get some the other part of what is a graph matching the library that can prosper that phase and then can do the graph matching is this part is more marketed in science at this point so a typical workflow for and on is with we to whites patterns so that's what we have here but in that that's so you we want a button for backspace the new ones when I expect they that exist so or you give it to the the some of that we give you a about and the backspace that that which is the sum of part uh as a graph and then he would give the pattern and the and left to would work matching neighboring this will give you matches and extracts sections that were interested in so backspace and I is a binary OK so is a sequence of bytes those bias can be interpreted in assembly language so the sequence of instructions so it would consider form the on the subroutine for the inca the decryption then where we can know create a graph of a series on the the flow of the execution of the instructions and in in the standalone tool we use a recursive function to create this graph then we all best for the disassembler in on captain the but for the IDA plugin we use the idea engine photographs the so we can create the graph at the right we have in principle uh sorry if my French were um we have the graph of backspace OK at the right and we want to match the pattern on the left which is the decryption routine to do so we use the usual we can use the usual algorithm sold at the left we have the pattern that we have the this graph in the gold is just to search over the 1st cold so again we found the 1st goal then we're going to look to the children node there is an added so we get to look to the right we have that absorbs all going to try the all the time so we have to push so we got a search form of cold so OK we match a the same thing we have ahead so we could go to look to the push at the right the right children and old so we try the hand yeah we match that so again there is a cone yeah there's a compilable OK so the next and so on so we found the bush OK and so on so OK is pretty simple but you need to check every child of the node and it's pretty small on the graphs so since that the usually solution is pretty strong graphs undergrads we wanted to have a fast resolution so we wanted to have different algorithms so 1 thing we can say about cover for us is that there are not many graphs so the 1st property they have is that usually when you are uh when we do this and then you see that and in order to confer for graph has at most 2 children it happens when you have for instance a and the sorry a conditional jump when you have a 1 share that would be reached when the condition is fulfilled and 1 share that with be which when the condition is not fulfilled so that the 1st and priority we haven't seen before West another property is that the children are not symmetrical usually you we find that the 1 of the child that is and directed the fully version is the next address and you know the child that is a remote searching somewhere Singapore 1 so we cannot in road then so we really false that in all graphs you can add to invert the order children so what you need to see now is that once we say that the children are all the inner graphs the object's we manipulating and not really grass anymore and that will help us have a very fast matching the other thing we can say about patterns so I talked about before for was in general but but and unless this specific has we so audio it's easier to match them from the 1st known so we really falls that every pattern has a 1st node so the 1st node that is and I would no that's that's from this node to every other node of the graph can be reached on the example on the right that is all part and for backspace you have the 1st node and that which is the root node because from from it you can reach every OK so with this efficiently we will have 1st mentioned I've been light so decided we show you how we can represent the about and also under different forms so the pattern we have now around the world to what we see on the left so we have a correlation added compare the push and push and the children all the reaction Chen number 1 inch and number 2 in general ones here so there is with this kind of thing since we have a word to the nodes which is of course there is a unique way to number this graph with a depthfirst search so that is the following if I think the root node so this accord I number this 1 1 then they take the 1st child of the caller ID I did here to the other and then to the native the Shiite animal 3 then they go back to the coarse every go to the genome but to remember it for and go to the genome or 1 number it's right so I can describe its and it takes way as this so that's equivalent to the bottom and that means OK I'm looking for a pattern for graph which has as a 1st know the corn then it has a 1st child if you take it to get to an ad then or I think again the fish and if you take it you get to compare then you go back to the 1st to the node numbered 1 before the corn so you go back here and say OK now you take the Chair number 2 when you get to push then dignified and the genome around and get to know the cost of this representation as a text and by the state's representation is equivalent to the bottom of condition of the work so once you have this was the question we need to ask is can be performed across all the
10:07
bottom within that this graph so we saw that the button on the left is a grant to the text representation I add some coral because it's may is going as it is and OK so now it's easier I need to find the for the initiative is to find uh would know that would be accord so I have candidates and the right here which is this 1 then I get to the you know and I get to the other part so in the end to see here if if there nodes that share number 1 that would be another so I think the chain number 1 year I get to also it doesn't match I have to go back so I have to find and although would know that could be a course so there is 1 here now I ask it is the trend in the 1 that is another yes I remember this 1 to like it there is the channel 1 that is the compare and yes it's worse than OK take me back to the change number to the node numbered 1 to back to the coarse so get their asking their child number 2 that's the push it works I say is there a general and that is if we should also words so this gives us a matching for all the button from the battle for that as well so that is the algorithm that we use so the difference with the previous algorithm is that now that children are ordered is we so that we know that we're looking for channel 1 or channel number 2 so you don't have these conditional and uh factor anymore you would only have but when known then for matchings it's much faster OK so I receptors algorithms now I we go back to the bottom so I talked about how we can match them but now I would say what opportunities do they look like so on the less you have or uh the bottom for backspace which was the decryption within and so we said that but on the other applied to specific fields of because we reuse like conditions reuse conditions of grows on our on arguments of instructions and address of this option so on so on the right here you have an an example of a button for a public space that matches exactly the 2 instructions on sub so what it says here so it's about finding it so as they were that's a adults keyword then we have the name of the patterns with the equipped with the guys on this and then it has we say it has no it's testing the 1st node is a with the the condition and the node is OK I want to match an infection that has or as it's of good and has as ultimate number 2 hidden I have another node that is the and the condition is I want to mention of good that is 7 and that as or argument number to 25 and then the party uh specifies it is so there is a need from node aid to the these it means they have a child from the baby that is that the but in a few more things about not options and Egyptians in oil and are part of the language of so I said that the but on graphs are supposed to other would know that is a node from which you can reach every other node so in order to specify this node you have to you can just say it uh Egyptian which is true that means this not the is the is the word you have the core uh feel that stays that there's a condition so we so we can have a conditional course for instance we have the get idea flag at midday the means to do with them what OK when you find a match for this note that I'm interested in this in this match so I want you to print it on wanted to keep it as sites like an excessive later the what is of the form of the funds for Chad options for edge of edges options sorry we so that shies on number of the the 1 or 2 so we have an option to specify the uninitiated such an 1 or genome so the example below is the same pattern as previously but with every option explicit and so we have still the condition we have a k this node is a word so that the 1st node of the pattern who are interested in this node and I want you to whom March the nodes with a so that the name of the node for the interested also in the notes so you have to market the and the Chen number of the edge was 1 because they are following each other that's sequential instructions the but is so back conditions conditions of can we be able to make conditions on instructions and they will be conditioned on the uh forward stranger than for select move IEEE aid is a x something for instance we have the condition and the address conditioned on the of God and the arguments of the number of arguments and text of the arguments so the the upstream fields we also have conditions on the number of incoming and the number of I agree it is a source so that means the number of others the and the number of children of a node the the the I have an example here the that said and looking for a function that is identical that so that there is a call to this node and so the code is in a and the construction encourages the and then there is no net between a and B a took so the number of the edges to because it's a it's not affirming assertions is a women's instruction and the condition we have an an and that the is that it has multiple incoming nodes that will help us find a function that are frequently called into code the OK so until now explain how we can match 1 the 2 1 infection that's what we really want to do next is to match 1 node to multiple instructions later when you want to match basic blocks sometimes you don't care how many insertion there is that is a good command to match a again this not needs much of what we think like so we can specify and minimum number of matches for another on the maximum number of matches and any works and sequential instructions with busy blacks so these are antigens that have 1 other online chat to think that words you have a busy good that we see here Nader so there is doubt to incoming nodes to a degree nodes and 93 insertions here to match this kind of things in graph this have to say again I want you to match an infection that means cone is true and you and you to match 1 or more instructions it is plus if I want to match the proportions instructions every why something like that like the and once insertion revealed good push and I want them to be repeated 3 times so by default graph we and takes the most matching instructions and when you will use this affair repetition but you have an option to change that you can say it is repeat is true so the 1st point was greedy repeatedly predictor most of them with lazy repeats you is that when the next missions was treaties which is as given example here so back to backs basis and then we had the decryption look so we're interested suspicion in that's on the 7 sections but we can say OK I want to room that looks a bit like that so we want to match 1 to 5 and officials we have 3 year but maybe there is less maybe there is more then we have the door then that the Sun than we have gain 1 to 5 instructions and then we have a condition engine that groups back to the beginning of the loop so but only do like that we have we have here on that's that representation on group on graphs 1 2 5 in infection then there's also 7 1 2 5 in this section and then the Commission engine in go up it's really where the button would look like that a user can do for instructions repeated that next 5 times and I want you to stop once the next condition is fulfilled so it would match for this node age we match moving Adam than here it's was stuff because the next mission and this would fit because we have absorbed than the rematch Gawronski we match 7 the we match again instructions so the compare moved into a step and the last 1 because that's when it says that the source instruction to begin with J and 2 after should run so that the case with the condition of from OK so back to a bike face symbols so I'm going to do do we have 100 the like this imposed I want to do that some of them and I then they want to detect the decryption algorithm that that is so that is the 1 we reverse that then I want to find variants of this division and with them and yeah we need to detect all the data from the poisons so do with for this so
19:21
as the I a symposium so a hundred
19:25
samples of x space and I have a
19:28
button so that the 1 we saw in the states and so we have a Lizzie repeats we have befriended XOR here then the Sun I want to match it against the the breaks the simple we know so as it has grabbed too much it's against simple this 1 so that the 1 we worked on the 1st 2 it is a single the the binary then it determines the patterns so there is
19:52
1 pattern that was wasn't the demographic which is a simple has 80 thousand nodes and it found when matching here so the matching is is described here so we have this option that match we
20:04
see the good on the said that working for but it so that flows for once a I wanted
20:10
written it backspace to see what happens would it take think away because um attitude needs to this as the somebody and 99 assembled takes around 20 seconds so my to receive a lot of matching that of text and then they would show you another option to have less less uh as their POS outputs and the more quiet 1 with 1 1 the 1 simple paraline so the for this area have that of texts so and I ask you to do it quietly so treated much quicker because we don't have to the the friends again I show you that the fisher of has the following areas adults phase which are the some of the versions of the banners the would so
20:59
now I launch it again but with a quite options so here I see all the atoms that all the and there is that's matched and all pattern but the 1 that so we find a lot of the sample that use the same direction with them that is the same but maybe there are more so I ru I read as graph to show me all the binary in so OK we see that's some samples match and some of them much so what they wanted to do next with find bonds to fund earns them in that way that which is here OK so here we have
21:35
the 2 groups going on there is it hard to school so what we see that it's very simple you ever before they the middle than another basic back we'll see how it is called so there is a function here that has a lot of calls to the Wiktionary algorithm is always to push is because the the other arguments then a call to the decryption function so I wanted to write about and to see and to detect this
22:02
so what they did but this pattern the so
22:08
I asked that we find something with 2 versions then the court then again to push is another course 4 times in the all points to a function that is that has the 1st big black then the look at then and then they got then return assertion and that what they say here that I'm interested in getting the new so I think it I did I said get a AG on the on the root node so I show you might troops lecture me that medical version because it's a is is kernels the
22:44
OK so we have here 2 preseason according to which is another call or point to the same function and we're interested in the here which is where the decryption algorithm happens the OK
22:57
so 92 matches against the mice impose the I have a home so I find here the 1st and within that we already know
23:14
it's exhausting 11 and 25 again the same here we have another 1 there is a sudden then there's always another constraints then another set then there is again the same 1 then when
23:26
we find that there are many more emotions see them through them or but what we
23:31
did next the what's right at some about for this these adequate environment so we have a there various more buttons so the 1st 1 is the 1 you so 1125 this 1 is the 1 we just sort of greedy and 20 and they were like 5 other ones of them and they so what we do now is we want to
23:53
match them against that experts and pose and so what we see here in that that wrote more and no samples matched and we know which algorithm was used so here and then that and that match with the loss of the resources a service also here there is another 1 and another 1 in Section so 1 more thing we wanted to know is that we have to have some and
24:22
match the and boundaries
24:26
so soon so I want to know where the match 1 simple thing in the 2 in this sort of at its worst like is we asked for uh stations 6 names and we see that it's you big so it's the 1st some symbols and seemed to be back to you peaks so what they do is I wanted to have a creates signature for these you peaks phase so I will look for our basic that loops that basic graphs that to themselves to have a patent for that's the real use now the the and this better so I find
25:13
to be the good to and we looked into whether
25:17
the ring but and wanted through them as signatures so I wrote a database nature which is which contains these troops invasive 1 speaks to so where they did next was to mention OK so there are lessons to be more simple than that but the debates and I always match the troops so I know how to do the guidance of backspacing and not
25:45
detects the phase spectrum peaks so I wanted to take them was recommended to the 2 signatures and I will mention the what is this a so we still have a few and measurement errors but most of them we see that there is a a violent of space and we know which algorithm is used all the opportunities OK but to
26:20
states so what we saw that is that we found 7 variants of debates so there was a the role the first one with the this or said 25 there was another 1 that we saw and then type of the atoms the we also found out that there are so many result but and want a smaller than the creek signature for this phase we match so what we did is the on off 100 the but this imposed he took around 20 seconds to the decimal them then to do the matching of 9 but only to the more than 2 seconds and then we see that there are like Simmons and Pfizer the properties is replaced the Huns uh the half of the samples that have the 1st button that we looked at and the other patterns are used sometimes and and the end there was the given uh in areas that we did not identify corrected the so what can
27:18
I come to simplify the creation of scripts the we create a Python bindings of the C + + library with weight and now with Python can give us some but we can all patterns and test graphs and can grab a button graphs and the graphs that can match but tense and past older results so here we have some examples of Python scripts so at the beginning we have as the imports
27:51
of a library then we have the will of the pattern that find the we re extracts those on the graph from the binary switch to the some find them with all the tests graph and after we much there the graph the about them In but there's graph the so here is an example of from 2 pattern we want to match uh 3 pushes so we use the match graph and now we can get old uh matching but patterns then we can select the pushes buttons and now we can go with this vicious patterns we can get all matching the instructions then can say OK if I want to get the the IDP so it give all instruction with the get IDP and then with like the 1st instructions so now we
28:53
have the instructions so we can request for some information like the addresses the all of the old constriction strings deal called the arguments and so on so OK we can match uh the decryption group in which so that we have a lot of Bush's and cold to the decryption strings so we want to match the 2 pushes and the cold today the decryption routine care so we have we want to check match the 1st which which of them and then the encrypted string and then the cold to the entry point of the function to be could be on both strings soul we have to know and decryption entry points we can match the during the address of the all the decryption loop so we have the address of the match there 1 so to get the entry point we know that the function is a very slow very early on tiny so we can say in we know also that the decryption entry point is before the decryption loop there so we can say we want to search an old between the decryption address and between the decryption address minus 30 and we want to match an old with at least 5 or more incoming so and that their own and
30:45
so this correct this the we have the imports yeah we get done the binaries would create the the graphs in then we have bowls but
31:04
tends to match the intra pants then too much the pushes before the call of the decryption function and we implement the old clients of the decryption algorithm so here you have subzone set said out and so on so the in my lord so we go through figure could description with a sample the so now we can decrypt bolstering so we have the sample we match the decree
31:54
also then we found did the address of this routine here we match the entry points with more than 1 of 5 or more than 5 incoming between the address of still
32:12
decryption minus 30 then the
32:15
address of the decryption the so we found the decryption of function the
32:21
standard parents so now we get to look
32:24
for the moment the pushes again
32:28
With the entry point of the function and then we matched 35 calls to this decryption function and so after this we take the strain bankrupt and string them lead a critic with the the algorithm and now we can see we have some registers and some fixed 5
32:51
things on the so now we can
32:55
but if and all samples
33:01
so here we have the 1st samples was strains
33:08
registries then we can find some big thing on all the
33:13
samples and so on so there we have all those strings of the configuration and so on of much space so here we have a tool but it
33:29
can be nice to have a plug to in I don't too much both the and those spot and so we have a nite uploading we convert the ITER graphs through no fall grapple then can much attention recently graph which is the Python bindings and then we can with GUI rolls balls matches and put some current and then we apply some filters to the so
33:58
match a single pattern for example of a fall s fall said keep we have 2 small groups for the initialization an if we create just 1 pattern for those 2 groups we have a lot of false positives so we get some fit the techniques that it's quite simple and abuse so on for example the 6 key so far as the form we have 2 groups fall were the 1st of these the 1st pattern in this and would be the 2nd pattern so if we don't have those loops in the function and it's not all functions so you can see at the right we have to to potential OK it's accepted but if uh in the EU we have just 1 part and it's rejected we have also the rates the and techniques so also quite simple but we have 2 hopes but if we have more than the more than 2 groups in the in the function so if we match uh for example 3 times the pattern 1 of the the function is rejected to because we just had the 1st but the 1st loop 1 time in the 2nd 1 9 2 then we have the threshold so sometimes with the the pattern can have some errors soul the at the right that we have the full patterns for 1 function so we match all pattern so OK it's good but in the middle of of the bottom number of thought answering that are not here so the function is also for example the function of decryption but we missed some some patents so we create treshold so here we have a threshold of 0 . 75 that means that at least 1 and 3 other form of uh patterns so in the middle it's accepted to that at the left we just have to add a force so you really we reject this function the last 1 is the overlapping so in for example again the FC false key we have the 2 groups who creates 1 pattern for the 1st and another for the Sun and if we match the same loop ways those 2 but and that means we met the same group and we don't this but in the example at the last we have a to match of the 2 matches of the 1st but then in 1 of the Sun and there is an overlapping between the 1st and the 2nd so if remove the pattern 1 of 4 the overlapping we have the honor of body the the matches the at the rights so we can say OK it's all function that this technique is not yet implemented in either graph so the creation of the rule fall the plugin is quite simple to solve for the SE fostered would create the uh the 1st loop so no 1 we give the the past to the adults time to the bottom down to give a name a decryption description of this button then we say OK a minimum but down is 1 so that mean I want to match 1 time uh this pattern almost then we fix the max pattern so we say I want at at next 1 time this pattern so we must match just 1 time the but the 1st so it the same things for the 2nd and then we traits of the function of that down so we give the 2 groups because we we must match those 2 of 2 loops in the function the city of st formed then we put the threshold that here is 1 so it means we must uh get the 2 uh um the to attempt to regulate the function then name the disk in the so demo OK
38:27
in all the in the the
38:36
the sessions for the family the it little the in FIL the fingerprint is to find the what dance so OK ranch here we can see we have the match some patterns so the we can think comment and then we have the decryption routine OK but sometimes the we don't know really where all the matched so we put some current so here means then it and then we have the current on the 2 instruction for the decryption look and we have some current on the in the entry point so that in here the we have good please the pushes for them the frames and if we don't know in bringing yeah them lunch again this can here we have the name of the function also with the number of instructions so is it's OK
40:06
and s if all the match or not efficient because the groups all very small and very generic so we have we still have many um false positives but we have a few false negative which is cool so do the cooling is appropriate content and we can try and maybe to to create all those signatures for all over cryptographic algorithm by chaos and so on and maybe you can put the filtering techniques that in some of key wrap because of its only in either grep so the perspective some mutation
40:56
of all tools all the arguments and structure of cold and so on all in of strings so sometimes we have to do some dirty hack to get uh 0 for example and you have the conditional jump so we want to match Ernesta chairman of strings that begin use J OK and we want the note to have 2 children because in the graph data insertion with children all polls and conditional jumps so will not also begin with C so that then then we have all the problem because in the sum the standalone tools and we use the captain and Zhang in in either we use the I'd engines so sometimes we for example the variables we have in Aida 25 h fall the in the the the the smaller value 25 in caps summary of all x 25 so again a dirty hacks to catch the both cases so often whom we catch 25 all the all x 25 so the solution is semantics the idea behind that is to say OK and I want to check if my the argument 1 is registered and all I have to check if the old cold is a conditional jumps and so on the same things for the for log 1 if we want to catch the intended the interviews with the value 25 not gravity to game there is another imitation of the how right bottoms that is and that we can we explain how we can do not repetition that there is 1 new mutation that that is the following year we can see the 1 with the repetition we can say I want the least number of matched instructions or I want the maximum number of matches which there is no in between so the problem is is this 1 we have gotten that say is that I want to match 1 to 4 instruction then name want to match absorb more than 1 too much of course and then I say OK this alpha instructions so it's 1 2 5 anything then the law than the core and so there is a lazy we been here we can develop a true or false if we put true between we mean that you want to dig the least number of instructions to you if you put false move to take the maximum number insertions so we'll see how to match the spot on on the on the binary year uh graph that would have a push in the push then or then another goes and then of course so with the foreign with and the current that was that if we put lazy be to shrewd the matching for the any know them with step once it's which is absorbed so for any it with they push push then interest up then or node we meds absorb than the core node which way to much of it or it doesn't work so there is no match so it doesn't work was later repeats at true if it would later repeated false so any any node would try to match the maximum or instruction so the problem is it is greedy so the any know that we match push push all because it matches for insertions and then we get to the door node that finds a corner so doesn't match the naughty there so there is no way for all to now to match this a binary thing with this pattern is counterintuitive because as with regular expressions would like to say OK I don't care if any takes 1 2 3 of all for the instructions I would like it is the more so of course the solution may be also for granted to would be to try every year number of repetitions of 1 2 3 4 and of course attributes Larosa lead to watch watch out for and but number forms OK we have a lot of more idea that you would like to implement in the buttons 1 thing that is limited also with the current version is that for basic goods when you have a instructions on this the condition and basic blocks for now we apply to every node and that you would like to say is I want of is graph that has at least ones or insertion off when you get graph that has had his to the insertion 1 compare instruction that again a defined as a maybe we'll also work on that data so we said that the children are numbered so you have to specify when you do it is if you you looking but can number 1 or trend but sometimes we we want to say OK I don't care so it think and the for for instance for condition and and you could say this could be the Chad for when the conditions which all 4 when the condition is fulfilled I don't care so we would like to add an option that would be generators of a question mark and so they go with them to try the both options then as that and expand you would like to bit of atoms in that paragraph which would be able to for to say OK are 5 patterns and won the match if we find the 3 of them that's enough or we would like to say OK after battles B 1 and B to find new binary that has 1 match over the 1 that is directly followed by a match of the 2 that is there is 1 institution of the 1 that has as a child in this section of the 2 that we would like to do some puzzling may that's the last point is that for now we constructed the grass manually with the writing the text buys it for its it's OK when you have a the when you use streets its course quite fast but it would still be 0 to be able to select nodes in either up and then Kij exports and rule generator nicer that with the uh the all the nodes analogy that you want it to be much faster integrate patterns OK so this will end or uh all talk the conclusion is that the graph is at the same time a solution to the test by the meanings we have name that begins to fall guru was to create a tool that can make your own graph buttons that there are is its rights and easy to understand we believe it is used for for the diction on intimate and is we show that the space and his words the is a free open source so you can find information and this address as in perspectives for the future amid before that the 2 that we would like to add some but features we talked about processing the semantics information from instructions that we talked about condition and basic Rex we would like to be able to create buttons and directly with an that and then you would like to have many more examples of work untutored those we will want to play with a yes and on bikers on command vectors so as we end
47:59
also works thank you for listening and we try to answer questions if thank you for the talk verdict of uh following if you had thought about using it since can have uh capstan and I don't know I think I haven't quite it but uh could you just steps timing error and that by that anniversary 2 steps in Nader yeah but you we thought about it but we have not decided gets hot enough to put into the system the