We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

0x3E9 Ways to DIE

00:00

Formale Metadaten

Titel
0x3E9 Ways to DIE
Untertitel
Introducing Dynamic IDA Enrichment framework (a.k.a DIE)
Serientitel
Teil
10
Anzahl der Teile
18
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Along the years many attempts have been made to combine static and dynamic analysis results. Some were good, other were bad, however the fact is that those two approaches still remain mostly separated as most analysis tools focus on one of them only. For many years, this lack of integration and mental passing of data between static and dynamic tools has caused lot of frustration among researchers. This was the main motivation in creating DIE. DIE is a new Hex-Rays IDA plugin that crosses the static-dynamic gap directly into the native IDA GUI. It gives the researcher access to runtime values from within his standard dissembler screen. As opposed to previous projects with similar goals, DIE takes a different approach by using an extensive plugin framework which allows the community to constantly add logic in order to better analyze and optimize the retrieved runtime values. With a click of a button, everything is accessible to the researcher: he can inspect handles passed to a function, analyze injected code or runtime strings, enumerate dynamic structures, follow indirect function calls and more (and the list keeps on growing). All of this happens without the researcher ever leaving his comfortable dissembler screen. Even better, as DIE is tightly coupled with IDA, it will basically support any architecture, data type or signature supported by IDA. DIE currently has a small but well-respected community of contributors. Starting with the alpha version, DIE users have been able to cut their research time by 20%-40%. As complex reverse engineering tasks may take several weeks or even several months to complete, DIE has already proved to be a valuable resource and a prominent part of the researcher`s toolkit. My talk introduces DIE for the very first time to the research community. I explain the basic idea behind DIE, describe its architecture, and show live examples of how to use its extensive plugin framework to speed up the research process. The talk includes *live examples* which have been carefully selected from real research projects in various security fields and demonstrate how DIE can be used to speed up bypassing software protections, unpack malware, and super-quickly locate a malware de-obfuscation functions.
Coxeter-GruppeBitPersönliche IdentifikationsnummerSchätzfunktionVerzweigendes ProgrammElektronische PublikationAblaufverfolgungMultiplikationsoperatorStatistikProzess <Informatik>PunktFunktionale ProgrammierungBinärdatenFramework <Informatik>Numerisches VerfahrenResultanteKatastrophentheorieFunktionale ProgrammierspracheFehlermeldungKategorie <Mathematik>Reverse EngineeringRechter WinkelPlug inDebuggingHydrostatikAnalysisHilfesystemParametersystemSystem DynamicsSystemaufrufFisher-InformationDatensatzMalwareSoftwareComputersicherheitSoftwareschwachstelleBinärcodeAssemblerZehnSpeicherabzugSechseckNotepad-ComputerDisassemblerBefehl <Informatik>Arithmetisches MittelEntscheidungstheorieStreaming <Kommunikationstechnik>Statistische HypotheseStrategisches SpielDatenstrukturSchreib-Lese-KopfBildschirmmaskePhysikalischer EffektComputeranimation
Streaming <Kommunikationstechnik>ParametersystemDatentypPunktSyntaktische AnalyseDebuggingLeistung <Physik>DatenstrukturFunktionale ProgrammierspracheFramework <Informatik>Ordnung <Mathematik>EinsBasis <Mathematik>Notepad-ComputerMAPInformationsspeicherungElektronische PublikationBinärcodeGlobale OptimierungAblaufverfolgungDigitale SpaltungSystemaufrufKonfigurationsdatenbankElektronischer ProgrammführerBenutzeroberflächeCodePlug inRechter WinkelNichtlinearer OperatorFontSystem DynamicsNetzadresseDatensatzKugelkappeThreadZeichenkettePersönliche Identifikationsnummersinc-FunktionFächer <Mathematik>Computeranimation
ParametersystemDatentypFunktion <Mathematik>PunktFisher-InformationObjekt <Kategorie>DatensatzInformationFunktionale ProgrammierspracheBoolesche AlgebraZeichenketteFahne <Mathematik>BitParserSyntaktische AnalyseEin-AusgabeLaufzeitfehlerPlug inInteraktives FernsehenPlastikkarteOrtsoperatorArithmetisches MittelImplementierungFramework <Informatik>System DynamicsHydrostatikSystemaufrufArray <Informatik>Kontextbezogenes SystemKugelkappeFunktionale ProgrammierungFunktionentheorieFlächeninhaltMagnetkarteProzess <Informatik>SoundverarbeitungZahlenbereichUrbild <Mathematik>RechenbuchXMLUMLComputeranimation
Demo <Programm>DifferenteFunktionale ProgrammierspracheMultiplikationsoperatorLaufzeitfehlerZeichenketteParametersystemEin-AusgabeDatentypRechter WinkelNotepad-ComputerEvolutionsstrategiePlug inMessage-PassingWellenpaketBitEindeutigkeitEreignishorizontMereologieBoolesche AlgebraSpeicherabzugKreisDisassemblerArithmetisches MittelObjekt <Kategorie>Ganze ZahlParserSoftwareentwicklerBenutzeroberflächeBimodulKonstruktor <Informatik>RobotikAbfrageEinsDebuggingAssemblerSystemaufrufProzess <Informatik>DatenbankEnergiedichteFunktion <Mathematik>ModelltheorieReverse EngineeringVideospielCyberspaceImplementierungTopologieFunktionale ProgrammierungZeiger <Informatik>Puffer <Netzplantechnik>Framework <Informatik>DatensatzSyntaktische AnalysePaarvergleichProgrammfehlerInformationsspeicherungCodeInnerer PunktFahne <Mathematik>Kernel <Informatik>MalwareStatechart
Projektive EbeneKartesische KoordinatenHilfsprogrammCodeDemo <Programm>Grundsätze ordnungsmäßiger DatenverarbeitungPasswortPunktComputeranimation
HilfsprogrammPasswortKonfiguration <Informatik>BereichsschätzungSchnelltasteMessage-PassingKartesische KoordinatenSchnittmengeFunktionale ProgrammierspracheGlobale OptimierungParametersystemRechenwerkArray <Informatik>MultiplikationsoperatorCodeBinomialkoeffizientProgrammierungNotepad-ComputerPunktKontrollstrukturBitKonstruktor <Informatik>Wort <Informatik>URLSyntaktische AnalyseLokales MinimumFunktionale ProgrammierungLastMenütechnikSystemaufrufComputeranimation
SichtenkonzeptFunktionale ProgrammierspracheSystemaufrufZeichenketteParametersystemRechter WinkelEvolutionsstrategieMathematikOrtsoperatorPasswortHeegaard-ZerlegungCoxeter-GruppeNotepad-ComputerPunktDatentypFormation <Mathematik>LoopInstantiierungMultiplikationsoperatorFunktionentheorieBimodulDatenflussGanze FunktionZahlenbereichKonfiguration <Informatik>Funktionale ProgrammierungBaum <Mathematik>FrequenzLaufzeitfehlerProgrammfehlerGraphDatenstrukturGanze ZahlCodeResultanteBitLokales MinimumSyntaktische AnalyseThreadSpeicherabzugInnerer PunktKontrollflussdiagrammEinfacher RingWort <Informatik>RohdatenBoolesche AlgebraProgramm/QuellcodeComputeranimation
AnalysisSystemaufrufVideospielFunktionale ProgrammierspracheCodeQuick-SortBenutzerschnittstellenverwaltungssystemComputeranimation
Funktionale ProgrammierspracheSystemaufrufZeiger <Informatik>Kartesische KoordinatenKonfiguration <Informatik>DebuggingMehrrechnersystemCLIBitSchnittmengeEinsKondition <Mathematik>SoftwareentwicklerQuick-SortAffine GeometrieComputeranimation
Demo <Programm>Grundsätze ordnungsmäßiger DatenverarbeitungWort <Informatik>VerschiebungsoperatorSoft ComputingHook <Programmierung>Coxeter-GruppeVideokonferenzBitMalwareTrojanisches Pferd <Informatik>DatenbankComputeranimation
Funktion <Mathematik>BildschirmmaskeQuick-SortElektronische PublikationImplementierungSichtenkonzeptOffene MengeGrundraumZeichenketteFunktionale ProgrammierungSpezielle FunktionModelltheorieBenutzeroberflächeLastFunktionale ProgrammierspracheBitProzess <Informatik>AggregatzustandCursorDatenbankAutomatSpezielle unitäre GruppeAnalysisHydrostatikCoxeter-GruppeZweiMailing-ListeMalwareThreadZoomParametersystemSchnittmengeComputeranimation
Computeranimation
ParametersystemGüte der AnpassungOpen SourceFunktionentheorieDatenbankOffene MengeFahne <Mathematik>Syntaktische AnalyseDemo <Programm>Message-PassingDivergente ReiheNotepad-ComputerSystemaufrufGeradeFunktionale ProgrammierungSystemzusammenbruchWeb-SeiteVerzweigendes ProgrammFunktionale ProgrammierspracheSechseckEin-AusgabeMultiplikationsoperatorSchiefe WahrscheinlichkeitsverteilungEnergiedichteOrtsoperatorBenutzerschnittstellenverwaltungssystemRhombus <Mathematik>Konstruktor <Informatik>Computeranimation
VideospielFramework <Informatik>FlächeninhaltDebuggingParametersystemDemo <Programm>SoftwaretestVirtuelle RealitätRPCAnalysisMultiplikationsoperatorSystemaufrufFunktionale ProgrammierungCoxeter-GruppeVirtuelle MaschineIdentitätsverwaltungRechter WinkelAutomatSoundverarbeitungGesetz <Physik>ModallogikEnergiedichteKlasse <Mathematik>Computeranimation
Transkript: Englisch(automatisch erzeugt)
Hi guys, so my presentation is called 1,001 ways to die and
Just a bit about me. My name is Yaniv and I work as a security researcher at Check Point Software Technologies During my day job. I practice mainly malware research and vulnerability research So this practically means that I spend most of my days staring at the assembly code and binary codes and hex dumps and all
Kinds of other fun things like that. Although I couldn't, you know, quite find any to-dos inside the binary file yet But yeah, I have something to look look forward for anyway You know during the all the years I've been doing reverse engineering There seems to be one problem that seems to be popping up again and again and it's really irritating me
And I'm I have to deal with that at least, you know tens or maybe even hundreds times a day And I'm sure you all share the same problem as me and you all get to encounter it as well and This problem is that you know when we're doing reverse engineering then basically We have two main methods, right? So we have static analysis and for static analysis
We just open our binary file inside some kind of disassembler whether it's IDA hopper rather or whatever you prefer and We actually have a lot of information in there I mean those tools are really really evolved quite well, you know throughout the years and
For me personally if I could I would never leave my my IDA screen I will do all my reverse engineering session just within my IDA because I feel so comfortable with it But however, that's very rarely the case I mean at most times we have to resort to the help of debuggers because well static analysis tools
They have like an integral integral problem with them because you know They are static and they just lack all the dynamic data. So how many times have you been faced with this problem? You know that you're you're doing some kind of static analysis you get to some function call and you just wish you Would know the value of the argument passed to this function because well you could
Theoretically do it yourself and trace it back It would just take you a lot of time and it will be much easier to just look at the values at a debugger So what you do is go to another screen open a debugger put the binary file in it place the right breakpoints Hit those breakpoints watch the row values and then all of a sudden you understand you're looking at some kind of struct
so again you go back to your IDA screen and you define this struct and you go back to your debugger and there's a lot of information going on back and forth back and forth for from the static tools to the Dynamic tools and all of this is being passed mentally right through your heads and that is very error prone
I think well as smart as we think we are and as good as we think we are we can make mistakes here and Those mistakes can be revealed only much later on in this in the in the reversing session and it could be it could have Catastrophic results and the other thing is that it takes a lot of time and it's really really frustrating I mean for me it takes all the fun out of reverse engineering because this is such a specific job that I would really like to
Have already inside my IDA screen So it seems trivial to me that should be some kind of solution to bridge this gap and bring us all the dynamic data Right inside our IDA screens now That's not a new problem and I'm doing the first one who's ever come up with this or
Was come up with some kind of a solution There's been numerous solutions in the past and I like to categorize them to two main categories One category is like standalone tools and those standalone tools do a great job But again, since they are standalone then once I have to leave my disassemble screen and go to these tools
Then again, I have to mentally pass the data back and forth and although they are really cool They just don't solve the problem They are just considered to me like next-generation Debuggers and they don't solve this specific problem The tools I'm talking about are actually IDA plugins that knows how to bring dynamic data right inside your IDA screen
and we're just you know briefly going to go over two of the best solutions that I could find and We're going to discuss the advantages and disadvantages of each one of them. So the first one is called IDA explode It was developed by in 2014 by Zach Riegel and for its dynamic framework
It uses the Intel pin framework, which is a binary estimation if you don't know and so the thing is it's really cool You have to run your binary file inside pin with a provided pin tool and then you got some kind of trace file which you later load into IDA and then IDA takes and I mean I just blow takes all this data and presents it to you. So that's really nice and
It has a very extensive tracing since it uses spin and pin lets you put callback functions at a lot of interesting places in Execution so you can see all memory reads right? You can see function calls function returns. You can see branch instructions It's really cool. Another thing it gives you is branch statistics. Like whenever you you stand on any jump instruction
Then you see how many times was this branch taken how many times it was taken? It's really insightful and really helps you out. However, there are some Cons about this and one is that all data as you've seen before is stored inside your IDB file Well, this could theoretically be fixed, but it's not fixed yet. And for me personally
I don't like other people people touching my IDB file Let alone other plugins and when I have so much data stored as comments inside my IDB file I just can't find my hands and my legs and I just I just throw this plug-in away immediately And another thing is that it only works on Intel based architectures
Well, since pin is Intel based it's designed by Intel and another thing as far as I could see This tool is really good for and designed for Windows OS So if you're looking into working with Linux or some other kind of OS then you need to modify that Accordingly and it's not currently supported
So the other tool was actually presented here if I'm not mistaken last year, it's called fan cap. It was Written in 2013 by a guy which name I'm not sure how to pronounce Androzaj, there's no there's no ski. I hope I got it right. Sorry if I didn't anyway for the dynamic framework it actually uses IDA debugging API and that seems to be really cool because
You'd never have to leave your IDA screen You just run a debugger and whichever debugger is supported by IDA is also supported by fun cap. That's really cool Now what it does he places breakpoints using the debugger at function calls and function returns and when those breakpoints are hit You just get their contacts. They try to look at the argument values there. So
that's how it looks like and It's a very intuitive solution. I mean it looks like a really simple solution Although it's not extensive as the other one because it only traces function calls and function returns, but I think that only you know Seeing function calls the function returns actually gives you a great insight about what this binary is doing and that's like a
Good a good Optimization of tracing algorithms because if we actually want to raise everything it would take forever and Another thing it does it takes a step forward and tries to looks to look at it those Arguments and see if there might be you know ASCII or Unicode strings and if they are he just places the ASCII codes inside
Your IDB, so all of the sudden inside your IDB, you see all kinds of registry keys, you know And and IP addresses and that's really cool. It's really a nice nice thing to have Then the cons about it that it doesn't really follow any new thread so if your executable is creating any new thread then a
Fun cap will be completely blind to it and that could be actually a problem when dealing with you know Almost every every binary we're working with and another thing is that argument offsets are actually calculated manually So what it does it goes to and looks at the value of ESP or RSP or whatever architecture
We're talking about and takes that looks at the how many arguments should be there divide that by four and says, okay So this should be the offset of the first argument. This is the second This is the third and this could work with a lot of cases but of course it could also not work with a lot of cases and it seems to be like not the right thing to do because All this data is actually already stored inside IDA. So why do I have to calculate it manually?
It really makes it like not really usable so you know, I was sitting down and thinking I mean, those are obviously great tools and Why aren't a lot of people using them on day-to-day basis because they really are there to solve a very very important problem
So I try to think of the reasons and the ones I came up with is first of all as you've seen before all the data that they bring from the from the debuggers or the Whatever the dynamic framework they're using is actually stored inside IDB It's also stored inside text files in funcap case, but still it's not indexed I can't do any cross references between them and it seems really logical, you know, because there's there's a whole
Layer of logic that I can do on those row row values and using these tools I just simply can't do it because all I have is like text search and that's really not enough Now another thing is that entry-level for adding any custom functionality is relatively high
I mean look at fun caps case they do string parsing But if I wanted to do any other kind of parsing that that would be really difficult I mean, I need to go over the code and do it myself might take me hours days Maybe even weeks, but the thing is the tool wasn't really designed for that and you know, everybody has its own parsing
requirements and it seems reasonable that there should be some kind of You know something that will let me do that now the most important point That I could think of is that both of these tools or any of the tools that I could find as any Reference to value types and what I mean by that Think about it fun cap places breakpoints on function calls then it goes and brings you the argument row value
But if this value is for example a struct then what do I get? I only get the first member of this truck and this is completely unusable to me because I actually want to see the entire struct and what I have to do is open my debugger again and see the struct and So why did they need this plug-in in the first place?
I mean this types thing is is that is the whole point of making this kind of plugins? in my opinion at least so I hope you are prepared to die and So die is my plug-in that I designed to solve this problem and it's called there because it's a dynamic
Enrichment framework and what it does it takes a very similar approach to the one fun cap did it collect context from function calls and Function returns and once those row values come back to me I try to parse them in some kind of human readable way and that's the best definition I can fight for it
And the other thing is that I tried to really to design some kind of smart interaction between static and dynamic data So you can move back and forth from steady data to dynamic data and over and over again And it could be very very usable and The last point is that I wanted to use as much as much a the IDA API magic as I could
I don't want to calculate anything myself. Not the arguments. Not their types. Not their positions. Nothing I want all the data that is stored in IDA to be available to me and to use it inside my tool so There are some implementation challenges here and we're just briefly gonna go over them. So the first one is okay So you want to collect function arguments and types and everything, but how exactly do you do it?
I mean IDA API is not exactly the most easy thing to read or to use Another thing is once we did manage to get those row values How do we parse them which parcels should we make what is important? What isn't important and another point is, okay So let's say we could deal somehow with you know strings booleans or other easy types like that
But what do we do when we encounter structs unions references, I don't know whatever complex types we have So let's just try and answer those one by one just quickly So for function arguments, I had hours of fun with IDA API was really I recommended to everyone
Not really Anyway digging into it. I could find some kind of object called T info T object, which is a type info object and it holds a Ridiculous amount of information about types This is just a small small screenshot of all the of all the functions that of all its functionalities and it's really cool
And the thing is that digging even deeper into this T info T object Then I find those objects which are func type data T func arct and arglog T Which actually gives me all the data about function arguments. I absolutely have everything in here I mean I IDA tells me how many arguments does a function has and
when I know then the arguments and for a specific argument It will tell me whether it was passed by stack and if so What's its offset and whether it was passed by register? and if so, what's the register and it tells me the type and the name of this argument and It also tells me if this argument is guessed
I mean if I that doesn't know the type then it just raises up the flag and that's really cool I can really use that inside my plugin So the other thing is Now that I have the row values then how do I parse them? I mean, I have my own parsers the one that I need for my work, but it's impossible to think of all the scenarios So it really makes more sense to make some kind of plug-in framework, which
everyone anyone could write his own plug-in and just plug it in the The framework and start parsing his own values. So this all works something like this You make up your logic your parsing logic here and Your value parser gets two input arguments one will be the argument type and the other will be the actual value
Now remember that this thing is actually running in runtime So this has actual access to all memory at the point of execution So the output will be kind of human readable value and you decide what human readable When what is human readable value and some kind of score and I'll get to the score in a second
So let's look at an example If for example, we have a boolean value either tells us this is a boolean value and the value is the row The row value is one then we just output true and ten because we are sure that this is a boolean value because I don't know So but if I that doesn't know the type if it's guessed then we can still say hey, this is one
So if it was a boolean value, so the value will be true However, I'm guessing so the score is a bit lower and you can set the scores. However, you like it depends on your logic so that's basically value parses and Now for complex data types, so those value parses could work really good when talking about, you know strings or boolean values
But what we do when we get to those structs unions references, so that's pretty simple We just break them up until we get to the simple types. So that mean For structs and unions, we can use the same team for T object I show you before to just enumerate all its members and the same goes for arrays and the same goes for references
We actually have a function to deal with all the dereferencing stuff So we don't really have to do anything ourself. Everything is already done within the IDA API. That's cool. So I Prepared this nice illustration of to show you how die exactly works So this is our IDA and inside IDA
We have our disassembler and we have our debugger and on top of both of these we have the IDA API Which lets us access everything of course as you've seen before So now we have die and die is a plug-in for IDA And it is the die core which holds all the all the logic and it also has two very important parts One would be the value parses
This is the plug-in framework I was talking about and another will be die DB Well, all the parse values will be actually stored. So this is our example code here Okay, and we see a very simple code There are two function calls one to func one and the other to an unknown function func one gets two input arguments one would be a string with the value str1 the other would be just the value one and
The unknown function gets gets just one input argument It's a string with again the value str1 the difference between the functions is that for function one? I don't knows all the types It has the symbols or derive them somehow and it actually knows the types for the unknown function
IDA knows nothing about so let's see how it works when you first start die The first thing that will happen that the disassembler will look for any call instructions inside the function You're currently standing on and we place breakpoints on these these functions and then we hit the debugger The debugger gets to the first function and then we issue a step in
Command and we step into this function now once we've stepped into the function We just start to take the function arguments So using disassembler take the first argument which is car star a and this this is now passed to the parser plug-in framework and since we know the type and the parsers know the type so the string parser is pulled out and
Being used on this value and the output of this will be str1 and str1 will be stored in the database Now the second argument is an integer and it too will be passed to the value parsers However, we don't have a parser for integer. So We just simply store the row value inside IDB Now after we did that, there's nothing another thing we do and that's walk through this function again for call instructions and
Now we continue the debugging and get to string compare now string compare is a library function and for library function It's like it's a case, you know, you can either do this again and again inside library functions
And if we're talking about windows, you probably end up in somewhere like ntdll, but you don't have to do that So that's the question is it's really configurable. You can choose whether you want to do it or not If not, it will just go one step inside inside str compare and give you all the arguments that it got and that's it Go out but whether or not we like to do it Let's just continue and say we got out of str compare and we continuing execution
Stepping out of the function and once we've stepped out of the function and hit the main function There's one more thing we need to do which is to parse the return value And again, we know it's a boolean value pass it to the value parses and enter true into the IDB now We get to the unknown function So we step into the unknown function and in this case you see that
The arguments are both int and int so that's Ida's way of telling us that I don't know what the types are But we have a better way to do it because we look at the tinfo t object and we actually see the flag So when we take this argument now and pass it to the value parses, they will know that this is a guest type
So they have no way of pulling out the correct plug-in So they will just try all the plugins and each plug-in who can actually guess that value will do it So in this case, first of all, we stored the row value inside IDB and then the string parser will say hey That could be a string. So let's store that in the IDB and now it's up to you to choose
Which one is correct? You just look at the Function call to say this can't be a row value. This must be a string and then you set the type as a string So that's basically it that's how they works and Now let's talk a bit about the value parses. No. No dye will only be as good as its value parses
I mean as many as much value parses there will be as many values will parse They will get better and better and what you have to understand what I did up till now me and the small community that uses Is just write up parses for our specific needs and
Whenever we got to something we need to parse then we just wrote the parser for it So let me show you what we have up to now So let's start with the simple value parses which are string parser as you've seen before now in here We actually use the IDC get string which helps us to parse ASCII strings Unicode strings Pascal strings and a whole Other bunch of strings that I really don't know anything about but we do parse them
Boolean parser as you've seen one is true zero is false pretty easy. Another thing is a function parser So when we get a row value this parser actually checks if that row value references any Beginning of function and if it does it will actually give you the function name So when a function is being passed as an argument to another function instead of seeing the row value
We'll actually see the function name, which is really useful and the other thing is a module parser We just does the same thing only with modules So if kernel 32 for example will be passed as an argument to a function that we will see the string kernel 32 And we will know that it's a module so it will be really easy to work like that
So a bit more advanced parses is the handle parser now we wrote this punch parser to handle Windows handles and it works only on local debugger But it could theoretically work on the remote ones and what it does It takes the row value and it uses duplicate handle to try and duplicate it Succeeds it uses anti-query object to get the handle type and name. So think about it. It's really cool
I can actually see all the runtime handles So when a function is passed a handle I can see that handle what is type and what is name? that's really super useful and This is also another another good example of a parser which is STD string I came up with a malware that uses STD strings. I don't know why it's the first one
I've saw and I really didn't Wasn't so familiar with the implementation of STD strings, which is like basic string So for those of you who are too unfamiliar with it, that's that's like how basic string works I mean at offset for of this complex object
There is a union which holds three members one is a buffer the other is a pointer and the string can be either stored at the buffer as string or it can be stored as a pointer to a string so offset for either stores the string or the offsets to a And Why is that? I mean, I'm not really sure if it's genius or stupid
Anyway, that's how it works and where we're debugging this it really makes our life hard because whenever we get a basic string We have to go to this offset and check is this ring? Hey, is it a reference and it takes a lot of time So writing a parser for this is actually pretty easy. We don't need to know anything of almost anything about the implementation We just go to this offset for and see hey, it is a string cool
If not, is this a reference to a string cool, then we have a basic string One more thing we have to note is that you know If offset for is a string then we might also want to check that offset zero is not a string We're just looking at the substring turn But if we just implement this logic, then we have a parser for basic strings and all of a sudden we can see all
runtime executed basic strings so That's basically it and I'm kind of finished talking and I have three demos for you of die I chose these demos really really carefully and they are from you know different worlds in the reverse engineering So let's just go to the first one
So the first one is actually another research project that I'm working on and during this project. I came up with this application It's an 810 femur upgrade utility and it has some kind of password protection Very stupid password protection in it and we need to bypass that and we need to do it quickly. So The thing is that when opening this in in IDA
This code isn't really ugly c++ code and it I actually did it without die And it took me and I'm pretty experienced It took me about 15 minutes to get to the point where I could bypass this protection now 15 minutes now Look at how quickly can I do it with die? So
Frank to the demo gods that everything will work Yeah, so this is the utility it looks something like this, can you see well
Yeah, that's a password. So I enter some kind of password here and hit ok And yeah, of course, it's invalid password and that's what we have to bypass. So let's load it up in IDA Okay, cool It's loaded and now dies also loaded and the first thing you notice when you load die and either with a die plug-in is this?
nice skull a skirt Which is a really one of the best features we have and Then you will notice this little menu here, which is die and open it up you will see all kinds of options like to load a previously stored IDB or to save your IDB and
Settings about and we're quickly gonna over gonna go over all of them. So in the settings one You Actually see that you have here the option to step into system libraries Which is told you before and two other options that I'm going to show you in the next example and all of these are
Practically used for debugging. It's it means whether you want to actually parse arrays Containers which are structs and units the references arguments or whatever The two values at the top of the screen are actually used for optimization Maximal function calls means that when die will encounter the same function that's been executed more than this amount of time
It will stop monitoring it So it's really useful when talking about like UI cases where the functions will executed like thousands of times And you don't really care about the value So you just exclude them by doing this and the other thing is the max dereference step because we must have some kind of you Know to stop somewhere when dereferencing stuff. Otherwise, we will never stop probably
So that's basically it's it's pretty simple and This is our program and we have three options to start die One will be to debug a custom scope and that means that we can just choose to debug You know somewhere between this function and this function we can actually choose functions and that will be the only scope being being debugged and
another thing is that we could like Debug the entire code and that means we before we even start execution We browse through the entire code and will praise break break points on each call instruction That could be a bit slower. And the other thing is just the example that I showed you which is go from current location
So we just want to start in this case from we main Because we don't care about all the CRT stuff and everything that's going on before we main So let's just stand on we main here and hit alt F which is the shortcut and I will start So this takes a while but but remember that it took me like 15 minutes before so
It's loading up hopefully Okay, cool. We got this screen. This is the password. So let's just enter you know any password. Let's say
recon 2015 it's a bit slow because don't forget it's now actually tracing everything in the background So that's it recon 2015 let's hit ok here and
See what happens. Maybe it's the password Yeah, no, it's not So that's it now the application stops I'm thrown out and everything returns to normal now Die has two main views one is the function view and the other is the value view
So let's start by looking the function view So function view will now give you this nice screen here showing you all there on the Functions being executed at runtime. So all of these functions have actually been executed and this is the number of times
They've been actually seen. So let's take a look at the first one Also, if you have any threads then you can see them here in this case We just have a single thread but you can you know split it up by thread or see all the threads So the first one will be a fix get followed by name and it's been called right here
This is the point in the in the code when it's been called. So if we double-click that we can see that hey That's where it's been called. See cool, so When we open that we can see occurrence zero because you know for each point in the code where there's a call instruction It could be called Several times for instance if it's like inside some kind of loop
So for this instance, we just have one occurrence and when we open it up We can actually see all that. Let me go here Sorry So we can actually see All the arguments being passed with this function. This is a string. It's named
Its name is LP string at this ring one. I'm excellent and these are the core values The values that were present at function call and the other one shows you all the values that are present when the function returned so if it changed you can see the change and That's that's pretty much it now for this function
We did know the symbols as you see we know all the names and everything, but let's say for the other function Right here. You see all the values are written as int because we don't really know them and die places this ugly Question mark Just right near them, which means that these are actually guessed I mean the value parses actually guessed those values and look we can see everything
I mean it makes sense that it tests, you know a string value boolean value another like string value and the return argument down There is true. So it's boolean. So it it really helps us a lot and Just some more nice features before I go on For instance, if you stand here on this occurrence and just right-click it
You can actually see a code graph showing you the exact path of execution that led up to this function which could come in really handy sometimes and Let's just browse down here a bit until we get to Where is it?
There is WSA startup a very known function. Let's open up and We see this function now and it has three elements word LPWS data and then integer and this LPWS data is actually a complex data type and that's a pointer and if we open it up, then we do reference it and we find that we hit a struct and
We when we open that struct up then we actually see all the members of the struct and all their values So we actually just you know took a look at all this struct Data at runtime and That's that's really cool. That could be really really helpful in my opinion. So
Okay, so that was the function view You have a couple of more options like show CFG will show you the entire CFG. It's totally clickable so if you want to, you know go into one function, but if it helps you And we have another thing that will like mark and unmark execute execution flow
So we can actually mark the functions being executed which could come in handy and we can of course take it off if we want So the other view I've been talking about is called value view now value view will give you a totally different view of Of the of the execution you will not see the functions, but you will actually see the parsed values here
So those are all the parse values that I have encountered during the execution and you can filter them out by You know, just looking let's see all the modules being executed so there's been only one module here and you see it's raw value was yeah, okay, and Instead of just showing you the raw value. We actually see the module name. So this could be really helpful and
same goes for functions. For example here. We have a lot of functions and also same goes for strings and now for strings if you remember it was quite a long time ago, but I entered recon 2015 is the password here. So let's look up if we can see recon 2015 here
If you see it, tell me There is so easy recon 2015 so actually so so that value because we entered at runtime and What we can do now is just double click this value and boom we hit a function where that value was actually entered and
Let's look at which function it is. Hey, it's str compare and Which is the what is the other argument being passed here? Well, it's 8 and 2015 0 6 20 which Might be the password, right? So let's just look at this. Oh, here's the stair compare you see and now I actually have the password
So do you remember it because I keep forgetting the password? What was so 8 and 2015 0 6 20 so let's just Do this 8 and 2015 0 620 and boom We just bypass the password protection
It just took me like two minutes to do this instead of 15 minutes. Think about how useful can this be? So that was my first example. I have two more No, no That to the presentation and
Second one. Okay. So the second one is actually a picture a feature I didn't talk to you about I consider it like a side feature of the eye But the thing is that you know when we're reversing like a c++ code, for example Then we got a lot of indirect calls and vtables and all this other kind of stuff that makes our life really hard
Now the thing is that when dye works, so when it steps into a function So it makes sense that we we can look back at where we came from and say hey Was there any xref coming from there? And if not, we can just place it because we know we just came from there So by just doing this we just solved all indirect calls because every indirect call
We now have an integrated xref in to the place It's being called and another thing we do is that when we step into a function We check whether this function is actually defined or not defined and if it's not defined we can try and define it So whenever we have some kind of you know undefined blobs inside inside our IDA Then when simply using diet without even using any value parsers
Then we can actually get a complete and a code analysis or You know not not complete but more complete than it was before So, let's just see how we do that so the application as I said or didn't say it's
7-zip CLI I've been doing a bit of work with that and That's somewhere over here Yep So let's open it up in IDA Does it open the sculpting?
Okay, cool. So Can you see this good, you know Okay, so here I have set up a few pointers here to just show you what's going on with this application So we have a couple of undefined function and a couple of indirect calls and just to show you this So let's step on this undefined function and it's an undefined function
That's a function and we don't have any xref to this function. Nothing at all. So we we can define it ourselves, but Let's not do that We have a couple of those here another one and another one Cool. Now we have a few indirect calls here
so let's just stand on this indirect call for example and see what we don't have any xrefs, of course and another one Here, let's stand here Sorry Yeah again no xrefs nothing absolutely nothing so we just want to quickly
Solve these issues. So we just stand here and now we go to die settings And I told you before I will talk about those are the two features that I mentioned One is new function analysis, which will actually try to define the function The other would be add xrefs if they are missing. So just by you know, checking those two options and Running die. Let's see what happens
So the debugger is now started it's taking a while Mm-hmm. Okay, cool. That's it. It's done. Now. Let's look what what happened
So the undefined function Yeah, it's now defined see and all their xrefs are here. Everything is cool
Let's look at other ones another undefined function defined and another undefined function Also defined now as for the indirect calls, let's go to the first indirect call. Where is it? Yeah, here it is Let's stand on it and ctrl j
Hey now we have an xref and we have an xref to this function And when standing on this function and xrefing back then hey We now got a xref to an indirect call, which is Pretty pretty useful I think and same goes for this indirect call Uh, hey, here we go xref cool and the other one
Somewhere over here, I think it's this one with this one cool. Yeah, pretty impressive. No, thank you. Thank you
Okay So now this is like where the demo gods really hit me because I need ida64b to run this demo and It seems like my ida64b just crashed last night So I do have a video presentation of it and I can try using another computer
Shift would you like to try and and hook it up? I hope it works. If not, I will show you the demo. Anyway, just a few words about this Well, you didn't think I let you off here without doing any analyzing of any trojan or malware. So this is going to be
Yeah, take it off So this is going to be the explosive trojan it's a trojan that with a malware campaign that we've just recently uncovered and The thing here is that you know, once we have di db that's not the end of it I mean, there's a whole lot of logic we can actually implement on the database itself so we can just
It works I can't believe it. It didn't work before Cool, cool. Okay, so there's a whole lot of logic we can implement on the database itself I mean just you know think about the usages on automated automated tools and stuff like that now What I let me see what I did here. I don't know. Let's start it again
Wow, that's really small Okay So this is my I just have the idb file here and i'm not going to run the malware on my friend's machine um So what i'm going to do here, uh, this is really small, huh?
So i'm gonna show you how I load a previously stored di db so I just go here and load di db and you know choose my my di db which is named ddb file Accordingly and it takes it a while and it loads everything up
How can you guys see the the output window in the back? Do you see it good enough? not I have a zoom it I will try to zoom as much as I can. Um, so was it loaded? Yeah, okay Let me just you know take it up a bit
Okay So that's what you see when you load the idb It says there were 449 functions loaded and passed values and threads and whatever So the thing is our mission here now is to try and find this this model has some kind of Deobfuscation function attacks the obfuscation function, which is very common for every model Now our job here is to try and find that the obfuscation function
Using nothing but di not even not even a bit of static analysis So, let's see how we do that. We've previously run di and just loaded di db. So We do have some kind of api that lets you access all the di db stuff. So
Well, that's really really really small. So what I do now is just load load up this api So here it's important into ida and next thing i'm gonna do is Okay, so I give it a command to load up all functions that it has stored inside idb. So how many functions do I have?
Let's take a look len of functions so As it said before we now have 449 execute functions So that's not enough But since we're talking about uh the obfuscation function then we would like to rule out all library functions out of this You know list because obviously the obfuscation function won't be a library function. So let's just
Where's my cursor? Okay Let's just take it out And okay. So now I give it a command and now I have the no lib functions Which are just the data set without the library functions. So again Len of functions now, I have only 158 functions still not good enough. So
What I want to do now is actually Try and take out All the functions that I noticed are returning strings because the obfuscation function must return a string, right? So Hitting
This okay. So now we have all of those functions Now we're up with only 17 functions. Hey, that's great, right but Let's filter it out a bit more and say okay. I want to see only the top 10 functions out of this the one that was called the most because Obviously the obfuscation function will be called a lot during execution, right?
So now we have that filter filtered out and just one more filter will be Like to sort them out according to the amount of x refs They have because whether or not it was, you know executed Still the obfuscation function must have a lot of x refs coming out from all over the code whether or not it was executed
so now we have everything sorted here and let's Where is it? Okay. So let's iterate through through this and Print it out Cool. Now we're left only with these functions here Okay, one of those functions must be that the obfuscation function at least we hope so
So let's take a look at the first one it is for I hope I can find it So let's open up function view here our function view Yeah That's a problem Put it up here No up here cool
So whoa, this is Really small Even I can see it Okay, just a second. I will zoom it in a second. Okay, so this is the function we found This is the first one that was returned You see down here, let's look at its occurrences and see what it actually returned. So let me open up just a few of those
So yeah, yeah, this doesn't really look like a de-obfuscation function, right? So this this this question mark thing is the actual return value and It can be the obfuscation function because it just returns gibberish. So that that's not it. That's not it So if we actually go to this function implementation, this is my actual
Idb file so we can see that this is a get substring function, okay Which actually, you know matches all of these parameters i've just said so it makes sense That will be the first one but let's look at the second one Let me just load it up in function view Function view function view. Oh cool
So it's supposed to be somewhere in here and d97 Looking it up Yeah, here it is Cool let's open up some of the occurrences here. So one occurrence two occurrence three occurrence
four occurrence five occurrence Can you see it? We get this kind of you know, what looks like You know basic 64 On the wrong on the wrong way and the output values are actually strings and those are actually the obfuscated strings
So here we just found them, you know, we did absolutely nothing that would take me in reversing session like at least about You know half an hour to get to that and I just did it now practically automatically. So That's really really useful when doing model research, I think and that's my third presentation. I hope you like it
Let me just try and switch back to my presentation Just a few more slides left
Where is it? I just threw it down here, no
Okay, so I got to do the demos It was really fun. So a few things that we still have to do in
In doing die and one of them is handling tank functions now tank functions Actually break up all our logic because for a tank function What happens is that when we step into a function? We just have a series of instructions that end up in some kind of jump instruction So we don't actually press the breakpoint at the call instruction and I will be absolutely blind to what whatever is going on there
And there are ways for either to know tank functions It actually flags them, but it doesn't know about all time functions and we have to find some kind of solution to that So have that in mind when when using die Another thing is complex function parsers because what we do now is just take the function arguments and parse them one by one
but that could be Not a necessary case There could be another case when we would like to take the value of one argument and use that Value as an input to another argument Like if we're talking about, you know We have some kind of size argument and then we want to pass that size to another argument
then this must be implemented in some kind of a complex function parser, so The other thing is better. Gooey. Well, i'm not really your your Your genius gooey guy and I did the best that I as I could and I really struggled with this cutie thing Let me tell you but it's working and if somebody wants, you know Is really good with qt and wants to design a new good and please feel free to approach me and tell me about it
Another thing is a much better database because i'm also not your database guy and I just did what I could And another thing is to solve some very dramatic crashes than we're experiencing here and there and we actually have a Case open with hex rays about that and we really hope we can fix that
But if anyone can somehow replicate this issue or come up with good examples of crashes Then please let me know about it and we'll we'll pass that through to hex rays and otherwise you've seen it's working it's just here and there that there are crashes and So it looks cool, can you have it yes you can
Add i is a completely open source tool and you can find it at my github page Which was that just made public today write it down write it down and Really the most important point and if you like it then please contribute write your own parsers
Take them up into the github page and you know enrich the community and Just one more thing. I have to say before I finish is sarc. This is a actually a Framework made my by my colleague tamir bachar And it's really making life much easier when working with ida python I recommend you all to try this this thing
Because you know ida python can be an id API can be quite complex at some times. So it's really cool. I really recommend it That's it guys. That's the answer any questions Thank you
Hey Very cool. Hey, I just one quick quick question on this. Where are you? I don't know if I should just come up there. It's kind of hectic at the moment But in my experience ida has a hard time with functions that pass arguments via registers, right? Like this call fast call it won't pick those up a lot of the time
Uh is die because of its dynamic analysis. Is it easy for it to find those and resolve that? No, I totally take all the values that I can from ida. So if ida fucks up die fucks up, too Thank you, that's it
Hi I have a question for you. I have seen your entire presentation and I was wondering if When you have done the last demo you have run the tool in a virtual machine or not Yes, I did. Okay. It means that you have to install hida in the infected test virtual machine, right?
Uh, you could use the remote ida debugger That could work. It just takes a lot of time I usually I just like to install my own ida and on the infected machine, but it's not necessary at all Okay. Thank you More no, thanks guys