Merken

Harnessing Intel Processor Trace on Windows for fuzzing and dynamic analysis

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
I Rabab map Member Member mom mood now some you'd move a long while OK as in 17 OK
before the briefings of the troubled I'm under LEED I'm armed security search and it's a good decision you of Microsoft now I have a look at for 2 years for the tunnels so this is the Palestine and that's size of the witnessing Danilov and low level of reverse-engineering I will produce water for products if events chart at the time of their companies and I look at the other 1 easy needed and the young designer of the 1st unified will peak in the year 2012 and the 1st batch got it that 1 bypass of presenting the 2014 and I'm not the in 1 of the design of the windows of pity the idea that we are going to present and I'm Richard Johnson I am the research technical lead for Cisco talents and we have a team that those vulnerability research and I help guide those efforts are you may see some of our vulnerabilities in the past year and then we focus on technology is mostly for finding bugs and so this was as you mentioned so my talks the last couple years a recon has been in the focus of high-performance supplied opposing you and engineering technologies that give us a feedback-driven buzzing and component sizes the different parts of a process to try to make as fast as possible and I came across this new architectural features and Intel CPU is called Intel processor trace about 2 years ago and so basically it's a supported mechanism for doing code coverage and so I made a prototype for this in 2015 the work for Linux and there was experimental support of the uh some open source drivers after evaluating its I realize that this would be a great thing to bring to the Windows operating system and by the Intel support for it was not going to be suitable for what we're looking for so and last year in recon we have the very 1st version of the Windows Driver working like an hour before we got on stage and it was during or decode and so I will talk about the last 6 months developments which have brought all kinds of support to the driver and this picks up a lost 1 1 also under is going to give you the introduction of the technical details of the driver and a low level of annotation and some demos and then we'll address that is applied to frozen in binding bugs the OK speaking about process a priest and the process of bases and now is a new feature of the let this thing that Skytech CPU it's a very useful because it can trace whatever your Sibiu is going to execute like going out there and I guess some arms but benefit especially for cortical for the honor code coverage for example if you would like to understand what our the piece of software with like tool uh to do for model ideas or for whatever I mean they in use each other and values for and I would like before because we have not a lot of time I would like to will be on our and quite fast thing and they are describing the OWL is and being that the keys and is executed the nastiness of you and I would like to concentrate on the new fee on the new feature of the of our the basically um being faster the INS discovered in that process of tracing CPU will eat so you can do even in the in the user model on the immediate jewel assist view IDEA uh in Stockholm week to the front 1 of the the view to the is supple for in that process of trace the 2nd 1 that user and needed to the fact that the feature of process of grace because the friend the has to be you can uh implement different features process of that has been implemented as being implemented in the 1st time in the 1st in broadly architect or but it was deleted 9 Skytech these the full support and you can trace whatever you would like seems to be you had their hand would be the area is a daily in the 2nd quarter of 2007 the needs they're quite on OK
hearing is I would like to show that a in the decoder for that day and Professor place as you can see it's quite easy and uh and then there's nothing official your payoff to where sections what I and less speaking about why using these sewing Dallas because it simply implemented including not where 1 of the basic things that that we can say about this is that this is not the detectable by softer I mean in user-mode software and that 1 of the important things to say that you cannot trace whatever you want to have it even as much as in my MM a handler and the even I provides court or whatever the only thing is that you can't Tracy's like the in SC exercycle combine about these he traveled fees because that's the x by design should an worker in 9 was elite environment fiction OK and fact you hear quite fast at out what's the price of the race Tracy's works in 3 and 3 more than you cannot trace using 3 different kind of thing the 1st is why by Koren 3 relate to their level I mean you cannot differentiate between the CAM software and user-mode software this 2nd uh the thing more by PML for page the board in that way you can trace on a single process because again he started the process of operates the trace on the ASR BCD physical address be stable in that way you can trace only asking what process otherwise the last term in feeding more these bisection point that you can set a start point and the end point and end uh city and ask the process of place the 3 solid that the and we know of called the and this is very good and the output logging the ESA done directly memory in physical memory that's why we need a driver to manage that and the lobbying could be implemented into genes into with the first one is you arrange that results sequel and being in the places we can always in the same place memory the 2nd uh the type is the uh of physical address also known as the power OK giving laker quite fast to we implemented a single ranger 11 should allocate our continues ago memory buffer and then you should set true problem what the specific register 1 is that the error of the height the output base and out of the mosque and then you have the to start the traces and this seeking gave threes yeah not a flag that you'd be in direct the register the Beaufort is ultimately the let me know as you got minor by day this view it the dual physical address the physical others it's a user in these are that there are some implementation of the output because you cannot set that our and bodies the succumbed physical memory this and you cannot create I like a table in which a you is like this if you went white exactly memory it's very and yeah it's there not because you can even set up you might think that after that is erased by the CPU he and chapter II our match up part of the BAF there is a the led by the along and then you can stop you can never assume we can do whatever you want and OK different kind of pockets tools in dialog days off the execution of a process of this is a different kind of of office pocket that out of them binding pockets that the we have 19 destined to the data but at that very interesting to the branch pocket the taken not taken the target be property pockets there are a lot of those bucket of the most debt after because you can with those you cannot trace execution and sort of a softer and follow the air it's execution even or maybe checking this was really a guess and called it yeah it's a bigger diagram II we learned a ever you to the Baghdad model if you would like to understand the need to be the of the pockets the 1 that you as a of the state said that the 1 that we are investing to out of the 10 uh branch pockets they cannot take target that be and for the OK let's speak about in in that be the driver implementation where we have decided to white is that I'd have to be able to wear that there from the place that acting out from or from all Windows operating system the under some of these of these er presentation the guide is is quite stable in vessel is 0 . 5 it's support all the defeating what combination and output modes is the in some new feature of user really is that it supports even with the processes and the support even can then what could be seen as a ever told that you can use the in the process of Bristol trees even gamma-soft without any problem and in in the developing of is that either we have we had a with kind of a lot of problems like uh 1 of the most the bigger problems out there is nothing but few my handing them out because there was not the not the commendation model for how to do that and even I did with the process of because you have to wear and money agent and you have to enable the process of phase to eat once each process the OK let's speaker fostered about the mind interrupt begin lined up that use as being a erase it by the process of base when the day our buffer is full and to do that we have program and the of physical addresses in the seventies these uh the MIT them up the request at the end of the of the of the buffer How we can now be able to manage that when the PMI endemic to race we suspended public process that day physical the physical memory and then is whether somebody invented these because as you as you probably know all the all the that of the site that exceed 86 uh and untreated Europe and around at the very high high well and these are quite a problem because of will from that according the you go and you can do quite anything and even the user-mode buffer they fostered the we have more than we have found our way to directly map the diffuser gonna be a physical memory now use a more the buffer and there we do these not smarter in mind there I think it's not that we expect this equitable that
boundaries and and we not be the only the a buffer and that sort in user mode and nothing Kevin what is important because if you use up very but there you have the problem that the photo at the and you other space that's now it's not a problem unique succeed in a 64 bit systems what could be the OK uh he argues that whether the things get interesting because he mentions you about 5 we have been able to Lsub put them with the process of M with deep red the the dread application in each SEB will as we have implemented the meaning that way you not we should be in each CPU as eats a buffer associated with the and the 3rd and mapping user mode and we sing added up in hiding them out whether the buffer easily would be the box he out there is a problem because if we if I did then PMI interrupted and in the same way the use of 1 application is not fast and is not able to detect the which Sibiu Aizerman as the buffer Fuller in our real-time waiting on num fast enough with then we have is we should the implementation named lamenting the user-mode code that's I mean that our user application as the and sport no 1 thread 42 1st of for each CPU and each thread there he thought you might be at the end I could that could function in that way when CPU as the buffer at this for the caudate exactly air right that the user corn backup AI we have tested that you get with you know yellow begin with the process and environment that if we don't they called the et URI time the binary log tool does for meeting I human-readable text that the performance of a really good I mean we can't see paying they some slowdown so something that you know I think that the rest of the application it they're speaking the somebody we have overcome this problem but the the only problem that we had is that money the money thing in with a credit application because as you reading all therefore see view point of view but that doesn't exist I mean this if you easily exceed will to some quotes that's not the air and this that annoyed feet so when he belongs to 1 3rd of another 1 OK air fuel title lounge comes good Hispanic upgrade understand that you would find that the that is it's not the end test and the process is then you are combined the process that the full another process these III and an example of increasing and complex of the even understand the process that for another process or a multithreaded and the ice could be could be on a problem for our pressing about with this because of the because of that and Fukuda that that we can use that we can even divided by Beijing the formation pockets of each process and and use the process of raising only there any starting eats through their the thing by the up 3 but there we have a big drawback because the size of the log of is huge I mean in that way you can you trace or that the process of the in the user that once a user model or devoted accordant with the and these these are and is a problem the 2nd way to overcome these these could be to reduce process thread that creation cut back and get more than than the trace only 1 process of time the solution is simple and it works but sometimes is not a set of if would because for example some odd where some complex components like for example Microsoft water that had acquired the direction from the air from the front process and this is a problem but that we had the researching now and a new way to do this because the air originally we would like to air enable processor-based by each trait by you know I'm using construction betray that he's a known only 9 in the software on the Windows Canada no out to 1 to the threats and in our region idea would go I was tool in December the threat that the context is which are called the and and save the minority or that the more the specific register use it in process of race to win 2 1 at 1 idea and then restore back when the a context feature is stored in the region the the and what I was doing these I was a manually saving more than what the specific register on on on on exam about fair balked someone has been blind to me any sensible of another very equally and that these and not is not known by the research community but is very useful to be member the or the push the sectioning CTC Executive to basically what it does is that it that pushes forward they in up opposite edges that are in use in you user-mode stock directly by using 1 instructions now that you know not 64 BP environment that there's something like it doesn't exist anymore laughter you better as me that these the new query instruction quality exceed the CSE visa high annual court the you know the and 64 are instruction set that that that basically see this saves only the and some they extended the red is the sum of their energy that belongs to the interlocutor Arctic ocular on a specific idea I mean I have also found that the AEC CAD suction conceive MMX SST edX any uh registers and he II written everything he had that the who what their he's 80 exceed 500 lab because you the existence the because feature about this because the troubled did succeed is that you can see the the date uh reduce said that we don't think that process of grace and they knew in that memory protection extension and is very cool because it using only 1 instruction we can see that would be in registers that belongs to the process of things directly you in a very very fast monitor without any problem if you open that my money we find that the users cool do use these this actually some beat the complex because there is an XAB structure that is only the ontology user models and to set the what to say you have to set an extended comparators there in use and what the user manual a suction that's coded it except
the but them tool be able to say that what the specific registered directly and you have to use another instruction you have to use the exceed s that the issue should he means exceeded supervisor and said and would that to you see the 9 can the more the I either their supple completed CSE and it was very funny because when I having the implemented user and have found that the new win the standard error context we should there already implemented this up of the succeeded but only for user model unintentional was the tool and find a way to win intercept the order that the keys what context something that is they can and will that Windows users pool from the context switch from 1 thread to another thread if we are here would be between this set these we can see that within what is the gratis their belongs to the set of base directly 9 an idea and then there's the up in that way we can implemented the tracing by thread that these arms are completely software uh and I not completely these now completely softer point of view not that we found some problems because as you probably all you can uh touch in any can in modern office sucking any can mode function in an office away using their for example and Walker I I don't know what the US and UK could be guassian or whatever because otherwise Pascal with Bruce cannot that your system and there some we found that these ways not visible in the public system I mean we can use our the box system in about environment you can do that too and but that doesn't the run and it's not a problem but is not that have yet will we now collections them the 2nd the solution that we found that it's the usage of the usage of ET w if you'll check the condition about it you gotta there is a way to win this at they could the context we share but it's it's serve the react we ask the user doing some research because of the in the EPI are very complex and we are trying to work get if we are able to using a legally anyway etw tool implemented that the the the the the uh the problem that provided by the trees by trade OK and other feature of the day the you really is is that uh um wiII data they either forward now fully support that they in canon motor racing we had like prevented the 11 you'll get military eyes that you can spare parts to your blog and you were you can use the money to decide what to trace the would decide how to trace and to do whatever from again and either backed we were not happy with this because we would like even to implemented in the tracing from my user multiplication then we have a we have created some high use against that you can communicate with the which might I with our life and in he said to do Air Canada tracing directly you know from our user model and that the movies we have overcame a lot of security and had problems but now that you can do that they're from a our use of with application you can even trace the in Canada good I say II wire with yeah that they're in this way we he able to for example trace the loading or unloading of can a model or even if you maybe you are studying some high see adenoviral Quito whatever you can even trace only the IOC get called the because as you know the I use it is again in Canada a broadened government out of the process said that a synchronous a synchronously when I use a modification of the cost but I guess summer with quick was about how to use the the the diver the as you can see the the code is quite simple 1st of all you have to grab of the of followed device if quite easy data dividing his name in the windows in that BP dead after that you have to move the and some other perspective namely that PTU very that is in the sorry quest and then you have tool after the in high your and management to sense the the uh uh I use against lower device a specified in user requests um in after all they end up this this they did trace starts you can decide the Web Tools public these using our diversity and this is very important because if you close the application you know without doing that it means that the dual processor the steel tracing of something there and that this could the ladder a problem if you try to unload our glider because did the processor is not clear the not that the diver is able to detect these and to overcome these but like and it's a good part of good proxies to do that form with the process circle that you cannot fool not CI user-mode thread that without any problem and then from the user model that let you have to wear any the on to call the ideal and manageable and send ID and adhesive PMI routine high UCT lower device that thought then you wait you week and the the rope in the the booking fees look and the only probabilities yeah things that they carry is that that is up on a meeting there is the packs a function that is needed highly have table or not you have to be set to Net weight every time they and the CPU after get the the code that as there we decoded the without any problem OK now it's time for the more I don't know the how of never very uh we have to be very fast OK I've regretted that for you OK you can see the OK as you can see here that is the code of a very simple application the only of some question to the user let's try to run it than no
Kevin trace target process and is our simple application under you can see here idea right OK just the moment How many CPU I had the at the beginning let's do 1 only once if you you asking you to increase the font size said and you can probably does merits since we switch yet in thinking that is that the about so always setting this up all all recap a little bit about what you know just 1 over so a new driver when we presented last year in June all we had shown is the wrong capture of the binary traits and
so those packets so taken not taken taken IP and timing was although we had available so we had to figure out how to decode this and is going to show you tracing in different modes and visualizing and IDA cells or they could use you know OK let's go overblowing beginning for 1 only 1 process of the how many Econe confidence and look at the guys the that's a 3 the that's fine with 3 OK the application ascended the cedar and you are a good talk that it let's set these
exit and like cool air in an open that they exhibit they would eat you not using high OK here that is the code Our goal is to raise the software we developing and IDA plugging that does this for us let's speedy deleted the text the log wait a moment a guide on except from but their heads against the this is exactly the code that the does run the exception was not all on that it was in
and uh so we use a library from Intel called live ITT it's open source and it provides the decoding of the binary traced to a text file that were personal plugin for right now we have where will integrate the decoding with the plot and again I wish I would like even to show you what
the same what using with the processor environment that's the way is the part of the process
how many profile many processes for and now that the 8 conferences that with some different and you can see in the in the summary each process societies all wanna Airbus there with different number of pockets look at the
East these dump as you can see are different and binary fights in text slide 42 processes now increase the font sites just a moment then use the ice yeah as you can see these these are number 1 it does all the air uh pocketed the AP cannot taken can indeed be the Jedi initially able and disabled because the context which of windows that is reached toward the front brake this is why the but for example only have locking we are not lucky there is even some of them 2 logs that are quite clear because the thing is that between those as executed these the and it's a could be able only for a small period of time this is not the case because luckily we need the contigs which it is an honor for a dedicated 42 different CPU but
sometimes it happens that even the log of 4 I wanna and processes empty the and this is our implementation communal textiles again took it as 1 a point out so when you read this text files you can see that the for indirect branches and return addresses we actually get the full 64 bit target address but if you look at like the T. Sosa indicating whether not a conditional branch was taken so they only store a single bit that determines whether or not you took the true or false branch so you have to recover that later on and disassemble in real time to recover what the target addresses were for this conditional branches if time of year I would like to show
even explaining but the more that the air uses catmint tracing directly for user model for these then why have virtues in the ECB divider not for a specific reason because it was randomly chosen and these I have found that the deceased and cause these interfaces a lot of times let's try to do even the user is due to take even those the the explorers mean block amazing
OK if if let's say
yes acpi not cease that they go of process of see Buehler just 1 from this time OK because I'm tracing do something and then stop
me I do for example some movement on there my PC and then at the point the point of time we say stop the In this
time we were quite lucky because the as you can see the professor number 1 I think in some pockets let's see what the
are those pockets OK as you can see it and that is not so a lot of a lot of time uh reduce a lot of things to reduce the the about something he's as being taken let's try to use it however blogging tool be able to face what's because the the the code of NCBI unfortunately ad nauseam of because he's on and on and and fight the areas of new or following the standard but that's right OK the bloody Morgan you can see if you can see now because the guy entities and never been quoted has been already quality in the in when you have a switch it on your system but the PC that or that is out of line qualities that the spatial this could be any use again or whatever that's like to go yes these are
I use again you can see just a moment that we the these are the should be blindly without knowing anything about the interface of the driver we can probably say that these any idea you yet because the code is executed a lot of lines because for the color of the countries that have and it means that the CPU as it's good is a functional defines is again see that our pattern-based the holiday branches if is the optimal use if it's not can and you can see that is the branch and all the branches of the place and that is the medium of of today and get the from user models from gamma motif you develop your by that you had even been in year would to trace the driver and 3 or and I don't know the routine I can show that I now because for us we don't have time in 2nd I my computer is regionally analogies environment because for doing that of course that we use we can't use our Sun Yat library but expensive or the if you do that in IEEE write your comment and you signed your pen the diver you and you can that you can do whatever you would like the OK let's go to the back
to the so cells which on OK yes
the acid a recap of the driver now supports kernels and user-mode tracing you can filter based upon this 0 3 so a single whole process you can trace the entire kernel space or you can isolate a ranges of contiguous IPD and that new up to 4 different ranges so I'm not going to demonstrate this in a practical real-world scenario inside of a cell the so we have assessed racing engine and and you'll see some performance numbers the role here but in the manuals they are targeting of 5 to 15 per cent trace overhead for the entire system so upper core you should build a trace both kernel-mode operations in user-mode operations were only 5 15 % and all actually build a show you that from and how that works in that as a justice also of the phone yes good thing up and the so how do we use the survival Italy discovery and so and who here is familiar with American puzzle up has used it's that we have people who devising in the crowd became a good portion so in the last few years we've seen a ball and evolutionary jump parsing technology basically we've gone from using dumb buzzing over a grammar-based fuzzing that's was unable to determine whether or not the samples that were being generated were useful to applying a new technologies or an engineering a an older technology into something that's performance of to be used in a we call Evolutionary Fuzzing so we take the idea of dumb closing mutation and we combine it with the ability to collect a feedback signal using code coverage and then we assess the fitness of that new randomly generated and put against the entire lifetime of your fuzzing cycle so basically what we can do is we can look at this code coverage information to determine if this newly generated input actually gets us to a different part of the code and if it does that will introduce that into our entire pool samples and continue to mutate involves those as we go so over time were refining our set of inputs and getting a were building a corpus and each 1 of those inputs exercise a slightly different part of the code and in effect what we've seen is the last of 3 or 4 years has been available on and this is highly optimized your compute time when it comes to doing dumb buzzing um and so the uh the last couple talks I've given have focused on this technology and so I encourage you to go look at previous slide that's which on the recon website or my website flowed upward but basically through researching this result that the main things that we need to effectively deploy this technology is we need a fast racing engine and a course that was the inspiration to look into Intel processor trace because the promise of 15 % over against closed-source binary software is pretty incredible compared to the technologies that we have available before previously hardware tracings not nude and so some since the P 4 there's been the ability to do hardware tracing and those mechanisms called branch store the BTS which is works in a similar fashion but was not designed in a way that was optimized and to physical RAM it did it polluted your cash and things like that so we saw of massive slowdown and then you have another option which is called a last French record which is solely 32 registers in modern processors and those only give you last 32 branches so you have to interrupt every time every 32 branches to the past that use that ends or you have to write a driver that flushes them out to a different caches and do other things so while this is a new this is designed for the 1st time to be highly performance up until Intel processor trace is actually faster to use software-based traces and things that we do dynamic Binary Instrumentation using dynamo Rio opinion or and something like that so I'm now we have this fast racing engine that's great but we need fast logging which is something that we get out of the design of a bell Aapo uses a bloom filter that allows you to quickly look up whether or not you from done the same code coverage so a set of passing of a text file a binary file is just a list of addresses of basic blocks consecutively and we'd actually pass that real-time and fill of bloom filters so you can just check to see if the 64 K RAM is identical to another 64 K RAM instead of doing a comparison of each addressed and then um through some the research there's been other attempts at Evolutionary Fuzzing starting about 2004 or 2005 larger Maat did some of his PhD research on this and so I get something called the evolution of wasn't system but it was based upon and basically the BTS recorder or a debugger breakpoints is quite slow as tracing and it was also overengineered and trying to bomb incorporate too much of the research that's an evolutionary like biology side of things so so the key is to have this fast racing engine of efficient loading and to keep them the analysis to the minimum so 2013 of models so as use contributed a lot a great stuff to our industry and produced the 1st performance open-source revolutionary father called American puzzle up and it uses a pretty comprehensive list of the types of housing strategies whether it's that flipping by flipping word flipping and so on and crossovers and various types of mutation and uses originally used block coverage via a plugin all a post process of the GCC compilation it would compile your code to assembler and then annotate is similar to add callback hooks every basic block entry point and then that's onto to the modified source code that's how you got and code coverage then as I mentioned he takes those edge transitions basically ships in awesome together and then increments offsets into this bloom filter or or bite map so that is basically able to track you know whether or not you seen this edge
before now you can get out what those addresses were originally because it's simply an offset into this a mapping but you can very quickly look up have I been here before and that's all we care about as far as went out to keep it simple I in the course it was written by on top deposits API so it wasn't Windows-compatible out the gate and the benefits that has the chats edge transitions and not as block entry and it uses the bloom filter they handle for server built into it so basically after your processes initialize it waited until all your libraries are loaded on all the linking and everything is done and then once you get to the pass code but would fork and so you skip all the initialization time which is an optimization I'm and then very importantly he introduced persistent mode fuzzing which is an in memory of what type of fuzzing where you're not exceeding and recreating a process every time you're giving it a up pointer to a function and the number arguments of function and say OK once you X at this section of code start over again and take on new inputs as inputs to this function so that also reduces the Monaco bigger tracing and executing and down to the middle points as optimization and then up importantly you can use this to build a corpus of open source software it's very fast and so you can use those inputs into your pipeline on may be slower or more heavyweight analysis on other types of funding so the way they do the tracing as every block it's a unique idea the edges are in that so that map at a price that has she lingo shift and XOR and then we increment the map so and this was great I was looking into this and had optimize this and bring this to the Windows platform I'm obviously we can use what would we cases but its rotation for the majority of its starts off a were targeting so we needed something they could do a binary targeting um and so this seemed ideal versus the other options of using Pinot animal Rio and so on so last summer but you know it's and start looking at this well around the same time of my talk that entail that we found last year Ivan fractured from Google but the thing is part of 0 or global security release of when FL which was a port of Michael Lewis's is fell to windows using Donnamarie was a back and uh ah yes familiar with painted animal Rio they're basically loaders for your program and as you visit each new basic block of code that caches that allows you modify it in real time so I'm it's a kind of having the Valgrind works the same way if you're familiar with that so it was used as a back end and it's really cool it works it was like the 1st thing you just go down the right now and start closing Windows GDI and then 5 minutes beautiful and the biggest thing that's allowed it to be a a performance is that it uses this persistent mode where doesn't the process so these floaters like Canada animal real they have to disassemble your program to instrument them and so he was able to stand in I did some experimentation on trying to do for working in windows previous toxin it turned out just roll pain so I'm using persistent mode you get things like the odds were about as a lot of your eyes the process and you don't have to read and Theodoridis a similar process every time because you're using that code had so when FL turned out to be pretty well engineered um and basically you can tell how many iterations to persist like you know maybe do a thousand iterations and then go ahead and exit and restart the process so that we can if we have any memory leaks are not quite cleaning up properly I Hamlet through delaying the restart but now possesses is key because every time you blow this and the DTI and but this somewhat so if we were to just do it every time we get to execution the 2nd on this GDI + down on a show you and if we persisted 100 times a for restarted 72 execution the 2nd and so on it reaches its ceiling somewhere around so a thousand or so iterations so uh we're now integrated are OPT driver into where Bell as an alternative uh tracer engine and stuff and that this bring some problems because the reason I had show you the text version of that dumped was that we don't have all the addresses in the log files so we have to recover some of those along the way and to the we don't have persistence mode working quite yet unfortunately I have done some experimentation here it's around the corner and by the next time we presented to happen a box this will be available on building my tooling on top of Alex scientist is great work from last year recon on of the application thereby hoping system and and so basically this is what I'm using the IP filtering mode so you can specify up to 4 deal else that you wanna trace or you know it for models were others ranges in the process the were trace and so on and the current status now is that we do accurately decode the full trace so doing disassembly online and using a cash or the control flow graph the recovered so I 1st look to see if we've already resolved what this upcoming conditional branches if so that's a quick index if not have to disassemble forward to determine the targets with initial branch and store and those inner structure of the edge and source destination recorded as expected so were not reduced 2 basic blocks we actually do the edges and so um currently just using preprocessed so we're doing this iteratively rather than 40 a persists so i'm in order to determine the performance of the tracing mechanism I 1st made of a dummy looping benchmark so basically it was just reading a process and waiting for 2 and so we got kind of maximum bounds on how many iterations will be able to execute with the sample I'm in this case I found that we could get 85 excuse the 2nd without doing any tracing and were just generating the follows the input and running it without some anything like that were not passing the log file added so once we enable a tracing of that was reduced to 72 execution the 2nd which is right and that sleep zone of 15 % over that Intel's promise for this particular sample so parts the log file was an additional 22 per cent overhead so now we're down to 55 execution the 2nd and so on all demo what this all means for you here so
low compared to the I wanna show you sort of fuzz g up plus this is the experiment that comes with the when FL out the gate and you just pass it's an image files and uses Windows to render them from without rendering to explain and so this is a live demo of that working so currently there's is using dynamo real and persistence mode with the maximum number of iterations possible and we see that it's getting 127 execution the 2nd the lighting is not good here but unfortunately but hopefully you guys can see the little bit I can increase the farm and this quickly here so yes so we're looking at here is this number specifically so we get 126 execution the 2nd using Donnamarie overseas to yet so let's see how are the windows PC driver performs in pairs of so that's new decoding lot yeah so what's redirect that norm OK so we're seeing with the overhead of the 15 % were tracing and then the additional overhead a 22 per cent for decoding and this will creep up a little bit but we're only getting about 40 40 something execution the that's a little bit disappointing with and were hoping to see a little bit faster now have to keep in mind again that this is just iterative tracing were not doing in memory buzzing so once we get to doing a memory puzzling this number will succeed increased significantly however this is not in the story I'm a original doing doing all my testing against their set up with the GDI + wrapper um and as you can see in my command line here this is tracing only the Windows XTO energy i + still in the process so I mean another and demo in order to compare the performance and this time will trace live PNG using when I have felt and so this is just with a with PNG statically compiled into a small harness that will the load a PNG file it offers it and then it will do and passes through a PNG and we see that the performance becomes quite abysmal actually so this is the dynamo Rio and that's expected about were saying only a half an exit . 5 executions per 2nd so this is quite slow obviously is not where we want to be and and this is because censuses statically compiled all the code involving with PNG including the encryption of our compression and things like that are included here which causes some issues In the dynamo real back and that is the fun part we have a constant overhead when we use Intel PC so using Intel PTA and we can see that were backed up and that the slope creep up to about 55 60 execution the 2nd so instead of being only happen execution 2nd this is 100 times faster than the dynamo real back and in this doing cover could cover trees and finding new house and so on so and so this and and we've done the times performance increase and depending on your target application and is not specifically chosen this which is randomly I had this puzzling around and uses the and so that's my demos for this point into 1 FL thank you and so just so in closing remarks now and no
reward all code that we're in is already open source are going to be open source were on get help slash tell pt real simple the driver is already there and actionable pull request Mandarin over last nite so that the latest version is on have it'll be merged to die and the way a bold new low clean up I was up till 3 30 last nite making my final preparations so of that will be up next week I'm and then we just have a few more things that we need to address but obviously you know we see that could coverage is being finally harnessed to make our posing a better weekend low this information into either too loud to our analysis of the crash or of malware or whatever it might be and using the hardware support tracing engine we don't have any issues like you have with other software-based instrumentation and hooking engines you know once so you know 1 of our future plans to get this into a hypervisor so that we can trace the gas inside of a hypervisor and then you're and you know it you a tracing will be basically you know on observable and you will be able to disable it and bit in that method there are a um capabilities of deploying until the to trace things like SGX modem SMM and be and there's a kind of future areas of work and and then also the my goal is of course to get this fully supported with persistent modem everything as well and 1 thing that we need to do is part of that is to finish the DTW-based thread context which awareness because we need to seperate these logs out into a per-thread instance from otherwise you have to use the timing information and to determine where the synchronization is between the threads and that slows down the person a lot so our goal is to get the logs individualized before you do your person and so you'll have to pass through thread that you care about that's doing your you know you're file network I O and some and to Germany for the common solutions and just wanted to say that the new better that we are going to release in our example the KSaver feature and fidelity you can test data that at the end it's quite a pool of features
of this in mind you might be OK so you can get this code i and you can reach a silent sweater and thank you very much few
think we might have a minute for questions just on time various is the a the it 8 but
tanks like you trace inside the 1st emission uh so currently there are read dotted hypervisors the currently exposes some virtualized so for example and the other hardware tracing modes the hypervisor has to virtualize the support correct and science this does exist like for example and be where using the PTS mechanism but Intel to use rather new so there are going to have a resistor available yet so we are either going to have to modify Xanana KBM and we've been actually even just today talking about perhaps being able to trace the entirety of the hypervisor and then later on pull out only the use of a processor kernel threads that you're interested in so currently but absolutely will continue working on this and so we get the so area and I know that that will hopefully be applicable for gruesome sandbox so that even tho the thank so you can feel free to grab us thank you very much for your attention if
Demo <Programm>
Prozess <Physik>
Adressraum
Versionsverwaltung
Datenmanagement
Orakel <Informatik>
Information
Systemzusammenbruch
Analysis
Computeranimation
Übergang
Intel
Fehlertoleranz
Freeware
Reverse Engineering
Bildschirmfenster
Computersicherheit
Notepad-Computer
Stützpunkt <Mathematik>
Vorlesung/Konferenz
Druckertreiber
Prototyping
Kraftfahrzeugmechatroniker
Softwareentwickler
Sichtenkonzept
Prozess <Informatik>
Computersicherheit
Malware
Übergangswahrscheinlichkeit
Biprodukt
Dateiformat
Codierung
Ereignishorizont
Entscheidungstheorie
Software
Funktion <Mathematik>
Gruppenkeim
Wurzel <Mathematik>
Windkanal
Schlüsselverwaltung
Zentraleinheit
Prototyping
Proxy Server
Subtraktion
Kontrollstruktur
Wasserdampftafel
Sprachsynthese
Patch <Software>
Physikalisches System
Pufferspeicher
Informationsmodellierung
Software
Proxy Server
Softwareschwachstelle
Zusammenhängender Graph
Gruppoid
Coprozessor
Softwareentwickler
Hardware
Architektur <Informatik>
Open Source
Systemplattform
Coprozessor
Programmfehler
Mapping <Computergraphik>
Programmfehler
Druckertreiber
Thread
Existenzsatz
Flächeninhalt
Softwareschwachstelle
Mereologie
Codierung
Computerarchitektur
Stapelverarbeitung
Gewichtete Summe
Programmverifikation
Parser
Textur-Mapping
Information
Extrempunkt
Hecke-Operator
Raum-Zeit
Computeranimation
Homepage
Richtung
Netzwerktopologie
Fehlertoleranz
Intel
Geschlossenes System
Typentheorie
Fahne <Mathematik>
Speicherabzug
Elektronischer Programmführer
Betriebsmittelverwaltung
Phasenumwandlung
Softwaretest
Geschlossenes System
Sichtenkonzept
Kategorie <Mathematik>
Zeiger <Informatik>
Kontextbezogenes System
Optimierung
Seitentabelle
Software
Verbandstheorie
Rechter Winkel
Festspeicher
Ablaufverfolgung
Pufferspeicher
Tabelle <Informatik>
Fehlermeldung
Stabilitätstheorie <Logik>
Subtraktion
Teilmenge
Multiplikation
Kontrollstruktur
Wasserdampftafel
Content <Internet>
Interrupt <Informatik>
Whiteboard
Homepage
Pufferspeicher
Informationsmodellierung
Differential
Theoretische Physik
Datentyp
Virtuelle Realität
Diffusor
Thread
Maßerweiterung
Datenstruktur
Modul
Tabelle <Informatik>
Architektur <Informatik>
Speicherschutz
Multiplexbetrieb
Raum-Zeit
Browser
Verzweigendes Programm
Indexberechnung
Einfache Genauigkeit
Existenzsatz
Thread
Overhead <Kommunikationstechnik>
Streaming <Kommunikationstechnik>
Resultante
Chipkarte
Bit
Punkt
Prozess <Physik>
Adressraum
Fortsetzung <Mathematik>
Kartesische Koordinaten
Datensicherung
Komplex <Algebra>
Marketinginformationssystem
Übergang
Komponente <Software>
Fahne <Mathematik>
Gamecontroller
Existenzsatz
Bildschirmfenster
Benutzerhandbuch
Multitasking
Flächeninhalt
Druckertreiber
Funktion <Mathematik>
Umwandlungsenthalpie
Nummernsystem
Konstruktor <Informatik>
Äquivalenzklasse
Viereck
Prozess <Informatik>
Abfrage
Übergangswahrscheinlichkeit
Übergang
Codierung
Spannweite <Stochastik>
Arithmetisches Mittel
Magnetkarte
Datenstruktur
Funktion <Mathematik>
Phasenumwandlung
Dateiformat
Garbentheorie
Datenfluss
Versionsverwaltung
Zentraleinheit
Aggregatzustand
Web Site
Quader
Befehlscode
Physikalismus
Schaltnetz
Implementierung
Zahlenbereich
Sprachsynthese
Kombinatorische Gruppentheorie
Zentraleinheit
ROM <Informatik>
Term
Kontextbezogenes System
Puffer <Netzplantechnik>
Task
Interrupt <Informatik>
Software
Digitale Photographie
Adressraum
Zusammenhängender Graph
Strom <Mathematik>
Optimierung
Gravitationsgesetz
Ereignishorizont
Drei
Hardware
Leistung <Physik>
Binärcode
Matching <Graphentheorie>
Ontologie <Wissensverarbeitung>
p-V-Diagramm
Physikalisches System
Paarvergleich
Hochdruck
Quick-Sort
Ordnungsreduktion
Office-Paket
Coprozessor
Energiedichte
Diagramm
Druckertreiber
Mereologie
Codierung
Mini-Disc
Stab
TLB <Informatik>
Adhäsion
Bit
Konfiguration <Informatik>
Prozess <Physik>
Punkt
Web log
Momentenproblem
Programmverifikation
Kartesische Koordinaten
Information
Hecke-Operator
Computeranimation
Intel
Netzwerktopologie
Komponente <Software>
Datenmanagement
Geschlossenes System
Bildschirmfenster
Flächeninhalt
Druckertreiber
Funktion <Mathematik>
Koroutine
Datentyp
Sichtenkonzept
Prozess <Informatik>
Computersicherheit
Güte der Anpassung
Übergangswahrscheinlichkeit
Ideal <Mathematik>
Kontextbezogenes System
Codierung
Motion Capturing
Strahlensätze
Magnetkarte
Datenstruktur
Funktion <Mathematik>
Verbandstheorie
Menge
Rechter Winkel
Konditionszahl
Client
Ordnung <Mathematik>
Schlüsselverwaltung
Zentraleinheit
Tabelle <Informatik>
Proxy Server
Maschinenschreiben
Gewicht <Mathematik>
Quader
Befehlscode
Patch <Software>
Zentraleinheit
ROM <Informatik>
Kontextbezogenes System
Stichprobenfehler
Mehrkernprozessor
Pufferspeicher
Benutzerbeteiligung
Multiplikation
Informationsmodellierung
Bildschirmmaske
Perspektive
Software
Affine Abbildung
Koroutine
Thread
Coprozessor
Demo <Programm>
Modul
Kreisfläche
Multiplexbetrieb
Gibbs-Verteilung
Physikalisches System
Coprozessor
Office-Paket
Druckertreiber
Thread
Mereologie
Codierung
Stab
Prozess <Physik>
Momentenproblem
VIC 20
Zellularer Automat
Kartesische Koordinaten
Ablaufverfolgung
Computeranimation
Intel
Softwaretest
Bereichsschätzung
Software
Adressraum
Affine Abbildung
Schreib-Lese-Kopf
Modul
Binärcode
Nummernsystem
Elektronische Publikation
Prozess <Informatik>
Übergangswahrscheinlichkeit
Ausnahmebehandlung
Coprozessor
System F
Einheit <Mathematik>
Codierung
Zentraleinheit
Nummernsystem
Binärcode
Subtraktion
Elektronische Publikation
Prozess <Physik>
Prozess <Informatik>
Open Source
Zahlenbereich
Übergangswahrscheinlichkeit
Plug in
Plot <Graphische Darstellung>
Elektronische Publikation
Computeranimation
Coprozessor
Intel
Softwaretest
Geschlossenes System
Adressraum
Mereologie
Speicherabzug
Programmbibliothek
Codierung
Coprozessor
Modul
Bit
Web Site
Elektronische Publikation
Punkt
Prozess <Physik>
Momentenproblem
Adressraum
Verzweigendes Programm
Implementierung
Zahlenbereich
Kontextbezogenes System
Frequenz
Login
Binärcode
Computeranimation
Intel
Rechenschieber
Echtzeitsystem
Adressraum
Bildschirmfenster
Speicherabzug
Speicherabzug
Versionsverwaltung
Zentraleinheit
Fehlermeldung
Dualitätstheorie
Prozess <Physik>
Datentyp
Prozess <Informatik>
Konvexe Hülle
Weg <Topologie>
Übergangswahrscheinlichkeit
Teilbarkeit
Computeranimation
Intel
Informationsmodellierung
Softwaretest
Interaktives Fernsehen
Druckertreiber
Zentraleinheit
Schnittstelle
Nummernsystem
Prinzip der gleichmäßigen Beschränktheit
Punkt
Prozess <Informatik>
Zahlenbereich
Mathematik
Übergangswahrscheinlichkeit
Physikalisches System
ROM <Informatik>
Computeranimation
Coprozessor
Intel
Bewegungsunschärfe
Softwaretest
Affine Abbildung
Adressraum
Speicherabzug
Codierung
Surjektivität
Druckertreiber
Gerade
Zentraleinheit
Normalvektor
Momentenproblem
Konvexe Hülle
Minimierung
Verzweigendes Programm
Zellularer Automat
Mathematik
Zentraleinheit
Computeranimation
Informationsmodellierung
Bewegungsunschärfe
Druckertreiber
Datenverarbeitungssystem
Programmbibliothek
Codierung
Motiv <Mathematik>
Kantenfärbung
Gammafunktion
Gerade
Schnittstelle
Resultante
Punkt
Prozess <Physik>
Extrempunkt
Compiler
Atomarität <Informatik>
Adressraum
Extrempunkt
Binärcode
Raum-Zeit
Computeranimation
Komponente <Software>
Wechselsprung
Nummernsystem
Computersicherheit
Ereignisdatenanalyse
Kraftfahrzeugmechatroniker
Nichtlinearer Operator
Filter <Stochastik>
Hardware
Fuzzy-Logik
Quellcode
p-Block
Ein-Ausgabe
Konfiguration <Informatik>
Rechenschieber
Generator <Informatik>
Menge
Evolute
Strategisches Spiel
Information
p-Block
Overhead <Kommunikationstechnik>
Ablaufverfolgung
Schlüsselverwaltung
Fitnessfunktion
Rückkopplung
Web Site
Subtraktion
Gruppenoperation
Abgeschlossene Menge
Zellularer Automat
Informationsmodellierung
Spannweite <Stochastik>
Datensatz
Software
Binärdaten
Stichprobenumfang
Datentyp
Coprozessor
Speicher <Informatik>
Ganze Funktion
Analysis
Soundverarbeitung
Algorithmus
Diskretes System
Open Source
Verzweigendes Programm
Plug in
Mailing-Liste
Physikalisches System
Paarvergleich
Frequenz
Elektronische Publikation
Cross over <Kritisches Phänomen>
Mapping <Computergraphik>
Druckertreiber
Übergangswahrscheinlichkeit
Rückkopplung
Fuzzy-Logik
Caching
Mereologie
Dreiecksfreier Graph
Debugging
Codierung
Speicherabzug
Wort <Informatik>
Demo <Programm>
Extrempunkt
Iteration
Benutzerfreundlichkeit
Computeranimation
Gebundener Zustand
Netzwerktopologie
Intel
Gleitendes Mittel
Quellencodierung
Verschiebungsoperator
Benchmark
Softwaretest
Kraftfahrzeugmechatroniker
Computersicherheit
Generator <Informatik>
Software
Rechter Winkel
Einheit <Mathematik>
Chatten <Kommunikation>
Festspeicher
Login
Server
Ordnung <Mathematik>
Multiplikation
Systemplattform
Überlagerung <Mathematik>
Loop
Leck
Spannweite <Stochastik>
Informationsmodellierung
Reelle Zahl
Datentyp
Programmbibliothek
Datenstruktur
Demo <Programm>
Analysis
Binärdaten
Booten
Konvexe Hülle
Open Source
Default
Verzweigendes Programm
Elektronische Publikation
Echtzeitsystem
Fuzzy-Logik
Bit
Prozess <Physik>
Punkt
Minimierung
Adressraum
Versionsverwaltung
Kartesische Koordinaten
Drehung
Bildschirmfenster
Flächeninhalt
Druckertreiber
Funktion <Mathematik>
Parametersystem
Fuzzy-Logik
Disjunktion <Logik>
p-Block
Quellcode
Ein-Ausgabe
Zeitzone
Gesetz <Physik>
Konfiguration <Informatik>
Arithmetisches Mittel
Verknüpfungsglied
Chiffrierung
Automatische Indexierung
Kontrollflussdiagramm
Garbentheorie
Overhead <Kommunikationstechnik>
p-Block
Message-Passing
Quader
Gruppenoperation
Zahlenbereich
Software
Front-End <Software>
Stichprobenumfang
Wrapper <Programmierung>
Ordnungsreduktion
Optimierung
Zeiger <Informatik>
Speicher <Informatik>
Bildgebendes Verfahren
Frequenz
Quick-Sort
Coprozessor
Mapping <Computergraphik>
Energiedichte
System F
Druckertreiber
Last
Mereologie
Codierung
Normalvektor
Brennen <Datenverarbeitung>
Bit
Gruppenoperation
Versionsverwaltung
Automatische Handlungsplanung
Systemzusammenbruch
Information
Login
Analysis
Synchronisierung
Computeranimation
Intel
Thread
Implementierung
Hardware
Analysis
Hardware
Datennetz
Open Source
Malware
Kontextbezogenes System
Elektronische Publikation
Codierung
Mechanismus-Design-Theorie
Coprozessor
Modem
Software
Druckertreiber
Thread
Flächeninhalt
Mereologie
Codierung
Information
Ablaufverfolgung
Instantiierung
Kraftfahrzeugmechatroniker
Hardware
Flächeninhalt
Thread
Coprozessor
Computeranimation
Zustand

Metadaten

Formale Metadaten

Titel Harnessing Intel Processor Trace on Windows for fuzzing and dynamic analysis
Serientitel REcon 2017 Brussels Hacking Conference
Teil 16
Anzahl der Teile 20
Autor Allievi, Andrea
Johnson, Richard
Lizenz CC-Namensnennung 4.0 International:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/32382
Herausgeber REcon
Erscheinungsjahr 2017
Sprache Englisch
Produktionsort Brüssel

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract This talk will explore Intel Processor Trace, the new hardware branch tracing feature included in Intel Skylake processors. We will explain the design of Intel Processor trace and detail how the current generation implementation works, including the various filtering modes and output configurations. This year we designed and developed the first open-source Intel PT driver for the Microsoft Windows operating system. We will discuss the architecture of the driver and the large number of low level programming hurdles we had to overcome throughout the development of the driver to program the PMU, including registering Performance Montering Interrupts (PMI), locating the Local Vector Table (LVT), managing physical memory. We will introduce even the new features of the latest version, like the IP filtering, and multi-processor support. We will demonstrate the usage of Intel PT in Windows environments for diagnostic and debugging purposes, showing a “tracing” demo and our new IDA Plugin, able to decode and apply the trace data directly to the visual assembly graph. Finally we discuss how we’ve harnessed this branch tracing engine for guided fuzzing. We have added the Intel PT tracing mode as an engine for targeting Windows binaries in the widely used evolutionary fuzzer, American Fuzzy Lop. This fuzzer is capable of using random mutation fuzzing with a code coverage feedback loop to explore new areas. Using our new Intel PT driver for Windows, we provide the fastest hardware supported engine for targeting binaries with evolutionary fuzzing. In addition we have added new functionality to AFL for guided fuzzing, which allows users to specify targeted areas on a program control flow graph that are of interest. This can be combined with static analysis results or known-vulnerable locations to help automate the creation of trigger inputs to reproduce a vulnerability without the limits of symbolic execution. To keep performance as the highest priority, we have also created new methods for efficiently encoding weighted graphs into an efficiently comparable bytemap.

Ähnliche Filme

Loading...