AV-Portal 3.23.3 (4dfb8a34932102951b25870966c61d06d6b97156)

Harnessing Intel Processor Trace on Windows for fuzzing and dynamic analysis

Video in TIB AV-Portal: Harnessing Intel Processor Trace on Windows for fuzzing and dynamic analysis

Formal Metadata

Harnessing Intel Processor Trace on Windows for fuzzing and dynamic analysis
Title of Series
Part Number
Number of Parts
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Place

Content Metadata

Subject Area
This talk will explore Intel Processor Trace, the new hardware branch tracing feature included in Intel Skylake processors. We will explain the design of Intel Processor trace and detail how the current generation implementation works, including the various filtering modes and output configurations. This year we designed and developed the first open-source Intel PT driver for the Microsoft Windows operating system. We will discuss the architecture of the driver and the large number of low level programming hurdles we had to overcome throughout the development of the driver to program the PMU, including registering Performance Montering Interrupts (PMI), locating the Local Vector Table (LVT), managing physical memory. We will introduce even the new features of the latest version, like the IP filtering, and multi-processor support. We will demonstrate the usage of Intel PT in Windows environments for diagnostic and debugging purposes, showing a “tracing” demo and our new IDA Plugin, able to decode and apply the trace data directly to the visual assembly graph. Finally we discuss how we’ve harnessed this branch tracing engine for guided fuzzing. We have added the Intel PT tracing mode as an engine for targeting Windows binaries in the widely used evolutionary fuzzer, American Fuzzy Lop. This fuzzer is capable of using random mutation fuzzing with a code coverage feedback loop to explore new areas. Using our new Intel PT driver for Windows, we provide the fastest hardware supported engine for targeting binaries with evolutionary fuzzing. In addition we have added new functionality to AFL for guided fuzzing, which allows users to specify targeted areas on a program control flow graph that are of interest. This can be combined with static analysis results or known-vulnerable locations to help automate the creation of trigger inputs to reproduce a vulnerability without the limits of symbolic execution. To keep performance as the highest priority, we have also created new methods for efficiently encoding weighted graphs into an efficiently comparable bytemap.
Open source Stochastic process Decision theory Multiplication sign Connectivity (graph theory) View (database) Device driver Water vapor Mereology Fault-tolerant system Event horizon Code Product (business) Software bug Revision control Mechanism design Prototype Different (Kate Ryan album) Energy level Endliche Modelltheorie Proxy server Information security Address space Vulnerability (computing) Computer architecture Area Stapeldatei Arm Mapping Key (cryptography) Military base Software developer Demoscene Window function Wind tunnel Software Personal digital assistant Device driver Speech synthesis Reverse engineering
Computer program Presentation of a group Context awareness Thread (computing) Stochastic process State of matter View (database) Direction (geometry) Multiplication sign Sheaf (mathematics) Combinational logic Water vapor Function (mathematics) Mereology Fault-tolerant system Tracing (software) Food energy Different (Kate Ryan album) Semiconductor memory Ontology Flag Cuboid Diagram Endliche Modelltheorie Office suite Extension (kinesiology) Error message Stability theory File format Constructor (object-oriented programming) Physicalism 3 (number) Bit Lattice (order) Window function Category of being Type theory Digital photography Arithmetic mean Benutzerhandbuch Befehlsprozessor Buffer solution Phase transition Interrupt <Informatik> Website Diffusion Right angle Summierbarkeit Quicksort Whiteboard Speicherschutz Resultant Asynchronous Transfer Mode Spacetime Point (geometry) Web page Implementation Backup Existence Sequel Connectivity (graph theory) Electronic program guide Device driver Branch (computer science) Code Power (physics) Number Latent heat Structured programming Term (mathematics) Reduction of order System programming Energy level Software testing Data buffer Address space Complex analysis Pairwise comparison Matching (graph theory) Differential calculus Cartesian coordinate system Table (information) Maize Computer animation Integrated development environment Software Query language Network topology Speech synthesis
Context awareness Thread (computing) Stochastic process View (database) Multiplication sign Coroutine Set (mathematics) Function (mathematics) Mereology Perspective (visual) Web 2.0 Cuboid Circle Endliche Modelltheorie Office suite Information security Arm Moment (mathematics) Bit Lattice (order) Window function Data management Befehlsprozessor Order (biology) Right angle Asynchronous Transfer Mode Point (geometry) Standard error Motion capture Device driver Canonical ensemble Code Goodness of fit Touch typing Ideal (ethics) System programming Proxy server Condition number Adhesion Multiplication Key (cryptography) Weight Cartesian coordinate system Table (information) Computer animation Software Integrated development environment Blog Network topology Multi-core processor Intercept theorem
Confidence interval Stochastic process Ferry Corsten Cellular automaton Multiplication sign Moment (mathematics) Cartesian coordinate system Disk read-and-write head Code Computer animation Software Asynchronous Transfer Mode Exception handling
Computer animation Computer file Open source Integrated development environment Stochastic process Different (Kate Ryan album) Plotter Mereology Code Plug-in (computing) Library (computing) Number
Point (geometry) Slide rule Context awareness Implementation Stochastic process Multiplication sign Moment (mathematics) Binary code Bit Branch (computer science) Real-time operating system Login Number Window function Frequency Computer animation Personal digital assistant Core dump Website Address space
Computer animation Divisor Stochastic process Interface (computing) Multiplication sign Endliche Modelltheorie
Point (geometry) Computer animation Multiplication sign System programming Line (geometry) Code Number
Motif (narrative) Interface (computing) Multiplication sign Cellular automaton Moment (mathematics) Device driver Branch (computer science) Special unitary group Line (geometry) Graph coloring Computer Code Befehlsprozessor Computer animation Endliche Modelltheorie Mathematical optimization Library (computing)
Dynamical system Group action Stochastic process Multiplication sign Range (statistics) Source code ACID Numbering scheme Survival analysis Set (mathematics) Mereology Tracing (software) Mechanism design Strategy game Different (Kate Ryan album) Computer configuration Core dump Chromosomal crossover Endliche Modelltheorie Electric generator Mapping Block (periodic table) Closed set Binary code Electronic mailing list Sampling (statistics) Fitness function Sound effect Maxima and minima Entire function Type theory Filtration Data storage device output Website Cycle (graph theory) Resultant Row (database) Spacetime Point (geometry) Slide rule Overhead (computing) Computer file Open source Device driver Branch (computer science) Coprocessor Binary file Code Operator (mathematics) Computer hardware System programming Stochastic kernel estimation Fuzzy logic Compilation album Plug-in (computing) Address space Pairwise comparison Key (cryptography) Information Cellular automaton Debugger Mathematical analysis Evolute Cache (computing) Word Software Fuzzy logic
Computer program Group action Stochastic process Ferry Corsten Multiplication sign Range (statistics) Sheaf (mathematics) Real-time operating system Function (mathematics) Parameter (computer programming) Mereology Food energy Front and back ends Optical disc drive Online chat Medical imaging Exclusive or Mechanism design Computer configuration Semiconductor memory Encryption Cuboid Endliche Modelltheorie Information security Logic gate Data compression Rotation Covering space Electric generator Mapping Block (periodic table) Wrapper (data mining) Structural load Sampling (statistics) Maxima and minima Bit Benchmark Demoscene Window function Type theory Arithmetic mean Filtration Message passing Data storage device Order (biology) output Normal (geometry) Moving average Right angle Quicksort Asynchronous Transfer Mode Point (geometry) Server (computing) Overhead (computing) Computer file Open source Real number Device driver Branch (computer science) Code Number Revision control Programmschleife Structured programming Software testing Fuzzy logic Booting Address space Mathematical optimization Computing platform Time zone Shift operator Demo (music) Bound state Mathematical analysis Cartesian coordinate system Leak Subject indexing Pointer (computer programming) Computer animation Software Personal digital assistant Network topology Iteration Control flow graph Library (computing)
Group action Context awareness Thread (computing) Open source Computer file Multiplication sign Device driver Mereology Login Code Tracing (software) Revision control Crash (computing) Malware Synchronization Computer hardware Modem Area Information Mathematical analysis Planning Bit Instance (computer science) Computer animation Software
Area Mechanism design Thread (computing) Stochastic process Multiplication sign Computer hardware Asynchronous Transfer Mode
I Rabab map Member Member mom mood now some you'd move a long while OK as in 17 OK
before the briefings of the troubled I'm under LEED I'm armed security search and it's a good decision you of Microsoft now I have a look at for 2 years for the tunnels so this is the Palestine and that's size of the witnessing Danilov and low level of reverse-engineering I will produce water for products if events chart at the time of their companies and I look at the other 1 easy needed and the young designer of the 1st unified will peak in the year 2012 and the 1st batch got it that 1 bypass of presenting the 2014 and I'm not the in 1 of the design of the windows of pity the idea that we are going to present and I'm Richard Johnson I am the research technical lead for Cisco talents and we have a team that those vulnerability research and I help guide those efforts are you may see some of our vulnerabilities in the past year and then we focus on technology is mostly for finding bugs and so this was as you mentioned so my talks the last couple years a recon has been in the focus of high-performance supplied opposing you and engineering technologies that give us a feedback-driven buzzing and component sizes the different parts of a process to try to make as fast as possible and I came across this new architectural features and Intel CPU is called Intel processor trace about 2 years ago and so basically it's a supported mechanism for doing code coverage and so I made a prototype for this in 2015 the work for Linux and there was experimental support of the uh some open source drivers after evaluating its I realize that this would be a great thing to bring to the Windows operating system and by the Intel support for it was not going to be suitable for what we're looking for so and last year in recon we have the very 1st version of the Windows Driver working like an hour before we got on stage and it was during or decode and so I will talk about the last 6 months developments which have brought all kinds of support to the driver and this picks up a lost 1 1 also under is going to give you the introduction of the technical details of the driver and a low level of annotation and some demos and then we'll address that is applied to frozen in binding bugs the OK speaking about process a priest and the process of bases and now is a new feature of the let this thing that Skytech CPU it's a very useful because it can trace whatever your Sibiu is going to execute like going out there and I guess some arms but benefit especially for cortical for the honor code coverage for example if you would like to understand what our the piece of software with like tool uh to do for model ideas or for whatever I mean they in use each other and values for and I would like before because we have not a lot of time I would like to will be on our and quite fast thing and they are describing the OWL is and being that the keys and is executed the nastiness of you and I would like to concentrate on the new fee on the new feature of the of our the basically um being faster the INS discovered in that process of tracing CPU will eat so you can do even in the in the user model on the immediate jewel assist view IDEA uh in Stockholm week to the front 1 of the the view to the is supple for in that process of trace the 2nd 1 that user and needed to the fact that the feature of process of grace because the friend the has to be you can uh implement different features process of that has been implemented as being implemented in the 1st time in the 1st in broadly architect or but it was deleted 9 Skytech these the full support and you can trace whatever you would like seems to be you had their hand would be the area is a daily in the 2nd quarter of 2007 the needs they're quite on OK
hearing is I would like to show that a in the decoder for that day and Professor place as you can see it's quite easy and uh and then there's nothing official your payoff to where sections what I and less speaking about why using these sewing Dallas because it simply implemented including not where 1 of the basic things that that we can say about this is that this is not the detectable by softer I mean in user-mode software and that 1 of the important things to say that you cannot trace whatever you want to have it even as much as in my MM a handler and the even I provides court or whatever the only thing is that you can't Tracy's like the in SC exercycle combine about these he traveled fees because that's the x by design should an worker in 9 was elite environment fiction OK and fact you hear quite fast at out what's the price of the race Tracy's works in 3 and 3 more than you cannot trace using 3 different kind of thing the 1st is why by Koren 3 relate to their level I mean you cannot differentiate between the CAM software and user-mode software this 2nd uh the thing more by PML for page the board in that way you can trace on a single process because again he started the process of operates the trace on the ASR BCD physical address be stable in that way you can trace only asking what process otherwise the last term in feeding more these bisection point that you can set a start point and the end point and end uh city and ask the process of place the 3 solid that the and we know of called the and this is very good and the output logging the ESA done directly memory in physical memory that's why we need a driver to manage that and the lobbying could be implemented into genes into with the first one is you arrange that results sequel and being in the places we can always in the same place memory the 2nd uh the type is the uh of physical address also known as the power OK giving laker quite fast to we implemented a single ranger 11 should allocate our continues ago memory buffer and then you should set true problem what the specific register 1 is that the error of the height the output base and out of the mosque and then you have the to start the traces and this seeking gave threes yeah not a flag that you'd be in direct the register the Beaufort is ultimately the let me know as you got minor by day this view it the dual physical address the physical others it's a user in these are that there are some implementation of the output because you cannot set that our and bodies the succumbed physical memory this and you cannot create I like a table in which a you is like this if you went white exactly memory it's very and yeah it's there not because you can even set up you might think that after that is erased by the CPU he and chapter II our match up part of the BAF there is a the led by the along and then you can stop you can never assume we can do whatever you want and OK different kind of pockets tools in dialog days off the execution of a process of this is a different kind of of office pocket that out of them binding pockets that the we have 19 destined to the data but at that very interesting to the branch pocket the taken not taken the target be property pockets there are a lot of those bucket of the most debt after because you can with those you cannot trace execution and sort of a softer and follow the air it's execution even or maybe checking this was really a guess and called it yeah it's a bigger diagram II we learned a ever you to the Baghdad model if you would like to understand the need to be the of the pockets the 1 that you as a of the state said that the 1 that we are investing to out of the 10 uh branch pockets they cannot take target that be and for the OK let's speak about in in that be the driver implementation where we have decided to white is that I'd have to be able to wear that there from the place that acting out from or from all Windows operating system the under some of these of these er presentation the guide is is quite stable in vessel is 0 . 5 it's support all the defeating what combination and output modes is the in some new feature of user really is that it supports even with the processes and the support even can then what could be seen as a ever told that you can use the in the process of Bristol trees even gamma-soft without any problem and in in the developing of is that either we have we had a with kind of a lot of problems like uh 1 of the most the bigger problems out there is nothing but few my handing them out because there was not the not the commendation model for how to do that and even I did with the process of because you have to wear and money agent and you have to enable the process of phase to eat once each process the OK let's speaker fostered about the mind interrupt begin lined up that use as being a erase it by the process of base when the day our buffer is full and to do that we have program and the of physical addresses in the seventies these uh the MIT them up the request at the end of the of the of the buffer How we can now be able to manage that when the PMI endemic to race we suspended public process that day physical the physical memory and then is whether somebody invented these because as you as you probably know all the all the that of the site that exceed 86 uh and untreated Europe and around at the very high high well and these are quite a problem because of will from that according the you go and you can do quite anything and even the user-mode buffer they fostered the we have more than we have found our way to directly map the diffuser gonna be a physical memory now use a more the buffer and there we do these not smarter in mind there I think it's not that we expect this equitable that
boundaries and and we not be the only the a buffer and that sort in user mode and nothing Kevin what is important because if you use up very but there you have the problem that the photo at the and you other space that's now it's not a problem unique succeed in a 64 bit systems what could be the OK uh he argues that whether the things get interesting because he mentions you about 5 we have been able to Lsub put them with the process of M with deep red the the dread application in each SEB will as we have implemented the meaning that way you not we should be in each CPU as eats a buffer associated with the and the 3rd and mapping user mode and we sing added up in hiding them out whether the buffer easily would be the box he out there is a problem because if we if I did then PMI interrupted and in the same way the use of 1 application is not fast and is not able to detect the which Sibiu Aizerman as the buffer Fuller in our real-time waiting on num fast enough with then we have is we should the implementation named lamenting the user-mode code that's I mean that our user application as the and sport no 1 thread 42 1st of for each CPU and each thread there he thought you might be at the end I could that could function in that way when CPU as the buffer at this for the caudate exactly air right that the user corn backup AI we have tested that you get with you know yellow begin with the process and environment that if we don't they called the et URI time the binary log tool does for meeting I human-readable text that the performance of a really good I mean we can't see paying they some slowdown so something that you know I think that the rest of the application it they're speaking the somebody we have overcome this problem but the the only problem that we had is that money the money thing in with a credit application because as you reading all therefore see view point of view but that doesn't exist I mean this if you easily exceed will to some quotes that's not the air and this that annoyed feet so when he belongs to 1 3rd of another 1 OK air fuel title lounge comes good Hispanic upgrade understand that you would find that the that is it's not the end test and the process is then you are combined the process that the full another process these III and an example of increasing and complex of the even understand the process that for another process or a multithreaded and the ice could be could be on a problem for our pressing about with this because of the because of that and Fukuda that that we can use that we can even divided by Beijing the formation pockets of each process and and use the process of raising only there any starting eats through their the thing by the up 3 but there we have a big drawback because the size of the log of is huge I mean in that way you can you trace or that the process of the in the user that once a user model or devoted accordant with the and these these are and is a problem the 2nd way to overcome these these could be to reduce process thread that creation cut back and get more than than the trace only 1 process of time the solution is simple and it works but sometimes is not a set of if would because for example some odd where some complex components like for example Microsoft water that had acquired the direction from the air from the front process and this is a problem but that we had the researching now and a new way to do this because the air originally we would like to air enable processor-based by each trait by you know I'm using construction betray that he's a known only 9 in the software on the Windows Canada no out to 1 to the threats and in our region idea would go I was tool in December the threat that the context is which are called the and and save the minority or that the more the specific register use it in process of race to win 2 1 at 1 idea and then restore back when the a context feature is stored in the region the the and what I was doing these I was a manually saving more than what the specific register on on on on exam about fair balked someone has been blind to me any sensible of another very equally and that these and not is not known by the research community but is very useful to be member the or the push the sectioning CTC Executive to basically what it does is that it that pushes forward they in up opposite edges that are in use in you user-mode stock directly by using 1 instructions now that you know not 64 BP environment that there's something like it doesn't exist anymore laughter you better as me that these the new query instruction quality exceed the CSE visa high annual court the you know the and 64 are instruction set that that that basically see this saves only the and some they extended the red is the sum of their energy that belongs to the interlocutor Arctic ocular on a specific idea I mean I have also found that the AEC CAD suction conceive MMX SST edX any uh registers and he II written everything he had that the who what their he's 80 exceed 500 lab because you the existence the because feature about this because the troubled did succeed is that you can see the the date uh reduce said that we don't think that process of grace and they knew in that memory protection extension and is very cool because it using only 1 instruction we can see that would be in registers that belongs to the process of things directly you in a very very fast monitor without any problem if you open that my money we find that the users cool do use these this actually some beat the complex because there is an XAB structure that is only the ontology user models and to set the what to say you have to set an extended comparators there in use and what the user manual a suction that's coded it except
the but them tool be able to say that what the specific registered directly and you have to use another instruction you have to use the exceed s that the issue should he means exceeded supervisor and said and would that to you see the 9 can the more the I either their supple completed CSE and it was very funny because when I having the implemented user and have found that the new win the standard error context we should there already implemented this up of the succeeded but only for user model unintentional was the tool and find a way to win intercept the order that the keys what context something that is they can and will that Windows users pool from the context switch from 1 thread to another thread if we are here would be between this set these we can see that within what is the gratis their belongs to the set of base directly 9 an idea and then there's the up in that way we can implemented the tracing by thread that these arms are completely software uh and I not completely these now completely softer point of view not that we found some problems because as you probably all you can uh touch in any can in modern office sucking any can mode function in an office away using their for example and Walker I I don't know what the US and UK could be guassian or whatever because otherwise Pascal with Bruce cannot that your system and there some we found that these ways not visible in the public system I mean we can use our the box system in about environment you can do that too and but that doesn't the run and it's not a problem but is not that have yet will we now collections them the 2nd the solution that we found that it's the usage of the usage of ET w if you'll check the condition about it you gotta there is a way to win this at they could the context we share but it's it's serve the react we ask the user doing some research because of the in the EPI are very complex and we are trying to work get if we are able to using a legally anyway etw tool implemented that the the the the the uh the problem that provided by the trees by trade OK and other feature of the day the you really is is that uh um wiII data they either forward now fully support that they in canon motor racing we had like prevented the 11 you'll get military eyes that you can spare parts to your blog and you were you can use the money to decide what to trace the would decide how to trace and to do whatever from again and either backed we were not happy with this because we would like even to implemented in the tracing from my user multiplication then we have a we have created some high use against that you can communicate with the which might I with our life and in he said to do Air Canada tracing directly you know from our user model and that the movies we have overcame a lot of security and had problems but now that you can do that they're from a our use of with application you can even trace the in Canada good I say II wire with yeah that they're in this way we he able to for example trace the loading or unloading of can a model or even if you maybe you are studying some high see adenoviral Quito whatever you can even trace only the IOC get called the because as you know the I use it is again in Canada a broadened government out of the process said that a synchronous a synchronously when I use a modification of the cost but I guess summer with quick was about how to use the the the diver the as you can see the the code is quite simple 1st of all you have to grab of the of followed device if quite easy data dividing his name in the windows in that BP dead after that you have to move the and some other perspective namely that PTU very that is in the sorry quest and then you have tool after the in high your and management to sense the the uh uh I use against lower device a specified in user requests um in after all they end up this this they did trace starts you can decide the Web Tools public these using our diversity and this is very important because if you close the application you know without doing that it means that the dual processor the steel tracing of something there and that this could the ladder a problem if you try to unload our glider because did the processor is not clear the not that the diver is able to detect these and to overcome these but like and it's a good part of good proxies to do that form with the process circle that you cannot fool not CI user-mode thread that without any problem and then from the user model that let you have to wear any the on to call the ideal and manageable and send ID and adhesive PMI routine high UCT lower device that thought then you wait you week and the the rope in the the booking fees look and the only probabilities yeah things that they carry is that that is up on a meeting there is the packs a function that is needed highly have table or not you have to be set to Net weight every time they and the CPU after get the the code that as there we decoded the without any problem OK now it's time for the more I don't know the how of never very uh we have to be very fast OK I've regretted that for you OK you can see the OK as you can see here that is the code of a very simple application the only of some question to the user let's try to run it than no
Kevin trace target process and is our simple application under you can see here idea right OK just the moment How many CPU I had the at the beginning let's do 1 only once if you you asking you to increase the font size said and you can probably does merits since we switch yet in thinking that is that the about so always setting this up all all recap a little bit about what you know just 1 over so a new driver when we presented last year in June all we had shown is the wrong capture of the binary traits and
so those packets so taken not taken taken IP and timing was although we had available so we had to figure out how to decode this and is going to show you tracing in different modes and visualizing and IDA cells or they could use you know OK let's go overblowing beginning for 1 only 1 process of the how many Econe confidence and look at the guys the that's a 3 the that's fine with 3 OK the application ascended the cedar and you are a good talk that it let's set these
exit and like cool air in an open that they exhibit they would eat you not using high OK here that is the code Our goal is to raise the software we developing and IDA plugging that does this for us let's speedy deleted the text the log wait a moment a guide on except from but their heads against the this is exactly the code that the does run the exception was not all on that it was in
and uh so we use a library from Intel called live ITT it's open source and it provides the decoding of the binary traced to a text file that were personal plugin for right now we have where will integrate the decoding with the plot and again I wish I would like even to show you what
the same what using with the processor environment that's the way is the part of the process
how many profile many processes for and now that the 8 conferences that with some different and you can see in the in the summary each process societies all wanna Airbus there with different number of pockets look at the
East these dump as you can see are different and binary fights in text slide 42 processes now increase the font sites just a moment then use the ice yeah as you can see these these are number 1 it does all the air uh pocketed the AP cannot taken can indeed be the Jedi initially able and disabled because the context which of windows that is reached toward the front brake this is why the but for example only have locking we are not lucky there is even some of them 2 logs that are quite clear because the thing is that between those as executed these the and it's a could be able only for a small period of time this is not the case because luckily we need the contigs which it is an honor for a dedicated 42 different CPU but
sometimes it happens that even the log of 4 I wanna and processes empty the and this is our implementation communal textiles again took it as 1 a point out so when you read this text files you can see that the for indirect branches and return addresses we actually get the full 64 bit target address but if you look at like the T. Sosa indicating whether not a conditional branch was taken so they only store a single bit that determines whether or not you took the true or false branch so you have to recover that later on and disassemble in real time to recover what the target addresses were for this conditional branches if time of year I would like to show
even explaining but the more that the air uses catmint tracing directly for user model for these then why have virtues in the ECB divider not for a specific reason because it was randomly chosen and these I have found that the deceased and cause these interfaces a lot of times let's try to do even the user is due to take even those the the explorers mean block amazing
OK if if let's say
yes acpi not cease that they go of process of see Buehler just 1 from this time OK because I'm tracing do something and then stop
me I do for example some movement on there my PC and then at the point the point of time we say stop the In this
time we were quite lucky because the as you can see the professor number 1 I think in some pockets let's see what the
are those pockets OK as you can see it and that is not so a lot of a lot of time uh reduce a lot of things to reduce the the about something he's as being taken let's try to use it however blogging tool be able to face what's because the the the code of NCBI unfortunately ad nauseam of because he's on and on and and fight the areas of new or following the standard but that's right OK the bloody Morgan you can see if you can see now because the guy entities and never been quoted has been already quality in the in when you have a switch it on your system but the PC that or that is out of line qualities that the spatial this could be any use again or whatever that's like to go yes these are
I use again you can see just a moment that we the these are the should be blindly without knowing anything about the interface of the driver we can probably say that these any idea you yet because the code is executed a lot of lines because for the color of the countries that have and it means that the CPU as it's good is a functional defines is again see that our pattern-based the holiday branches if is the optimal use if it's not can and you can see that is the branch and all the branches of the place and that is the medium of of today and get the from user models from gamma motif you develop your by that you had even been in year would to trace the driver and 3 or and I don't know the routine I can show that I now because for us we don't have time in 2nd I my computer is regionally analogies environment because for doing that of course that we use we can't use our Sun Yat library but expensive or the if you do that in IEEE write your comment and you signed your pen the diver you and you can that you can do whatever you would like the OK let's go to the back
to the so cells which on OK yes
the acid a recap of the driver now supports kernels and user-mode tracing you can filter based upon this 0 3 so a single whole process you can trace the entire kernel space or you can isolate a ranges of contiguous IPD and that new up to 4 different ranges so I'm not going to demonstrate this in a practical real-world scenario inside of a cell the so we have assessed racing engine and and you'll see some performance numbers the role here but in the manuals they are targeting of 5 to 15 per cent trace overhead for the entire system so upper core you should build a trace both kernel-mode operations in user-mode operations were only 5 15 % and all actually build a show you that from and how that works in that as a justice also of the phone yes good thing up and the so how do we use the survival Italy discovery and so and who here is familiar with American puzzle up has used it's that we have people who devising in the crowd became a good portion so in the last few years we've seen a ball and evolutionary jump parsing technology basically we've gone from using dumb buzzing over a grammar-based fuzzing that's was unable to determine whether or not the samples that were being generated were useful to applying a new technologies or an engineering a an older technology into something that's performance of to be used in a we call Evolutionary Fuzzing so we take the idea of dumb closing mutation and we combine it with the ability to collect a feedback signal using code coverage and then we assess the fitness of that new randomly generated and put against the entire lifetime of your fuzzing cycle so basically what we can do is we can look at this code coverage information to determine if this newly generated input actually gets us to a different part of the code and if it does that will introduce that into our entire pool samples and continue to mutate involves those as we go so over time were refining our set of inputs and getting a were building a corpus and each 1 of those inputs exercise a slightly different part of the code and in effect what we've seen is the last of 3 or 4 years has been available on and this is highly optimized your compute time when it comes to doing dumb buzzing um and so the uh the last couple talks I've given have focused on this technology and so I encourage you to go look at previous slide that's which on the recon website or my website flowed upward but basically through researching this result that the main things that we need to effectively deploy this technology is we need a fast racing engine and a course that was the inspiration to look into Intel processor trace because the promise of 15 % over against closed-source binary software is pretty incredible compared to the technologies that we have available before previously hardware tracings not nude and so some since the P 4 there's been the ability to do hardware tracing and those mechanisms called branch store the BTS which is works in a similar fashion but was not designed in a way that was optimized and to physical RAM it did it polluted your cash and things like that so we saw of massive slowdown and then you have another option which is called a last French record which is solely 32 registers in modern processors and those only give you last 32 branches so you have to interrupt every time every 32 branches to the past that use that ends or you have to write a driver that flushes them out to a different caches and do other things so while this is a new this is designed for the 1st time to be highly performance up until Intel processor trace is actually faster to use software-based traces and things that we do dynamic Binary Instrumentation using dynamo Rio opinion or and something like that so I'm now we have this fast racing engine that's great but we need fast logging which is something that we get out of the design of a bell Aapo uses a bloom filter that allows you to quickly look up whether or not you from done the same code coverage so a set of passing of a text file a binary file is just a list of addresses of basic blocks consecutively and we'd actually pass that real-time and fill of bloom filters so you can just check to see if the 64 K RAM is identical to another 64 K RAM instead of doing a comparison of each addressed and then um through some the research there's been other attempts at Evolutionary Fuzzing starting about 2004 or 2005 larger Maat did some of his PhD research on this and so I get something called the evolution of wasn't system but it was based upon and basically the BTS recorder or a debugger breakpoints is quite slow as tracing and it was also overengineered and trying to bomb incorporate too much of the research that's an evolutionary like biology side of things so so the key is to have this fast racing engine of efficient loading and to keep them the analysis to the minimum so 2013 of models so as use contributed a lot a great stuff to our industry and produced the 1st performance open-source revolutionary father called American puzzle up and it uses a pretty comprehensive list of the types of housing strategies whether it's that flipping by flipping word flipping and so on and crossovers and various types of mutation and uses originally used block coverage via a plugin all a post process of the GCC compilation it would compile your code to assembler and then annotate is similar to add callback hooks every basic block entry point and then that's onto to the modified source code that's how you got and code coverage then as I mentioned he takes those edge transitions basically ships in awesome together and then increments offsets into this bloom filter or or bite map so that is basically able to track you know whether or not you seen this edge
before now you can get out what those addresses were originally because it's simply an offset into this a mapping but you can very quickly look up have I been here before and that's all we care about as far as went out to keep it simple I in the course it was written by on top deposits API so it wasn't Windows-compatible out the gate and the benefits that has the chats edge transitions and not as block entry and it uses the bloom filter they handle for server built into it so basically after your processes initialize it waited until all your libraries are loaded on all the linking and everything is done and then once you get to the pass code but would fork and so you skip all the initialization time which is an optimization I'm and then very importantly he introduced persistent mode fuzzing which is an in memory of what type of fuzzing where you're not exceeding and recreating a process every time you're giving it a up pointer to a function and the number arguments of function and say OK once you X at this section of code start over again and take on new inputs as inputs to this function so that also reduces the Monaco bigger tracing and executing and down to the middle points as optimization and then up importantly you can use this to build a corpus of open source software it's very fast and so you can use those inputs into your pipeline on may be slower or more heavyweight analysis on other types of funding so the way they do the tracing as every block it's a unique idea the edges are in that so that map at a price that has she lingo shift and XOR and then we increment the map so and this was great I was looking into this and had optimize this and bring this to the Windows platform I'm obviously we can use what would we cases but its rotation for the majority of its starts off a were targeting so we needed something they could do a binary targeting um and so this seemed ideal versus the other options of using Pinot animal Rio and so on so last summer but you know it's and start looking at this well around the same time of my talk that entail that we found last year Ivan fractured from Google but the thing is part of 0 or global security release of when FL which was a port of Michael Lewis's is fell to windows using Donnamarie was a back and uh ah yes familiar with painted animal Rio they're basically loaders for your program and as you visit each new basic block of code that caches that allows you modify it in real time so I'm it's a kind of having the Valgrind works the same way if you're familiar with that so it was used as a back end and it's really cool it works it was like the 1st thing you just go down the right now and start closing Windows GDI and then 5 minutes beautiful and the biggest thing that's allowed it to be a a performance is that it uses this persistent mode where doesn't the process so these floaters like Canada animal real they have to disassemble your program to instrument them and so he was able to stand in I did some experimentation on trying to do for working in windows previous toxin it turned out just roll pain so I'm using persistent mode you get things like the odds were about as a lot of your eyes the process and you don't have to read and Theodoridis a similar process every time because you're using that code had so when FL turned out to be pretty well engineered um and basically you can tell how many iterations to persist like you know maybe do a thousand iterations and then go ahead and exit and restart the process so that we can if we have any memory leaks are not quite cleaning up properly I Hamlet through delaying the restart but now possesses is key because every time you blow this and the DTI and but this somewhat so if we were to just do it every time we get to execution the 2nd on this GDI + down on a show you and if we persisted 100 times a for restarted 72 execution the 2nd and so on it reaches its ceiling somewhere around so a thousand or so iterations so uh we're now integrated are OPT driver into where Bell as an alternative uh tracer engine and stuff and that this bring some problems because the reason I had show you the text version of that dumped was that we don't have all the addresses in the log files so we have to recover some of those along the way and to the we don't have persistence mode working quite yet unfortunately I have done some experimentation here it's around the corner and by the next time we presented to happen a box this will be available on building my tooling on top of Alex scientist is great work from last year recon on of the application thereby hoping system and and so basically this is what I'm using the IP filtering mode so you can specify up to 4 deal else that you wanna trace or you know it for models were others ranges in the process the were trace and so on and the current status now is that we do accurately decode the full trace so doing disassembly online and using a cash or the control flow graph the recovered so I 1st look to see if we've already resolved what this upcoming conditional branches if so that's a quick index if not have to disassemble forward to determine the targets with initial branch and store and those inner structure of the edge and source destination recorded as expected so were not reduced 2 basic blocks we actually do the edges and so um currently just using preprocessed so we're doing this iteratively rather than 40 a persists so i'm in order to determine the performance of the tracing mechanism I 1st made of a dummy looping benchmark so basically it was just reading a process and waiting for 2 and so we got kind of maximum bounds on how many iterations will be able to execute with the sample I'm in this case I found that we could get 85 excuse the 2nd without doing any tracing and were just generating the follows the input and running it without some anything like that were not passing the log file added so once we enable a tracing of that was reduced to 72 execution the 2nd which is right and that sleep zone of 15 % over that Intel's promise for this particular sample so parts the log file was an additional 22 per cent overhead so now we're down to 55 execution the 2nd and so on all demo what this all means for you here so
low compared to the I wanna show you sort of fuzz g up plus this is the experiment that comes with the when FL out the gate and you just pass it's an image files and uses Windows to render them from without rendering to explain and so this is a live demo of that working so currently there's is using dynamo real and persistence mode with the maximum number of iterations possible and we see that it's getting 127 execution the 2nd the lighting is not good here but unfortunately but hopefully you guys can see the little bit I can increase the farm and this quickly here so yes so we're looking at here is this number specifically so we get 126 execution the 2nd using Donnamarie overseas to yet so let's see how are the windows PC driver performs in pairs of so that's new decoding lot yeah so what's redirect that norm OK so we're seeing with the overhead of the 15 % were tracing and then the additional overhead a 22 per cent for decoding and this will creep up a little bit but we're only getting about 40 40 something execution the that's a little bit disappointing with and were hoping to see a little bit faster now have to keep in mind again that this is just iterative tracing were not doing in memory buzzing so once we get to doing a memory puzzling this number will succeed increased significantly however this is not in the story I'm a original doing doing all my testing against their set up with the GDI + wrapper um and as you can see in my command line here this is tracing only the Windows XTO energy i + still in the process so I mean another and demo in order to compare the performance and this time will trace live PNG using when I have felt and so this is just with a with PNG statically compiled into a small harness that will the load a PNG file it offers it and then it will do and passes through a PNG and we see that the performance becomes quite abysmal actually so this is the dynamo Rio and that's expected about were saying only a half an exit . 5 executions per 2nd so this is quite slow obviously is not where we want to be and and this is because censuses statically compiled all the code involving with PNG including the encryption of our compression and things like that are included here which causes some issues In the dynamo real back and that is the fun part we have a constant overhead when we use Intel PC so using Intel PTA and we can see that were backed up and that the slope creep up to about 55 60 execution the 2nd so instead of being only happen execution 2nd this is 100 times faster than the dynamo real back and in this doing cover could cover trees and finding new house and so on so and so this and and we've done the times performance increase and depending on your target application and is not specifically chosen this which is randomly I had this puzzling around and uses the and so that's my demos for this point into 1 FL thank you and so just so in closing remarks now and no
reward all code that we're in is already open source are going to be open source were on get help slash tell pt real simple the driver is already there and actionable pull request Mandarin over last nite so that the latest version is on have it'll be merged to die and the way a bold new low clean up I was up till 3 30 last nite making my final preparations so of that will be up next week I'm and then we just have a few more things that we need to address but obviously you know we see that could coverage is being finally harnessed to make our posing a better weekend low this information into either too loud to our analysis of the crash or of malware or whatever it might be and using the hardware support tracing engine we don't have any issues like you have with other software-based instrumentation and hooking engines you know once so you know 1 of our future plans to get this into a hypervisor so that we can trace the gas inside of a hypervisor and then you're and you know it you a tracing will be basically you know on observable and you will be able to disable it and bit in that method there are a um capabilities of deploying until the to trace things like SGX modem SMM and be and there's a kind of future areas of work and and then also the my goal is of course to get this fully supported with persistent modem everything as well and 1 thing that we need to do is part of that is to finish the DTW-based thread context which awareness because we need to seperate these logs out into a per-thread instance from otherwise you have to use the timing information and to determine where the synchronization is between the threads and that slows down the person a lot so our goal is to get the logs individualized before you do your person and so you'll have to pass through thread that you care about that's doing your you know you're file network I O and some and to Germany for the common solutions and just wanted to say that the new better that we are going to release in our example the KSaver feature and fidelity you can test data that at the end it's quite a pool of features
of this in mind you might be OK so you can get this code i and you can reach a silent sweater and thank you very much few
think we might have a minute for questions just on time various is the a the it 8 but
tanks like you trace inside the 1st emission uh so currently there are read dotted hypervisors the currently exposes some virtualized so for example and the other hardware tracing modes the hypervisor has to virtualize the support correct and science this does exist like for example and be where using the PTS mechanism but Intel to use rather new so there are going to have a resistor available yet so we are either going to have to modify Xanana KBM and we've been actually even just today talking about perhaps being able to trace the entirety of the hypervisor and then later on pull out only the use of a processor kernel threads that you're interested in so currently but absolutely will continue working on this and so we get the so area and I know that that will hopefully be applicable for gruesome sandbox so that even tho the thank so you can feel free to grab us thank you very much for your attention if