Physical Memory Forensics for Files and Cache

Video thumbnail (Frame 0) Video thumbnail (Frame 11855) Video thumbnail (Frame 18760) Video thumbnail (Frame 29338) Video thumbnail (Frame 39916) Video thumbnail (Frame 50689) Video thumbnail (Frame 61104) Video thumbnail (Frame 62405) Video thumbnail (Frame 64182) Video thumbnail (Frame 65637) Video thumbnail (Frame 67617) Video thumbnail (Frame 68650) Video thumbnail (Frame 70037) Video thumbnail (Frame 71487) Video thumbnail (Frame 72740)
Video in TIB AV-Portal: Physical Memory Forensics for Files and Cache

Formal Metadata

Physical Memory Forensics for Files and Cache
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Physical memory forensics has gained a lot of traction over the past five or six years. While it will never eliminate the need for disk forensics, memory analysis has proven its efficacy during incident response and more traditional forensic investigations. Previously, memory forensics, although useful, focused on a process' address space in the form of Virtual Address Descriptors (VADs) but ignored other rich sources of information. In the past, some techniques of process reconstitution have been auspicious at best and erroneous at worst. This presentation will build upon lessons learned and propose more thorough ways to reconstruct process contents, and therefore a process' address space. By using the methods presented, it will be possible to further reduce the data you care about in an incident response or forensic investigation and to better apply the traditional computer security techniques such as reverse engineering, hash matching, and byte pattern or signature matching such as those provided by ClamAV and VxClass.
Code Source code Sheaf (mathematics) Port scanner Computer programming Software bug Medical imaging Semiconductor memory Error message Physical system Area Enterprise architecture Email Software developer Computer file Binary code Physicalism Menu (computing) Bit Virtualization Process (computing) Data storage device Hard disk drive Computer science MiniDisc Right angle Speicheradresse Trail Game controller Computer file MIDI Product (business) Root Network topology Average Representation (politics) Data structure Address space Demo (music) Information Forcing (mathematics) Mathematical analysis Directory service Semiconductor memory Cartesian coordinate system Limit (category theory) Kernel (computing) Pointer (computer programming) Personal digital assistant Network topology Speech synthesis Object (grammar) Table (information) Routing
Context awareness Randomization Presentation of a group Code Interior (topology) Multiplication sign Range (statistics) Sheaf (mathematics) Icosahedron Computer programming Web 2.0 Medical imaging Virtual memory Semiconductor memory Different (Kate Ryan album) Object (grammar) Single-precision floating-point format Physical system Area Seitentabelle Mapping Structural load Computer file Moment (mathematics) Binary code Physicalism Mass Bit Virtualization Instance (computer science) Process (computing) Befehlsprozessor Order (biology) Hard disk drive MiniDisc output Hill differential equation Convex hull Damping Speicheradresse Asynchronous Transfer Mode Probability density function Web page Ocean current Point (geometry) Windows Registry Slide rule Game controller Computer file Computer-generated imagery Maxima and minima Translation (relic) Heat transfer Dynamic random-access memory Number Attribute grammar Product (business) Revision control Internetworking Operating system Representation (politics) Data structure Address space Form (programming) Dependent and independent variables Twin prime Forcing (mathematics) Projective plane Basis <Mathematik> Directory service Line (geometry) Incidence algebra Cache (computing) Pointer (computer programming) Sheaf (mathematics) Object (grammar) Table (information) Window Local ring
Web page Trail Game controller Run time (program lifecycle phase) Computer file Code Multiplication sign View (database) Computer-generated imagery Sheaf (mathematics) Parameter (computer programming) Mereology Number Revision control Medical imaging Pythagorean theorem Prototype Mathematics Crash (computing) Virtual memory Semiconductor memory Object (grammar) Data structure Physical system Area Structural load Moment (mathematics) Binary code Counting Physicalism Bit Total S.A. Line (geometry) Uniform resource locator Hexagon Pointer (computer programming) Personal digital assistant Sheaf (mathematics) Order (biology) Chain MiniDisc Right angle File viewer Object (grammar) Window Speicheradresse
Greatest element Code Workstation <Musikinstrument> Sheaf (mathematics) Area Software bug Web 2.0 Medical imaging Array data structure Virtual memory Semiconductor memory Object (grammar) File system Office suite Recursion Physical system Covering space Mapping Block (periodic table) Virtualization Type theory Process (computing) Order (biology) Hard disk drive Right angle Speicheradresse Windows Registry Game controller Mapping Computer file Computer-generated imagery Product (business) Number Power (physics) Revision control Cache (computing) Energy level Utility software Squeeze theorem Data structure Validity (statistics) Quantum state Cartesian coordinate system Cache (computing) Word Pointer (computer programming) Personal digital assistant Network topology Sheaf (mathematics) Object (grammar) Table (information) Window
Metre Point (geometry) Web page Dynamical system Computer file Multiplication sign Connectivity (graph theory) Virtual machine Sheaf (mathematics) Number Software bug Medical imaging Goodness of fit Malware Semiconductor memory File system Operating system Reduction of order Negative number Physical system Social class Pairwise comparison Enterprise architecture Matching (graph theory) Projective plane Moment (mathematics) Binary code Content (media) Mathematical analysis Sampling (statistics) Bit Database Limit (category theory) Electronic signature Database normalization Process (computing) Hash function Order (biology) Hard disk drive MiniDisc Fuzzy logic Right angle Pattern language Object (grammar) Freeware Family Bounded variation Window
Laptop Revision control Process (computing) Demo (music) Semiconductor memory Binary code Hard disk drive Virtual machine Sampling (statistics) Right angle
Windows Registry Area Game controller Process (computing) Computer file Semiconductor memory Range (statistics) File viewer Codierung <Programmierung> Object (grammar) Physical system
Web page Windows Registry Cache (computing) Key (cryptography) Semiconductor memory Operator (mathematics) Order (biology) Linearization File viewer
Web 2.0 Medical imaging Cache (computing) Execution unit Uniform resource locator Process (computing) Information Twin prime Semiconductor memory Code Uniform resource name Multiplication sign
Laptop Medical imaging Word Process (computing) Semiconductor memory Maß <Mathematik> Window Template (C++)
Wave Execution unit Word Presentation of a group Process (computing) Office suite Lipschitz-Stetigkeit Cartesian coordinate system Thomas Kuhn
Service (economics) Open source Mathematical analysis Principle of maximum entropy Set (mathematics) Menu (computing) Bit Thomas Kuhn Revision control Type theory Semiconductor memory Password Convex hull File viewer
Execution unit Dependent and independent variables Multiplication sign Incidence algebra Port scanner Process (computing) Moment of inertia Root National Institute of Standards and Technology Convex hull Hill differential equation Physical system Chi-squared distribution
Slide rule Email Computer file Information Principle of maximum entropy Bit Directory service Public key certificate Computer programming Twitter Web 2.0 Revision control Hash function Semiconductor memory Blog Website Information security Window Address space
welcome this is physical memory forensics for files in cash I'm Justin Murdoch and to my right is Jamie Butler
just a bit of a background information Jamie Butler is the director of research and development at mandiant he's focused mainly on post analysis and operating systems research I'm a computer science major at the Rochester Institute of Technology but i'm currently on coop at Mandiant's working on their enterprise product as a software developer so we're going to speak today about physical memory forensics and this is kind of a layout of the talk first we're going to go over traditional forensic methods kind of a background information and then specifically move on to memory forensics and given that background we're going to speak about the issues that are in existing tools right now so a lot of them are missing important information and often miss attributing data to executables so most memory forensics also deals with utilizing files so we're going to speak about memory mapped files recons reconstituting binaries and data files and specifically the role that cash plays in this process then we're going to talk about possible applications of our new techniques and show you guys a couple of demos and then we're going to speak about our new tool that we're going to be releasing pretty soon that uses these new techniques and speak about wrap up with just some further work that needs to be done in the area so traditional forensics is kind of a broad overview a host has two large sources of information for forensics so the disk and memory and lately memory has become a great way to triage a host say in a forensics investigation so the reasons for this are the average size of disk is growing extremely high so most hard drives out there come at least two hundred fifty gigabytes I know people in this room probably have way more storage space than that so searching through that whole image or just dumping a full copy of the hard drive is getting to be a much longer process and memory can really help you out to speed up what you're looking for it's relatively small comparative ly so you can scan the whole space pretty quickly also for intruders to get their code running on a system they have to load it into memory and in almost all cases out there they aren't covering their tracks they aren't cloaking their memory footprint because that's it's really just too much work for in the most case so also many of the artifacts that the colonel needs to load the program into memory we can use to gain a lot more information about the executable so specifically for memory forensics memory is divided really into two basic sections user land and kernel memory this talk is going to focus on user land memory again because in most attacks most intrusions they're focusing on user land memory it's it's easier to get execution and it's more resilient to coding errors basically if you're developing this attack then if you got a couple bugs in your program all of a sudden you have to crash the system because it's running in the colonel so it's it becomes very costly to develop these kind of attacks so memory forensics traditionally focuses on recovering all the binaries out of the memory so all the executables dll's this is one of the main focuses of any investigation and most of these tools rely on virtual address descriptors or that these describe the processes address space in memory and they make up they're made up of these objects so as you can see it's got a pointer to left and right child it's usually in a bat in a tree structure and featured ad also contains the starting address and the size of the memory along with a pointer to the control area so here's a representation of a typical bad tree you can see it starts with a bad route and just each one of those VADs contains information about the virtual addresses of the process so traditionally you would scan physical memory for Annie process block and that kind of lets you know that there's a process at that location from there you get the directory table base or dtb in the e process and this will help you translate from virtual to physical addresses from there you locate the the root of the baton bad tree and just kind of step through the tree translating the virtual addresses to physical and usually start with the starting address take the size and just grab all the data in between there some other tools also utilize information about the PE headers to reconstruct the executable with their knowledge about the different sections inside the an alternate approach is to just use the dtb to try and translate basically brute force the whole address space and it kind of just starts at a beginning address and goes all the way through to the end and this has its limitations really on a4 or on a 32-bit system this pretty much works because you got an upper bound of about like 4 gigabytes but on 64-bit the size can just be enormous and this also leads to some misattribution of the data because the virtual addresses could translate globally not particularly for that process and this kind of leads us to problems that our existing tools right now being used so with that I'm going to hand it off to Jamie Bullen
so as Justin mentioned the traditional approach in the current memory forensic tools is to really to take a virtual address its base in a size you know even if that virtual start is zero in size is four gig you're going to brute force across the whole thing and you go do the translation in the context of the process that you're trying to analyze so every process is the previous slide showed has a directory table base and that is used for the virtual to physical translation and that should tell you what's in the process context well what we found in our research is that there's a lot of data that's actually missing if you do this for instance the first thing that we encountered is when we're trying to reconstruct a process obviously like the attackers usually injecting code you know in the form of an injected dll or whatever into the process address space so in order to analyze that and detect it we need to be able to translate the code for that dll well the operating system the windows loader is going to load the DLL as a memory map file so memory mouth files are stored in a special way because basically the OS doesn't want to waste space and what do we mean by that well on a Windows host you have let's say a user mode process or every user mode process most likely has a has the DLL MTD load a deal o mapped includes after space so if if the OS did not use memory mapped files or they shared files across the address basis of all processes then that d-la would have to be replicated for every single process that loads so obviously that would waste a lot of physical memory and this design was thought up you know back in the probably windows 16-bit versions when there wasn't a lot of memory to waste in the first place plus it's also just were efficient in today's world you know where a greener so let's not waste memory so these memory mapped files are shared across all processes even if they're only used once right so because of this they may not they may be in your process after space but the address that they represent may not translate in your page table entries so i won't go into the depths of how you do virtual to physical translation if you want to learn more about that there's slides on the internet and so forth you can google those but basically the page table entry is the very last table structure you'll find corner of virtual to physical address translation when you go to read it to find out where the physical pages lo and behold it's all zeros so that doesn't tell you anything that means typically we just had to ignore that region because we couldn't get access to it and here's an example of taking the honeynet project challenge 3 if you're familiar with that there was a memory image came out about a year and a half ago I translated or I acquired these files out of the memory image and there I just chose file at random and you see the file size there in the first column and then the bytes acquired with the traditional approach so that's using the VADs that's a starting virtual address ending virtual address and using the dtb of the process to acquire it and that's how much of the of the file that we could acquire and then use a different technique that we're going to cover in the the last half of this presentation which was using what's called file objects which represents the memory mapped files so for instance with you know the ace dll we got seventy percent of it using the traditional method but we were able to increase that to ninety-three percent using this more accurate method of the file objects also we'll talk about in the moment like what you can do how this number may go up to actually a hundred percent if you're running on a live system in on memory image because you have access to the disk the second
problem that we ran into is basically when you're doing we come from a background where we build products and tools to do incident response so in that context and even and probably some of your more traditional forensic investigations trying to determine exactly which process is infected is important so knowing the whole host is infected is perhaps interesting but then the next question you're also going to ask you is well which processes or which how do you know what artifacts within that hosts are infected because they may lead to things like user accounts have been compromised also processes have a creation time so tell you perhaps around the time of the infection if they were creating new processes and so forth so attribution is important to us on one of the issues with the traditional approaches especially with the brute forcing method is if you brute force over the global address space as Justin alluded to there's areas of the address space that are global for every process so basically when you cross from virtual address in user land and you go into virtual addresses there in Colonel land most colonel addresses will translate appropriately in every context outer space and the reason this works and everything you can read markers on the riches windows internal books and take those to bed at night it will keep you warm but basically if you do that over a couple years you'll figure out that the colonel addresses appear global because Microsoft one to save entries within the cash within the actual cash that's on the chip you know like your l2 cache they want to save cash lines so they have these global addresses and save time on the speed transfer context switching so they didn't have to flush the cache every time and so and so forth so colonel address is basically a global here is a graphic that someone else stole from the windows internals books that we stole from the person off the internet because I don't like to draw so this is a virtual representation of a 32-bit system and basically all I want to show here is you'll see there at C 1 000 000 is the beginning of system cache so that is a cache that the operating system is keeping that's different from your l1 l2 cache that the CPU is keeping this cache is you know going to be used for things like file i/o so if you request to read a page of a file the operating system is going to assume that you're probably actually going to want to read more than one page so it's going to read ahead and that read ahead is going to cash for you and it's assuming locality of reference so all your future reads statistically should be relatively close to where you're currently reading on the typical programs so those go into the cache of that virtual address well if we acquire memory or acquire process and we're brute-forcing basically anything the cash will appear to be in every single process so that's really bad for attribution so let's talk about ways to make this process better so we're going to use a las file objects and file objects can represent a number of different things they can be for instance memory mapped files that we kind of touched on which will include de lausanne deities they also include data files which may not be mapped into memory but are in the cache in different places like a word or PDF or registry hive web history we've seen restore points you know windows XP those were nice restore points that we can actually see after the attacker attacked we could see what was happening on the system what they installed because the restore points so the VADs are still interesting to us but we're going to to utilize a little bit more data that they make available so that's described a range of memory that the file occupies and if it's a memory metal file or if it's a file object represents that region of memory then the VAD will have what's called a control area and if you're very familiar with when debug you'll be used to these structures that we're talking about control area would have a pointer back to the file object so we're finding VADs in memory because we found the e process block once we find the VAD we're trying to parse it to see its control area once we get to the control area we're parsing it to find its file object now file objects contain some useful data including the device name that would be things like hard disk vol.1 as you see in this example you can then translate that for the OS in question for your host that's probably c colon or not also contain the file name itself and then the next the thing that we're going to talk about today is this a table of three pointers depending on where the file or where the file data is actually backed up so the three things were going to look at our image section objects data section objects and shared cache map here's a graphical representation of a file object it contains a point to that section it contains a foreigner called section object pointers and section object pointers only has three members and when debug will tell you all this or worse on image might as well so image section objects represent are very interesting to us in forensics and I are because they represent binaries that are loaded in memory so if binaries loaded up the windows loader is going to create a section object for for that binary but this pointer this image section object
saying is not actually a pointer to the structure called image section object that doesn't exist it's actually appointed in a control area so yet another control area that control area will have a corner to what's called a segment object this segment object can be used for sandy checking like if you were just scanning through memory you you may find some artifacts that are no longer in use that aren't actually usable if you try to parse them in certain code you probably crash so we'll use this for sanity checking that segment object will contain a segment size and the total number of pt's represented by that segment so i'll talk about pts in the moment but basically first out each i could give compare the segment size equal should be equal to the total number pts comes the page size page size in windows for the most part you can just there is a small alteration on page size but for the most part you can assume that that's 4k or a thousand hacks the thing that you're going to want to be parsing if you're trying to dig out the binary data from memory would be these subsections so some sections represent the individual pieces of a file how many years ever loaded up like a looked at a p/e a portable executable in PE view or Lord PE or some other viewer basically what you'll see in there is that the PE file is broken into a bunch of sections so there's like a code sections usually called text there's a data section there's a resource section there's some different things like that relocation section well all these sections have a relative virtual offset relative virtual address within the PE and then they also have permissions so once this thing loads into memory which is the permissions of that section b + 10 c section within the PE file probably has different permissions then there has to be a subsection object to represent every single section within the PE we couldn't find a pointer that would show us where the subsection object was so what was some if you just stared the hex for long enough the math starts to come out at you basically the subsections all seem to line up at the very end of the control area that we just found so although the segment object was way away in virtual memory the subsection object was immediately following the control area so that was nice for us basically for every version the u.s. you can just determine how large the control area is add that to where the base of control area was and then cast that as a subsection object so the subsections they'll contain an array of prototype ptes this is the crux of the data that we have to parse in order to get the binary out of memory so the prototype pts contained the physical address of each memory page in physical memory right so we'll have a graphic in a moment it may be a little bit more useful for you but I'm not sure basically though if the prototype PT contains the virtual address of that subsection object that it's within then that means all bets are off basically this page of memory is on disk and when I say on disk I don't mean you know probably you've heard arguments back and forth about hey can you acquire the page file and use that for memory forensics and stuff too I personally believe you can't you can only use the page file at runtime but even if you were to be able to use all flawed page files and marry those up with a flan memory images even if that were possible you won't be able to access the data that's represented here because memory mapped files mean exactly that their memory mapped so if they're paid to disk they don't go to the page file they are represented by themselves on disk so if you want to get that data you have to go to the location of mtv LOL on disk for sector 15 or whatnot and read the data so the page file is not useful in this case also something else that's within this subsection object will be the number of full sectors and the number of prototype pts for the subsection so again going back to disk whom we're looking at the PE or disk each the disk sectors are 512 bytes right and the page page alignment in the windows OS is 4k so there's a little bit of a fix up that you have to do so you have to take into the count the total number of pts that this subsection represents but also the total number of sub sectors full sectors that it represents and it contains these numbers in the structure so you use that in order to parse it correctly because if you were just reading pt's blindly you would get more data than is actually within that subsection and the file would in memory wouldn't line up like it does on disk so we're going to use this whole sectors to our advantage the clever and basically by keeping track of where we are when we walk these sectors in in memory you know 512 bytes at a time we will know the offset that we need to read if the file is page to disk so using that we can get all the data if we're running on the LOB system another thing of interest is there maybe there's probably definitely more than one subsection especially if it's PE file so there will be a pointer to the next subsection and you basically just chase the chain right so this is just a nice graphic of
what that looks like conceptually in memory with our array for each subsection now data section objects they
represent data files a memory I'm not sure what types of files load themselves as a data section object I know that PDFs don't but Word documents do so if you're loading on the microsoft office documents they're going to look like in memory the data section object that structure actually is exactly like an image section object besides a few sanity checks don't work in this case but basically since they're the same you know data section objects in the image section object they point to the same structures we can surmise that the data access for a word document would be just as fast as the access as any data access or code access in this case for an image section objects so word documents are going to perform as well as deities right because their structures are the same it's a shortcut of hard to get the data it's not relying upon the cache which is somewhat subjective the OS defines how to load the cash and when and when to flush it based upon utilization on the system and the number of resources available well if you have a data section object that's not the case all your structures are right there you have quick immediate access to the data so I thought that was kind of cool maybe Adobe should change how they live their files so we've covered image section objects and we've covered data section objects the last thing we touched on here is the shared cache map so the shared cache map is used to represent the file data in the cache so if all other bets are off you know if you don't have an image section object you don't have a data section object then you should go parse the cache structures in order to get as much data as is available now again that may be nothing or it may be more or less the whole file it really all depends so in the shared cache map structure this structure is actually defined by Microsoft and windy bug it contains the file size the amount of valid data within the file within the cash the valid data should never be larger than the file cache if or the file size because if that happens then you're actually looking at uninitialized data also it contains an array of pointers to VA CBS so VA CBS our virtual address control blocks this is the structures that define where the data is in virtual memory in the cache so if the file is one megabyte or less the Microsoft decide to put in some performance improvements by embedding an array of for VA CBS right into the shared cache map itself so no reason to chase anything they've just embedded it right there which works really nice for them because often the things in the cache are less than a Meg right you're probably gonna have a lot of web images and so forth that are in the cache they're going to be less than Megan size so let's keep it simple hvac v I should mention represents 256 kilobytes so if you have four of those obviously you can represent a one-make file if its larger than Meg though we have to go to and array of pointers so what that looks like is it's a nested structure it can if it's one level deep basically there's base there's one array that points to the VA CBS themselves it can represent 32 megabytes if it's one level deep I think I forget the number its I believe each each array has 128 entries so if you read Marco sandwiches in David Solomon's book on Windows internals chapter 13 I believe in the newest version is all about the cash and they'll tell you how to calculate the depth of this tree so arrays were raised of a raise of a raise it gets really fun i recommend recursion but anyway the largest file that you can have on Windows OS is 2 to the 63rd power so that's the largest you can have and because of the number of entries in each ray and so and so forth and the number of blocks if they can represent your tree can be no deeper than seven levels deep here's what the shared cache map would look like kind of conceptually this obviously the bottom is just two levels deep and if we parse this will get all the file data that is present in the cache so let's talk about some of the applications of this technology well in the past most of the product most of the tools i believe all the tools that are out there freely available will parse the VAD tree in order to due process reconstitution but we probably since we care about some data that's in the file system or in the file cache we should probably also parse the handle able so we're going to parse the handle table we're going to parse the VAD tree and then once we get there we're going to parse the file objects that we find now the windows registry hives are a nice example of value can use this if you would analyze the system process basically it has a handle to every registry hire that's in memory so you can get all the data there so if you were to let's say you're going to acquire this to your local hard drive or to your system analyst station so you can do parsing with your traditional registry forensic tools you would want to acquire the system process out of memory once you have that it'll
literally write the individual file names to the local hard drive with the content of those hives good local also PDFs as i mentioned are founded cash we haven't had a ton of time to research all the things we can get out of cash if you're on a Windows XP system you can get the restore points because it has a handle to a file that is basically keeping the data for the restore points and so you can get that out of memory in some cases but as I you know the caveat for the cash is it's kind of all bets are off it's hit or miss rather be useful the way we're going to use this most likely and some of the free tools that we're releasing our to do data reduction so we have hashing you know we beat up on the AV industry all the time about hashing and how it sucks and that's nice and I've done it myself however Hashem can also be useful i mean we have over two decades of data so let's start to use it to our advantage to make the problem simpler its memory forensics is all about data reduction so we went from a 250 gigabyte hard drive now we're down to a 4 gigabyte memory image probably now in that memory image we're going to find roughly on the Vista meter no the windows 7 machine I looked at we're going to find about over 4,000 files in memory right if you don't take into account eliminating duplicates so everything is going have a handle de or a VAD two NT dll everything's going to have add to kernel32.dll and so on and so forth so now we went from 250 gigabyte that we care about maybe at first down to four thousand files and then we do a data reduction to get rid of redundancy and now we're down to about 1,500 files well now the problems starting to get a little bit more manageable in finite time so if we could use quite listing and hashing to eliminate the other components of the operating system that we don't care about that number get a lot more manageable so there's been some efforts in the past not my research but others who've done fuzzy hashing and different things like that in order to try to make the hash that's found in memory matched the hash on disk the problem with these fuzzy national techniques is we don't have the data going back years because they just take subsections of files and most whitelisting technologies and so forth use the full hash of the file so the comparison is difficult to make and it you know it may have benefit in your organization like if you had a whitelist I mean a gold master that you were deploying across the enterprise you may be able to use data reduction with fuzzy hatchet however we found that most people don't have this and so we went to the section image object and we parsed it like on a live system which means we can get access to the file system as well if there's a page that's not there in memory and by utilizing this data we can make the hash that we find in memory match the hash that's one disk so we call this memory five it's a cute name of our other co-workers came up with that will be released in a free tool I'll talk about in a moment another application for this better process reconstitution or binary acquisition what what have you is that there's a lot of tools out there that are starting to use you know like bite patterns of malware and things like that there was a tool developed by dynamics which was acquired by Google unfortunately and they killed the project or took it internally came by it anymore but it's called VX class I was kind of cool I liked it enough that convince my company by it but VX class would generate by pattern signatures for classes of malware so families of malware so what the VX class leveraged was that if you had a malware sample or a lot of malware samples they were reusing the code and if you were doing I are you probably didn't want to folk is you know all your you have a limited amount of resources so you probably don't have a large malware team whatnot you don't want to focus all your efforts looking at the last 50 variations of Zeus bucked you just want to throw it into some automated system and have it spit out a this is zeus by right so that's what they did with their VX class and then it would generate and someone else during the company and one of his PhD ideas i think were around generating the commonality between all these things in the bite pattern so you have a hundred samples they all say um Zeus well let's generate one pattern that matches for all 100 samples so we could use this to search memory and it was fun but we got some false negatives well by utilizing this the image section objects and so forth we no longer to get the false negative so it's more useful to us also how many people have ever used climb AV so the large number here they've heard of it they used it well climb IV also has this concept of a bite signature match for malware so I think in the last time I download climb AV they released 40 signatures that are what they classify his bite pattern so I didn't look at what the database is and what those signatures represent but there are 40 signatures in there that now we'll be able to use in memory analysis so we're trying to use the tools that we already have apply them in a triage scenario so we have a few minutes left here I'm
just going to cover some demos quickly apologized my laptop died like right before I came to black hat so I had a fast machine now I have really crappy one so anyway basically I won't run the acquisitions Oh in this hard drive because it would take about six minutes
per sample but this if you have
memorized you could do this at home as we release the new tool the new version here we're going to run a process of acquisition pull the binaries out of memory and then we're going to look at what we got
so the first example was the registry hive if i acquire the system process i have a bunch of data here along with the sizes so we're pulling out individual files and we're giving them a name it's encoded so that we could write it all to the local file so you'll see the encoding of like cult of C colon slash things like that if we couldn't determine like it didn't have a name it didn't have a control area associated with it will just give you the range that we found it in and so forth but if it did have a name will write that out you know there's some files we couldn't find any data we can find a file object we could find its name but it just didn't have any data that we could carve out a memory so those will be 0 in size but what I want to show you is loading this into a registry viewer because again this is a
system process so we're going it up in the registry viewer and since this is
coming from the cash there may be some pages that are not there right but if we encounter one of those in the cache example will just write a page basically have no ops because we need to keep linear the linear order of things in in the file so that it can be parsed by tools like this is access down as registry viewer if you see here you know I can drill down into things and I can see when the key was created so i can do all my traditional forensics on it there's no guesswork there's no carving hives out of memory and guessing where they are and stuff like that just works another thing I'll show you is here I'm
going to have acquired basically the so
from the hunting that challenge the habana challenge 3 there was a memory image that had it was basically
infection through the web I think it was zeus and when i acquired the firefox process i was able to get the web cache information of the URL history information so I load this up into one of our free tools called web miss historian and there wasn't much data here because they were just demonstrating an attack and seeing what you can find but this last entry here contains the exploit code is pd pd f dot PHP so i can see the URL i can see the access time like when the user clicked to go there and stuff like that because it's all in the web history log that firefox is keeping and I just acquired firefox out of memory so i can use about traditional tools
also i acquired the large soft word
process out of memory and this was a memory image of tomah ask my wife if she would load up word document a center and then take an image of her laptop is running windows of vista here you see we were able to acquire the word template and an actual word document so if we try to open that word document it's going to
complain this was the word doctrine is written in the word 2010 this is word this is microsoft office 2007 it's going to complain about the content it's a little corrupted it's probably also dealing because we're coming from cash we're just going to go ahead and say okay let's go say do you really trust this we're gonna say yes and here's the
word document for this presentation so there's a 13-page white paper and it is all there so i'm not really sure if this has any applicability to ir and forensics which is my day job but i thought it was kind of sexy cool so what the hell i guess for all you ediscovery
types you might like it then the last thing i'll show you before I go is this
is the UI for our memory analysis tool that is also free the UI is open source it's called audit viewer we're going to be releasing a new version where you can say filter out the known md5 so if it's going to do the mem d5 compare that to a known set here if you had something like bit 9 you could whitelist right you could even lost their service and this will just fire a bunch of crap at bit9 I don't know if that's legal but if you have a password and user a gallop I'm sure they'll probably let you do it so we're going to say cancel because we don't have a bit not account for today's demonstration it's still running okay
finished you can also use the NSL are from NIST and stuff to do your known whitelist and then we're going to say
filter on trust and there we've reduced
oh by the way this is every single process on the system because I double-click the root node here so this is every process these are the things I have to care about now is an incident responder triaging this host so we went from 1500 down so i don't know maybe 30 or something like that so it's a big time saver and that's about the end of my time there are a few caveats where
places we need to continue the work basically dealing with the tool will be
available in the coming weeks you can check the blog these slides will be on the web the white paper will be on like that site but dealing with a SLR because it changes the outer space a little bit when programs load but we have all the data we need to actually reverse that trend so we can fix it up and just run the fixed up data through our hash and get the same hash values also there's something called the security directory that's on some some files it's a certificate info and that won't be present there are no artifacts representing that on Windows 7 or Windows 2008 previous versions there were the artifacts so again we have the PE header in memory so we can detect that exists we can go to dis we can run it through our hash and mem d5 will work again so that's really the end I'll be available at the QA room after this thank you