AV-Portal 3.23.3 (4dfb8a34932102951b25870966c61d06d6b97156)

Fast Forensics Using Simple Statistics and Cool Tools

Video in TIB AV-Portal: Fast Forensics Using Simple Statistics and Cool Tools

Formal Metadata

Fast Forensics Using Simple Statistics and Cool Tools
What's all the FFUSS about?
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Ever been attacked by malicious code leaving unknown files all over your computer? Trying to figure out if a file is encrypted or just compressed? Is the file really something else? Is there hidden data? Are you short on time! This talk leads you through file identification and analysis using some custom FREE tools that apply statistics and visualization to answer these questions and more. You can often identify files by their statistical picture and I am going to show you how. We can find some hidden data (steganalysis), easily determine if an executable file is packed or obfuscated, find appended data, figure out if the file is really what it purports to be and even aid in reversing XOR encryption. The final proof of concept program allows you to statistically identify (i.e. no magic numbers or header information used) some file types autonomously for an entire hard drive. The Windows-based tools (mostly math so adaptable to Linux) and source code are free! John Ortiz is currently a senior computer engineering consultant for Harris/Crucial Security Inc. working as a reverse exploit engineer. In this position, he develops and analyzes vulnerabilities and exploits for various software. Prior to working at Crucial, he spent 5 years at SRA International and 5 years at General Dynamics developing various defense related software, researching data hiding techniques, and analyzing malware. In a second role, Mr. Ortiz developed and teaches a Steganography course for the University of Texas at San Antonio (UTSA). It covers a broad spectrum of data hiding techniques in both the spatial and transform domains including least significant bit, discrete cosine transform, echo hiding, hiding in executables, and hiding in network protocols. For the course, Mr. Ortiz developed several steganographic programs for testing and analysis. Mr. Ortiz holds two master's degrees from the Air Force Institute of Technology, one in Electrical Engineering and one in Computer Engineering and a BSEE from Rose-Hulman Institute of Technology.
Histogram Presentation of a group Statistics Service (economics) State of matter Multiplication sign Combinational logic Raster graphics Steganography Number Power (physics) Medical imaging Malware Goodness of fit Computer configuration String (computer science) Encryption Utility software Extension (kinesiology) Physical system Email Touchscreen Key (cryptography) Information Demo (music) Mathematical analysis Content (media) Bit Mereology Peg solitaire Cryptography Degree (graph theory) Type theory Wave Process (computing) Hexagon System identification Text editor Right angle Game theory
Presentation of a group Installation art Multiplication sign Range (statistics) Principle of maximum entropy Bit rate Raster graphics Function (mathematics) Mereology Computer programming Neuroinformatik Derivation (linguistics) Coefficient of determination Mathematics Malware Different (Kate Ryan album) Encryption Videoconferencing Entropie <Informationstheorie> Process (computing) System identification Extension (kinesiology) Pixel Physical system Presentation of a group File format Computer file Sampling (statistics) Data storage device Sound effect Bit Peg solitaire Type theory Arithmetic mean Process (computing) Malware Entropie <Informationstheorie> Different (Kate Ryan album) System identification Right angle Pattern language Physical system Spacetime Windows Registry Slide rule Multitier architecture Statistics Histogram Functional (mathematics) Service (economics) Computer-generated imagery Characteristic polynomial Letterpress printing Control flow Average Login Number Power (physics) Sound effect Revision control Writing Well-formed formula Frequency distribution Codierung <Programmierung> Data type Boolean algebra Histogram Information Characteristic polynomial Uniqueness quantification Content (media) Counting Total S.A. Line (geometry) Greatest element Symbol table Symbol table Word Loop (music) Visualization (computer graphics) Personal digital assistant Scalable Coherent Interface Window Extension (kinesiology)
Greatest element Code Java applet Multiplication sign Source code Sheaf (mathematics) Raster graphics Shape (magazine) Steganography Bookmark (World Wide Web) Computer programming Medical imaging Different (Kate Ryan album) Synchronization Flag System identification Area Scripting language Email Smoothing Mapping File format Computer file Sampling (statistics) Bit Peg solitaire Degree (graph theory) Type theory Wave Hexagon Uniformer Raum System identification Pattern language Text editor Right angle Encryption Data type Point (geometry) Histogram Statistics Divisor Characteristic polynomial Graph coloring Number 2 (number) Wave Advanced Encryption Standard Centralizer and normalizer Natural number Term (mathematics) Firmware Data type Histogram Scaling (geometry) Content (media) Steganography Extreme programming Cursor (computers) Cartesian coordinate system Evolutionarily stable strategy Calculation Speech synthesis Gleichverteilung Musical ensemble
Presentation of a group Multiplication sign Raster graphics Client (computing) Computer programming Formal language Software bug Medical imaging Exclusive or Malware Different (Kate Ryan album) Single-precision floating-point format Set (mathematics) Entropie <Informationstheorie> Encryption Data compression Vulnerability (computing) Computer file Bit Statistics Type theory Category of being Hexagon Entropie <Informationstheorie> Phase transition Order (biology) Data structure Spacetime Reverse engineering Point (geometry) Slide rule Histogram Backup Statistics Computer-generated imagery Characteristic polynomial Einstein field equations Online help Number Wave packet Spacetime Histogram Matching (graph theory) Key (cryptography) Information Demo (music) Chemical equation Directory service Cryptography Exclusive or Software Visualization (computer graphics) Personal digital assistant Blog Coefficient
Exclusive or Link (knot theory) Touchscreen Demo (music) Visualization (computer graphics) Blog Structural load Blog Interactive television Encryption Computer network Right angle
Group action Touchscreen Plotter 1 (number) Parameter (computer programming) Computer programming Steganography Bookmark (World Wide Web) Neuroinformatik Number Pseudozufallszahlen Hypermedia Different (Kate Ryan album) Figurate number
Covering space Channel capacity Demo (music) Demo (music) Data storage device Bit Maxima and minima Function (mathematics) Streaming media Raster graphics Computer programming Message passing Computer configuration Musical ensemble Right angle Asynchronous Transfer Mode
Graphical user interface Message passing Sign (mathematics) Goodness of fit Demo (music) EPROM Right angle Parameter (computer programming) Steganography Window
Addition Dot product Exterior algebra User interface Hypermedia Demo (music) Data storage device Menu (computing) Extension (kinesiology)
Point (geometry) Digital photography Medical imaging Arithmetic mean Computer configuration Phase transition Right angle Insertion loss Bit
Histogram Distribution (mathematics) Dataflow Multiplication sign Zoom lens Counting Raster graphics
Area Wave Raw image format Histogram Freeware Auditory masking Demo (music) Zoom lens Musical ensemble Bit Right angle Digital object identifier
Area Medical imaging Freeware Demo (music) Document management system Multiplication sign MiniDisc Right angle Bit Raster graphics Steganography Active contour model
Backup Presentation of a group Multiplication sign Similarity (geometry) Data storage device Raster graphics Mereology Trigonometric functions Revision control Medical imaging Frequency Facebook Mathematics Goodness of fit Cross-correlation Different (Kate Ryan album) Matrix (mathematics) Codierung <Programmierung> Area Covering space Matching (graph theory) Computer file Bit Flickr Dirac delta function Internet forum MiniDisc Right angle Summierbarkeit Coefficient
Satellite Graphical user interface Exclusive or Open source Arithmetic mean Wrapper (data mining) Multiplication sign Source code MiniDisc Computer programming
all right how's everybody doing today it's great to be here thank you well I developed this presentation and I hope that you all find the various tools on here interesting and most of all fun and then useful as useful as I have over the years see how to try that now all right there we go anybody know what that's a picture of on the screen just raise your hands just kind of curious one person all right and the audio did not work try that one more time and otherwise that's what solitaire sounds like when you play it through a sound file we'll get back to that all right so what can US defenders do you know sometimes even I'm sure all of you had had malware attack your system before it does stuff right it drops files it changes register keys things like that you want to know what happened another another useful topic we'll talk about today is a file content type identification just because a file has an extension that doesn't mean that's what it is but we'll look a little deeper than just looking at the magic numbers and so forth a little bit of state analysis a little bit of reversing or encryption and I don't know there's just lots of use this for these various tools there's there's some stego tools these are all on the CD I checked this morning and then there's a couple the analysis tools you'll get to see attackers have lots of tools right packers bass encoders crypto compressors rappers oh I wrote that one that's just for fun and various tango tools there's a lot of Steggall tools out there defenders course hex editors and strings you all know what that is footprint that's one of my tools I wrote that helps identify what malware did to your system it takes a snapshot files register keys processes and services and then you can take a later snapshot and compare them does that right bitmap histogram which is a terrible title but I can't think of anything else to call it takes an image of the file and also I take some basic statistics you can learn a lot from just a few basic statistics and then the statistical analyzer kind of combines the two and automates them and I don't have much time to talk about that one but it's there and if you want more information my contact information is at the end of the presentation a little bit about me instead of counting sheep I counted powers of two I learned how to program at 14 I did have a couple of Atari games published you haven't heard of them was it pac-man or anything I joined the airforce got my degrees and now I'm a engineer at Harris and a part-time instructor at UTSA all right wrap this is just a small utility basically it'll take any file you want and wrap it into a bitmap header or a wave file header various types it's got a few options there and so it's very simple it's good for demos here's that stay yellow speed tool this one hides in the least significant bit which is common this one has five bits
hidden so at first glance if you didn't see the original you might not notice but the original of course looks a little better but you think about that five out of eight bits that's a lot of data this one hides in JPEGs that's also on the CD that's my dog he texted me
last night he's lonely you can't see
anything though you can't see any
artifacts between the original and that one so you've got that tool on your CD
as well I will show you how you can detect that though malware effects as I mentioned does lots of different things to your system also sometimes you install these programs trial programs or maybe programs you don't want any more and you'll uninstall them did the uninstall get out rid of all the registry keys did it get rid of all the files for print can help you find out so it takes a snapshot of the system stores in a big log file it can save it by size as well and date to the date part isn't working a time I've got to get that one done but if anybody wants to upgrade to version with date you can email me but that's good for finding files that were just recently installed often malware will drop a bunch of files maybe in different places but they all have the same date so this can identify stuff that's just been dropped on the system or if you like getting videos that they don't let you download or pictures that you click on them and you can't save picture as go to your content folder and use this tool and it will find it regular windows browsing and searching doesn't search the content folder but if you get there and use this tool well you don't have to get there you can write it from the C Drive and it will find all those files and list them out so then you can find that video and just simply copy it from that folder to one you want to use it for footprint can compare the two different files okay log files here's all the files but what has changed same thing with registry Keys same things with processes and services this is just a sample output I'm gonna go through real quick just how it shows you you know if a file was deleted or a file was added it's a very textual type program but you know here's where it shows what it looks like when a file is modified and it just creates this big log file or small there hasn't been many changes since the last footprint all right file type characteristics malware often disguises itself may pack stuff and executables encrypted this can help detect that okay the right bitmap histogram tool will do a few things it can create a bitmap image as you saw in the beginning that was a bitmap image of solitaire when one person recognized that probably as an executable the chart on the right was a histogram that's a typical histogram for a executable and then before discussing the tools guy did a little bit of math so once we get through the math there's a lot of slides I meant to mention that there's a lot of slides in this presentation so once we get to the math and you understand a little bit about the tools as it goes on and its uses if we don't get finished it'll be easy for you to figure out on your own all right who's heard of entropy before knows what it is it's curious okay about looks like about half or so all right very good and what about a histogram tell me what a histogram is okay about the same same people all right usually we consider of course bytes with computers 0 to 255 so the maximum entropy is the log base 2 of the total number of symbols so log base 2 of 256 different symbols is 8 so the maximum entropy for a file can be 8 if that file is base 32 encoded maximum entropy is going to be 5 I don't have an example in the slides this time but you can actually tell if the base 32 encoding has encoded an encrypted file or if it's encoded at a text file just by using this tool ant well of course for base 64 that's little quiz to the sixth is 64 so if we got a gold star one person alright very good - all right little bit of Statistics here so P is the probability the log is often abbreviated LG to mean log base two and that's simply - to what power equals x so log base two of 256 is eight log base 2 of 4 is 2 log base 2 of 8 is 3 so on we can estimate the probability in a file by Counting so you take a file count how many zero bites count how many one bites count how many two bites and so on and that's the histogram that's the count the frequency distribution of each byte is another way of putting it so giving that count and the total number of bytes we can compute the probability for each byte so we can say you know if 0 appeared 25 times out of 100 we could say the probability is 0.25 and then we can plug into this nice nifty formula here okay which looks complicated but it's really just a for loop that's multiplying the probability times the log base 2 of the probability and adding it all up you'll get a negative number out of that I'll skip the log derivations for today and you add it up and you get an entropy count H H is the entropy so encrypted files have the greatest entropy compressed files are next and so on every file type generally has some characteristic range of entropy 24-bit bitmap SCI found have been very varied but executable files text files they're kind of in a range compressed and encrypted files are in a very narrow range so you can identify a lot just by the entropy so bottom line is the higher the entropy the more uncertainty that's what you want in an encrypted file right you don't want the opponent the attacker to figure out what you've encrypted you don't want them to have any kind of information about what symbols are compressed removes pattern and once you remove pattern you get a randomized looking file but it's not as random as an encrypted file text English text I found to be around 4.5 4.6 4.3 it's in a very narrow range so you can identify that immediately now of course it does depend on having sufficient data okay very small files the entropy counts are going to be skewed I mean I found that in practice around 4 K is where it starts to get reasonably accurate of course the more you have the more accurate it looks so histogram I've kind of talked about that already just on the chart on the left side of the chart that's going to be the zero count and on the right side of the chart is the 255 count and the darker lines are at 16 a value interval so it's 16 the blind is a little darker at 32 the lines a little darker just to kind of break it up a little bit many file types I discovered have unique histogram characteristics and so I've used that you can identify them very quickly in many cases all right so of course here's how how do you identify file you got this new file what is it well you look at the extension but that doesn't mean anything right you look at the magic number that may mean something that's not disguised as something we can apply a visualization that's what this tool does it will also do the audio ization which is a kind of a very strange word but it's actually out there and then statistics so here's what you check on the file you know what's in it does it match this extension does it have unusual data does it have hidden data does it have appended data is part of it compressed we can tell a lot all right that's just a command line for using the histogram tool all the tools have usage functions so I'm sure you can figure them out and here's the text file on the left is what it looks like because C is very dark because text is all below 128 so it's all the darker shades of gray on the right you can see the histogram what what character is this do you think space that's right space is the most common character in text followed by the the e and the T these are lowercase uppercase is kind of hard to see in this one there was a very many uppercase numbers you can notice the pairing carriage return line feet okay those are all the same size there so you can see that this is the text output of the program so that gives you like the exact numbers so you can see the exact count sometimes it's useful because of course
the the visual whine is scaled right so you can't necessarily see the difference between a few values on a large histogram here's HTML so you see that has some textual characteristics but it also has a lot of herrings you know HTML has all the tags with the braces and all so forth and so you can see that here see source code you know Java code stuff shows up the same way you get lots of pairings so you can distinguish between text and and C++ and that type of thing to a certain degree here's a bitmap the one characteristic of a this is a bitmap of a bitmap by the way it's not a you know it just kind of gets out of synchronization there but you can see that it's smooth that's the characteristic of a natural bitmap all of them are smooth if they're not fairly smooth then something's going on now a gray scale is very spiky just like that as well as a 8-bit color bitmap you know we don't know where the spikes are which values are the most common but but they both they all look spikey and of course for some of you that know a grayscale 8-bit grayscale and 8-bit color is the same in terms of the file content it's just the palette that's different speech all 8-bit WAV files that are natural WAV files will look like this that's because waves oscillate about the central axis so you get the most values in the middle and as you go out towards the edges you get fewer values music is a little fuller than speech so you still get the central spike 16-bit speech it's a little tough to notice at first but you get where's my cursor you get a u-shape it's kind of a very open you there okay because there's very little in the upper extremities these are the upper extremities and there's very few samples up there but when you get music like that then you get a fuller u-shape if it doesn't have a u-shape it's not 16-bit audio okay now you can take anything right just like I did with the solitaire program and wrap it up in a WAV file header okay and you will not get this histogram it will not look like this but natural audio will all have a you type shape or a pointed shape if it's 8-bit jpg this one has a lot of zeros okay you see it's pretty uniform over here see it's fairly flat over there so that's characteristic of JPEG some of them are more spiky than others but they all have a reasonably uniform distribution across the top PE files typically have large numbers of zeros and large numbers of FF and then various values here okay the thing about the P file that's very characteristic is that it has different sections so you know looks like a text section and then which is the actual code and then various sections in here of different data types they all have kind of a striped look encrypted I use a program called ax script which is just available for free download it's been out there a few years and you can see this you can't really tell the difference between the jpg but this you can it's very very flat and that gets flatter as the file gets larger alright so file type identification that's kind of the overview of some of the things that the tools so you would be looking for when you use them so here's this one can you tell compressed or encrypted just by looking at the picture of the file not really but from the histogram and especially the entropy value this is the entropy calculation over here it's easy to tell entropy seven point nine nine nine nine seven four the encrypted file so unless the files are pretty small you can use this to distinguish between compressed and encrypted and even if they're fairly small the entropy for the encrypted will go down but the entropy for the compressed will go down further packed or not packed so here's an executable is it does it have is a packed you can't tell by looking at a hex editor but here it's looking pretty smooth to me still has a large number of zeros which probably throws the entropy down a little but looks fairly uniform there so I'm gonna say that's more than likely packed unless it's just full of compressed data maybe you have an executable that's just full of a bunch of JPEGs his resources factor not packed quite a difference right now this is thrown off a little bit because of the large number of zeros because all this has to be scaled it's just a spiky but you can see the different different patterns going throughout there I use this to examine a ROM one time and there was one area that was just all white and that was the area that was the the RAM on the on the firmware it was like a firmware download whatever and that was the RAM on the firmware where it's blank here's the zoomed in histogram with the zeros kind of going off the scale and now you can see it looks kind of like an executable however you can see a little bit of uniformity down there kind of in the bottom so I would say that maybe this has some packed data in it that the whole thing is unpacked up all right so histogram is an entropy aren't always effective this is a this is the full color bitmap that you saw earlier in black and white you see how it's fairly smooth and let's see if we're hiding something data appended to the end of the file statistics don't really tell you a lot about it however if you look at the histogram you can see that right that's kind of unusual for a 24-bit map to have these kind of spikes in there and that some of that just comes from experience I've done this on hundreds and hundreds of bitmaps and looked at them over the years and preparing several other talks and so forth here is the bitmap shown the the picture of it and then you can see some data hiding at the end okay because that's got a different characteristic there so that can reveal something are we using steganography hey LSB steganography hides and lisa knipfing a bit very difficult to see if the number of bits is less than four there's some cover images where you can see them but others where you can't and sometimes even at 4-bit well at Orbitz in a normal picture you can't even tell okay five bits is where you can really start to tell so what about what the histogram well of course otherwise I wouldn't bring it up right all right so here's I may be the original you can see a fairly smooth histogram their entropy seven point five five and then we go to one bit of randomized data tough to tell on that one right it's not that wouldn't that would not raise my alert flag they're looking at that histogram it's a little spiky but not too much now we go to two bits three bits four bits it's getting easy to tell the picture however can you tell by the picture I think on this particular image the background is a little blurred so you've got some smoothness so you can actually tell them the picture a little bit go back to three and look at the green background and then go to four and you can see little bits of discoloration there but in the foreground where there's lots of detail you don't really see that however the histogram is clear that is not this is not a histogram of a 24-bit bitmap and neither is this one and that one that one would raise my suspicion so with two bit and let's see if we have five on-air oh yeah finding it becomes obvious at that point right so you can even tell by the picture even though is I didn't know what a honeybee look like would might think that and then six bits and then seven bits anyone want to guess what kind of data were hiding obviously it's kind of randomized data right because it's very flat over here if we were hiding text data we get kind of a text look to the histogram at this point it's in bits and then eight bits that's right now you don't have a bitmap at all right how about jpg does this work well here's my favorite pet Mandy she actually looks
kind of annoyed I mean like why are you
taking this picture yes and that's the
histogram of the JPEG see entropy is fairly high but it's not like an encrypted file here's Mandy with a hundred forty six thousand two hundred T six bytes of hidden data and she still looks annoyed
but you can't really tell I can't tell I can't tell even if I flip between the
two of those the entropy is a bit higher though seven point nine seven okay so that gets a little higher but yeah still it might find JPEGs with that much entropy well how about an image of the JPEG that doesn't work however if we decompose the JPEG into its DCT coefficients and then take a histogram of that which is where we're hiding then it's quite obvious that on this side that's very matching which is normal for a JPEG they don't match exactly I know these look like they're exact but if you look at the raw numbers they're not exact but they're generally close this is going to be like a let's see plus one two three four five six seven it's like a plus 8 and minus eight or something but generally the the plus coefficients on the lowers you know the lower values match the minus coefficients so 1 and negative 1 2 and negative 2 and so forth but when you start messing with those and hiding they don't match anymore ok and it's easy to figure out why because if you are hiding something in a negative 1 and you change the least significant bit what does that number become nope nope if you have a negative one and you change the least significant bit negative two right if you have a one and you change the least significant bit it becomes a zero so even if you change one and negative one evenly they're off balance okay so so that one's a little bit takes a little more work - hi there are some stay gold programs that try to balance those out the one you have will produce a histogram like this doesn't doesn't try to bounce back no no no I'm at the tool that is yeah the tool that I provided on the DEF CON CD for hiding and JPEG will do the hiding that you've seen here but it doesn't do anything to balance the the DCT histograms okay well we might actually have time for demos I hadn't really I didn't think I'd get through it this fast because there's so many slides people said you're not gonna get through 7580 slides in 45 minutes oh there's lots of pictures but you guys are just smart right am I going too fast okay all right well then we try some demos all right so reversing XOR I don't put more about this if I don't know and I got through quite this fast something X award with itself is zero so you got to understand how XOR works something XOR didn't I don't think I wrote it up here but the reason XOR so popular in cryptography is because when you X or something with a key you get a ciphertext and then when you XOR that ciphertext with the same key you get back the original so X whoring it twice we retrieves the original okay now notice this this is kind of an interesting property of XOR because a lot a lot of malware will use X or just a basic X or encryption to kind of hide stuff something XOR with a space will just change the case of a letter okay so if you have an uppercase letter and you XOR it with hex 0 to 0 becomes a lowercase letter you have a lowercase letter X or with hex to 0 becomes an uppercase letter and of course that's typically the most common character in the Inga's file in English language exploring with a single character doesn't even change the entropy okay like if you just use one key X or and XOR the whole thing with that same key and entropy stays the same just gets shifted a little bit all right so here is like a text file that's been XOR with some character so you get the kind of same characteristic spikes kind of grouped together you get a brighter visual because now all these values are an upper side of the bit values instead of the lower side but it still kind of looks like a per muted English text histogram so that can be revealed so this kind of looks like an executable the image does alright it looks like this is fairly uniform in here so entropy 7.2 suggests some type of compression or encryption or maybe weak encryption you know that's another thing I should point out the encryption in order for it to have the seven point nine nine nine entropy has to be good encryption if you use weak encryption then you get the same effect as if it were compressed so I discovered that once a client brought us some stuff and said here tell us what you can tell about these network packets said okay so we did a lot of examination on it but one of the things I came back with I said well looks like you're using some kind of weak encryption he's like how do you know like this oh why should the backup so knowing that the first two bytes and then executable is MZ and that zero is prevalent can also help you a little bit that excuse me so in the target file we found that two bytes were DNN looking at the textual histogram we found CA N and D were much more prevalent than others so we can start guessing one of those might be a space one of those might be an e one of those might be a T and I kind of just did some hand waving here because I didn't think I would even get to these slides so there's a little bit more to it than that but the point is that you can use the entropy and the histograms and the visual tool to help reverse XOR encryption statistical analyzer this one takes the foot that program and combines it with the histogram tool to automate the so you said a loose on a directory and it will iterate through all the subdirectories and it will run like ten different statistics not just entropy it'll create histograms it'll create bitmap images if you want and then it will compare it to a baseline so it's one of those you know kind of training type phase programs compared to a baseline and then it'll pop out and spit out any anomalies you have you know this one says it's a JPEG but it doesn't look like a JPEG it has low entropy or this one says it's text but it's got high entropy that kind of thing I have presented on that particular tool before a few years ago and so I didn't obviously include the whole presentation on that here but if anybody is more interested my contact information is at the end of the slide presentation all right I hope you learned something useful looks like we do have some time so I can do a few a few demos for you there's my contact information and you can email me if you want here's some blogs at Harris that are relevant this one's written by someone else
reversing XOR encryption so if you're interested in that there's go and then this one is what I added to it and then
here's some irrelevant blogs at Harris that I wrote that's actually a serious article it's not it's not you know anything they wouldn't let me publish one that was that was bad right all right and I do want to thank mr. Greg Conte he's presented at blackhat before and he kind of gave me the idea back in 2005 of the whole visualization concept and that's where a lot of this stuff was born from all right so anybody want to see some demos let me find my screen
again here all right so what do you want to see do you want to see like the stag program you want to see what it would what demos any any preference is there no I know it's okay there we go yeah it's on the screen all right like find the my favorites all right let's see what we have my favorites are the stego tools really so I like to do those as well all right so I'll pull out the plot and where's that stack JPEG okay all right and then I just need a little media file here find some some JPEGs here I'll try to I'll try to pull out some more the interesting ones oh yeah this this is one of my favorite JPEGs to hide in I actually had heard on the news about this I heard about this device sold by a particular company which I will not mention that was supposed to detect like porn on your computer or something so I decided to see if I can move this one over there and it doesn't want to ok see I gotta figure out how to apologize yeah we can duplicate let's do that kind of work better thank you appreciate it
okay so this is supposed to put this thing in my on my laptop and and it divided the pictures in the three categories like suspicious highly suspicious or not suspicious this was the one that was the most highly suspicious now if I find that that's not that's not what I'm looking for all right now let me find my thing here okay I need just get to this directory okay okay so here is let me change that prompt to so it's nice and short so here's the the stake jpg one and can't just show you that in action and it has a number of different features to play with I mean it can it can just take randomized input and create it from a pseudo-random number generator and so you can add that there's several parameters here typically just keep a and you like four to eight and quality is pretty high that would get you the best file hiding and we're gonna let's
see we'll try to hide this one in there 75 K and 209 K alright so we'll try to
hide the flower in the baboon there so do that tell it - hide and then we need a cover file which is the jpg oh and it'll take either bitmap or jpg as a cover file and then it will convert that to a JPEG on the output and then the message file is a flower it may not fit but we'll give it a try here 100 quality and I'll just go for the max on what the message file can be any arbitrary file anything it does it doesn't care it reads it as a stream of bits and hides it for missions please come in the microphone okay I'm sorry yeah the question was with what kind of files can you hide does it have to be a picture file no it can hide any arbitrary file and the the stake LSB program that's the same way it can hide any arbitrary file you have to give it the dash LSB option okay which is in the usage because it has a special demo mode right all right so here's what it said it said our storage capacity was about 146 7,000 fights the message size was 75 966 okay so we can look at the resulting file
here can't tell anything in that a pretty baboon I think that took that picture in Africa
all right so you can't see anything now of course no steganography is complete without extraction right because you can
always say yeah it's hidden in there alright so so let's see what we can do with that so we got a stego file and that's going to be the hid file and then it should pick up the quality okay but these parameters the - a and the - you have to match and then let's see oh and I need to tell it to extract the command line is very archaic this would be much better with a GUI so if anybody likes developing Windows gooeys and they want to develop one great send it to me appreciate it okay so it that's a good sign right to find the same message sighs okay and let's see where is it
that it extracted that - this now how did it know the file name well I have to put in in addition to the file data I put in the size and I went ahead and store date an old terminated file name so the first four bytes of the size then an alternative file name and then the rest of data and I'm just going to add a dot jpg extension onto the end of this one and you'll be able to see all right do and there's the flower picture
you know what I should have shown you the original right there there's the
original I could have extracted anything
all right now now let's just do it you
know let's do the wbh thing here there's wph right and then this one is a simple command-line tool also so we'll put the hit jpg file there and the dash B option creates the image of it so there it is by the way this bit entropy I tried that out with zeros and wants it it doesn't work it doesn't tell anything yes it's exactly the same byte for byte okay exactly the same yep oh thank you the question was does the recovered file have big exact checksum as the original and the answer is yes because the original file is stored in there exactly byte for byte there's no loss thank you so the bit entropy here that was kind of experimental that didn't work out too well but we see seven point nine seven with the hidden data so we've taken compressed data and hidden it in another compressed file we can look at the
textual histogram just for grins there so you can kind of scroll down and you can see the exact counts and the exact distribution and so forth of that okay and then the where is it the histogram
the bitmap histogram okay it's very uniform here except for a lot of zeros if you want it we can use the tool to get a closer look at this okay so like I will I'll use it what's called
the zoom feature and I'll run the same thing again except this time I'll use dash Z and five okay and now we have one
let me stretch this out a little bit with a Z 5 that's what that is up I over zoomed a little bit let's try three so
very easily I can do three and then go
to the three and now you can kind of get a closer look at the area in fact that would be actually good one to use on an executable right
because it's not quite as uniform so so we could do wbh of itself all right and then take a look at that one and here's
the histogram of wv8 to zoom 1 so 0 and 255 kind of mask out some of those so we could use the zoom feature and that's
the n3 and now you can see that a little
closer okay the image where did we put down that one was the baboon this one here that's the image of the JPEG with
the hidden data can't really tell anything from that so it is with the
bitmap where you see the file are behind it or whatever I have time I'm not good we have the technology so these these two folders here by the way are on your disk so there's some tools that I just included that you can also download for free just like I did so here's the snake LSD one so we'll take that back over to the the demo area here let me clear out some of these other things just to make room okay so now I have stage LSB in there now all I need are some bitmaps so let me grab some of those okay these are good see here oh there we go all right now the only thing is for this picture and picture steganography it's really not useful it's only as far as justice for playing with right because everything has to be the exact same size so that I'm gonna take the upper four bits of one picture and stuff it in the bottom four bits of the other picture okay so let's see here we have that one is 36 17:11 no no let's see I have to find the right need about 768 by 512 nope I may not have the right size there these are all different sizes apparently well I
might have to just do LSB on that one for now I'll show you how to do it but it's going to come back and tell me that the that the files aren't the right matching size just trying to think where I know I have files on here somewhere that that can try my backup disk here yeah I could do that but that's the master they have to match exactly so I'm sure that I have one you know maybe not I really didn't expect to get done as fast as I did but I appreciate your patience with this I went to the wrong nope that's that's gonna be the same thing is what I have here well let me just show since we've just got a couple minutes let me just show the files and then we can meet later on yeah you had a question
well if it was I'd mess with it wouldn't have anything hidden okay right oh no if you the question is if I do a transcoding like when you post up to Facebook and Flickr a lot of times they'll change it for you and change the quality or whatever or shrink it you can't recover your data but can you tell something was hidden no because it just all the coefficients get all scrambled up again and made of new yeah you have a question yes yeah I have sure what's your question so in your example you use to sum two images that had no relationship to each other let's say that you're talking about somebody who like good journalists and conflict area and they have a picture that is like the street before the demonstration in the street after so there's a lot of overlaps is it possible for to use iconography to essentially reuse the the cover image as part of the stenography so you just kind of encode the deltas and so you can essentially take advantage of the original image where there are exact or close to similarities you can certainly encode a delta there might not be any reason to when you take two different pictures the way the camera is going to encode it the way the light hits everything's gonna change you're gonna have very high probability of having very likely completely different mathematically you're gonna have very different images anyway so you can you could hide both of those in two different images or hi to both in the same image and you can't really tell that they're related in any way okay so you can't take advantage of any common runs of the roller bike in a JPEG there won't be much common in a bitmap that they were both recorded as bitmaps like 24-bit bitmap then you would probably be able to find some commonality but I did not cover in this presentation the math behind the JPEG but it is it's easy once you know how to use it but it looks very complex you multiply that summation by cosines and stuff and it gets to be very complex and so everything is interdependent you'd have basically an 8x8 matrix and you run it all through a bunch of math at different frequencies it's kind of like a frequency correlation of your image and so you just change one little value in there and it's all interdependent it goes this way in this way and all that so it gets to be completely different when you have JPEG bitmap yeah there could be a lot of similarity you got a blue sky it's very similar but with JPEG if you're off by one it's not yes all right well we're out of time for today and I don't have that tool on the disk it's a little bit more complex to use you have to take a baseline first and stuff but if you email me we'll see you I'll see about how I can get it to you or whatever I don't have a problem with giving it out it's just it's just fairly complex to use yeah oh sure do that it's fairly easy it's just us
Daigo at SAT XOR are calm and you do have time for one more question okay imagine wanting a GUI is this open source you have on the disks some of the source code I just don't remember exactly which source code I put on there I know I put the wrapper source code and the wbh they're all written by me yeah but if the source code is not there to some of these programs and you want it I'll give it to you email me yep all right well thank you very much I appreciate