I am a packer and so can you - TIB AV-Portal

I am a packer and so can you

00:00

6

Formale Metadaten

Titel

I am a packer and so can you

Serientitel

Anzahl der Teile

109

Autor

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/36402 (DOI)

Herausgeber

Erscheinungsjahr

Sprache

Inhaltliche Metadaten

Fachgebiet

Genre

Abstract

Automating packer and compiler/toolchain detection can be tricky and best and downright frustrating at worst. The majority of existing solutions are old, closed source or aren’t cross platform. Originally, a method of packer identification that leveraged some text analysis algorithms was presented. The goal is to create a method to identify compilers and packers based on the structural changes they leave behind in PE files. This iteration builds upon previous work of using assembly mnemonics for packer detection and grouping. New features and analysis are covered for identification and clustering of PE files. Speaker Bio: Mike Sconzo has been around the Security Industry for quite some time, and is interested in creating and implementing new methods of detecting unknown and suspicious network activity as well as different approaches for file/malware analysis. This includes looking for protocol anomalies, patterns of network traffic, and various forms of static and dynamic file analysis. He works on reversing malware, tool creation for analysis, and threat intelligence. Currently a lot of his time is spent doing data exploration and tinkering with statistical analysis and machine learning.

DEF CON 2333 / 109

1

50:24

Red vs Blue: Modern Active Directory Attacks & Defense

2

44:13

Medical Devices: Pwnage and Honeypots

3

44:33

Ubiquity Forensics: Your iCloud and You

4

45:30

Drive it like you Hacked it: New Attacks and Tools to Wireles

5

46:34

Advances in Linux process forensics Using ECFS

6

24:25

Separating Bots from the Humans

7

42:48

Cracking CryptoCurrency Brainwallets

8

40:47

Hacking a Linux-Powered Rifle

9

47:33

Knocking my neighbor's kid's cruddy drone offline.

10

48:13

TLS Canary: Keeping your dick pics safe(r)

11

43:28

Let's talk about SOAP, baby. Let's talk about UPnP.

12

51:22

biotech and humanity 2.0

13

20:37

Harness: PowerShell Weaponization Made Easy (or at least easier)

14

21:36

I'm A Newbie Yet I Can Hack ZigBee

15

35:32

Insteon' False Security and Deceptive Documentation

16

40:53

Stick That In Your (root) Pipe & Smoke It

17

45:10

`DLL Hijacking´ on OS X? #@%& Yeah!

18

1:45:26

WhyMI so Sexy? WMI Attacks, Real-Time Defense, and Advanced Forensics Analysis

19

44:45

ThunderStrike 2: Sith Strike

20

43:42

Switches Get Stitches: Episode 3

21

1:45:55

Licensed to Pwn: Weaponization and Regulation of Security Research

22

52:34

Lets Encrypt Minting Free Certificates to Encrypt the Entire Web

23

1:39:24

DEF CON Comedy Inception

24

1:21:27

DEF CON 101 - Panel II

25

27:24

DEF CON 23 CVE Closing Ceremonies

26

1:39:08

Panel - Ask the EFF: The Year in Digital Civil Liberties

27

36:57

Abusing Adobe Reader's JavaScript APIs

28

47:18

Why Nation-State Malwares Target Telco Networks: Dissecting Technical Capabilities of Regin and Its Counterparts

29

47:06

From 0 To Secure In 1 Min

30

38:46

Hacking SQL Injection for Remote Code Execution on a LAMP Stack

31

29:26

BURPKIT: Using WebKit to Own the Web

32

45:07

33

46:01

I am a packer and so can you

34

44:50

Hacking electric skateboards: vehicle research for mortals

35

38:30

Applied Intelligence: Using information that isn't there

36

16:46

Put on Your Tinfo_t Hat

37

50:43

How to Hack Government: Technologists as Policy Makers

38

48:22

Sorry Wrong Number: Mysteries Of The Phone System Past and Present

39

48:07

Who Will Rule the Sky: The Coming Drone Policy Wars

40

17:57

Tell me who you are and I will tell you your lock pattern

41

54:42

Big Game Hunting: Peculiarities in Nation State Malware Research

42

43:17

The Bieber Project: Ad Tech and Fraud 101

43

20:43

Detecting random strings; a language based approach

44

41:44

Investigating the Practicality and Cost of Abusing Memory Errors with DNS

45

36:54

GPS Spoofing: low-cost GPS emulator

46

50:07

Rocking the Pocket Book: Hacking Chemical Plants for Competition and Extortion

47

40:43

Confessions of a Professional Cyber Stalker

48

50:09

Secure Messaging for Normal People

49

47:31

Stagefright: Scary Code in the Heart of Android

50

29:45

"Quantum" Classification of Malware

51

44:58

NSA Playset: JTAG Implants

52

38:50

USB Attack to Decrypt Wi-Fi Communications

53

32:15

Quantum Computers vs. Computers Security

54

48:23

How to Shot Web (Better hacking in 2015)

55

41:56

NetRipper: Smart Traffic Sniffing for Penetration Testers

56

39:57

ThruGlassXfer: Remote Access, the APT

57

13:38

LTE Tracking and Recon with RTLSDR

58

1:26:44

Closing Ceremonies

59

45:03

Staying Persistent in Software Defined Networks

60

27:19

#HamSammich (A replacement talk)

61

40:28

Forensic Artifacts From a Pass the Hash (PTH) Attack

62

49:32

Pivoting Without Rights

63

34:30

Abusing XSLT for Practical Attacks

64

38:00

Inter-VM data exfiltration: The art of cache timing covert channel on x86 multi-core

65

43:58

RE: Exploring Regular Expression Denial of Service

66

21:25

What is fuzzing?

67

55:51

Crypto for Hackers

68

40:27

DT and 1057 - Welcome to DEF CON

69

34:32

One Device to Pwn Them All

70

38:57

Hacker in the Wires

71

46:21

How the ELF ruined Christmas

72

46:17

Are We Really Safe? Hacking access control systems

73

43:22

Docker, Docker, Give Me The News, I Got A Bad Case Of Securing You

74

43:40

Alice and Bob are Really Confused

75

43:08

Drinking from LETHE: New methods of exploiting and mitigating memory corruption vulnerabilities

76

46:14

Bugged Files: Is Your Document Telling on You?

77

45:57

Hacking Smart Safes

78

48:08

I Want These * Bugs off My * Internet

79

45:46

Beyond the Scan: The Value Proposition of Vulnerability Assessment

80

40:36

Introduction to SDR & the Wireless Village

81

45:31

The Wassenaar Arrangement on Export Controls for Conventional Arms and Dual-Use Goods and technologies, and You

82

30:17

How to Train Your RFID Tools

83

50:56

Fighting Back in the War on General Purpose Computers

84

42:56

Don't Whisper my Chips

85

40:27

Spread Spectrum Satcom Hacking

86

46:06

Hooked Browser: Meshed-Networks with WebRTC and BeEF

87

38:48

REpsych: psychological warfare in reverse engineering

88

46:33

Questions and Answers

89

38:18

Hacking Web Apps

90

43:02

Extracting the painful (blue) tooth

91

45:44

Hack the Legacy! IBM i (aka AS/400) revealed.

92

42:19

Fun with Symboliks

93

36:46

How to hack your way out of home detention

94

31:28

Game of Hacks: Play, Hack & Track

95

46:29

Working together to keep the Internet safe and secure

96

36:44

Dissecting the Design of SCADA Web HMIs: Hunting Vulnerabilities

97

44:40

Linux Containers: Future or Fantasy?

98

46:30

1057 - RICK ASTLEY

99

37:43

802.11 Massive Monitoring

100

53:08

...And That's How I Lost My Other Eye Further Explorations in Data Destruction

101

39:18

Hijacking Arbitrary .NET Application Control Flow

102

47:06

QARK: Android App Exploit and SCA Tool

103

45:22

Hardware and Trust Security ELI5

104

40:51

Shall We Play a Game?

105

45:57

Angry Hacking: How angr pwnd CTFs and the CGC

106

47:42

Abusing Native Shims for Post Exploitation

107

22:41

How to secure the Keyboard Chain

108

33:18

Guests 'N Goblins: Exposing Wi-Fi Exfiltration Risks and Mitigation Techniques

109

47:52

High Def Fuzzing: Exploring Vulnerabilities in HDMI-CEC

Automatisches Abspielen

Sprache

Text

Bild

00:00

MathematikVorlesung/Konferenz

00:36

Virtuelle MaschineHydrostatikRechnernetzAnalysisBitMathematikProjektive EbeneEinsProdukt <Mathematik>Rechter Winkelsinc-FunktionMereologieResultanteLoginComputersicherheitHydrostatikInformationMailing-ListeAnalysisTwitter <Softwareplattform>Repository <Informatik>

02:00

MalwareStandardabweichungÜbersetzer <Informatik>SystemplattformExpertensystemRechter WinkelSampler <Musikinstrument>Quick-SortCodecHauptidealringElektronische UnterschriftCoxeter-GruppeFigurierte ZahlFuzzy-LogikProjektive EbeneMatchingFormale SpracheBitPortabilitätMusterspracheVerknüpfungsgliedMereologieBildschirmfensterSechseckEinsTexteditorWort <Informatik>ProgrammierungMultiplikationValidität

04:54

DatenstrukturTermSchnittmengeElektronische PublikationMAPBitDatenstrukturTermE-MailWort <Informatik>GarbentheorieRechter WinkelCodePunktDifferenteSampler <Musikinstrument>BildschirmsymbolProgrammiergerätMereologieZustandsdichte

05:52

E-MailElektronische PublikationDatenstrukturE-MailMultigraphDateiformatGarbentheorieTypentheorieAnalysisSampler <Musikinstrument>BitChiffrierungZahlenbereich

06:56

Kette <Mathematik>ProgrammierumgebungCompilerMaschinenschreibenSoftwareBinder <Informatik>InformationService providerSampler <Musikinstrument>Elektronischer FingerabdruckVisualisierungProgrammierumgebungKette <Mathematik>SoftwareBinärcodeGebäude <Mathematik>SchnittmengeBinder <Informatik>Ordnung <Mathematik>ProgrammierungMereologieVersionsverwaltungSIDISHalbleiterspeicherRechter Winkel

08:14

MereologieCodeMereologieKartesische KoordinatenKontextbezogenes SystemLokales MinimumDatenkompressionElektronische PublikationBinärcodeProgrammierung

08:50

KontrollstrukturSinusfunktionWechselsprungHalbleiterspeicherPunktBildgebendes VerfahrenCodeBildschirmfensterGamecontrollerAdressraumBootenElektronische Publikation

09:32

Übersetzer <Informatik>Sampler <Musikinstrument>Projektive EbeneSchlussregelFormale SpracheElektronische UnterschriftHauptidealringGrenzschichtablösungKryptologie

10:20

RFIDStichprobeSoftwaretestSchnittmengeDatenanalyseMultiplikationsoperatorResultanteTypentheorieAnalysisRechter WinkelSoftwaretestRechenschieberElektronische UnterschriftMathematikElektronische PublikationDatenanalyseStichprobenumfangEindeutigkeitSchnittmengeHauptidealringProgrammverifikationEins

12:04

Visuelles SystemVersionsverwaltungMatchingHauptidealringZahlenbereichElektronische UnterschriftEinsVisualisierung

12:45

StichprobeZahlenbereichDateiformatGeradeQuaderElektronische UnterschriftRechter WinkelFahne <Mathematik>KorrelationElektronische PublikationKorrelationsfunktionDiagonale <Geometrie>HauptidealringMultiplikationsoperatorMultiplikationGraphSkalarprodukt

14:07

Elektronische UnterschriftStichprobenumfangGraphAdditionSampler <Musikinstrument>E-MailMultiplikationsoperatorFahne <Mathematik>Rechter WinkelProgrammierumgebungsinc-FunktionInternetworkingHauptidealringZeichenkette

14:56

ZeichenketteClientBinder <Informatik>VersionsverwaltungGarbentheorieZeichenketteProjektive EbeneStrategisches SpielRechter WinkelTypentheorieRotationsflächeAutorisierungRandomisierungVisualisierungMalwareKryptologieEvoluteBinder <Informatik>VersionsverwaltungZahlenbereichStichprobenumfangGruppenoperationGarbentheorieUmwandlungsenthalpieStichprobeSchnittmengeVerschlingungÄhnlichkeitsgeometrieDistributionenraumComputeranimation

16:14

Abstrakte ZustandsmaschineVirtuelle MaschinePunktCodeSelbstrepräsentationWort <Informatik>Rechter WinkelOrdnung <Mathematik>

16:49

DisassemblerSystemplattformFreewareMultiplikationsoperatorComputerarchitekturResultanteTypentheorieDisassemblerSchnelltasteWechselsprungSystemaufrufPunktCodeWiderspruchsfreiheitKorrelationsfunktionMultiplikationProgrammbibliothekFormale SpracheRechter WinkelVererbungshierarchie

17:57

Abstrakte ZustandsmaschineBasis <Mathematik>CompilerSchnittmengeAbstandÄhnlichkeitsgeometrieElement <Gruppentheorie>Total <Mathematik>EindeutigkeitInklusion <Mathematik>MathematikMinimumMathematikKorrelationPunktCASE <Informatik>SchnittmengeMotion CapturingAbstandProgrammierungOrtsoperatorKorrelationsfunktionTotal <Mathematik>Element <Gruppentheorie>ZahlenbereichDomain <Netzwerk>FitnessfunktionOrdnung <Mathematik>BitEindeutigkeitElektronische UnterschriftRechter WinkelCodeStichprobenumfangDifferenteDatenflussGamecontrollerWechselsprungÄhnlichkeitsgeometrieEinsTeilbarkeitDisjunktion <Logik>Neuroinformatik

21:07

Abstrakte ZustandsmaschineCodeOrdnung <Mathematik>DatenflussRechter WinkelSoftwaretestWechselsprungVerzweigendes ProgrammMetrisches SystemGraphPhysikalische TheorieTypentheorieCode

21:56

MagnetbandlaufwerkOrtsoperatorVersionsverwaltungÄhnlichkeitsgeometrieAbstandInklusion <Mathematik>SystemaufrufAlgorithmusWechselsprungMixed RealityFormale SpracheRechenbuchRechter WinkelGewicht <Ausgleichsrechnung>Motion CapturingSchnittmengeÄhnlichkeitsgeometrieProgrammierungAbstandPunktOrtsoperatorEinsCASE <Informatik>DickeVerzweigendes ProgrammQuick-SortDemoszene <Programmierung>Regulärer Graph

23:30

ÄhnlichkeitsgeometrieAnalysisGruppenoperationE-MailDatenmodellQuellcodeHackerCloud ComputingElektronische PublikationVisualBASICVisuelles SystemBinder <Informatik>StichprobenumfangE-MailAlgorithmusGarbentheorieVersionsverwaltungRechter WinkelZahlenbereichMatchingEndliche ModelltheorieComputersicherheitEinsÄhnlichkeitsgeometrieDialektElektronische PublikationCASE <Informatik>Domain <Netzwerk>BitElektronische UnterschriftWald <Graphentheorie>ImplementierungUmwandlungsenthalpieFormale SpracheDemo <Programm>PortscannerAlgorithmische LerntheorieMereologieSchwellwertverfahrenWellenpaketPaarvergleichFitnessfunktionSchnitt <Mathematik>Hash-AlgorithmusPunktGrundsätze ordnungsmäßiger DatenverarbeitungProzess <Informatik>SchnittmengeProblemorientierte ProgrammierspracheQuellcodeSchlüsselverwaltungGüte der AnpassungArithmetisches MittelGlobale OptimierungStandardabweichungHauptidealringSystemverwaltung

27:39

Demo <Programm>Demo <Programm>Grundsätze ordnungsmäßiger DatenverarbeitungTouchscreenFigurierte ZahlComputeranimation

28:43

BildschirmfensterDemo <Programm>QuaderMathematikGrundsätze ordnungsmäßiger DatenverarbeitungTypentheorieTouchscreenVorlesung/Konferenz

29:18

Coxeter-GruppeMultiplikationsoperatorCASE <Informatik>Demo <Programm>RechenschieberComputeranimation

30:36

MehragentensystemZoomDemo <Programm>Demo <Programm>Rechter WinkelGüte der AnpassungGrundsätze ordnungsmäßiger DatenverarbeitungVersionsverwaltungElektronische UnterschriftBinder <Informatik>VerzeichnisdienstSkriptspracheMailing-ListeZahlenbereichGarbentheorieGenerator <Informatik>Pi <Zahl>StichprobenumfangÄhnlichkeitsgeometrieSchwellwertverfahrenElektronische PublikationMatchingCASE <Informatik>PhasenumwandlungNeuroinformatikRechenschieberPunktOrdnung <Mathematik>Quick-SortComputeranimationProgramm/Quellcode

32:45

Abstrakte ZustandsmaschineDemo <Programm>Deskriptive StatistikSchlussregelSchnittmengeEinsCluster <Rechnernetz>Rechter WinkelDämon <Informatik>GraphfärbungWort <Informatik>PaarvergleichSichtenkonzeptVariationsrechnungGraphiktablett

33:55

Formale SpracheMultigraphStandardabweichungBinder <Informatik>GarbentheorieCASE <Informatik>VersionsverwaltungBitElektronische UnterschriftZahlenbereichVariationsrechnungEinsSchnittmengeDifferenteÄhnlichkeitsgeometrieRechter WinkelStichprobenumfangCluster <Rechnernetz>Turing-TestGruppenoperation

34:55

Green-FunktionMalwareZahlenbereichDimension 2KreisflächeDimension 3Cluster <Rechnernetz>Rechter WinkelSchnittmengeGarbentheorieSichtenkonzeptMultigraph

35:30

Abstrakte ZustandsmaschineRuhmasseHauptidealringBitBildschirmfensterKurvenanpassungCluster <Rechnernetz>VersionsverwaltungKreisflächeResultanteSchnittmengeBinder <Informatik>ZahlenbereichSpiraleGarbentheorieRechter WinkelGanze Funktion

36:46

ZufallszahlenÄhnlichkeitsgeometrieRechter WinkelGraphÄhnlichkeitsgeometrieCluster <Rechnernetz>KryptologieModifikation <Mathematik>Elektronische Publikation

37:47

SpiraleNational Institute of Standards and TechnologyGarbentheorieZahlenbereichGoogle Chrome

38:26

Google ChromeGarbentheorieAbstrakte ZustandsmaschineBinder <Informatik>VersionsverwaltungBinärdatenGarbentheorieGraphische BenutzeroberflächeHash-AlgorithmusZeichenketteBinder <Informatik>AbstandPunktVersionsverwaltungInstantiierungElektronische UnterschriftGoogolZahlenbereichRechter WinkelMatching

39:13

GruppenoperationEindeutigkeitGarbentheorieVersionsverwaltungBinder <Informatik>VariationsrechnungElektronische PublikationGruppenoperationZählenSchreib-Lese-KopfVersionsverwaltungBitMAPCheat <Computerspiel>ZeichenketteResultanteTabelleRechter WinkelZahlenbereichÜbersetzungsspeicher

40:29

Abstrakte ZustandsmaschineInternetworkingResultanteGraphiktablettAlgorithmusBitUltraviolett-PhotoelektronenspektroskopieVariationsrechnungZahlenbereich

41:16

SystemplattformSkriptsprachePortabilitätTypentheorieElektronische UnterschriftMathematikBitDemo <Programm>RechenschieberWort <Informatik>Rechter WinkelTwitter <Softwareplattform>EinsStichprobenumfang

42:20

PunktReverse EngineeringNatürliche SpracheMultiplikationsoperatorRechter WinkelZahlenbereichBinärcodeInformationElektronische UnterschriftGruppenoperationMathematikInternetworkingStichprobenumfangBitTypentheorieKategorie <Mathematik>Selbst organisierendes SystemDickeMetropolitan area networkMalwareAnalysisKontextbezogenes SystemFarbverwaltungssystemHilfesystemMereologieBesprechung/Interview

Transkript: Englisch(automatisch erzeugt)

00:00

So hopefully everybody will be entertained. I know everybody thinks, you know what I really want to do at 5 p.m. is go to a talk that involves math. So hopefully this will ‑‑ excellent. Came for the math. Stay for the mustache. So that's me.

00:23

I'm Packer and so can you. I'm going to attempt to keep this to the 45ish minute mark so I can do some Q&A. Hopefully I'll get some hard questions. Hopefully I'll get some good questions. All right. Now we're on to the agenda. Do a little

00:41

bit of an intro, talk about the product ‑‑ not the product, the project. A little bit about me because why wouldn't I? I'm up here. Give everybody a little refresher, kind of techniques, talking a little bit about the PE format since that's mostly what we're going to be focused on today. We're going to look at the data and pull out our magnifying glass and look at ones and zeros, do a little bit of math.

01:02

We're going to then look at the solution and then finally we're going to look at the results. So the most important part, me. What do I do? Currently threat research at Bit9 Carbon Black. Those are my hobbies, static analysis, machine learning. Anybody else from Texas? There we go. If you're in Austin, I will totally buy you a beer. I run a

01:23

little project in a website called secrepo.com. If you guys are looking for various security data, I try to keep a somewhat updated and curated list. So everything from kind of bro logs and snort logs to other projects that have way more information than I could possibly host. You can follow

01:41

me on Twitter, at Sushi and then finally I'm a sometimes occasionally contributing member to the project. Thanks, Alex. And feel free to tweet about this and use the hashtag secure because math because we are going to be talking about math. All right. So what's the main problem here? I'm

02:04

sure a lot of people are familiar with the idea of detecting compilers and packers and encryptors and all sorts of other stuff, right? There's some good tools. Some of the tools are really old. So I'm going to pick on PID here. PID was written in 2005. So in essence it's 10 year

02:21

old technology. Maybe there's a more interesting way or a better way to kind of manage this problem. So really the goal was set out as can we do something new and different? So we've got some goals. So we've got some great projects out there like PID and some of the other ones. Yeah, they might be a

02:41

little old but there's probably some validity. However, for this we're going to try and adopt kind of a zero trust towards them. In other words, if somebody as an analyst says, oh, yeah, this PID signature is verifiably correct, then great. We being myself or anybody else in this room can create a

03:01

signature and kind of directly translate it into this new language. The other one is this easy to create signatures. So looking at PID and some of the other associated tools, you've got to live in a hex editor, right? You've got to maybe open up IDA and find the exact pattern that you're looking for. It requires kind of a certain bar to entry. So the

03:21

idea here is can this really be distilled down to something anybody can get value out of, right? So let's make it easy and we're going to talk a little bit about the signatures as well. Cross platform. So running PID on a Mac itself, that's not going to happen. There are a couple

03:40

solutions to attempt to let you run PID signatures on Linux, on Mac. They're really good. They're not as full featured as actually using PID on Windows. So that's kind of a negative there. The other thing, once again, right, simple to extend and understand. So in my opinion, what I'm going to start with here is kind of this base notion, this idea,

04:03

present some data and say, look, I'm pretty sure this mostly works. And then hopefully somebody, multiple somebody's in this room or elsewhere will go, wow, that guy wasn't really dumb. He was only mildly dumb. And instead here's a couple of enhancements, right? And the other thing that I really wanted to get out of this was this idea of fuzzy

04:22

matching. So if you've got kind of something like PID or another signature based language, generally it's the signature hit or it didn't. So instead I want to kind of introduce a notion of, well, part of the signature hit and this is about how much of the signature hit. So in other words, when I use this or when anybody else uses this for

04:44

signature management, you can kind of figure out where your overlapping signatures lie and you can maybe be a little bit more effective kind of out of the gate. So this we're going to jump in, just an easy refresher, talk a little bit about the terms. When I say certain words, what I mean, it

05:01

might be different from what other people said, so I want to make sure to do basic level setting. Talk a little bit about the PE file structure. I'm sure most of you in this room go home and dream about the PE headers. Probably not everybody does. All right. So this is a very simplified look of the PE

05:21

file structure, right? You've got kind of this DOS stub at the beginning. You've got these other various headers, some of which are optional, some of which are only generated by certain compilers. You have this notion of sections, right? Some sections contain the code and some contain data and so forth and so on. This idea of resources, so if you ever look

05:41

at, you know, a programmer executable's icon, generally stored in the resource section, so there are many, many different parts. This is one of my favorite graphs, and I apologize if you can't see it all that well. These are all the header values that you can have in a PE file. Now keep in mind not all of them are required to exist. Not all

06:04

of them are required to be filled out in an entirely accurate way, but this is what you can deal with. So there's a lot of things to mess with. They're color coded. So really as far as the PE format itself and the header structure, this is what we're going to care about today. The three

06:23

basic things that I decided, and whether I'm correct or not, that's fine, but three basic features out of the PE header that I said these can be kind of interesting and these should generally vary enough from compiler to compiler or decrypter to decrypter that they should be useful features in

06:42

kind of doing this type of analysis. The other one is number of sections. So things like UPX and a lot of other packers, right, maybe they jammed the entire executable, and we'll get a little bit more into this in a second, into one section and then just have their little tiny data section. So when I use the phrase tool chain, right, what I'm talking about is the set of tools used to develop

07:02

software. So you have things like IDEs and linkers and compilers and all that kind of stuff, and each one of these actually leaves somewhat of a relatively unique fingerprint upon the binary that it creates. Now, once again, you can manually go in and change these. Not a lot of people do. So for this, when I talk about tool chain, I'm actually

07:22

going to go, we're going to talk about kind of the build environment. So GCC versus visual C++. So packers, what are they? Packers are generally this program within a program. When I want to pack a binary, what I'll do is I can take the original executable, kind of smush it down and ram

07:42

it somewhere inside this new pack executable. So generally want to do that to evade AV, right, make analyst lives harder, because who loves or who doesn't love really stepping through a debug trying to figure out how do I get the unpacked version of this in memory, because this is just ridiculous. So at least if you know, you can

08:02

identify what packer, if it's similar to anything you've seen before, right, you know what steps you have to go through or maybe you know what tool to pull out of your toolbox in order to do the unpacking. So there are really two parts to a packer. You get the packer executable that you run on the original file. This is the thing that actually does the

08:21

compression or the obfuscation and creates this new executable. And then you get the unpacker. And the unpacker is generally this little stub that comes out in the new program that when this new executable is run, the sub is generally the first thing that is executed and it goes through and it kind of, you know, unpacks the original

08:40

binary and goes oh, okay, now I'm going to run this. So really when I talk about packer detection in this context, I'm actually going to be referring to the unpacker of the stub. So unpackers, how do they work? So what you really want to do is you want to take control of the address of the entry point, right? So where when a

09:02

Windows or a P file is loaded, where should I go and begin executing code? So you want that to now point to your stub. And then once you unpack it, right, so maybe you decrypt it or maybe you deobfuscate it or whatever it is, right, so you find the pack data, you kind of restore it, you get this little in-memory image. You've got to do a couple

09:20

relocation fixes because it's not the Windows loader doing the actual loading for execution. You have to mimic some of that. And then you jump into the original program and keep going. All right. So now on to the popular kids. So these are kind of the three, in my opinion, and there's probably several more tools that when people do compiler detection

09:43

or do packer or crypto detection, this is what they're talking about. So PID, I mentioned that one before. It's nice. The signature language is pretty good. It's been around forever. It's my opinion. It's kind of the de facto standard. Yara has its own signature language. Several projects that will allow you to take PID rule sets,

10:03

convert them to Yara rules so you can kind of update your analyst tools, but you're still kind of using this limited idea of what it is you're looking at or this harder way to describe data. And then this last one, this RDD packer detector, I actually really like their slogan. All right. So now we're going to dig into data.

10:24

And who doesn't love data? And honestly, if you're going to talk about math and if you're going to talk about doing any type of analysis, if you don't use data and you don't understand your data, right, it's really, really hard to get good results. And a lot of times, data is really

10:41

ugly, right? It's not this beautiful end result. It's this nasty thing you have to slog through and dissect and understand. So this is the data that I used in my testing setup. So I went and I found and I Googled and I threw together 3977 unique PID signatures. That's a lot of PID

11:03

signatures. Right? So that alone kind of got me thinking maybe we can address the signature management problem. We've got some file sets, various sizes, right? We've got smaller ones that I understood that I could pull apart and go, oh, okay, I get it. Yeah, these two are right and this technique seems to be working. And then we have this

11:22

giant random sample at the bottom, right? So 411,000 files. Because everybody loves big data and this wouldn't be a math talk unless I use the phrase big data. So there you go. So that was kind of the end all after I felt comfortable with the technique and comfortable with the tool. What I ran it over to kind of verify and did some spot checking with

11:43

that giant data set. We'll talk about that as well. So let's get into some of the data analysis, right? So for this, there's a handful of slides. We'll go through them. We're going to talk about the basic exploration of the Zeus data set. So if I go back a slide, I think, yeah, there we go. So 6,700 samples roughly is what these slides

12:03

are based off of. There we go. Okay. So first thing I did was, all right, what happens if I run PID on these 6,700 files? Well, turns out PID signatures don't match 4,600 of them. Really disappointing. So you get some other ones. So

12:25

this different UPX and another UPX version and Microsoft Visual Basic and Armadillo Packer, which I'm sure just by looking at the numbers you could probably make a relatively educated guess that maybe Microsoft Visual Basic 5.0 and 6.0 and Armadillo Packer are really, really closely

12:43

related. So kind of those numbers, what they look like in visual format. It's a bar chart. You don't have to worry about the numbers. That really tall line is the 4,600. This is kind of another way to visualize it. Just to kind of get into the idea that creating signatures is hard. It's

13:03

not trivial. So having an easier way to do it would be great because then that really big giant, and I apologize for not using gray scale, blue box or bluish purplish box, to make that smaller, to get more things that you can actually label and understand. Okay. Cool. So this graph, in my

13:23

opinion, is what science looks like. Right? You show this to somebody and they're going to go that dude up there totally did science. So this is simply a correlation matrix. And the idea being is you take all of these PID signatures and for files that had multiple PID signatures flagged, you want to

13:43

see which signatures flagged, right, with a high correlation or flagged, when one flagged the other one was very, very likely to flag. So the diagonal is basically the signature correlating with itself, which makes sense, right, because every time a signature fires, it's going to be observed. So

14:02

with this you kind of want to pull out the little black dots. And while this one is kind of hard to view, we can zoom in on one little snippet of the graph. So this is kind of that upper left-hand corner. And you can see that there are a couple of signatures that are highly, highly correlated, right? So there's a lot of signature overlap. There could be signature overlap, right, in your environments. There's obviously signature overlap on

14:21

the stuff I downloaded from the Internet, right? So every time, you know, one of these AS pack signatures flagged, the other one did. And so with that, you kind of get a feel for, oh, this is where I'm lacking or this is maybe where I have some duplication. So that's where we've got, we understand what PID looks like in a sample dataset. So

14:42

now looking at maybe some of the other features that we can use in addition to the header features that may allow us to definitively say or say with a very high probability that we're looking at a, you know, a specific packer or a specific compiler. So we can use PDB strings. I love it

15:00

when any type of malware author or any author in general includes a PDB string because sometimes it's like hitting gold. Sometimes they're awesome, right? And they're like, oh, yeah, by the way, we're using this crypto called crypto evolution. It's our visual C++ project. Sometimes, you know, it's kind of random garbage. It doesn't really give you anything. It's important to keep in mind that these are just text, right? So there's no reason why you

15:23

can't create your own. So for misinformation strategy, right? So now I kind of mentioned this linker version. So you've got these major and minor linkers. What do they look like in the sample set? So this is just kind of breaking down. So if you've got the first one, right, linker 2.5,

15:41

2,000 of them. So while you can group, you know, this Zeus sample set or many other sample sets just by looking at the number of linker versions or, sorry, looking at the linker versions of the account, it still really doesn't tell you the whole story. So we looked at the number of sections. And you can kind of see a relatively similar distribution,

16:02

right? You've got a couple of really, really big groups of files, right, that might indicate a specific campaign or something like that within the Zeus data set. And you kind of have this longer tail. So another thing we really wanted to look at, assembly mnemonics. So I think these are kind of cool. So the idea here is, right, when an

16:21

executable runs, there's code. And that code, those bytes, can be translated into a mnemonic. And all the mnemonic is is simply instead of, right, the byte representation for ad, it just prints out the word ad. And it's easier for me and a lot of other people to understand. So the idea is maybe we can use assembly mnemonics to help understand exactly what it is they're

16:42

looking at. And Johnny 5 is alive, but in order to get assembly mnemonics, you must disassemble. So sorry, Johnny. So for this, capstone engine was used. I don't know if anybody has played with capstone engine. If you're looking for a free and a really awesome disassembler,

17:02

it's great. I love it. Runs on multiple architectures. There's bindings for multiple languages. It's super easy to use. So the reason I call this out specifically is I'm sure a lot of you have noticed that every single time you run a different disassembler on an executable or some code, you will get different results, right? So

17:22

really you only get consistency within a disassembling engine. So if you were to write your own or use one of the other disassemble libraries, the technique itself would still work, and that's totally cool, right? I'm not pimping completely capstone engine. I like it a lot. But the point is just to be consistent with this type of stuff. So then I had what I

17:44

thought was a really bright idea. I was going to look at the correlation between assembly mnemonics, right? So every time an add appears, how likely is it that a move or maybe a call or a jump also appears? Yeah, that was an awful idea. So we moved on. So now let's get into

18:03

some of the math, right? Because how do you not love math? Math is so fantastic. So going back to kind of the assembly mnemonics, right? These mnemonics describe the program behavior, and that's kind of what we're looking to capture is what exactly is this unpacker doing or how

18:22

exactly does the executable get set up, right? Because it's generally compiler specific or in the case of a packer or cryptor, right, they have to know what to undo so they can then run whatever code they want. So we want to kind of capture this program behavior, and that's what we're doing with the assembly mnemonics. So

18:44

how can we look at these various assembly mnemonics? We looked at correlation. Correlation doesn't really take order into account. You saw the correlation matrix. It looked ridiculous, right? So imagine looking at that for 400,000 samples, and it's going to be some massive gray blob, and you're going to go blind and be sad. So, you know, there's this kind of

19:02

notion of distance or similarity that fuzzy idea is if I have a signature, I want to know how close what I'm looking at is how close is it to the signature, right? How similar are two things, this idea of similarity. So we'll talk a little bit about Jacquard distance. Jacquard's awesome. It's cool. However, it doesn't take order into account. The idea being that with assembly, right, it executes an order. It doesn't

19:23

jump around. I mean, there's, you know, flow control and all that kind of stuff, but generally, you know, if you see an add, a move, and an XOR, they'll be executed in that order and not, you know, move XOR at or vice versa. So while Jacquard's great and it

19:40

might be useful, order I thought was pretty important to take into account. So there's this idea of leveraging distance. There's another cool distance metric. The number of edits determine the distance, and position is important. So let's look at one of the examples of Jacquard distance. So here we have two seemingly random just sets of assembly mnemonics,

20:02

right? So we can say the left most is the one at the edge of entry point. So this is where the executable will start. And then it moves from left to right. And you can see there's various ones. So the easy way to view of computing leverage, sorry, Jacquard distance is to take the total number of shared elements, divide it by the total number of unique elements, and that's your distance. So in this

20:20

case, it's move push, which is 2, divided by the other set, right, which is 8, and you get .25. So as far as set membership is concerned, these two things are, you know, have a distance of .25. And while okay, I just didn't quite feel right. So with Levenshtein, once again, you have this idea of

20:42

order. So how many things have to change to make one into the other? So this kind of fit the domain a little bit better. So once again, kind of just doing a quick compare, you know, looking at if they're different. So, right, there's one difference and then they're not different and so forth and so on. So basically seven changes are necessary

21:02

to make one set into the other set. Therefore, we get a distance of 7. So kind of what we were talking about before. But code is executed in order, there may be branches. I really didn't want to build any type of, you know, flow graph or any of that kind of stuff. I wanted to keep it simple and understandable and efficient. So in theory, the

21:21

assumption was, what I worked with was the assembly mnemonics to the left should be more than the assembly mnemonics on the right, right, because it will execute starting on the left and finish somewhere off the right. And if there's a jump in there, maybe you want to care about it, but maybe you don't really want to care about this stuff after as much as the fact that there was a jump. So there were a bunch of testing and metrics where

21:42

I tried to figure out where the cutoff was, how many assembly mnemonics were required, so forth and so on and we'll get into that. We also have taken into account how big is the stub and if you don't know what you're looking at, then you don't know what you're looking at and some of these questions are really hard to answer. So we turned to tapered leverage scene. And this I think is a really, really cool algorithm. So basically the

22:04

idea is it's position dependent like regular leverage, except the ones on the left, right, any edit to the left will have a higher weight than an edit to the right, which kind of makes sense. So this is kind of a way to capture and now we have, oh, we care about more the things that are executed first in case there's something

22:20

like a branch or a jump, right, and now we have a language, this assembly mnemonics to kind of capture program behavior. So we can put those two together and the way you basically calculate this for every single position is one minus the position of the thing you're looking at, right, divided by the length of the set. So in this case

22:41

there's ten things in the set. So the first thing, right, requires one full edit. The second thing requires zero edits. And the third thing requires, right, .8 of an edit. So you kind of go on and now you have a distance of 3.5. So to me this was great because it said, yeah, these things are separate and different, but there

23:01

might be some sort of similarities. The nice thing you can also do with leverage scene is you can actually use it as a similarity calculation, right? So if you want to use it as a similarity, so it says basically those two sets are 65% similar. So this is how the idea of

23:22

similarity is saying, oh, we get this fuzzy hashing, this idea of similarity mixed into the algorithm. All right. So now that we've made it kind of through this great refresher, everybody loves P files and their headers and all the various values, and we have an idea of the

23:41

features we're going to look at, right, what we're going to use. We're going to use the major linker version, the minor linker version, these various assembly mnemonics, right, number of sections. We have some really fancy sounding algorithms that are actually really simple to understand, which is great. We have a way to do fuzzy matching. Awesome. So what do we do? Well,

24:01

first step, gather samples. We already talked about the data sets, so you know there are well over 411,000 samples that we dealt with. So the second thing was, right, let's get PID, kind of this industry standard, this thing that I'm very comfortable with, I've used a lot in the past. Let's see what it looks like for everything. Then from there, for

24:21

one of the executables, we're going to disassemble them, right, because we need the assembly mnemonics, and in this case we wound up using the first 30 assembly mnemonics. We need the header features. We'll talk a little bit about clustering so you can kind of understand which P files are similar based on these three features, right, assembly mnemonics, the various

24:40

header features. Then when I ran this across all the data sets, my threshold was this 90% similar. So I felt that if an executable's signature and a signature that I was matching against were not at least 90% similar, I felt that wasn't good enough to call it an actual match. So one of

25:01

the things that I started off using was banded minhash. It's a similarity comparison optimization, because I didn't feel like doing big O, right, of N squared comparisons, especially on 400,000 things. However, the implementation of banded minhash that I was using was broken, so I wound up doing a lot of comparisons, but luckily not by hand. All right.

25:23

Then we created signatures so we could test and verify. So one of the things that I kind of want to talk about briefly is why signatures, right? So everybody, we live in this great age where it's like, oh, my God, security data science, we have to do supervised machine learning, and if we're not using random forest or run unsupervised, if we're not

25:41

using TV scan or k-means with k-means optimization, you're like, no. Sometimes it's overkill, right? So one of the nice things about signatures in this case is we can use it to capture this domain specific language, but me or anybody else, we don't have to worry about model drift. So after you create this awesome

26:02

machine learning model that might have great accuracy, what happens when you get new data and you go to train it, right? That accuracy begins to drift, the model gets out of whack, so to speak, and you keep going through this large process, right? This is one of the issues with operationalizing machine learning. Also the model

26:22

will vary based on training source, right? So if I trained it against only my APT1 set, well, then it would be really good at probably finding things labeled APT1, but it would be worse about trying to determine which packer or which cryptor is what, right? And likely everybody else will have different data than me, so it really

26:40

wasn't a good fit. Kind of that last bullet is really where I was going is simple, right? You want to play. You want to do things. You want to tinker. Sometimes machine learning is fun to tinker with. Sometimes you really just want to get something done. So here's kind of what the signature language itself looks like. So really, really simple. It's kind of highlighted

27:03

to show you the signature, and I'm going to go into a demo in a second, but so the signature for Microsoft Visual Basic is the top line, and then the parts where it matched on the file, so you can kind of see those blue highlighted regions. There's quite a few. The ones on the left, there's a really long run, right? So you get the similarity of .902, right,

27:23

because it required 2.9 through repeating edits. So in my opinion, I think that accurately captured, yeah, this signature is relatively simple to the file, and I feel pretty confident that this file matches my signature. All right. Now let's move

27:42

into a demo. Oh, God. I'm going to minimize that real quick. We should be good. I think I

28:18

broke everything. That's phenomenal. Yeah,

28:21

seriously. All right. This is what I get for doing, trying to do like an honest to God demo, huh? Maybe if I don't do it in full screen.

28:41

We'll figure it out. There we go. It just hates full screen. So I'm going to ‑‑ Asian guy showed up. Awesome. Went from friendly math talk to clan rally quickly. I apologize. So I scripted this out because I was kind of a chicken as well. I didn't

29:01

want to type commands, and quite frankly, watching me type commands is boring. So I'll direct your attention to the top kind of small box and walk through the demo. This really is ‑‑ you know

29:20

what? You're going to get back up here. Just sit there for like two minutes. Got to want it. That's

29:43

all right. If it doesn't work, I have slides, but I thought a demo would be way more entertaining for everybody. Third time's a charm. Nope. Third time

30:04

was not a charm. When in doubt, try a different port. Oh. Maybe that's awesome if that was the

30:23

case. All right. I really didn't want to like try and lean over. You know what? Screw it. I'm going to unplug one more time. We're just going to go back to the presentation. If anybody wants to actually see a demo, I promise ‑‑ I literally promise it

30:42

works. I swear. I don't promise ‑‑ oh, no, no. I was trying to make sure of that. I think we're good. All right. We're good? All right. Screw you,

31:02

demigods. Now you get completely unreadable slides. So I'll try to describe what's going on. There's two phases to this. One, there's the signature generation phase, and that simply says run this one script on a binary that I can't even show on a computer. That's what I get for trying to do a demo. And generate the

31:22

signature. And all the signature is going to be is a simple list of assembly mnemonics and then give you this major, minor linker version and then as well as the number of sections. And then all you have to do, if you're not giving a demo, is run this other

31:40

script that if you can see it, that mmpes.py on the signature, and you can do all sorts of things. You can give it a threshold. So if your idea for similarity is different than mine, if you say I want to know everything that's 50% similar, you can do that. You can give it this crazy verbose to where it says, all right, here's the signature that I have, and here's what I'm matching against. You can

32:02

do that in case you really want to interrogate everything. It also tells you when the major and minor linker versions match or when they don't match or when the number of sections don't match. It tells you how many edits you have and then the actual similarity. So this is actually between two APT1 samples. And you can kind of see the

32:20

signature generated on the two files in this directory. The first one really didn't match all that well. It had this .844 required roughly 4.5 edits. But then this other file, it matched exactly. So all 30 assembly mnemonics were perfectly in order. Both the major and minor linker numbers matched as well as

32:43

the number of sections. And here's kind of a better description of the rule that you guys might be able to see. All right. Apologies for the demo. So let's look at some of the data sets. We'll start with the APT1. So for here, this is kind of

33:05

describing the clusters. So in other words, the like things grouped with other like things. And it's two bar charts superimposed on one another, which is why you get the color variations. Once again, apologies for not doing it in grayscale. So that very far one on the right, the idea is PAD found in that yellow thing

33:23

and said this many things are similar. And then kind of that green bar is the assembly mnemonic comparison of this many things were similar. So kind of the cool thing, even with having zero trust in the labels of using something like PAD, right, you get kind of

33:42

this anticipated view. You expect a lot of things to kind of fall into a few buckets and then you get this really long tail that as an analyst is always a pain in the ass to deal with. So one of the other ways you can represent this is kind of these neat-looking bubble graphs. It's not really science unless you have sweet graphs. So this is just clustered on

34:03

assembly mnemonics. So once again kind of representing what you can see, this one really large cluster and kind of these other ones. But the signature language and this work revolved around a couple other features. So what do they look like? All right. So the darker blue is the actual is the group. So in

34:21

case it's that big orange one is the big dark blue one. And then within that one cluster, right, based only on assembly mnemonic similarity, you have kind of these three subclusters based on number of sections. So this is kind of interesting. There's maybe a little bit of variation. So maybe somebody used a slightly different version of something,

34:41

so forth and so on. Likewise with linker versions, I thought this was kind of neat. There's very little in this example set. Deviation for linker versions when used as a subclustering. So then this is kind of a three-dimensional or two-dimensional view of three-dimensional set of

35:00

features. So once again kind of that dark blue is the assembly mnemonic circle. Then you've got these various subcircles. Kind of the one on the lower right-hand corner. You can see the cluster and then you can see one cluster that was actually based off of number of sections. Then you can see two subclusters in that. And then

35:20

everything else only had that one cluster. So it's kind of cool. So let's look at Zeus. Much bigger data set, much bigger graphs. Much more science. So this is what Zeus looked like. And once again kind of earlier with a little teaser, you get this massive, massive, massive PID unknown label. But the cluster one, it actually breaks it up. So

35:43

this one and the stacked one, you can see the assembly mnemonic clustering on that yellow bar kind of in that blown up window is a little bit more manageable. And you kind of get this slightly more gentle sloping curve. But you get a

36:02

lot of bubbles. So either the end result is I shouldn't do anything in D3 or you should never D3 while you're high. Because both scenarios end badly. So once again, what

36:20

does it look like if we subcluster on a number of sections versus the initial cluster on assembly mnemonics? You get more circles. What if we do it on linker version? You get these crazy sub spirals. Things look so bizarre. This was for me kind of enjoyable because it was a really neat

36:41

exploration of Zeus and kind of a way to visualize this entire data set. And then when you subcluster on both, you just kind of want to go home and cry. It's never very good. All right. So I mentioned that I did something on 411,000 files, which was awesome. So let's talk about them. All right.

37:02

This is just the assembly mnemonic graph. So you can see there are tons of clusters based on, similarity based on assembly mnemonics. This is awful to read. So one of the fun facts about this is roughly 5800 out of these

37:23

411,000 files are not 90% similar to any other file in this entire corpus. I thought that was really cool and really surprising. So this might be some polymorphic stuff, right? It might be various cryptors. Who knows? But it

37:41

was cool. 5800 things is way too many for me to actually dig through. So we'll kind of skip through some of these. Everybody loves spirals and I really wanted to leave 15 minutes for questions. So don't D3 or I shouldn't D3. I actually broke D3 on one of these. My NIST adjacent was

38:01

too big and it just wouldn't work. So once again subcluster number sections and this is the one that I broke. So this is where D3 just simply said, I give up. Or you're doing it really wrong. And it might very well be that I was doing it really wrong, but it cried. So there were a couple really,

38:21

really cool things that popped out of this relatively large data set. Like Google Chrome. There were 97 Google Chrome instances, right, hashes in this 411,000 and they all matched the same signature. Right? They all had this kind of same assembly mnemonic string. So they're very

38:41

consistent with their builds at Google. So if anybody is in the room from Google, thanks. Appreciate that. Right? They're very consistent with what linkers they have, what linkers they use. So out of the 97, kind of the take home is, 94 of those 97 have matching linker versions, matching

39:02

number of sections and assembly mnemonics within 90%, right, this .9 distance. So this is kind of cool. And then it really wouldn't be a talk about packers if they didn't talk about UPX. Because somebody was going to ask about it. So this was kind of cool. This was kind of telling. I dug into

39:23

TBX some in the past, but this actually forced me to do a little bit more digging. So I kind of cheated and I said, all right, what if I do this really, really naively and just look for the string, right, UPX 0, UPX 1 or UPX bang in the file and said, it's probably UPX, right, because once again

39:43

I didn't want to test any prior solution and I wanted to really see how this kind of stuff stacked up. So with the assembly mnemonics, right, out of just doing that simple thing through it, it got 65 different groups and I thought, shit, now I'm going to be laughed off stage. However,

40:00

there's some pretty cool results in here. So you can see in the table there's this group label and there's this count. So that's group label or the cluster label is just the arbitrary number that I assigned to it, this group. So you can kind of see once again you get this neat little slope. And I was like, all right, so maybe there's some variations of UPX. Maybe I'm much smarter than I thought I

40:22

was and I can do UPX version detection with this. Maybe my head's going to explode or maybe I failed miserably. The algorithm is looking up against how it stacked up against PAD. While I didn't trust the PAD results fully, it was neat to say either me or every random person that I pulled

40:43

signatures from on the internet were making the same mistakes or maybe were totally on to something. So kind of the cool thing was, here's the numbers, it looks like maybe I was on to something after all. There's also kind of that none. I dug through that a little bit to see what was going on and if this algorithm was completely failing. It turns out

41:03

there's a bunch of packers that basically wrap UPX, which I really hadn't much exposure to. So I thought that was awesome. So I learned a whole bunch there. These kind of variations. So, right, let's go through this recap. The idea was easy to generate signatures. Had I had a working

41:23

demo, you would have literally seen me type one command and the signature would have appeared out of nowhere. It would have been awesome. But I can show you later, I'm happy to. It doesn't love math. It's cross platform. It's all written in Python because Python is the new old Ruby. It's cross platform.

41:41

It met that requirement. It's mostly easy to understand. It involves a little bit of math but hopefully not too bad even for 5 o'clock on a Friday. And probably most important for me is it works. So even though the paper promised the demo didn't work, I'm going to release it online. The guys at work were more than happy to say, yeah, you

42:01

can totally release this tool and sample signatures for people to play with and use. That's the URL where it and these slides will live, the updated slides because the old ones are on the CD. So feel free to take a picture of it or you can ping me on Twitter. However, it's not up there yet because I'm a slacker so it will probably get done next week. And last but not least, if anybody has any

42:21

questions, I'm more than happy to answer them. Yeah. So the answer is once you have all of this data, what's the

42:42

action? And that's actually a really good question. So aside from why did I do it because I love messing with things, it's important in my opinion for any analysis to drive an action. And the action is to understand what you're looking at as a malware or somebody looking for extra context. So if I can kind of help solve part of the

43:02

signature management problem and you can get this idea of fuzzy matching out of signatures and whatnot and you have fairly accurate signatures with very little low lift, right, when you're at your home organization and you got, man, I've got this piece of malware that I've never seen and you go grab 3,900 signatures off the Internet, right, you can go, oh, right, here's a technique that uses

43:21

these types of signatures that works that tells me how similar it is to some of these other things that other people have seen. So it kind of helps give you a starting point for analysis. Any more? Okay. Honestly, I haven't looked at

43:42

much so I don't really know if I have a good opinion on it. Sorry. Any more? I mean, it would be awesome. Would you believe it? Oh, if anyone is using it, I haven't run into it. So the question was have I run into anybody

44:03

actually putting in the packer information into the packed files? My answer is no because I didn't run into it in any of my sample sets. However, even at 410 or 411,000 binaries, given the number of executables that everybody

44:20

talks about, right, that's still a relatively small sample set, so it's nowhere near everything. Any more? Like a question? Yep. So does this apply to protectors as well or am I using packers on a broad side? Yeah. When I say

44:40

packer, I mean protectors, cryptors, the whole gamut, the whole idea that it's going to obfuscate some intellectual property or something in a binary and make it hard for someone to get the juicy bits. Any more? Man, is my math that much on point that not everybody fell asleep and nobody has questions on math? All right. Cool. So I'll

45:02

be around if everybody has questions. One more. How do I make this mustache happen? I think it is genetics. It is math. This is what happens when you do too much math. You

45:24

wear super classy shoes. So I actually had a really long beard at one point in time, and my wife hated my long beard because I told her I was going for wizard length. So I said,

45:42

you know, if I can have a long beard, I'm going to have a long mustache. Now I sleep on the couch. Too much D3. Exactly. All right. Any more questions? Nobody? All right. Cool. Thanks for coming. I appreciate it.