Windows Offender: Reverse Engineering Windows Defender's Antivirus Emulator

Video thumbnail (Frame 0) Video thumbnail (Frame 4342) Video thumbnail (Frame 6117) Video thumbnail (Frame 7768) Video thumbnail (Frame 9274) Video thumbnail (Frame 10352) Video thumbnail (Frame 13466) Video thumbnail (Frame 15557) Video thumbnail (Frame 18446) Video thumbnail (Frame 20932) Video thumbnail (Frame 21966) Video thumbnail (Frame 25306) Video thumbnail (Frame 26428) Video thumbnail (Frame 29674) Video thumbnail (Frame 30663) Video thumbnail (Frame 32941) Video thumbnail (Frame 35307) Video thumbnail (Frame 36247) Video thumbnail (Frame 37867) Video thumbnail (Frame 38933) Video thumbnail (Frame 39849) Video thumbnail (Frame 41170) Video thumbnail (Frame 42437) Video thumbnail (Frame 43666) Video thumbnail (Frame 45216) Video thumbnail (Frame 46232) Video thumbnail (Frame 47836) Video thumbnail (Frame 49347) Video thumbnail (Frame 50482) Video thumbnail (Frame 51454) Video thumbnail (Frame 52430) Video thumbnail (Frame 53522) Video thumbnail (Frame 54423) Video thumbnail (Frame 56155) Video thumbnail (Frame 57589) Video thumbnail (Frame 58690) Video thumbnail (Frame 60570) Video thumbnail (Frame 61587) Video thumbnail (Frame 63108) Video thumbnail (Frame 64118) Video thumbnail (Frame 66048) Video thumbnail (Frame 67389) Video thumbnail (Frame 68906)
Video in TIB AV-Portal: Windows Offender: Reverse Engineering Windows Defender's Antivirus Emulator

Formal Metadata

Windows Offender: Reverse Engineering Windows Defender's Antivirus Emulator
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Windows Defender Antivirus's mpengine.dll implements the core of Defender's functionality in an enormous ~11 MB, 30,000+ function DLL. In this presentation, we'll look at Defender's emulator for analysis of potentially malicious Windows binaries on the endpoint. To the best of my knowledge, there has never been a conference talk or publication on reverse engineering any antivirus binary emulator before. We'll cover a range of topics including emulator internals—machine code to intermediate language translation and execution; memory management; Windows API emulation; NT kernel emulation; file system and registry emulation; integration with Defender's antivirus features; the virtual environment; etc.—building custom tooling for instrumenting the emulator; tricks that binaries can use to evade or subvert analysis; and attack surface within the emulator. Attendees will leave with an understanding of how modern antivirus software conducts emulation-based dynamic analysis on the endpoint, and how attackers might go about subverting or attacking these systems. I'll publish code for a binary for exploring the emulator from within, patches that I developed for instrumenting Defender built on top of Tavis Ormandy's loadlibrary project, and IDA scripts to help with analyzing mpengine.dll and Defender's "VDLLs"
Computer virus Presentation of a group Multiplication sign View (database) Computer reservations system Binary code Open set Computer font Emulator Antivirus software Velocity Flag Process (computing) Information security Physical system Vulnerability (computing) Cybersex Presentation of a group Computer font Email View (database) GUI widget Binary code Product (business) Window function Data mining Arithmetic mean Process (computing) Emulator System programming Information security Reverse engineering Firmware Game controller Proxy server Computer file Variety (linguistics) Cybersex Exploit (computer security) Control flow Device driver Web browser Product (business) Royal Navy Goodness of fit Software Reverse engineering Firmware Loop (music) Address space Default (computer science) Vulnerability (computing) Projective plane Authoring system Cartesian coordinate system Exploit (computer security) Performance appraisal Antivirus software Graphical user interface Loop (music) Software
Presentation of a group Link (knot theory) Variety (linguistics) Disintegration Mereology Electronic signature Twitter Emulator Read-only memory Fiber bundle Reverse engineering Vulnerability (computing) Building Projective plane Binary code Code Funktionalanalysis Binary file System call Leak Window function Antivirus software Explosion Emulator Malware Buffer solution MiniDisc Reverse engineering
Presentation of a group Dynamical system Group action Code Software developer Surface Mathematical analysis Binary code Login Online help Mathematical analysis Vector potential Antivirus software Malware Function (mathematics) Gastropod shell Heuristic Moving average Gastropod shell Aerodynamics Information security Booting Information security Mathematical optimization
Polymorphism (materials science) Computer virus Presentation of a group Open source Computer file Ferry Corsten Code Source code Binary code Coprocessor Sequence Royal Navy Data model Emulator Mechanism design Malware Antivirus software Blog Velocity Software Information Endliche Modelltheorie Reverse engineering Physical system Collaborationism Software developer Computer file Projective plane Binary code Code Funktionalanalysis Port scanner Binary file Leak Sequence Electronic signature Antivirus software Emulator Malware Hash function Heuristic Quicksort Family Fingerprint Reverse engineering
State observer Dynamical system Structural load Length Multiplication sign Virtualization Binary code Data model Emulator Malware Virtual reality Radical (chemistry) Hooking Semiconductor memory Single-precision floating-point format Information Vulnerability (computing) Computer font Computer file Binary code Virtualization Funktionalanalysis Port scanner Electronic signature Window function Type theory Process (computing) Befehlsprozessor Emulator Malware Hash function Heuristic Virtual reality Reverse engineering Point (geometry) Implementation Computer file Patch (Unix) Mathematical analysis Number Sequence Read-only memory Integrated development environment MiniDisc Reverse engineering Firmware Standard deviation Information Patch (Unix) Run time (program lifecycle phase) Projective plane Mathematical analysis Code Database Binary file Symbol table Integrated development environment Function (mathematics)
Dynamical system Code State of matter Multiplication sign System administrator Set (mathematics) Mereology Configuration management Emulator Heuristic Aerodynamics Process (computing) Local ring Physical system Simulation Structural load Binary code Data storage device Bit Window function Arithmetic mean Process (computing) Interface (computing) Heuristic Information security Reverse engineering Booting Implementation Computer file Maxima and minima Mathematical analysis Royal Navy Booting Game theory Interface (computing) Debugger Projective plane Mathematical analysis Code Binary file Configuration management Limit (category theory) Inclusion map Graphical user interface Kernel (computing) Software Personal digital assistant Local ring Library (computing)
Point (geometry) Computer virus Android (robot) Implementation Computer file Link (knot theory) Demo (music) Binary code Open set Emulator Fluid statics Malware Semiconductor memory Determinant Data buffer Standard deviation Email Demo (music) Binary code Projective plane Funktionalanalysis System call Window function Process (computing) Pointer (computer programming) Emulator Hash function Buffer solution System identification Right angle Table (information) Writing Resolvent formalism
Computer file Code Demo (music) Mathematical analysis Client (computing) Malware Software testing Process (computing) Aerodynamics System identification Information security Physical system Personal identification number Metropolitan area network Plug-in (computing) Addition Standard deviation Demo (music) Information Block (periodic table) Binary code Mathematical analysis Code Port scanner Window function Process (computing) Emulator Malware Visualization (computer graphics) Function (mathematics) System programming Convex hull Control flow graph Reverse engineering
State observer Presentation of a group Greatest element Structural load Code Ferry Corsten Sheaf (mathematics) Binary code Front and back ends Formal language Telephone number mapping Emulator Malware Virtual reality Bit rate Semiconductor memory Befehlsprozessor Object (grammar) File system Process (computing) Resource allocation Physical system Computer font Arm Binary code Virtualization Funktionalanalysis Instance (computer science) Translation (relic) Opcode Window function Latent heat Data management Befehlsprozessor Process (computing) Emulator Malware Buffer solution Heuristic Resultant Spacetime Data buffer Point (geometry) Slide rule Mobile app Real number Image resolution Translation (relic) Focus (optics) Number Intermediate language Read-only memory String (computer science) Software Uniqueness quantification Integrated development environment Spacetime Reverse engineering Vulnerability (computing) Focus (optics) Assembly language Resource allocation Integrated development environment Function (mathematics) Sheaf (mathematics) Object (grammar)
User interface Code Translation (relic) Opcode Limit (category theory) Mass Arm Intermediate language Emulator Operator (mathematics) Software output Link (knot theory) Arm Bytecode Binary code Mathematical analysis Funktionalanalysis Opcode Window function Process (computing) Emulator Software Personal digital assistant Royal Navy System programming Right angle
Polymorphism (materials science) Computer virus Computer program Slide rule System call Just-in-Time-Compiler Ferry Corsten Code State of matter Multiplication sign Translation (relic) Parameter (computer programming) Event horizon Neuroinformatik Emulator Latent heat Malware Software Uniqueness quantification Just-in-Time-Compiler Block (periodic table) Uniqueness quantification Binary code Electronic mailing list Mathematical analysis Code Funktionalanalysis Translation (relic) Opcode System call Latent heat Befehlsprozessor Hexagon Emulator Event horizon Software Function (mathematics) Right angle Escape character Block (periodic table)
Computer virus Implementation Group action State of matter View (database) Binary code Function (mathematics) Side channel attack Emulator Malware Hooking Befehlsprozessor Diagram Process (computing) Reverse engineering Computer font Vulnerability (computing) Information Structural load Projective plane Binary code Letterpress printing Funktionalanalysis Group action Leak Exploit (computer security) Window function Malware Emulator Function (mathematics) System identification Quicksort Hacker (term) Table (information) Library (computing)
State of matter Multiplication sign Parameter (computer programming) Declarative programming Variable (mathematics) Software bug Emulator Pointer (computer programming) Virtual reality Virtual memory Hooking Read-only memory String (computer science) Data structure Local ring Address space Compilation album Standard deviation Touchscreen Counting Variance Parameter (computer programming) Bit Virtualization Funktionalanalysis Window function Hooking Process (computing) Pointer (computer programming) Emulator Integrated development environment Function (mathematics) Buffer solution Right angle Spacetime Address space
Information management Group action User interface Demo (music) Online help Structural load Demo (music) Binary code Function (mathematics) Line (geometry) Software bug Revision control Hooking Malware Visualization (computer graphics) Videoconferencing Source code Self-organization Hill differential equation Library (computing)
State observer Implementation Mobile app Divisor Code Multiplication sign Demo (music) Function (mathematics) Complex analysis Software bug Emulator Malware Virtual memory Telecommunication Linker (computing) Address space Mathematical optimization Information Structural load Binary code Projective plane Electronic mailing list Code Funktionalanalysis Window function Hooking Emulator Visualization (computer graphics) Function (mathematics) Telecommunication Order (biology) Revision control Right angle Quicksort Spacetime Library (computing)
Asynchronous Transfer Mode Presentation of a group Hoax Computer file Variety (linguistics) Mereology Configuration management Emulator Pointer (computer programming) Malware Virtual reality File system Source code Integrated development environment Reverse engineering Installable File System Physical system Vulnerability (computing) Computer font Computer file Binary code Code Core dump Virtualization Price index Funktionalanalysis Window function Emulator Integrated development environment Oval Physical system Reverse engineering Asynchronous Transfer Mode Singuläres Integral
Backup Demo (music) Computer file Faster-than-light Demo (music) Virtualization Function (mathematics) Entire function 2 (number) Window function Mechanism design Hooking String (computer science) File system Installable File System
Wechselseitige Information Hoax User interface Hecke operator Weight Multiplication sign Open set Virtual memory Invertible matrix Malware Virtual reality Uniformer Raum Source code Process (computing) Physical system Computer virus Building Computer file Binary code Electronic mailing list Price index Funktionalanalysis Windows Registry Process (computing) Malware Emulator Order (biology) Hill differential equation Convex hull Quicksort Physical system Wide area network Windows Registry Asynchronous Transfer Mode Mobile app Computer file Sequel Principal ideal domain MIDI Term (mathematics) Touch typing Integrated development environment Key (cryptography) Demo (music) Java applet Binary file Configuration management System call Vector potential Word Address space
Asynchronous Transfer Mode Presentation of a group Code Real number Demo (music) 1 (number) Real-time operating system Arm Emulator Mechanism design Type theory Read-only memory Befehlsprozessor Integrated development environment Process (computing) MiniDisc Physical system Addition Online help Code Funktionalanalysis Symbol table Window function Type theory Process (computing) Emulator Integrated development environment Function (mathematics) Internet service provider Asynchronous Transfer Mode Singuläres Integral
Ocean current Implementation Content management system Computer file Real number Maxima and minima Neuroinformatik Emulator Semiconductor memory String (computer science) Core dump File system Energy level Aerodynamics Information security Physical system Complex analysis Mapping Code Computer Funktionalanalysis Translation (relic) Window function Emulator Simulation Physical system Spacetime
Complex analysis Computer program Identifiability Code Uniqueness quantification Virtualization Funktionalanalysis Complex analysis IP address Emulator Emulator Function (mathematics) String (computer science) String (computer science) Interrupt <Informatik> File system Website Boundary value problem Integrated development environment Website Fingerprint Library (computing) Asynchronous Transfer Mode Singuläres Integral
Module (mathematics) Computer file State of matter Coprocessor Emulator Pointer (computer programming) Kernel (computing) Extension (kinesiology) Disassembler Vulnerability (computing) Physical system Module (mathematics) Complex analysis Surface Code Virtualization Funktionalanalysis Coprocessor Opcode System call Hypercube Hooking Befehlsprozessor Pointer (computer programming) Emulator Software Hash function Oval Function (mathematics) String (computer science) Modul <Datentyp> Table (information) Extension (kinesiology)
Context awareness Group action Implementation Just-in-Time-Compiler Multiplication sign Code Parameter (computer programming) Function (mathematics) Funktionalanalysis System call Software bug Number Emulator Emulator Oval String (computer science) Kernel (computing) Function (mathematics) String (computer science) Cuboid
Context awareness Real number Multiplication sign Parameter (computer programming) Entire function Number Template (C++) Emulator Virtual reality Befehlsprozessor Source code Energy level Data structure Backdoor (computing) Physical system Context awareness Dataflow Uniqueness quantification Counting Variance Parameter (computer programming) Funktionalanalysis Window function Data management Befehlsprozessor Emulator Function (mathematics) Quicksort
Code Real number Regular graph Emulator Virtual reality Virtual memory Read-only memory Semiconductor memory String (computer science) Operator (mathematics) Single-precision floating-point format Spacetime Integrated development environment Utility software Operations research Wrapper (data mining) Software developer Funktionalanalysis Local area network Similarity (geometry) Word Kernel (computing) Emulator Function (mathematics) Interface (computing) Reading (process) Spacetime Asynchronous Transfer Mode Singuläres Integral
Windows Registry Asynchronous Transfer Mode System call Computer file Device driver Semaphore line Primitive (album) Mereology Number Data management Type theory Synchronization Kernel (computing) Object (grammar) Network socket Core dump File system Process (computing) Installable File System Computer file Code Windows Registry Thread (computing) Window function Type theory Data management Process (computing) Kernel (computing) Emulator Wechselseitiger Ausschluss Event horizon Synchronization Key (cryptography) Object (grammar) Hydraulic jump Physical system Singuläres Integral
Intel Inheritance (object-oriented programming) Thread (computing) Computer file Disintegration Semaphore line Computer font Event horizon Wechselseitiger Ausschluss Virtual reality Type theory Read-only memory Semiconductor memory Object (grammar) Mapping Inheritance (object-oriented programming) Weight Computer file Semaphore line Funktionalanalysis Variable (mathematics) Thread (computing) Data management Process (computing) Wechselseitiger Ausschluss Event horizon Emulator Mathematical singularity Object (grammar) Physical system
Ocean current Asynchronous Transfer Mode Run time (program lifecycle phase) Computer file Disintegration File format Uniform resource locator Emulator Malware Virtual reality Object (grammar) File system Integrated development environment Process (computing) output Installable File System Standard deviation Code Virtualization Funktionalanalysis Latent heat Process (computing) Hexagon Emulator Function (mathematics) Order (biology) Right angle Heuristic Quicksort Physical system Electric current Singuläres Integral
Group action Event horizon Configuration management Neuroinformatik Emulator Malware Uniformer Raum Heuristic Maize Information Traffic reporting Physical system Information Interface (computing) Surface Aliasing Binary code Metadata Funktionalanalysis Directory service Group action System call Hypercube Explosion Process (computing) Malware Emulator Function (mathematics) Telecommunication Heuristic Queue (abstract data type) Asynchronous Transfer Mode
State diagram Principal ideal domain State of matter Multiplication sign View (database) Range (statistics) Event horizon Emulator Graphical user interface Type theory Computer configuration Kernel (computing) Software Information Process (computing) Traffic reporting Surface Interface (computing) Binary code Price index Funktionalanalysis Product (business) Process (computing) Computer configuration Personal digital assistant Radio-frequency identification Rewriting System identification Quicksort Data structure
Context awareness Game controller Information Code Projective plane Counting Binary code Funktionalanalysis System call Element (mathematics) Emulator Loop (music) Malware Emulator Computer configuration Function (mathematics) Sheaf (mathematics) Rewriting Vulnerability (computing)
Web page Computer file Code Sheaf (mathematics) Set (mathematics) Primitive (album) Attribute grammar Emulator Malware Read-only memory Memory management File system Heuristic Spacetime Process (computing) Context awareness Linear code Memory management Bound state Counting Virtualization Funktionalanalysis Price index Sequence System call Malware Resource allocation Sheaf (mathematics) Metaheuristik Buffer overflow Writing Reading (process) Data buffer
Classical physics Game controller Proxy server Dependent and independent variables Parameter (computer programming) Function (mathematics) Software bug Emulator Malware Read-only memory Semiconductor memory String (computer science) Videoconferencing Boundary value problem Proxy server Address space Vulnerability (computing) Vulnerability (computing) Dependent and independent variables Simulation Interface (computing) Software developer Code Funktionalanalysis System call Pointer (computer programming) Malware Emulator Algebraic closure Heuristic Information security Boundary value problem Buffer overflow
Web page Game controller Demo (music) Code Interface (computing) Demo (music) System call Number Software bug Revision control String (computer science) Videoconferencing Source code
Surface Game controller System call Proxy server Code Multiplication sign Demo (music) Mathematical analysis Parameter (computer programming) Function (mathematics) Mechanism design Read-only memory Heuristic Source code Integrated development environment Process (computing) output Reverse engineering Proxy server Vulnerability (computing) Computer font Code Parameter (computer programming) Bit System call Kernel (computing) Malware Computer configuration Emulator Function (mathematics) Hacker (term) Fingerprint
Randomization System call Computer file Code Parameter (computer programming) Emulator Crash (computing) Source code output Address space Physical system Buffer overflow Demo (music) Online help File format Interface (computing) System call Window function Sign (mathematics) Order (biology) Fuzzy logic output Smartphone Right angle Writing
System call Thread (computing) Ferry Corsten Code Weight Covering space Source code .NET Framework Mathematical analysis Arm Data model Emulator Pointer (computer programming) Befehlsprozessor Integrated development environment Process (computing) Aerodynamics Endliche Modelltheorie output Reverse engineering Physical system Vulnerability (computing) Buffer overflow Vulnerability (computing) Computer font Multiplication Arm Demo (music) Surface Binary code Mathematical analysis Code Bit Translation (relic) Binary file Window function Feature structure Befehlsprozessor Emulator
Parsing Logical constant Parsing Momentum LTI system theory Weight .NET Framework Parsing Bit American Vacuum Society Twitter Data management Malware Overclocking Antivirus software Kernel (computing) Emulator Velocity Authorization Project Zero Object (grammar) Quicksort Reverse engineering Vulnerability (computing)
Slide rule Presentation of a group Module (mathematics) Scripting language Proxy server Link (knot theory) Code Binary code Mathematical analysis Open set Twitter Document management system Information Aerodynamics Process (computing) Reverse engineering Disassembler Computer virus Slide rule Patch (Unix) Code Coprocessor System call Open set Document management system Malware Emulator Ideal (ethics) Videoconferencing Disassembler Extension (kinesiology)
hi so my name is Alex below Zell I'm here to present on my research on reverse engineering windows offender's antivirus emulator a little about me before we get started I am a security researcher at for all secure you may know the company from their victory at the cyber Grand Challenge two years ago DEFCON 24 with the mayhem CRS I also do firmware reverse engineering and cyber policy River loop security and I'm a very proud alumnus of RPI and RPI SEC they're playing over in the CTF right now and once they good luck guys and this is my first time speaking at Def Con so it's great to be here this work is my personal research and is my own views not those of my employer or anyone else have previously worked for before get started I do want to say this presentation is a deeply technical look at reverse engineering Windows defenders binary emulator and as far as I know the first conference talk to really look at reverse engineering the antivirus emulator for any AV product it's not an evaluation of Windows Defender I'm not going to tell you what this is a good product you should use in your network or not I'm not going to tell you whether it catches viruses effectively relative to other V's or anything like that and also this talk does not address Windows Defender ATP or any other technology under the Windows Defender name this is about Windows Defender antivirus the traditional end point AV product so an outline of this talk gonna go through an introduction then talk about my two lien process how I did what I did then reverse engineering and the real meat of the presentation I bet in vulnerability research and then we'll conclude so why look at Windows Defender antivirus this is Microsoft's built an AV product that is installed by default on all Windows systems and on Windows 10 it runs by default which means that over 50 percent of Windows 10 systems have Windows divider antivirus running the defender name now seems to cover a variety of mitigations and security controls built into Microsoft OS OSS so you have you know control flow guard II met ATP all these different things now get lumped under you know Windows Defender device guard Windows Defender application guard witness under exploit guard and so forth again here we're focused on Windows Defender antivirus and also runs on sandbox Taz NT authority system meaning if that are you found a vulnerability inside defender that would give you initial are see if you could exploit that it would also give you a probe esque up the system and you'd be running inside a navy process so they'd be you'll be unlikely to catch you doing any malicious cuz not gonna flag itself say doing some malicious writing a file injecting another process and so forth it's also surprisingly easy for attackers to research to reach I've not tried this myself but friends of mine at Google project zero have told me that you could send an executable to someone who has a gmail account open and if they have that gmail open in a background tab Chrome the Chrome browser will cache the downloaded file that just hits the inbox that'll hit like a mini filter driver on the windows OS and then the file that's written to desk would passed off to defender to be scanned so you can't actually reach this in a remote fashion even though you would think this is a traditional host based protection system
my motivation came from this tweet from Tavis Ormandy at Google project zero who about a year ago found some vulnerabilities in defenders JavaScript engine with Natalie's Ivanovitch also project zero and I had a background reverse engineer and reverse engineering antivirus software did some work we called a V leak with Jerry Blackthorne is here in the audience a couple years ago there's none of that at blackhat and would but I never actually analyzed Windows Defender and I always wanted to and I also this interest in JavaScript engines so I took on defender and looked at the JavaScript engine for about four months and presented that work and moved on to reverse engineering the Windows emulator which I'm under here I'm here to talk about today so our target is MP engine DLL this is the main DLL that provides Windows Defender scanning functionality it's a very large minor it's about 12 monkeys large and again this is not the part of the fender that's say doing hooking for system calls or filtering you know disk writes this is the main scanning engine this you take a buffer of data and you say this is malicious or it's not malicious that's his purpose and inside NP engine are a variety of scanning engines I'm focusing today on the windows binary emulator which is one of many scanning engines before we go into my work on the
windows binary engine just want to quickly recap what I did reverse engineering the JavaScript engine this bitly link there will take you to that presentation and this was presented at
our econ brussels in brussels belgium back in february so when does the fender has a javascript engine that's used for analysis of potentially malicious javascript code i reversed it from binary I used a
custom loader in shell for dynamic experimentation with help from Ralph rolls so thanks Ralph throughout the JavaScript engine I found AV instrumentation callbacks that inform the heuristic antivirus portion of defender about actions that the potential emissions javascript is taking that uses to determine it whether this is malicious JavaScript or not say for example an exploit and I also found that developers seem to prioritize security at the cost of performance so the JavaScript engine is very pared down stripped down doesn't have cheating or many of other features and optimizations that make it modern JavaScript engines fast on the other hand I found it to be relatively secure and the attack surface to be relatively pared down you'll see some common themes like that throughout this presentation today as far as
related and prior work goes there's really only a handful of prior publications on reverse engineering antivirus software at all let alone the emulators within them there is of course the work I mentioned a V leak which I did with some collaborators at RPI some of who are here there's also book work from hak-seon curette touching on this there's Travis or Maddie's work at Google project zero and there actually are some talks from the AV industry itself such as Mihai Sharrock stalk from believe this was hacked Lew I think ten years ago as a AV industry developer talking about how but the fenders emulator works but really there's not been a lot of offensive work or work from people who don't work in the AV industry looking at these systems I'd also mention that patents are a great source of sort of open source intelligence about how IVs work Chris Tomas called out in his presentation looking at patents on exit e6 processors similarly you can find a lot of patents that describe undocumented functionality with a navies or how these particularly complex mechanisms work alright moving
into a background run emulation itself so there's this traditional AV model and I think a lot of people have this idea about how a V's movie will play work which is that they scan files and look for known malware signatures such as file hashes sequences of bytes or file traits and they might have some heuristics about say imports or they recognize a static md5 hash or they recognize a particular snippet of code that's known to be associated with a given malware family but this is really an outdated model and this is an outdated model you know 15-20 years ago this was outdated because malware could evade these hard-coded signatures with pact code by creating novel binaries you know packing obfuscation you heard a lot about polymorphic viruses back in the early 2000s so the solution that again 15 to
20 years ago the AV industry came up with was runtime dynamic analysis on the endpoint through emulation so actually running these unknown binaries in a virtualized environments and looking for signatures there this technology goes by a number of names you may hear it called sandbox a heuristic analysis dynamic analysis detonation and virtualization and so forth at the other day it's all emulation and that's what we're talking about today so an overview of emulators
in general you begin by loading a potentially malicious unknown binary that you can identify with more expensive analyses or less expensive analyses or other such as hashing or heuristics based on imports you're going to run the emulator run the binary in an emulated environment so you're gonna have a CPU emulator for the particular architecture of binary generally x86 you're gonna run that in this emulator and throw it running you're gonna click these observations and you'll terminate it at some point such as length of time it run number of instructions that an executed the number of API calls a matter memory the malware is used or so forth and throughout this you're collecting heuristic observations about the mauers behavior that inform detection you might also look for things like if the malware calls create file and writes a known malware signature with create file you'd hook that implementation and every create file you would look for say unknown malware signature no malware hash at that point moving into talking about tooling and process how I did what I did reverse engineering wise I used a pretty standard industry tools like Ida and Binda for patch analysis so as Google project 0 was discovering some vulnerabilities I was able to diff updates of the dll and find what had changed how the Microsoft tried to mitigate vulnerabilities inside defender I found overall there's about 30,000 functions across this massive 12 megabyte dll so this is enormous probably one of largest binaries I've ever taken on reversing obviously people look at firmware that are much larger but this is really absolutely monolithic for a single Windows DLL what doesn't make this job a lot easier is that Microsoft publishes pdbs and basically debug databases that have symbols and sometimes type information for the binaries the neck analysis wise
navies are generally harder to look at than traditional software and dynamic analysis does require some work on the part of the user or the reverse engineer in defender's case it's a protected process meaning that even if your system or admin on your local system you cannot attach the process to debug it even if you has a have se debug privilege or anything like that you can't still can't catch it's protected by the OS the solution to this is to go into kernel debugger and for example debug an entire VM and then attach the kernel process or the process from the kernel but that's very expensive and just annoying to do so introspection is also challenging actually if you can say pause in a breakpoint actually understand what's going on the emulator state can be difficult with a debugger even though you have AB debugger running scanning on demand can be difficult to trigger if you want to scan a binary you might have to go into a GUI interface click on couple buttons select something choose it you know it's a pain to do that you want an automated command-line interface just say scan this file scan that Fosca any other file and code here each ability may be configuration or heuristics dependent meaning that local settings about say how aggressive the scanning is what time limits you allow the scanner to have all of these can get in the way of effective scanning the solution is through all of a custom loader for these AV binaries and it was nice that I was able to start with some work that have a store medida Google project 0 did on building his own custom harness for defender which I've been extended extensively so first off I'm
going to talk a little bit about Travis's existing work which he called load library so Travis built a PE loader for Linux so this is able to take a Windows DLL on Linux and load it up and then actually run it this is not a full replacement for something like wine or any other we new simulation this is just enough it Windows Defender itself running and shim hangout system calls on windows that defender will be making to Linux implementations so talking through
how to Havas's tool works and the link here will take you to the github project we begin with a Linux binary just standard user Android binary and it's going to load and resolve imports 4mp engine dll so this is just the process of taking the dll relocating into memory standard DLL loading process I'm putting in a read/write/execute memory buffer there on linux then the iit the import address table you're going to go through and shim out the implementations of various windows api s with linux replacements so for example create file is replaced by a call to open file or F open and say write files are placed to a call to F right inside this engine you have an emulator and for now just remember that there's a table called Jesus calls which is a table of function pointers to various emulations of windows API functions and on the outside we have our malware binary with here we have the standard MZ header on the binary we're going to call a function exported by defender called our signal and this is the main entry point to defender we give it a buffer of data and it's going to come back with a malware classification we then go through a process of selecting a scanning engine so defender may do some initial analyses with things like static hashes if those fail and it can't determine whether this is a malicious binary or not they're ultimately gonna read it into the emulator the emulator will run make its determination whether this is a malicious binary or not and then come back with a virus identification or it might say this is just benign so quick
demo I'm going to show you scanning with
MP client this is tebbis Normandy's unmodified harness for Windows Defender
so here we're scanning the ECAR test file this is an industry standard test file for any AV and we see we scan the file and it comes back and says we found ECAR comm so that's kind of a demo we're
actually taking this Windows code
running it here on Linux and seeing what happens when we scan a binary in
addition to using this harness from Tavis I did some dynamic analysis with customized code coverage tools developed by Marcus Casa Dom of ret - systems as a fellow RPI SEC alumnus as well and Marcus made a tool called lighthouse that lets you scan a binary or run a binary under demo REO or pin collect coverage information and visualize that in Ida Pro so you can see here in this control flow graph the blue basic blocks are those that have been hit during a given scam and I found this the extremely powerful and useful tool when I was doing my reverse engineering I did find an interesting to see how far flake just about a month or two ago give a keynote SST I see where he was talking about challenges of introspect ability with malware or with binaries and how it can be very difficult to introspect and analyze and debug binaries and how ultimately that's a hindrance to security and however explicitly called out the challenges of analyzing Windows Defender is one example of this where because defenders and a privilege process on Windows you can't analyze it under a tool like pin or dynamic real of course we're running on Linux we sidestep the whole issue of the protected process and we can actually run and visualize coverage okay now
moving into the meat of the presentation talking about reverse engineering the emulator itself first off I'm going to talk about startup the engine then we're moving to CPU emulation instrumentation and then the Windows environment and emulation so first off the first thing that has to happen when we want to emulate ego in binary is we have to load it in and initialize the emulator and get everything started up so we're going to call the our signal function which provides this entry point the defender scanning and we give it this buffer of data to be scanned and to be classified and it will turn the malware classification so these results are actually cached as well there's lots of stuff going in the back end we don't really care about we ultimately care about just going into the emulator itself so the emulator has to be
initialized we have to allocate memory for execution we have to initialize various C++ objects that are involved in the emulation and sella process itself various subsystems then defender for example the object manager we have to create an object manager instance we have to set up the virtual file system and so forth we're gonna load the binary that's to be analyzed to resolve its imports and things like that and then initialize virtual DLLs in this emulated process memory space these are akin to the real DLL is on a real Windows system that provided me or fried Windows API functionality throughout this process defender is collecting heuristic observations about the binary and you can see these on the right side here for example things like PE a suspicious section size so these might inform some heuristic classifications and defender because there's a suspicious section size maybe this is malware will also be doing things like in the bottom rate you can see some minhwan resolution resolving ap IMS from the API set dll's and here in the bottom left I have this example of when we're setting up a name for the miner to be emulated you can see that if the binary is a Windows executable it'll be called my a sexy this is something you could write a face of malware that says if my name is my app that XE I won't run I know that I'm running inside defender and a neat if you google this string you will find malware binaries online that explicitly look for the name my update XE and she was not thrown if they see it after startup an initialization we're gonna move into talking about CPU emulation so technically what defender does is not so much emulation as it is to name a translation this is akin to what kemo the quick emulator does which is basically taking assembly code of a given language lifting it up into an IL or an intermediate representation and then taking that aisle and then dumping it out with a jet engine into executable code so defender supports a number of architectures you can see here in the enum on the right ranging from Exedy six of three different flavors it's up to arm and even vm protects they can take a vm protect opcodes lift those into an L and dump the veteran to sanitize exit e6 to be run and analyzed as well as arm now this subsystem is incredibly complicated and not really a primary focus of my research but I'll give you a brief overview over the next few slides
we begin with the architecture - IL lifting process which are in these giant functions that architecture underscore - IL you can see an example from Exedy 6 - IL translator just an absolutely massive ugly switch case thousands of switch cases you know I didn't get super slow when you load this in and basically what they're doing here is grabbing a bite of opcode from Exedy 6 opcode looking at that determine what it is and then emitting the according are related Windows Defender intermediate representation for that binary operation and you can see an example here in the bottom right where all push instructions lift to 13 in the Windows Defender IL
there's also after we lift with IL there is an il emulator that runs in software so we can actually run binaries in software I never serve this being rendering my research did some code coverage analysis never saw us being hit my intuition is that this is that we can support analysis of x86 binaries on non x86 hosts so for example if you are running Windows Defender on Windows for arm you don't have to have a il-2 arm jet engine you can just run it in software now as far
as the il-2 executes translation we're taking il code and then translating a basic block at a time similar the way chemo does things I did observe this ship being used during my research defender will actually handle unique instructions that it can't handle with emulation through software bound emulation so if it can't JIT an instruction that it'll actually generate a call directly into a function that does that we're in show the next slide but just you can see here circled in red on the left you can actually see the op codes being constructed so they're actually constructing a move do you move an immediate and then call the immediate calling directly into a function handling a particularly unique architectural instruction or event over here on the right you can see the le AI opcode actually being emitted the OP code in x86 is 8 D so as you're dumping out from the le a il instruction down to exit e6 you do ad and then you XOR that with a register to register and value to create an a valid exit e6 instruction Microsoft actually documented this in 2005 at virus baldon paper called defeating polymorphism beyond emulation and it's definitely worth checking out and it's really remarkable that Microsoft was experiment with experimenting with technologies technology almost 15 years ago aisles are so hot right now ever it's playing with aisles for things like pyre in ninja or various program analyses but Microsoft was doing this on the endpoint you know on your your computer your grandma's computer everyone's computer 15 years ago they were lifting up the ILS getting them out doing analyses on them it's very impressive I found so then we have these architecture specific escape handlers for these unique architectural events that we can't emulate with the jet engine you can look at this offline see an exact listing of some of these genomes and an example of
one of these functions would be this software bound emulation of the x86 CPU ID instruction so this is an instruction that provides unique and from a about given x86 CPU and here it's emulated in software so I've shown here I wrote a malware binary that does CPI D with this argument hex eight zero zero one and when we run this my nary inside defenders analysis engine will get this code coverage and won't release if it will bounce off the block where that same immediate is compared and then we do it on the true branch because the immediate that our code was doing matches up with the immediate here in software and then they can emulate cpuid by setting register state accordingly
all right moving into talking about in my instrumentation which was a big enabler for the rest of my research so
the problem with analyzing Windows Defender again I said there's very little introspection it's very difficult to tell what's going on inside of it all you really get out of it is virus identification now you could exploit virus identification as sort of a side channel to extract information about inside the engine and indeed that's what I did with the AV leak project a couple years ago was exploiting our identifications as a side channel to get information about what's going on inside various AV emulators but this is really slow and inefficient so a smarter technique is to go in and sort of give us a malware eyes view of what's going inside in the engine so MP engine DLL has various functions that are invoked when various windows api's are called by malware running inside of it and we can then hook Foles emulation functions and provide our own implementations so we can create a one or two way IO path to share information with the outside and also in turn inform the amount or binary inside about what actions we wanted to take so let me give you a diagram of that this is the original load library diagram I showed you this is terras or Maddie's tool kind of an unmodified State this is how it works and I went in and I hook the Jesus calls table this is the table of about 120 functions providing emulations for various windows api's I hooked it and replace those implementations with my own implementations of various common functions like output debug stirring a or win exec so when these functions now called by our malware binary inside the engine instead our functions are invoked
so here's an example of our woody bugs during a hook and the process we have to take on which is resolving the relative offsets of these functions and then setting hooks in the read/write/execute dll buffer in our lumix process so what this looks
like is this here in the top right we have our Ida Pro disassembly or D compilation rather of Windows defenders emulation of LP debug string a basically a no op all it does is retrieve a single parameter off the virtual stack and then bump the ticket count so bumps the time a little bit in the emulator here in the center of the screen I have my reimplementation of this function so we're gonna walk through this step-by-step first off we have our declaration so this takes a void pointer P VARs T is a mass of about half megabyte large structure passed to all these windows API emulations we don't want to know an exact definition of that function so we just provide take a void pointer just say we're not gonna worry about it it's just a pointer then we have this local thing to hold parameters to the function so the function has parameters passed to it in the virtualized emulated environment and we want to interact with those so we have to make some space for them we're going to use a function internal the defender to pull off one parameter from the virtual stack so we're going in talking you know looking at the virtual ESP and EBP state in this virtual memory space and then pulling off before by value that was there I'm actually calling back in the defender from my hook function to that then I'm calling a function getstring that's gonna translate a virtual address inside the emulator to a real address that we can interact with locally and now we can just print that string a standard out so this sounds like a lot but let me show
you a quick demo of it in action so here
I have a malware binary that's gonna say hello DEFCON when we run it it goes output Y bugs during a hello Def Con
we're not gonna scan that binary inside my hooked and modified version of Tabitha's load library tool and you'll see here it says hello DEFCON now going
back to Visual Studio we're gonna add a new line this is a live demo of course this is a pre-recorded video because the DEF CON organizers this year wanted us to do pre-recorded videos but I was doing this live I just rebuilt the the
binary and here scanning it again it's now going to say hello DEFCON and then also this is live demo so this is what's
happening is inside the emulator our malware binary is calling this function and because we've hooked the implementation of the output to bugs during a emulation defender our functions being called instead we're gonna run out one more time I believe with some more information you can see here we have a
more rich debug output and we can see things like the exact address has passed
to it from the virtual memory space so
this was a big enabler for the rest of my research the fact that I had this sort of window into what's going on inside the emulator I can have my malware miner inside take observations and then post them out to the outside world as far as my malware binary goes call it my app detects see again that's the name of all binaries running inside defenders engine it does this aisle communication without bugs during any and some other functions on the right side you'll see a list of factors that I found could impede emulation and the ways I get around them so I had to really massage the linker optimizations imports in order to get binaries that were consistently emulated by defender and I'll be releasing some code at the end of this talk that will have a very simple visual studio project that I found I was able to get consistently emulated once scanned with load library
finally as far as the reverse engineering goes moving into the Windows emulation and the Windows environment I think the most interesting part of this presentation I'm going to start off by talking about the user mode environment so this is the emulation of a fake Windows user mode so in Windows Defender
there is a virtual file system as any real system would have a file system and files that malware might look at defender virtualizes one there's about 1500 functions on their virtual file system and you'll see a variety of things in there mostly it's fake executables that are there for malware miners to for example infect or you know do different things - that could be indicators that they are in fact malicious binaries so I'll do a
quick demo dumping the file system again
using that mechanism that I showed you of posting data out without we doing string a we're able to enumerate the entire file system and dump it in just a few seconds I did here actually use a slightly more sophisticated hook
whereas doing win exec and I'll show some examples in my backup slides it's not as simple as just output debug stringing them but you can see here in just a second or two we dumped the entire virtual file system for me inside Windows Defender we Tam our binary go inside there Nomar ate all the files that it could see and then dump them out and when we after we dump them out we see that
there's about 1500 of them in this virtual file system
and you'll see things like this the word goat repeated thousands of times over in a file called a a a touch-me-not XE my intuition is that this binary is right there I'm the the C Drive and it's there so that am our binary might read that file in and say send it over the network or encrypt it or do some look some indicator that we are indeed malware so maybe if you touch it that might be an indicator that your malicious the reason it has the goat goat or goat placed at thousand times over presumably it is a goat file that's sort of an AV industry term for a sacrificial file like a sacrificial goat that you can let get infected or changed or encrypted by malware in order to have the malware kind of show its true intent so that was an interesting artifact again this is also something that you could write malware that says if I see the word goat thousands times over in a five will day a touch-me-not I know I'm running inside defender therefore I'm not going to run amok or anything malicious will see fake config
files you can see that these very clearly written by a real human with comments like blah blah and you know generic sequel queries we have a virtual registry that has thousands of entries and a numerating whole registry and dumping that out we'll see things like this so for example there's a registry entry for world of warcraft presumably there's malware that maybe looks for world or warcraft registry entry and touches it so if we saw a call to say reg open key on world of warcraft that might be an indicator of potential malicious intent we'll see various other fake processes running on the system and these are not real processes they're just when you call you know the callback function enumerate all processes it'll give you this fake listing and high-lift at the bottom and yellow there is our function my app detects see quick demo of that dumping the process listing again using
this same mechanism that I developed so there you can see real time just took less than a second we dumped the entire process listing all right back to the
presentation in addition to this
environment we have Windows user mode code that runs to provide emulations of various windows api functions and there
are generally two types of windows api emulate akin to those windows api functions the real window system there are those that stay in user mode which are ones that say in the emulator knows that resolving to a Cisco just like a trap to a native emulation here in defender symbols indicate that these emulated virtual dll's that are in the emulator environment are called v dll's and
because they are simply DLLs once we have a filesystem dump we can just go reverse that dump and reverse those dll's by throwing Ida and their standard windows PE files when we look at them they're definitely not the real implementations of things that kernel32 that you would see on a real system so
we'll see things like this in kernel32 if we call get username a little return a hard-coded string of John Doe this is again something we could use to create a vase in our that says if I see the user name John Doe I'm not gonna run will see a computer named Hal 9000 ostensibly a arthur c clarke you know Space Odyssey 2001 reference so again you could right now it looks for 2000 how mine thousand are no you're running inside defender we'll also see very simple imitations of functions like RTL get current map all that function takes is it needs to just go grab a memory segment that FS 18 so they actually support memory segmentation at the architectural level so they can just do that actual instruction inside the emulator or will see complex functions like RTL sets ankle security strip they're just knocked out they just return zero and
more functions just stubbed out putting zero negative one so forth or just triggering interrupts so lots of complex function Plex functions are not fully emulated by defender we'll also see
things like this and get more unique strings and identifiers that we know we're running inside defender like these German IP addresses and references to German websites maybe a German programmer developed this particular DLL emulation so that covers some of the
user mode code they're very simple emulations those that just return the hard-coded names like John Doe or how 9,000 how about the user Colonel privilege boundary and how do we get into more complex simulations such as those requiring access to a virtual file system these functions are implemented
with a hyper call like instruction called API call this is of course not a real x86 instruction with the opcode 0 FFF 0 and then a 4 byte immediately describing the particular function to be invoked but when this instruction is called and the virtual CPU it's going to generate a call in to a native MP engine that DLL function that provides emulation of these unique functions so these are complex functions that minify system state or may require particularly complex handling and so in copy file worker we have an API call to kernel32 copy file of living worker the virtual CPU sees that instruction generates a call directly into this emulation of that function and then it's emulated there in software in engine the DLL this is great attack surface if you found any vulnerabilities in these native emulation functions you could use these to break out of the emulator and infect the native host this disassembly here is provided by an Ida processor module and I'll have an article coming out in PUC or gtfo issue 19 describing exactly how this Ida processor extension module works so once we have these API call
instructions running they're gonna trigger a call to a function looks at the Jesus calls table which is big table of these function pointers and these hashes let's go look for the 4 byte immediate that was called from the API call instruction and then dispatch to the appropriate function that matches up with it so kind of a workflow of what
this looks like inside the emulator here we have of kernel32 output you bugs during a it's gonna do things like log the number of times it was call so if it's called more than nine hundred times that might trigger some unique behavior but automates can resolve dinner this function API called kernel32 output debug string ie which is then I'm gonna
use the API call instruction you can see the zero ffff 0bb 1480 to be - it's gonna see that instruction and then the hyper calls must have been and basically transition us into native emulation out of this managed dynamic translation context and we're going to hit the native emulation for output debug string a of course this is what we hooked and we show had our own output box during a implementation that I was using to post information out of the emulator
enumerate in the emulated functions that have native emulations these are them the yellow functions are those that are not found on a real Windows systems slor' specific to defender for example for debug
functionality or unique backdoor management here's more of them including a number of VFS functions which are for low level access to the virtual system so all these need of emulation
functions take a p/e VARs T a very large half megabyte large structure containing everything about a given emulation context and then we have template of parameters functions that are used to retrieve parameters to the function from the emulated snack and then programmatic API is for manipulating return values which just restate the CPU tick count or time all that sort of stuff can be programmatically managed through manipulations of P VARs T structure
virtual memory can be interacted with with a I similar to that found in many emulation engines such as unicorn engine where we can memory map virtual memory into our real memory space and manipulate it there and there are
wrapper functions for common operations like reading a single byte and writing a single D word reading or writing white strings or regular char stars these are all have kind of these utility functions wrapped around them to make them easier for developers moving into kernel
internals so we've talked about the user mode code doctor how the user mode code gets in the kernel mode or the native emulations let's look at how those need immolations are themselves implemented
so the Windows kernel provides a number of facilities to any binary you know this is any cost kernel that XC and all associated drivers and these are really the core of the Windows OS or the NT kernel these exempt these are include examples the object manager process management file system access the registry through registry hives and synchronization primitives for IPC first
off we're gonna talk about the object manager this is an essential part of the windows executive that provides management for handles so anytime you are opening a file a socket so forth it's gonna go through the object manager and defender supports 5 types of objects with its object manager so these are
file thread event mutant which is a singular mutex and semaphore and these are stored in the big object manager map here an ndmp engine dll they're stored in memory is C++ objects and they all inherit from a common parent class object manager object we then have subclasses like file object or mutant object and you can see I've made a little larger for the font the unique traits to those particular C++ objects such as the M file handle thing in the object or the weight account variable for a mutex if various processes can wait another given mutex C++ RTI is used to are TTI is used to cast between these subclasses to their parent class when they retrieved and the object manager
can be interacted with programmatically by these various functions so if we open a mutant they're gonna grab that object and then you know mess with it if we open a file object it's actually called object manager and get file object which will use our tool tool first check the type of than explicitly use our TTI to cast to a file object and fail if the retrieved handle is not indeed a file handle well also see things like the
pseudo handle for current process is in order to as hex one two three four again a treat of the emulator we could use to write evasive malware based on seeing that our own handle is one two three four we have a virtual file system
provides emulation and access to a file system and this is accessed through the standard ntdll NT right file and you create file and so forth api's as well as these lower-level VFS functions which
provide sort of a backdoor unsanitized access to the file system emulation finally moving into talking about AV instrumentation so the heuristics and analyses the AV is doing throughout the runtime so there are some internal
functions that are exposed through the hyper call API call interface and I've summarized them here and we're gonna look at a few of these first off and
your report event which is used to communicate information about malware binary actions with defenders heuristic detection engine so these are in some of these user mode emulations such as get user name or computer name those don't require trapping into a full native emulation and that would increase the attack surface greatly if they all did but we do want to inform defender that the given function was called so if get system directory is called it'll report event one two three three one or if you create a process and you do a suspended it'll do hex 30 18 but it'll say create suspended just physically noting that a process like that was created him your
Ford went can be called in more case as you can see here just more examples this is called thousands of times throughout these VD LLS and a more concrete example
of how this might play into a V identification of potentially malicious binaries is here where we see that if we call terminate process on a pit in the 700 range which you'll note that all these various AV processes are in the 700 range it'll trigger are called MP report event 1 2 3 4 9 but it also say AV so if you try to terminate process in our view that's probably a good indicator your malicious anti control channels sort of
a backdoor interface for administering the engine this is something to have a sermon to hit and I went here and reverse engineered the 32 switch case options of this function and showed you what they all do so these dealt you through things like manipulate the rewrite microcode manipulate register state all sorts of stuff great attack surface it definitely something that shouldn't be open to malware binaries running inside the emulator we're doing clued by talking
about vulnerability research certify trying to understand some prior vulnerability is discovered by Travis Normandy at Google project zero so Travis discovered this API call instruction that I talked about and he was able to call directly into native emulations of functions rather than passing through their API call stubs by just not generating the API call instructions on the fly as you can see here and then Tavis was hitting internal debug functions like NT control channel which when you give it option X 12 it
goes to rewrite micro code and this code here let's the user specify the count in a tight loop and what the user specified count we only have I think a thousand elements allocated for the new micro code information but the user can give
and say two thousand and we have a linear buffer overflow Microsoft patched this by adding a check that the count is no greater than 1000 and if it is it returns zero it doesn't it doesn't run
Travis also looked at the virtual file system and by calling directly into these unsanitized functions to access the virtual file system was able to basically get a linear heap read and write primitive by creating a file with these you know strange sizes and and this sequence of calls could crash the engine with an out of bounds right now I
looked at the mitigations that Microsoft put in for the abuse of the API call instruction which were primarily the Tavis himself was generating the API call instruction on the fly from the malware text section then Microsoft that didn't check that says is the call to the API call instruction is it coming from a vdl page and if it's not it's gonna deny the user the ability to invoke in native emulation function this means that these API call instructions can only be invoked from code pages that are associated with a given vdl that cannot be called from our binary and in fact if you call them it'll do MP set attribute which will basically set a heuristic that you tried to call the API call instruction from your text section this is really really weird probably a strong indicator of malicious intent and I
found that I could bypass this mitigation by simply finding the API call stubs in memory in our video else which we can reverse-engineer and that could just bounce off the API call instruction and hit this interface these interfaces with my own controlled arguments since this is not good I did report this to Microsoft and they told me this is not a trust boundary kind of a classic Microsoft response to a lot of vulnerabilities closures that's not quite a trust boundary you know unless you actually found an actual vulnerability like you know actual buffer overflow in there the fact that there's this logical flaw that I can hit internal debug interfaces and do things like stop emulation right then and there or change microcode in the emulator that's evident not a vulnerability and according to Microsoft so an example of
a bypass here doing something pretty benign just we're gonna hit output D bugs during a so I found in current to the offset about PI debug string a and I can resolve that address and then treat that as a function pointer and just bounce off the simulation and when this runs we hit opt registering a now more maliciously we can sort of hit NT control channel in that internal debug interface left in by developers maybe debug or Minister the engine and we can set our own heuristics like for example if we call Virant body found will tribute immediate malware detection so a
quick demo of that so in this video you
can see we're calling uppity bugs during a in the illegitimate way and then calling it with our opt bug string I abuse through this unintended interface kind of left there in the video I'll code page once we run and compile this binary you know and we'll also hit empty control channel as well and we're gonna use NT control channel to check the exact version number of the engine this was done in the February
2018 build of the engine so with our kind of fret 2 API call technique we run this binary and we'll see we hit budget big string at the normal way then
through the API call with kind of the bypass for Microsoft's mitigation so we
have a controlled argument doing in there and we also show that we can hit NT control channel with a controlled argument as well
now the again the implications of this
is we can hit these internal debug in our faces with attacker controlled arguments probably not a good idea
finally I want to talk a bit about fuzzing so I was able to then fuzz
emulated api's basically working out some more complex mechanisms to allow our our channel to be a two-way IO channel not just an output Channel I took em wor labs as OS X kernel fuzzer
which generated random values to fuzz the OSS kernel and I folded that in with my code I'm generating random values at each time and then I pushed those into
the emulator and I was able to do things like fuzz NT write file and actually
reproduce Havas's crash but an unique
way that got around the sanitization
that auntie write file don't really does I repaired his crash and VFS right but through NT write file without having to abuse the API call instruction you can see in this demo here we're going to do that we're going to resolve the address of NT e right file and then fuzz that in this home and cuz I'm here with the params this is a more complex interface that I have for passing information in and out of the emulator and basically on the outside of the emulator we're generating fuzz input to give to inside of it and we're calling it to you right file with those fuzz parameters and seeing what happens so running this you're gonna see just run for quite a while it's just gonna keep running in my experience it took about seven minutes running single-threaded around 8,000 system calls per second to reproduced a versus crash again this is not a smart phone so there's no AFL there's no code coverage formation it's just a dumb the random values at Windows Defender I in order to fuzz it
there's our demo and moving into the conclusion we covered tool and
instrumentation CPU emulation basics for x86 binaries and a bit on vulnerability research and fuzzing for Windows Defender we didn't cover a whole lot of other stuff for example exit e6 x64 excuse me emulation arm emulation at the end protect emulation the 16-bit emulation there is a full dose emulator aside from the win32 in a modern Windows System emulator there's an a 16-bit emulation built into defender a really interesting attack surface as well probably not as well looked at as they have 32-bit one we look at the threading model how you could do multi threading for binaries inside emulators that's always a source of problems for a vm v emulators at large so we're looking at we're also analysis for dotnet binaries we're primarily looking at windows PE liners that are just compiled x86 code also inside MP engine we have unpack
errs parsers JavaScript engine which you can see in my recon Brussels talk author
scanning engines and dotnet engine now I
want to say that people love to talk about AVS and what they can and can't do we're what they were they way or may not be vulnerable but there's not a lot of ground truth about V's in the public and I think there should be more I think they're really fascinating target to analyze if you're a lot of fun I think this is much more interesting to me at least looking at malware actually saying how malware gets caught in me to get detected and you also learn a whole lot about say NT kernel internals and object managers and things of that it gives you an impetus to look at all these different technologies a lot of claims about AV vulnerabilities and how they may or may not be on our ball are based on have a summer DS work and a bit in Hawkins work but there's really not a whole doubt that out there I really like this tweet from OC's and where he said if you google antivirus internals all you find is me him and then Tavis Normandy I would say if you like this sort of work definitely grab a copy of his book it's an awesome book and really underappreciated by people just some really credible work that went into that
we releasing some code later here's my github also tweet about this so nowhere you have to take a picture of the slide what I'll be sharing some of the harnesses that I built an Ida disassembler for the API call instruction also be publishing an article and Parker GTFO issue 19 describing more of this well the more technical details of some these technologies and that concludes the presentation I'll have a whole lot more slides being released online after this this is only about 50% of the material that I prepared for today my JavaScript slides over well there that bitly link I want to thank all my friends Taba smirk Marcus Hawks Shanna and then numerous friends help me edit this presentation and get here to DEFCON hit me up on Twitter if you have any questions I have open DMS thanks very much you [Applause]