Windows Offender: Reverse Engineering Windows Defender's Antivirus Emulator
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 322 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/39670 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
View (database)Information securityLoop (music)CybersexFirmwarePresentation of a groupAntivirus softwareEmulatorBinary codeProduct (business)SoftwareVulnerability (computing)Process (computing)Reverse engineeringComputer fontSystem programmingGUI widgetProxy serverExploit (computer security)Window functionReverse engineeringInformation securityView (database)Product (business)Control flowExploit (computer security)Computer fontGraphical user interfacePhysical systemComputer fileData miningLoop (music)Computer reservations systemWeb browserFlagPresentation of a groupProcess (computing)Vulnerability (computing)Antivirus softwareMultiplication signBinary codeGoodness of fitEmulatorSoftwareAuthoring systemRoyal NavyPerformance appraisalEmailDevice driverOpen setCybersexFirmwareDefault (computer science)Address spaceProjective planeVelocityComputer virusGame controllerCartesian coordinate systemVariety (linguistics)Arithmetic meanAuthorizationBit
02:54
CodeExplosionRead-only memoryEmulatorFiber bundleBinary fileMalwareElectronic signatureDisintegrationBuildingSystem callProjective planeFunktionalanalysisBinary codeReverse engineeringMiniDiscEmulatorBuffer solutionLeakWindow functionAntivirus softwareTwitterVulnerability (computing)Variety (linguistics)Mereology
04:05
Reverse engineeringMathematical analysisMalwareBinary codeGastropod shellAerodynamicsFunction (mathematics)LoginInformation securityBinary codeLink (knot theory)Window functionReverse engineeringPresentation of a groupMathematical analysisCodeBootingInformation securitySoftware developerAntivirus softwareGroup actionSurfaceMathematical optimizationOnline helpGastropod shellHeuristicDynamical systemExploit (computer security)Moving averageVector potentialCheat <Computerspiel>Program flowchart
05:11
EmulatorAntivirus softwareSoftwareReverse engineeringFingerprintInformationBinary codeBlogAntivirus softwareSource codeHacker (term)CoprocessorCollaborationismMechanism designFunktionalanalysisReverse engineeringOpen sourcePhysical systemPresentation of a groupEmulatorProjective planeSoftware developerLeakRoyal NavyFerry CorstenQuicksortSource code
06:11
EmulatorPort scannerData modelComputer fileSequenceMalwareBinary fileCodePolymorphism (materials science)CodeMalwareFluid staticsSequenceComputer virusFamilyComputer fileHeuristicElectronic signatureAntivirus softwareEndliche ModelltheorieEmulatorHash functionBinary codeVelocity
06:54
EmulatorData modelComputer filePort scannerSequenceIntegrated development environmentRun time (program lifecycle phase)MalwareBinary fileCodeVirtual realityMathematical analysisVirtualizationBinary codeStructural loadRadical (chemistry)Read-only memoryMiniDiscReverse engineeringComputer fontPatch (Unix)Symbol tableFunction (mathematics)InformationVirtual realityEmulatorBinary codeMathematical analysisDynamical systemNumberHeuristicElectronic signatureVirtualizationSymbol tableInformationMalwareType theoryReverse engineeringIntegrated development environmentBefehlsprozessorLengthFirmwareState observerVulnerability (computing)DatabaseProcess (computing)FunktionalanalysisPoint (geometry)HookingWindow functionStandard deviationSemiconductor memoryPatch (Unix)Multiplication signSingle-precision floating-point formatHash functionComputer fileProjective planeImplementationComputer animation
08:59
Local ringMathematical analysisAerodynamicsBootingProcess (computing)CodeConfiguration managementGame theoryInformation securityInclusion mapHeuristicBinary fileProcess (computing)Binary codeDebuggerBootingSet (mathematics)Selectivity (electronic)Kernel (computing)System administratorMultiplication signEmulatorPort scannerArithmetic meanCodeMereologyHeuristicPhysical systemMathematical analysisConfiguration managementExtension (kinesiology)Sound effectReverse engineeringDynamical systemAntivirus softwareComputer fileGraphical user interfaceState of matterLimit (category theory)Local ringData storage deviceRoyal NavyProjective planeInterface (computing)CASE <Informatik>Software
10:22
Interface (computing)Maxima and minimaBinary codeEmulatorEmulatorBootingWindow functionStructural loadImplementationBitSystem callLibrary (computing)Physical systemSimulationFunktionalanalysisMalwareLink (knot theory)Table (information)Fluid staticsBinary codeProcess (computing)Standard deviationEmailPoint (geometry)Computer virusRoutingDeterminantPointer (computer programming)Hash functionData bufferAddress spaceResolvent formalismBuffer solutionProjective planeComputer fileOpen setWritingSystem identificationAndroid (robot)Semiconductor memoryRight angleProgram flowchart
12:18
Demo (music)Metropolitan area networkPort scannerConvex hullDemo (music)Client (computing)Window functionComputer fileSoftware testingStandard deviationCodeBinary codeSource codeXMLComputer animation
12:53
System programmingEmulatorPort scannerProcess (computing)AerodynamicsCodeMathematical analysisMalwareFunction (mathematics)System identificationPlug-in (computing)Window functionPersonal identification numberBlock (periodic table)Control flow graphProcess (computing)Mathematical analysisCodeAdditionPort scannerReverse engineeringBinary codeInformation securityInformationVisualization (computer graphics)Physical systemDemo (music)MalwareSource code
13:57
Vulnerability (computing)Reverse engineeringEmulatorBefehlsprozessorIntegrated development environmentComputer fontData bufferMalwareFunction (mathematics)Binary codeBuffer solutionEmulatorMalwareIntegrated development environmentPoint (geometry)FunktionalanalysisResultantWindow functionBefehlsprozessorFront and back endsPresentation of a groupSource code
14:39
SpacetimeRead-only memoryResource allocationEmulatorObject (grammar)Structural loadVirtual realityProcess (computing)Binary codeSheaf (mathematics)Vulnerability (computing)Reverse engineeringIntegrated development environmentComputer fontFocus (optics)BefehlsprozessorTranslation (relic)SoftwareLatent heatFunction (mathematics)Uniqueness quantificationBinary codeEmulatorObject (grammar)Sheaf (mathematics)Semiconductor memoryResource allocationFormal languageData managementProcess (computing)Intermediate languageGreatest elementBefehlsprozessorJust-in-Time-CompilerMalwareString (computer science)Telephone number mappingOpcodeVirtualizationResolvent formalismWindow functionInstance (computer science)NumberTranslation (relic)Slide rulePhysical systemFunktionalanalysisFocus (optics)Real numberFile systemCodeAssembly languageRight angleHeuristicImage resolutionAddressing modeState observerSpacetimeFerry CorstenBit rateArmMobile app
16:52
Link (knot theory)OpcodeRoyal NavyLimit (category theory)outputUser interfaceWindow functionIntermediate languageOpcodeRight angleTranslation (relic)FunktionalanalysisCASE <Informatik>Process (computing)MassOperator (mathematics)Representation (politics)
17:37
SoftwareEmulatorSystem programmingArmBytecodeFunction (mathematics)Block (periodic table)System callCodeTranslation (relic)Just-in-Time-CompilerLatent heatUniqueness quantificationEvent horizonSoftwareBinary codeWindow functionEmulatorCodeArmMathematical analysisJust-in-Time-CompilerComputer virusBlock (periodic table)Slide ruleUniqueness quantificationIntermediate languageSystem callRight angleEvent horizonEnumerated typeOpcodeNeuroinformatikEscape characterLatent heatBulletin board systemTranslation (relic)Multiplication signElectronic mailing listFunktionalanalysisComputer programPolymorphism (materials science)Ferry Corsten
19:47
Latent heatSoftwareCodeEmulatorMathematical analysisBefehlsprozessorEmulatorBranch (computer science)CodeState of matterMalwareBinary codeParameter (computer programming)SoftwareFunktionalanalysisBlock (periodic table)HexagonInformation
20:27
Computer fontReverse engineeringVulnerability (computing)Process (computing)BefehlsprozessorFunction (mathematics)MalwareBinary codeHacker (term)Letterpress printingGroup actionEmulatorWindow functionSystem callEmulatorQuicksortState of matterMalwareFunktionalanalysisView (database)LeakSide channel attackLibrary (computing)InformationSystem identificationImplementationProjective planeGroup actionExploit (computer security)Computer virusDiagramStructural loadTable (information)String (computer science)HookingFunction (mathematics)Binary code
21:58
EmulatorRead-only memoryFunction (mathematics)HookingParameter (computer programming)Pointer (computer programming)Variable (mathematics)Local ringAddress spaceVirtual realityProcess (computing)Software bugBuffer solutionFunktionalanalysisHookingWritingString (computer science)Reading (process)Function (mathematics)Virtual memoryEmulatorMultiplication signState of matterParameter (computer programming)Pointer (computer programming)TouchscreenBitVirtualizationAddress spaceCountingData structureSpacetimeRight angleVarianceWindow functionIntegrated development environmentMassDeclarative programmingStandard deviationCompilation albumProgram flowchartJSON
23:32
HookingDemo (music)Online helpInformation managementUser interfaceHill differential equationGroup actionDemo (music)MalwareFunction (mathematics)Binary codeSoftware bugStructural loadLibrary (computing)Revision controlLine (geometry)VideoconferencingSelf-organizationVisualization (computer graphics)Computer animationSource codeXML
24:10
Demo (music)HookingCodeEmulatorTelecommunicationComplex analysisFunction (mathematics)Revision controlFunktionalanalysisEmulatorBinary codeMalwareMultiplication signFunction (mathematics)InformationImplementationSoftware bugString (computer science)Address spaceVirtual memorySpacetimeRight angleLibrary (computing)Order (biology)State observerTelecommunicationMathematical optimizationCodeWindow functionProjective planeDivisorLinker (computing)Structural loadQuicksortElectronic mailing listVisualization (computer graphics)Mobile appSource codeComputer animationXML
25:15
Vulnerability (computing)Reverse engineeringComputer fontAsynchronous Transfer ModeCodeSinguläres IntegralIntegrated development environmentEmulatorVirtual realityInstallable File SystemPhysical systemPointer (computer programming)Computer fileConfiguration managementCore dumpOvalIntegrated development environmentWindow functionPresentation of a groupAsynchronous Transfer ModeHoaxMereologyReverse engineeringEmulatorFile systemFunktionalanalysisBinary codeMalwareVirtualizationPhysical systemPrice indexComputer fileVariety (linguistics)Source code
25:57
Installable File SystemDemo (music)Faster-than-lightFile systemDemo (music)2 (number)HookingFunction (mathematics)Mechanism designEntire functionString (computer science)Computer fileBackupWindow functionVirtualizationMalwareComputer animationSource code
26:34
Hill differential equationComputer virusWeightVirtual memoryJava appletAsynchronous Transfer ModeWechselseitige InformationWide area networkUser interfaceUniformer RaumWordBinary codeComputer fileMultiplication signMalwarePrice indexTerm (mathematics)QuicksortOrder (biology)Configuration managementSource codeComputer animation
27:27
Computer fileInvertible matrixHill differential equationConvex hullWindows RegistryIntegrated development environmentAddress spaceVirtual realityBinary fileHecke operatorPhysical systemMalwarePrincipal ideal domainEmulatorProcess (computing)MIDIBuildingDemo (music)MalwarePhysical systemWindows RegistryProcess (computing)FunktionalanalysisConfiguration managementGeneric programmingVector potentialComputer fileHoaxPrice indexSystem callQuery languageTouch typingElectronic mailing listSequelOpen setMobile appKey (cryptography)Source code
28:17
Online helpProcess (computing)Demo (music)ArmBefehlsprozessorSinguläres IntegralCodeAsynchronous Transfer ModeIntegrated development environmentEmulatorMiniDiscRead-only memoryFunction (mathematics)Type theoryElectronic mailing listEntire functionReal-time operating systemMechanism designProcess (computing)Presentation of a groupAdditionCodeIntegrated development environmentWindow functionEmulatorInternet service providerAsynchronous Transfer ModeFunktionalanalysisSystem callSymbol tablePhysical systemType theory1 (number)Real numberSource code
29:07
Content management systemMaxima and minimaEmulatorCodePhysical systemTranslation (relic)AerodynamicsComputerSimulationComputer filePhysical systemCore dumpWindow functionFile systemImplementationReal numberStandard deviationKernel (computing)FunktionalanalysisEmulatorSpacetimeNP-hardNeuroinformatikLimit (category theory)Complex analysisCodeMalwareLevel (video gaming)Cycle (graph theory)Vector space modelString (computer science)Information securitySemiconductor memoryOcean currentSource codeComputer animation
30:09
EmulatorFunction (mathematics)Complex analysisWebsiteFingerprintString (computer science)Library (computing)Singuläres IntegralIntegrated development environmentFunktionalanalysisComplex analysisInterrupt <Informatik>String (computer science)IP addressIdentifiabilityEmulatorWebsiteComputer programUniqueness quantificationVirtualizationAsynchronous Transfer ModeFile systemCodeKernel (computing)Boundary value problemSource codeComputer animation
30:49
Modul <Datentyp>EmulatorKernel (computing)CodeDisassemblerCoprocessorExtension (kinesiology)Module (mathematics)Function (mathematics)EmulatorComplex analysisSurfaceCoprocessorFunktionalanalysisPhysical systemSoftwareOpcodeComputer fileSystem callState of matterBefehlsprozessorVirtualizationModule (mathematics)Extension (kinesiology)HypercubeVulnerability (computing)Source codeXML
31:53
Function (mathematics)Pointer (computer programming)OvalString (computer science)HookingCodeEmulatorKernel (computing)Parameter (computer programming)Pointer (computer programming)FunktionalanalysisSystem callHash functionTable (information)NumberString (computer science)Function (mathematics)Multiplication signEmulatorSoftware bugJust-in-Time-CompilerGroup actionImplementationContext awarenessCuboidSource codeComputer animationJSONXML
32:54
Virtual realityFunction (mathematics)DataflowEmulatorContext awarenessParameter (computer programming)Entire functionBefehlsprozessorFunktionalanalysisReal numberPhysical systemWindow functionEmulatorBackdoor (computing)NumberUniqueness quantificationData managementLevel (video gaming)VirtualizationFile systemContext awarenessData structureBefehlsprozessorParameter (computer programming)State of matterCountingMultiplication signQuicksortTemplate (C++)VarianceSource codeJSON
33:39
Read-only memoryVirtual realitySpacetimeFunction (mathematics)Similarity (geometry)Interface (computing)Operations researchLocal area networkEmulatorIntegrated development environmentSinguläres IntegralSpacetimeVirtual memoryReal numberEmulatorSemiconductor memoryFunktionalanalysisKernel (computing)Reading (process)Utility softwareRegular graphWrapper (data mining)Software developerSingle-precision floating-point formatOperator (mathematics)String (computer science)WordCodeAsynchronous Transfer ModeComputer animationSource code
34:18
EmulatorSinguläres IntegralKernel (computing)CodeObject (grammar)SynchronizationInstallable File SystemWindows RegistryType theoryPhysical systemSystem callProcess (computing)Data managementAsynchronous Transfer ModeKey (cryptography)Computer fileThread (computing)Wechselseitiger AusschlussEvent horizonSemaphore lineHydraulic jumpKernel (computing)Process (computing)Device driverWindows RegistryNumberWindow functionData managementCore dumpFile systemSynchronizationObject (grammar)Primitive (album)InterprozesskommunikationType theoryComputer fileMereologyNetwork socketSource code
34:57
Wechselseitiger AusschlussThread (computing)Event horizonSemaphore lineRead-only memoryObject (grammar)Inheritance (object-oriented programming)Type theoryVirtual realityComputer fileObject (grammar)WeightWechselseitiger AusschlussComputer fileComputer fontData managementVariable (mathematics)Inheritance (object-oriented programming)Process (computing)Semiconductor memoryLevel (video gaming)CountingSemaphore lineEvent horizonMathematical singularityThread (computing)
35:41
Object (grammar)Physical systemEmulatorIntelDisintegrationUniform resource locatorProcess (computing)Electric currentFile formatObject (grammar)FunktionalanalysisComputer fileData managementType theoryOcean currentProcess (computing)MalwareHexagonEmulatorFile systemVirtualizationOrder (biology)Computer animation
36:17
Virtual realityPhysical systemFunction (mathematics)outputEmulatorInstallable File SystemLatent heatAsynchronous Transfer ModeIntegrated development environmentCodeSinguläres IntegralQueue (abstract data type)ExplosionHeuristicMalwareMetadataConfiguration managementAliasingUniformer RaumMaizeInformationGroup actionLevel (video gaming)EmulatorVirtualizationFile systemStandard deviationComputer fileFunktionalanalysisRight angleHeuristicRun time (program lifecycle phase)QuicksortSystem callHypercubeInterface (computing)Event horizonGroup actionInformationTraffic reportingBinary codePhysical systemDirectory serviceProcess (computing)SurfaceMalwareTelecommunicationAsynchronous Transfer ModeNeuroinformatikSource codeComputer animation
37:26
Graphical user interfaceState diagramInformationPrincipal ideal domainData structureSoftwareType theoryProcess (computing)Kernel (computing)Product (business)Computer configurationCASE <Informatik>Event horizonTraffic reportingMultiplication signRange (statistics)Principal ideal domainBinary codeProcess (computing)System callSystem identificationGoodness of fitPrice indexView (database)Radio-frequency identificationQuicksortEmulatorFunktionalanalysisComputer configurationSurfaceRewritingInterface (computing)State of matterGame controllerComputer animationTable
38:24
Binary codeMalwareSheaf (mathematics)EmulatorFunction (mathematics)Context awarenessSystem callVulnerability (computing)FunktionalanalysisComputer configurationHexagonGame controllerEmulatorProjective planeCodeCountingElement (mathematics)RewritingLoop (music)InformationJSONXML
39:08
Context awarenessSpacetimeMemory managementData bufferHeuristicProcess (computing)Sheaf (mathematics)MalwareResource allocationRead-only memoryMetaheuristikBuffer overflowLinear codeCountingInformationFile systemVirtualizationWritingBound stateFunktionalanalysisComputer fileSystem callMemory managementReading (process)SequencePrimitive (album)MalwareAttribute grammarSet (mathematics)Sheaf (mathematics)CodePrice indexEmulatorBinary codeWeb pageSource codeComputer animation
40:23
MalwareCodeEmulatorVulnerability (computing)Information securityBoundary value problemDependent and independent variablesProxy serverRead-only memoryBuffer overflowEmulatorVulnerability (computing)Parameter (computer programming)Boundary value problemSystem callInterface (computing)Semiconductor memoryClassical physicsDependent and independent variablesVideoconferencingAlgebraic closureSource code
41:03
CodeProxy serverMalwareEmulatorDemo (music)HeuristicFunction (mathematics)Pointer (computer programming)Interface (computing)String (computer science)MalwareKernel (computing)Game controllerProxy serverFunktionalanalysisSoftware developerEmulatorAddress spaceSimulationSoftware bugDemo (music)Videoconferencing
41:39
Function (mathematics)CodeRevision controlWeb pageBinary codeVideoconferencingInterface (computing)String (computer science)Game controllerNumberSoftware bugSystem callProxy serverSource codeJSONComputer animation
42:16
Demo (music)MalwareSurfaceMathematical analysisIntegrated development environmentHeuristicComputer configurationRead-only memoryFingerprintProxy serverVulnerability (computing)Process (computing)Reverse engineeringComputer fontFunction (mathematics)Hacker (term)EmulatorParameter (computer programming)outputSystem callCodeGame controllerParameter (computer programming)Interface (computing)BitMechanism designFunction (mathematics)Fuzzy logicKernel (computing)Multiplication signCodeSource code
42:55
outputSystem callBuffer overflowSign (mathematics)Online helpComputer fileFuzzy logicWritingEmulatorCrash (computing)outputSmartphoneDemo (music)Right angleInterface (computing)Address spaceOrder (biology)Physical systemFile formatWindow functionRandomizationParameter (computer programming)System callCodeInformationMechanism designJSONSource code
44:02
Pointer (computer programming)CodeoutputSystem callBuffer overflowVulnerability (computing)Process (computing)Reverse engineeringComputer fontCovering spaceBefehlsprozessorTranslation (relic)AerodynamicsIntegrated development environmentEmulatorBinary fileArmData modelWeightMathematical analysisFeature structureDemo (music)BefehlsprozessorBitBinary codeEmulatorWindow functionVulnerability (computing)Endliche ModelltheoriePhysical systemParsingCodeThread (computing)Source codeSurfaceArmWeightFerry CorstenMathematical analysis.NET FrameworkMultiplicationJSONComputer animation
44:56
ParsingParsingLTI system theoryWeightAntivirus softwareProject ZeroReverse engineeringLogical constantEmulatorAuthorizationParsing.NET FrameworkWeightObject (grammar)TwitterAntivirus softwareVulnerability (computing)Data managementKernel (computing)BitMalwareMomentumQuicksortVelocityOverclockingAmerican Vacuum SocietyComputer animation
45:56
CodeInformationExtension (kinesiology)CoprocessorModule (mathematics)EmulatorMalwareBinary codeDisassemblerScripting languagePatch (Unix)Proxy serverVideoconferencingComputer virusIdeal (ethics)Reverse engineeringMathematical analysisAerodynamicsProcess (computing)Slide ruleOpen setDocument management systemLink (knot theory)System callDocument management systemDisassemblerOpen setSlide ruleCodePresentation of a groupTwitter
Transcript: English(auto-generated)
00:00
Hi, so my name is Alex Bolozell and I'm here to present on my research on reverse engineering windows defenders anti-virus emulator. A little about me before we get started, I am a security researcher at ForAllSecure. You may know the company from their victory at the cyber grand challenge two years ago Defcon 24 with the mayhem CRS. I also do firmware reverse engineering and cyber policy at Riverloop security.
00:23
And I'm a very proud alumnus of RPI and RPI Sec. They're playing over in the CTF right now and I want to say good luck guys. And this is my first time speaking at Defcon, so it's great to be here. This work is my personal research and is my own views, not those of my employers or anyone else I've previously worked for.
00:41
Before I get started, I do want to say this presentation is a deeply technical look at reverse engineering windows defenders binary emulator. And as far as I know, the first conference talk to really look at reverse engineering the anti-virus emulator for any AV product. It's not an evaluation of windows defender. I'm not going to tell you whether this is a good product you should use in your network or not. I'm not going to tell you whether it catches viruses effectively relative to other AVs
01:04
or anything like that. And also this talk does not address windows defender ATP or any other technology under the windows defender name. This is about windows defender anti-virus, the traditional endpoint AV product. So in outline of this talk, I'm going to go through an introduction, then talk about my tooling and process, how I did what I did, then reverse engineering and
01:24
the real meat of the presentation, a bit on vulnerability research, and then we'll conclude. So why look at windows defender anti-virus? This is Microsoft's built-in AV product that is installed by default on all windows systems. On windows 10, it runs by default, which means that over 50% of windows 10 systems
01:41
have windows defender anti-virus running. The defender name now seems to cover a variety of mitigations and security controls built into Microsoft OS, so you have control flow guard, EMET, ATP, all these different things now get lumped under windows defender device guard, windows defender application guard, windows defender exploit guard, and so forth. Again, here we're focused on windows defender anti-virus.
02:03
And it also runs unsandboxed as NT authority system, meaning if you found a vulnerability inside defender, that would give you initial RC, if you could exploit that, it would also give you a prevesq up to system, and you'd be running inside an AV process. So the AV would be unlikely to catch you doing anything malicious, because it's not going to flag itself, say doing something malicious, writing a file, injecting another
02:22
process, and so forth. It's also surprisingly easy for attackers to reach. I've not tried this myself, but friends of mine at Google Project Zero have told me that you could send an executable to someone who has a Gmail account open. And if they have that Gmail open in a background tab, Chrome, the Chrome browser will cache the downloaded file that just hits the inbox, that'll hit like a mini filter driver
02:43
on the windows OS, and then the file that's written to desk will be passed off to defender to be scanned. So you can actually reach this in a remote fashion, even though you would think this is a traditional host-based protection system. My motivation came from this tweet from Tavis Ormandy, at Google Project Zero, who about a year ago found some vulnerabilities in defender's JavaScript engine with Natalie
03:03
Silvanovich, also of Project Zero, and I had a background in reverse engineering antivirus software, did some work we called AVleak, with Jerry Blackthorn, who's here in the audience, a couple years ago, presented that at Black Hat and Woot, but I never actually analyzed Windows Defender, and I always wanted to, and I also had this interest
03:21
in JavaScript engines, so I took on defender and looked at the JavaScript engine for about four months, then presented that work and moved on to reverse engineering the Windows emulator, which I'm here to talk about today. So our target is MPEngine.dll. This is the main DLL that provides Windows Defender's scanning functionality.
03:40
It's a very large miner, it's about 12 megabytes large, and again, this is not the part of defender that's, say, doing hooking for system calls or filtering, you know, the disk writes, this is the main scanning engine, this, you take a buffer of data and you say this is malicious or it's not malicious, that's its purpose. And inside MPEngine are a variety of scanning engines.
04:00
I'm focusing today on the Windows binary emulator, which is one of many scanning engines. Before we go into my work on the Windows binary engine, just wanna quickly recap what I did, reverse engineering the JavaScript engine. This bit.ly link there will take you to that presentation, and this was presented at Recon Brussels in Brussels, Belgium back in February.
04:20
So Windows Defender has a JavaScript engine that's used for analysis of potentially malicious JavaScript code, and I reversed it from binary. I used a custom loader and shell for dynamic experimentation with help from Rolf Rals, so thanks Rolf. Throughout the JavaScript engine, I found AV instrumentation callbacks that inform the heuristic antivirus portion of Defender about actions that the potentially
04:43
malicious JavaScript is taking that it uses to determine whether this is malicious JavaScript or not, say for example an exploit. And I also found that developers seem to prioritize security at the cost of performance. So the JavaScript engine is very pared down, stripped down, doesn't have jitting or many other features and optimizations that make modern JavaScript engines fast.
05:02
On the other hand, I found it to be relatively secure and the attack surface to be relatively pared down. You'll see some common themes like that throughout this presentation today. As far as related and prior work goes, there's really only a handful of prior publications on reverse engineering antivirus software at all, let alone the emulators within them.
05:21
There is, of course, the work I mentioned, AVleak, which I did with some collaborators at RPI, some of who are here. There's also work from Huxian Corette, touching on this. There's Tavis Ormondy's work at Google Project Zero. And there actually are some talks from the AV industry itself, such as Mihai Shahrokh's talk from, I believe this was Hack.lu, I think 10 years ago,
05:44
as a AV industry developer, talking about how Bitdefender's emulator works. But really there's not been a lot of offensive work or work from people who don't work in the AV industry looking at these systems. I'd also mention that patents are a great source of open source intelligence about how AVs work. Chris DeMoss called that in his presentation, looking at patents on X86 processors.
06:04
Similarly, you can find a lot of patents that describe undocumented functionality within AVs or how these particularly complex mechanisms work. All right, moving into a background on emulation itself. So, there's this traditional AV model, and I think a lot of people have this idea about how AVs may work, which is that they scan files and look for known malware signatures,
06:24
such as file hashes, sequences of bytes, or file traits. And they might have some heuristics about, say, imports, or they recognize a static MD5 hash, or they recognize a particular snippet of code that's known to be associated with a given malware family. But this is really an outdated model. And this is an outdated model, you know, 15, 20 years ago this was outdated
06:43
because malware could evade these hard-coded signatures with packed code by creating novel binaries, you know, packing obfuscation. You heard a lot about polymorphic viruses back in the early 2000s. So, the solution that, again, 15 to 20 years ago the AV industry came up with was runtime dynamic analysis on the endpoint through emulation.
07:02
So, actually running these unknown binaries in a virtualized environment and looking for signatures there. This technology goes by a number of names. You might hear it called sandboxing, heuristic analysis, dynamic analysis, detonation, virtualization, and so forth. At the end of the day, it's all emulation, and that's what we're talking about today. So, an overview of emulators in general.
07:21
You begin by loading a potentially malicious unknown binary that you can't identify with more expensive analyses, or less expensive analyses, rather, such as hashing or heuristics based on imports. You're going to then run the binary in an emulated environment. So, you're going to have a CPU emulator for the particular architecture of binary, generally x86. You're going to run that in this emulator,
07:41
and throughout running, you're going to collect these observations, and you'll terminate it at some point, such as length of time at run, number of instructions that have been executed, number of API calls, amount of memory the malware has used, or so forth. And throughout this, you're collecting heuristic observations about the malware's behavior that inform detections. You might also look for things like if the malware calls createFile
08:00
and writes a known malware signature with createFile, you'd hook that implementation, and if you createFile, you would look for, say, a known malware signature or a known malware hash at that point. Moving into talking about tooling and process, how I did what I did. Reverse engineering-wise, I used pretty standard industry tools, like IDA and BinDiff for patch analysis.
08:23
So, as Google Project Zero was discovering some vulnerabilities, I was able to diff updates of the DLL and find what had changed, how Microsoft tried to mitigate vulnerabilities inside Defender. I found, overall, there's about 30,000 functions across this massive 12-megabyte DLL, so this is enormous. Probably one of the largest binaries I've ever taken on reversing.
08:43
Obviously, people look at firmwares that are much larger, but this is really absolutely monolithic for a single Windows DLL. What does make this job a lot easier is that Microsoft publishes PDBs, and that's basically debug databases that have symbols and sometimes type information for the binaries.
09:00
Dynamic analysis-wise, AVs are generally harder to look at than traditional software, and dynamic analysis does require some work on the part of the user or the reverse engineer. In Defender's case, it's a protected process, meaning that even if your system or admin on your local system, you cannot attach the process to debug it. Even if you have SE debug privilege or anything like that,
09:20
you still can't attach. It's protected by the OS. The solution to this is to go into a kernel debugger and, for example, debug an entire VM and then attach the kernel process or the process from the kernel, but that's very expensive and just annoying to do. So, introspection is also challenging. Actually, if you can say pause and a breakpoint, actually understanding what's going on in the emulator state
09:41
can be difficult with a debugger even though you have a debugger running. Scanning on demand can be difficult to trigger. If you want to scan a binary, you might have to go into a GUI interface, click a couple buttons, select something, choose it. You know, it's a pain to do that. You want an automated command line interface, just say scan this file, scan that file, scan the other file. And code here reachability, maybe configuration
10:00
or heuristics dependent, meaning that local settings about, say, how aggressive the scanning is, what time limits you allow the scanner to have, all of these can get in the way of effective scanning. The solution is to build a custom loader for these AV binaries. And it was nice that I was able to start with some work that Tavis Orme did on building his own custom harness
10:20
for Defender, which I then extended extensively. So first off, I'm going to talk a little bit about Tavis' existing work which he called load library. So Tavis built a PE loader for Linux. So this is able to take a Windows DLL on Linux and load it up and then actually run it. This is not a full replacement for something like Wine or any other Windows emulation.
10:41
This is just enough to get Windows Defender itself running and shimming out system calls on Windows that Defender will be making to Linux implementations. So talking through how Tavis' tool works, and the link here will take you to the GitHub project, we begin with a Linux binary, just a standard user-led binary, and it's going to load and resolve imports
11:02
for MPEngine.dll. So this is just the process of taking the DLL, relocating it in memory, doing standard DLL loading process, putting it in a read-write-execute memory buffer there on Linux. Then the IT, the import address table, you're going to go through and shim out the implementations of various Windows APIs with Linux replacements. So for example
11:21
createFile is replaced by a call to openFile or fOpen and say writeFile is replaced to a call to fWrite. Inside this engine you have an emulator, and for now just remember that there's a table called gsysCalls which is a table of function pointers to various emulations of Windows API functions.
11:40
And on the outside we have our malware binary, with here we have the standard MZ header on the binary. We're going to call a function exported by Defender called rSignal, and this is the main entry point to Defender. We give it a buffer of data and it's going to come back with a malware classification. We then go through a process of selecting a scanning engine, so Defender may do some
12:01
initial analyses with things like static hashes. If those fail and it can't determine whether this is a malicious binary or not, they're ultimately going to route it into the emulator. The emulator will run, make its determination whether this is a malicious binary or not, and then come back with a virus identification, or it might say this is just benign. So a quick demo, I'm going to show you
12:21
scanning with MPClient, this is Tavis Ormondy's unmodified harness for Windows Defender. So here we're scanning the eCAR test file, this is an industry standard test file for any AV.
12:42
And we see we scan the file and it comes back and says we found eCAR.com, so that's kind of a demo, we're actually taking this Windows code, running it here on Linux, and seeing what happens when we scan a binary. In addition to using this harness from Tavis, I did some dynamic analysis with customized code coverage tools developed by
13:01
Marcus Gossodom of Retu Systems, a fellow RPiSec alumnus as well. And Marcus made a tool called Lighthouse that lets you scan a binary or run a binary under DynamoRio or PIN, collect coverage information, and then visualize that in IDA Pro. So you can see here in this control flow graph the blue basic blocks are those that have been hit during a given scan. And I found this to be an extremely powerful and useful
13:22
tool when I was doing my reverse engineering. I did find it interesting to see Alvar Flake, just about a month or two ago gave a keynote at SSTIC where he was talking about challenges of introspectability with malware, or binaries, and how it can be very difficult to introspect and analyze and debug binaries, and how ultimately that's a hindrance to security.
13:41
And Alvar explicitly called out the challenges of analyzing Windows Defender as one example of this, where because Defender is in a privileged process on Windows you can't analyze it under a tool like PIN or DynamoRio. Of course, we're running on Linux, so we sidestep the whole issue of the protected process, and we can actually run and visualize coverage. Okay, now moving into
14:00
the meat of the presentation, talking about reverse engineering the emulator itself. First off, I'm going to talk about startup of the engine, then we're going to move into CPU emulation, instrumentation, and then the Windows environment and emulation. So first off, the first thing that has to happen when we want to emulate a given binary is we have to load it in and initialize the emulator and get everything started up.
14:21
So we're going to call the R signal function which provides this entry point to Defender's scanning and we give it this buffer of data to be scanned to be classified, and it will return the malware classification. So these results are actually going to be cached as well there's lots of stuff going on in the backend we don't really care about, we ultimately care about just going into the emulator itself. So the emulator has to be initialized
14:42
we have to allocate memory for execution we have to initialize various C++ objects that are involved in the emulation process itself various subsystems within Defender, for example the object manager we have to create an object manager instance we have to set up the virtual file system and so forth, we're going to load the binary that's to be analyzed, resolve its imports
15:00
and things like that and then initialize virtual DLLs in this emulated process memory space. These are akin to the real DLLs in our real Windows system that provide Windows API functionality. Throughout this process Defender is collecting heuristic observations about the binary and you can see these on the right side here, for example things like PEA Suspicious Section Size
15:22
so these might inform some heuristic classifications in Defender because there's a suspicious section size, maybe this is malware we'll also be doing things like in the bottom right you can see some min-win resolution resolving API MS some of the API set DLLs and here in the bottom left I have this example of when we're setting up
15:41
a name for the binary to be emulated you can see that if the binary is a Windows executable it'll be called myapp.exe this is something you could write a face of malware that says if my name is myapp.exe I won't run, I know that I'm running inside Defender and indeed if you Google this string you will find malware binaries online that explicitly look for the name myapp.exe and choose not to run
16:01
if they see it after startup and initialization we're going to move into talking about CPU emulation so technically what Defender does is not so much emulation as it is to name a translation this is akin to what Keemu the quick emulator does which is basically taking assembly code of a given language lifting it up into an IL or an intermediate
16:21
representation and then taking that IL and then dumping it out with a JIT engine into executable code so Defender supports a number of architectures you can see here in the enum on the right ranging from x86 of three different flavors to ARM and even VM Protect so they can take VM Protect opcodes lift those into IL and dump them out into
16:42
sanitized x86 to be run and analyzed as well as ARM now this subsystem is incredibly complicated and not really a primary focus of my research but I'll give you a brief overview of it in the next few slides we begin with the architecture to IL lifting process which are these giant
17:00
functions that are architecture underscore to IL you can see an example from x86 to IL translator just an absolutely massive ugly switch case thousands of switch cases it gets super slow when you load this in and basically what they're doing here is grabbing a byte of opcode from an x86 opcode looking at that
17:20
determining what it is and then emitting the according related Windows Defender engine representation for that binary operation you can see an example here in the bottom right where all push instructions lift to 13 in the Windows Defender IL there's also after we lift to IL there is an IL emulator
17:41
that runs in software so we can actually run binaries in software I never observed this being run during my research did some code covered analysis never saw this being hit my intuition is that this is so that we can support analysis of x86 binaries on non x86 hosts so for example if you're running Windows Defender on Windows for ARM
18:01
you don't have to have a IL to ARM JIT engine you can just run it in software now as far as the IL to x86 JIT translation we're taking IL code and then translating a basic block at a time similar to the way Kimu does things and I did observe this JIT being used during my research Defender will actually handle unique
18:22
instructions that it can't handle with emulation through software bound emulation so if it can't JIT an instruction out it will actually generate a call directly into a function that does that and we're going to show that in the next slide but just you can see here circled in red on the left you can actually see the opcodes being constructed so they're actually constructing a move an immediate and then call the immediate calling directly into a function handling
18:41
a particularly unique architectural instruction or event over here on the right you can see the LEA opcode actually being emitted the opcode in x86 is 8D so as you're dumping out from the LEA IL instruction down to x86 you do 8D and then you XOR that with a register
19:00
to register and value to create a valid x86 instruction Microsoft actually documented this in 2005 at Virus Bulletin with a paper called Defeating Polymorphism Beyond Demulation and it's definitely worth checking out and it's really remarkable that Microsoft was experimenting with this technology almost 15 years ago ILs are so hot right now everyone's playing with ILs for things like
19:22
binary ninja or various program analyses but Microsoft was doing this on the endpoint you know on your computer your grandma's computer, everyone's computer 15 years ago they were lifting up the ILs, jitting them out, doing analyses on them it's very impressive I found so then we have these architecture specific escape handlers for these unique architectural events that we can't
19:41
emulate with the JIT engine you can look at this offline see an exact listing of some of these enums and an example of one of these functions would be this software bound emulation of the x86 CPU ID instruction so this is an instruction that provides unique information about given x86 CPU and here it's emulated in software
20:01
so I've shown here I wrote a malware binary that does CPU ID with this argument hex 80001 and when we run this binary inside Defender's analysis engine we'll get this code coverage and we'll actually see that it will bounce off the block where that same immediate is compared and then we go down the true branch because the immediate that our code was doing
20:22
matches up with the immediate here in software and then they can emulate CPU ID by setting register state accordingly alright, moving into talking about my instrumentation which is a big enabler for the rest of my research so the problem with analyzing Windows Defender, again I said there's very little introspection it's very difficult to tell what's going on inside of it all you really get out of it is
20:41
virus identification, now you could exploit virus identification as sort of a side channel to extract information about inside the engine and indeed that's what I did with the AV leak project a couple of years ago was exploiting malware identification as a side channel to get information about what's going on inside various AV emulators but this is really slow and efficient so a smarter technique is
21:02
to go in and sort of give us a malware eyes view of what's going inside in the engine so MPEngine.dll has various functions that are invoked when various Windows API's are called by malware running inside of it and we can then hook those emulation functions and provide our own implementations so we can create a one or two way IO path to share information
21:22
with the outside and also in turn inform the malware binary inside about what actions we want it to take so let me give you a diagram of that this is the original load library diagram I showed you, this is Tavis Ormandy's tool, kind of in an unmodified state this is how it works and I went in and I hooked the G syscalls table, this is the table of
21:42
about 120 functions providing emulations for various Windows API's I hooked it and replaced those implementations with my own implementations of various common functions like output debug string A or win exec so when these functions are now called by our malware binary inside the engine instead our functions are invoked so here's an example of our output debug string A hook and the
22:02
process we have to take on which is resolving the relative offsets of these functions and then setting hooks in the read write execute DLL buffer kind of in our Linux process so what this looks like is this here in the top right we have our IDA pro disassembly or decompilation rather of Windows defenders emulation of output debug string A, basically a no op
22:22
all it does is retrieve a single parameter off the virtual stack and then bump the tick count so it bumps the time a little bit in the emulator here in the center of the screen I have my re-implementation of this function so we're going to walk through this step by step first off we have our declaration so this takes a void pointer PE vars T is a mass
22:40
of about half megabyte large structure passed to all these windows API emulations we don't want to know an exact definition of that function so we just provide take a void pointer just say we're not going to worry about it it's just a pointer then we have this local thing to hold parameters to the function so the function has parameters passed to it in the virtualized emulated environment
23:00
and we want to interact with those so we have to make some space for them we're going to use a function internal to defender to pull off one parameter from the virtual stack so we're going in talking, you know, looking at the virtual ESP and EVP state in this virtual memory space and then pulling off the 4 byte value that was there I'm actually calling back into defender from my hook function to do that
23:22
then I'm calling a function get string that's going to translate a virtual address inside the emulator to a real address that we can interact with locally and now we can just print that string to standard out so this sounds like a lot but let me show you a quick demo of it in action so here I have a malware binary that's going to say hello DEFCON when we run it it's because output
23:42
you are storing a hello DEFCON we're now going to scan that binary inside my hooked and modified version of Tavis' load library tool and you'll see here it says hello DEFCON now going back to visual studio we're going to add a new line this is a live demo of course this is a pre-recorded video because the DEFCON organizers this year wanted us to do pre-recorded videos but I was doing this live
24:02
I just rebuilt the binary and here scanning it again it's now going to say hello DEFCON and then also this is live demo so this is what's happening inside the emulator our malware binary is calling this function and because we've hooked the implementation of the output debug string emulation in defender our function is being called instead
24:22
we're going to run it one more time I believe with some more information you can see here we have a more rich debug output and we can see things like the exact addresses passed to it from the virtual memory space so this is a big enabler for the rest of my research the fact that I had this sort of window into what's going on inside the emulator I can have my malware binary inside take observations and then post them out to the outside world
24:43
as far as my malware binary goes call it myapp.exe again that's the name of all binaries running inside defender's engine it does this IO communication with output debug string a and some other functions on the right side you'll see a list of factors that I found could impede emulation and the ways I get around them so I had to really massage the linker optimizations
25:02
imports in order to get binaries that were consistently emulated by defender and I'll be releasing some code at the end of this talk that will have a very simple visual studio project that I found I was able to get consistently emulated when scanned with load library finally as far as the reverse engineering goes moving into
25:20
the windows emulation and the windows environment I think the most interesting part of this presentation I'm going to start off by talking about the user mode environment so this is the emulation of a fake windows user mode so in windows defender there is a virtual file system as any real system would have a file system and files that malware might look at defender
25:40
virtualizes one there's about fifteen hundred functions on their virtual file system and you'll see a variety of things in there mostly it's fake executables that are there for malware binaries to for example infect or you know do different things too that could be indicators that they are in fact malicious binaries so I'll do a quick demo of dumping the file system
26:00
again using that mechanism that I showed you of posting data out with output debug string a we're able to enumerate the entire file system and dump it in just a few seconds I did here actually use a slightly more sophisticated hook whereas doing winexec and I'll show some examples in my backup slides it's not as simple as just output debug string a-ing them but you can see here in just a second or two we dump
26:21
the entire virtual file system from inside windows defender read a malware binary go inside there enumerate all the files that it could see and then dump them out and when we after we dump them out we see that there's about fifteen hundred of them in this virtual file system and you'll see things like this the word GOAT repeated thousands of times over in a file called AAA TouchMeNot.exe
26:42
my intuition is that this binary is right there on the C drive and it's there so that a malware binary might read that file in and say send it over the network or encrypt it or do some some indicator that we are indeed malware so maybe if you touch it that might be an indicator that you're malicious the reason it has the GOAT
27:00
GOAT the word GOAT pasted thousands of times over presumably it is a GOAT file that's sort of an AV industry term for a sacrificial file like a sacrificial goat that you can let get infected or changed or encrypted by malware in order to have the malware kind of show its true intent so that was an interesting artifact again this is also something that you could write malware that says if I see the word GOAT thousands of times over in a file
27:22
called AAA TouchMeNot I know I'm running inside Defender therefore I'm not going to run I'm not going to be anything malicious we'll see fake config files you can see that these are very clearly written by a real human with comments like blah blah and generic SQL queries we have a virtual registry that has thousands of entries and a numerating whole registry dumping that out we'll see things like
27:42
this so for example there's a registry entry for World of Warcraft presumably there's malware that maybe looks for World of Warcraft registry entry and touches it so if we saw a call to say regopenkey on World of Warcraft that might be an indicator of potential malicious intent we'll see various other fake processes running on the system
28:02
and these are not real processes they're just when you call you know the callback function to numerate all processes it'll give you this fake listing and highlighted at the bottom in yellow there is our function myapp.exe quick demo of that dumping the process listing again using this same mechanism that I developed
28:28
so there you can see real time just took less than a second we dumped the entire process listing alright back to the presentation in addition to this environment we have Windows User Mode code
28:40
that runs to provide emulations of various Windows API functions and there are generally two types of Windows API emulations akin to those Windows API functions the real Windows system there are those that stay in user mode which are ones that stay in the emulator and those that resolve into a syscall just like a trap to a native emulation here in defender. Symbols indicate
29:02
that these emulated virtual DLLs that are in the emulator environment are called VDLLs and because they are simply DLLs once we have a file system dump we can just go reverse that dump or reverse those DLLs by throwing them in IDA and they're standard Windows PE files when we look at them they're definitely not the real things like kernel 32 that you would see in a real system
29:22
so we'll see things like this in kernel 32 if we call get username it will return a hard code string of John Doe this is again something we could use to create a VSM malware that says if I see the username John Doe I'm not going to run we'll see a computer named Hal9000 ostensibly a Arthur C. Clarke Space Odyssey
29:42
2001 reference so again you could write malware that looks for Hal9000 or know you're running inside defender we'll also see very simple limitations of functions like RTL get current PEB all that function takes is it needs to just go grab a memory segment at FS18 so they actually support memory segmentation at the architectural level so they can just do that
30:01
actual instruction inside the emulator or we'll see complex functions like RTL set cycle security descriptor just knocked out they just return 0 and more functions just stubbed out 0, negative 1 and so forth they're just triggering an interrupt so lots of complex functions are not fully emulated by defender we'll also see things like this
30:21
again more unique strings and identifiers that we know we're running inside defender like these German IP addresses and references to German websites maybe a German programmer developed this particular DLL emulation so that covers some of the user mode code and the very simple emulations those that just return hard coded names like John Doe or Hal9000
30:41
how about the user kernel privilege boundary and how do we get into more complex simulations such as those requiring access to a virtual file system these functions are implemented with a hyper call like instruction called API call this is of course not a real x86 instruction with the opcode 0FFFF0 and then a 4 byte
31:01
immediate describing the particular function to be invoked but when this instruction is called and the virtual CPU it's going to generate a call into a native MPEngine.dll function that provides emulation of these unique functions so these are complex functions that modify system state or may require particularly complex handling and so
31:20
in copy file worker we have an API call to kernel32 copy file worker the virtual CPU sees that instruction generates a call directly into this emulation of that function and then it's emulated there in software in MPEngine.dll this is great attack surface if you found any vulnerabilities in these native emulation functions you could use these to break out of the
31:40
emulator and infect the native host this disassembly here is provided by an IDA processor module and I'll have an article coming out in POCR GTFO issue 19 describing exactly how this IDA processor extension module works so once we have these API call instructions running they're going to trigger a call to a function that looks at the G syscalls
32:00
table which is a big table of these function pointers and these hashes that's going to look for the 4 byte immediate that was called from the API call instruction and then dispatched to the appropriate function that matches up with it so kind of a workflow of what this looks like inside the emulator here we have kernel32 output debug string A it's going to do things like log the number of times it was called so if it's called more than 900
32:22
times that might trigger some unique behavior but ultimately it's going to resolve down into this function API call kernel32 output debug string A which is then going to use the API call instruction you can see the 0FFFF0BB1480 B2 it's going to see that instruction and then the hyper call is going to step in and basically transition us into native emulation out of
32:42
this managed dynamic translation context and we're going to hit the native emulation for output debug string A of course this is what we hooked when we had our own output debug string A implementation that I was using to post information out of the emulator enumerating the emulated functions that have native emulations these are them the yellow functions are those that are not found
33:02
on real Windows systems so they're specific to Defender for example for debug functionality or unique backdoor management here's more of them including a number of VFS functions which are for low level access to the virtual file system so all these native emulation functions take a P vars T a very large half megabyte large structure containing everything about
33:21
a given emulation context and then we have templated parameters functions that are used to retrieve parameters to the function from the emulated stack and then programmatic APIs for manipulating return values register state the CPU tick count or time all that sort of stuff can be programmatically managed through manipulations the P vars T structure
33:41
virtual memory can be interacted with with a API similar to that found in many emulation engines such as unicorn engine where we can memory map virtual memory into our real memory space and manipulate it there and there are wrapper functions for common operations like reading a single byte writing a single D word reading or writing wide strings or regular char stars these are all
34:02
have kind of these utility functions wrapped around them to make them easier for developers moving into kernel internals so we've talked about the user mode code we've talked about how the user mode code gets into kernel mode or the native emulations let's look at how those native emulations are themselves implemented so the windows kernel provides a number of facilities
34:22
to any binary this is n plus kernel dot xc and associated drivers and these are really the core of the windows os or the nt kernel these include examples like the object manager process management file system access the registry through registry hives and synchronization primitives for IPC
34:40
first off we're going to talk about the object manager this is an essential part of the windows executive that provides management for handles so anytime you are opening a file a socket so forth it's going to go through the object manager and defender supports five types of objects with its object manager so these are file, thread, event, mutant
35:01
which is a singular of mutex and semaphore and these are stored in the big object manager map here in mp engine dot dll they're stored in memory as c++ objects and they all inherit from a common parent class object manager object we then have subclasses like file object or mutant object and you can see I've made a little larger for the font the unique
35:21
traits to those particular c++ objects such as the m file handle thing in the file object or the weight count variable for a mutex if various processes can wait on a given mutex c++ rti is used to cast between these subclasses to their parent class when they're retrieved
35:42
and the object manager can be interacted with programmatically by these various functions so if we open a mutant they're going to grab that object and then mess with it if we open a file object it's actually called object manager get file object which we'll use which we'll first check the type and then explicitly use rtti to cast to a file object
36:01
and fail if the retrieved handle is not indeed a file handle we'll also see things like the pseudo handle for current process is emulated as hex 1234 again a treat of the emulator we could use to write evasive malware based on seeing that our own handle is 1234 we have a virtual file system that provides emulation and access to
36:21
a file system and this is accessed through the standard ntdll ntrite file ntcreate file and so forth APIs as well as these lower level VFS functions which provide sort of a backdoor unsanitized access to the file system emulation finally moving into talking about AV instrumentation so all the heuristics
36:41
and analyses the AV is doing throughout the run time so there are some internal functions that are exposed through the hyper call API call interface and I've summarized them here and we're going to look at a few of these first off MP report event which is used to communicate information about malware binary actions with defenders heuristic detection engine so these are in some of these user mode emulations such as
37:02
get user name or get computer name those don't require trapping into a full native emulation and that would increase the attack surface greatly if they all did but we do want to inform defender that the given function was called so if get system directory is called it'll report event 1233 one or if you create a process and you do it suspended it'll do hex
37:21
3018 but it'll say create suspended specifically noting that a process like that was created MP report event can be called in more cases you can see here just more examples this is called thousands of times throughout these VDLLs and a more concrete example of how this might play into AV identification of potentially malicious binaries
37:42
is here where we see that if we call terminate process on a PID in the 700 range which you'll note that all these various AV processes are in the 700 range it'll trigger a call to MP report event 12349 but it'll also say AV so if you try to terminate process on an AV that's probably a good indicator you're
38:01
NT control channel is sort of a backdoor interface for administering the engine this is something Tavis Ormondy hit and I went here and reverse engineered the 32 switch case options of this function and showed you what they all do so these do things like manipulate the rewrite microcode, manipulate register state all sorts of stuff great attack surface and definitely something that shouldn't be open
38:21
to malware binaries running inside the emulator we're going to include by talking about vulnerability research start off by trying to understand some prior vulnerabilities discovered by Tavis Ormondy at Google Project Zero so Tavis discovered this API call instruction that I talked about and he was able to call directly into native emulations of functions
38:42
rather than passing through their API call stubs by just generating the API call instructions on the fly as you can see here and then Tavis was hitting internal debug functions like NT control channel which when you give it option hex 12 it goes to rewrite microcode and this code here lets the user specify the count in a tight
39:01
loop and with the user specified count we only have I think 1000 elements allocated for the new microcode information but the user can give and say 2000 and we have a linear buffer overflow Microsoft patched this by adding a check that the count is no greater than 1000 and if it is it returns zero it doesn't run Tavis also looked at the virtual file system
39:21
and by calling directly into these unsanitized functions to access the virtual file system was able to basically get a linear heap read and write primitive by creating a file with these strange sizes and this sequence of calls could crash the engine with an out of bounds write
39:41
now I looked at the mitigations that Microsoft put in for the abuse of the API call instruction which were primarily that Tavis himself was generating the API call instruction on the fly from the malware.text section and then Microsoft added a check that says is the call to the API call instruction is it coming from a VDL page and if it's not it's going to deny the user the ability
40:01
to invoke a native emulation function this means that these API call instructions can only be invoked from code pages that are associated with a given VDL that cannot be called from the malware binary and in fact if you call them it'll do MP set attribute which will basically set a heuristic that you tried to call the API call instruction from your .text section this is really really weird
40:21
probably a strong indicator of malicious intent and I found that I could bypass this mitigation by simply finding the API call stubs in memory in our VDLs which I can reverse engineer and that can just bounce off the API call instruction and hit this interface these interfaces with my own controlled arguments this is not good I did report this to Microsoft
40:41
and they told me this is not a trust boundary kind of a classic Microsoft response to a lot of vulnerability disclosures but that's not quite a trust boundary unless you actually found an actual vulnerability like actual buffer overflow in there the fact that there's this logical flaw that I can hit internal debug interfaces and do things like stop emulation right then and there or change microcode in the emulator
41:01
that's evidently not a vulnerability according to Microsoft So an example of a bypass here doing something pretty benign just we're going to hit output debug string a so I found in kernel 3.2 the offset of output debug string a and I can resolve that address and then treat that as a function pointer and just bounce off this emulation and when this runs we hit output debug string a
41:21
now more maliciously we can sort of hit NT control channel again that internal debug interface left in by developers maybe debug or administer the engine and we can set our own heuristics like for example if we call a viral body found it will trigger immediate malware detection so a quick demo of that
41:41
so in this video you can see we're calling output debug string a in the legitimate way and then calling it with our output debug string a of use through this unintended interface kind of left there in the VDL code page once we run and compile this binary and we'll also hit NT control channel as well and we're
42:02
going to use NT control channel to check the exact version number of the engine and this was done in the February 2018 build of the engine so with our kind of ret2 API call technique we run this binary and we'll see we hit budget bug string a the normal way then through the API call with kind of the bypass for Microsoft's mitigation so we have a controlled argument going into there and we also show that we can hit NT control channel with a controlled
42:22
argument as well now again the implications of this is we can hit these internal debug interfaces with attacker controlled arguments probably not a good idea finally I want to talk a bit about fuzzing so I was able to then fuzz emulated API's basically working out some more complex mechanisms to allow our channel to be
42:42
a two way IO channel not just an output channel I took MWR labs OSX kernel fuzzer which generate random values to fuzz the OSX kernel and I folded that in with my code on generating random values at each time and then I post those into the emulator and I was able to do things like fuzz NT write file and actually reproduce Tavis's crash but in
43:02
a unique way that got around the sanitization that NT write file normally does I repaired his crash in VFS right but through NT write file without having to abuse the API call instruction you can see in this demo here we're going to do that we're going to resolve the address of NT write file and then fuzz that and this whole mechanism here with the params this is a more complex
43:22
interface that I have for passing information in and out of the emulator and basically in the outside of the emulator we're generating fuzz input to give to inside of it and we're calling NT write file with those fuzz parameters and seeing what happens
43:42
so running this you're going to see just run for quite a while it's just going to keep running in my experience it took about seven minutes running single threaded around 8,000 system calls per second to reproduce Tavis's crash again this is not a smart fuzzer there's no AFL there's no code coverage information it's just a dumb random values at Windows Defender in order to fuzz it
44:05
there's our demo and moving into the conclusion we covered tooling and instrumentation CPU emulation basics for x86 binaries and a bit of vulnerability research and fuzzing for Windows Defender we didn't cover a whole lot of other stuff for example x86, x64
44:21
emulation arm emulation VM protect emulation the 16 bit emulation there is a full DOS emulator aside from the Win32 modern Windows system emulator there's a 16 bit emulation built into defender really interesting attack surface as well probably not as well looked at as the 32 bit one we didn't look at the threading model how you could do
44:41
multi-threading for binaries inside emulators that's always a source of problems for AVA emulators at large so worth looking at we're also analysis for .NET binaries we're primarily looking at Windows PE binaries that are just compiled x86 code also inside MP Engine we have unpackers, parsers JavaScript engine which you can see in my recon Brussels talk other scanning engines
45:01
and .NET engine now I want to say that people love to talk about AVs and what they can and can't do where they weigh or may not be vulnerable but there's not a lot of ground truth about AVs in the public and I think there should be more I think they're a really fascinating target to analyze I think they're a lot of fun I think this is much more interesting to me at least than looking at malware actually seeing how malware gets caught and mitigated
45:22
and detected and you also learn a whole lot about say NT kernel internals and object managers and things like that it gives you an impetus to look at all these different technologies a lot of claims about AV vulnerabilities and how they may or may not be vulnerable are based on Tavis Armondi's work and a bit on Hoi Xian's work but there's really not a whole lot out there
45:40
I really like this tweet from Hoi Xian where he said if you Google antivirus internals all you find is me, him and then Tavis Armondi I would say if you like this sort of work definitely grab a copy of his book, it's an awesome book and really under appreciated by people just some really incredible work that went into that I'll be releasing some code later, here's my github I'll also tweet about this so you don't have to take
46:02
a picture of the slide but I'll be sharing some of the harnesses that I built an IDA disassembler for the API call instruction I'll also be publishing an article in POCR GTFO issue 19 describing more of this, some of the more technical details of some of these technologies and that concludes the presentation I'll have a whole lot more slides being released online after this, this is only about 50% of the material
46:22
that I prepared for today my javascript slides are available there at that bit.ly link again I want to thank all my friends Tavis, Marcus and then numerous friends who helped me edit this presentation and get it here at DEFCON hit me up on twitter if you have any questions I have open DMs, thanks very much