We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Fuzzing Malware For Fun & Profit. Applying Coverage-Guided Fuzzing to Find Bugs in Modern Malware

00:00

Formal Metadata

Title
Fuzzing Malware For Fun & Profit. Applying Coverage-Guided Fuzzing to Find Bugs in Modern Malware
Title of Series
Number of Parts
322
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Practice shows that even the most secure software written by the best engineers contain bugs. Malware is not an exception. In most cases their authors do not follow the best secure software development practices thereby introducing an interesting attack scenario which can be used to stop or slow-down malware spreading, defend against DDoS attacks and take control over C&Cs and botnets. Several previous researches have demonstrated that such bugs exist and can be exploited. To find those bugs it would be reasonable to use coverage-guided fuzzing. This talk aims to answer the following two questions: ___ we defend against malware by exploiting bugs in them ? How can we use fuzzing to find those bugs automatically ? The author will show how we can apply coverage-guided fuzzing to automatically find bugs in sophisticated malicious samples such as botnet Mirai which was used to conduct one of the most destructive DDoS in history and various banking trojans. A new cross-platform tool implemented on top of WinAFL will be released and a set of 0day vulnerabilities will be presented. Do you want to see how a small addition to HTTP-response can stop a large-scale DDoS attack or how a smart bitflipping can cause RCE in a sophisticated banking trojan? If the answer is yes, this is definitely your talk.
MalwareStudent's t-testMathematical analysisBinary fileReverse engineeringSystem callPlug-in (computing)CodeKolmogorov complexityVideo gamePresentation of a groupLevel (video gaming)Maxima and minimaAreaThomas BayesQuicksortEndliche ModelltheorieCuboidBitMereologyWeb pageCore dumpFuzzy logicGoodness of fitMathematical analysisMalwareSoftware bugSource code
CASE <Informatik>MereologySource codeDirection (geometry)MalwareComputer fileMultiplication signDifferent (Kate Ryan album)Software bugSharewareFuzzy logicSampling (statistics)CuboidOpen sourceForcing (mathematics)Computer virusPlanningComputer animation
WordPrice indexCodeSample (statistics)Complex (psychology)Software bugSampling (statistics)CuboidPoint (geometry)Integrated development environmentDemosceneCellular automatonEndliche ModelltheorieTelecommunicationGroup actionOffice suiteForcing (mathematics)Server (computing)Local ringCASE <Informatik>ComputerCommunications protocolInformation securityPartial derivativeFile formatMalwareAlgorithmCodeParsingTwitterComputer animation
Mathematical analysisMalwareServer (computing)Digital signalInternet forumInformation securityContext awarenessSoftwareMalwareFlow separationSoftware bugHacker (term)Field (computer science)CuboidBoss CorporationComputer animationSource code
MalwareNP-hardPresentation of a groupPlanningInsertion lossAttribute grammarCuboidSampling (statistics)Endliche ModelltheorieSemiconductor memoryMalwareSoftware bugTime zoneComputer animation
WindowService (economics)Complex (psychology)Software bugMalwareOperator (mathematics)Virtual machineComputer fileSampling (statistics)Semiconductor memoryMereologyOperating systemSpeech synthesisXML
Denial-of-service attackService (economics)Control flowTrailPhysical systemCrash (computing)Dependent and independent variablesGame controllerGroup actionBitDiscrete element methodWhiteboardCASE <Informatik>Spring (hydrology)ExistenceInformation securityDemosceneRobotRemote procedure callSoftwareDenial-of-service attackVulnerability (computing)Code
MereologyInformation securityTorvalds, LinusAsynchronous Transfer ModeSoftware bugFormal languageCodeRemote procedure callSemiconductor memoryCuboidBoss CorporationLevel (video gaming)Line (geometry)SurfaceOpen sourceSoftware developerInformation securityVideo gameProjective planeSoftwareTraffic reportingProduct (business)Cycle (graph theory)Computer animation
Crash (computing)Computer programNP-hardoutput
Black boxPoint (geometry)CASE <Informatik>Computer programCompilation albumWorkstation <Musikinstrument>Presentation of a groupGoodness of fitReal numberCuboidMultiplication signRow (database)Electric generatorBoss CorporationCodeSoftware testingSoftware bugMessage passingSource code
CASE <Informatik>Multiplication signCodePointer (computer programming)Forcing (mathematics)AdditionState of matterPoint (geometry)Direction (geometry)
Core dumpKernel (computing)Level (video gaming)Binary fileCompilerCoroutineBlock (periodic table)Source codeSystem callWindowBinary codeInjektivitätKernel (computing)Traffic reportingResultantStress (mechanics)Compilation albumDrum memoryCASE <Informatik>CoroutineFunctional (mathematics)Block (periodic table)Source codeComputer animation
Source codeFerry CorstenBit rateOpen sourceCASE <Informatik>Multiplication signMereologyRun time (program lifecycle phase)CodeMalwareSource codeSubsetTelecommunicationSampling (statistics)CuboidDiagramComputer animation
MalwareEncryptionCryptographyCoprocessorLibrary (computing)AerodynamicsBinary fileSource codeOpen sourceSource codeDynamical systemSoftware frameworkBinary codeSoftware testingCASE <Informatik>MalwareSoftware bugSystem callSampling (statistics)Open sourceFrame problemComputer animation
Binary fileRun time (program lifecycle phase)CodePhysical systemPoint (geometry)Read-only memoryCartesian coordinate systemInjektivitätBinary codeRun time (program lifecycle phase)Multiplication signSpywareFerry CorstenBlock (periodic table)Transformation (genetics)Classical physicsControl flowHookingCodePoint (geometry)Computer programBefehlsprozessorLibrary (computing)Cache (computing)Traffic reportingTask (computing)Game controllerConstructor (object-oriented programming)Computer animation
Source codeEncryptionRead-only memoryModule (mathematics)Server (computing)Client (computing)EstimationCoroutineComputer networkLibrary (computing)UDP <Protokoll>Default (computer science)CryptographyAddress spacePrototypeFunction (mathematics)OvalFreewareSuite (music)Software testingSampling (statistics)Library (computing)Dependent and independent variablesCASE <Informatik>Key (cryptography)Computer fileElectric generatorSpecial unitary groupWhiteboardComplex (psychology)LogicNumbering schemeEncryptionEndliche ModelltheorieOpen sourceFunctional (mathematics)Reduction of orderBoss CorporationDefault (computer science)IP addressOffice suiteView (database)TelecommunicationMalwareBitSemiconductor memoryFuzzy logicCodeEstimatorComputer animationProgram flowchart
SharewareComputer animation
Inclusion mapOscillationPoint (geometry)Office suiteVirtual machineRevision controlCodeDependent and independent variablesSoftware testingMalwareHTTP cookieOpen setStandard deviationTelecommunicationDigital electronicsConnected spaceWindowInternetworkingWeightParsingLogicPlastikkarteInformationVariable (mathematics)CASE <Informatik>Functional (mathematics)Server (computing)Internet service providerDemosceneComputerWeb browserLibrary (computing)Goodness of fitRadical (chemistry)Bit rateFerry CorstenStreaming mediaSampling (statistics)Source codeComputer animation
Library (computing)Forcing (mathematics)Parameter (computer programming)Network topologyCASE <Informatik>Stability theorySampling (statistics)2 (number)Address spaceIterationTouchscreenMalwareStandard deviationBinary codeDirectory serviceComputer file
Theory of relativityParallel portStability theoryVirtual machineSpeech synthesisPower (physics)Physical lawSource codeComputer animation
CuboidPolarization (waves)Sampling (statistics)Block (periodic table)Right angleMalwareSoftware bugComputer animation
TwitterInternetworkingDenial-of-service attackElement (mathematics)Fluid staticsScaling (geometry)MalwareMereologyDenial-of-service attackWeb serviceCuboidDifferent (Kate Ryan album)CodeSource codeFreewareInternetworkingSemiconductor memoryHost Identity ProtocolBitComputing platformDialectBuffer overflowInformation securityDivisorFormal grammarRight angleFlow separationTendonLevel (video gaming)Open sourceComputer animation
EmailDependent and independent variablesParsingExploit (computer security)Level (video gaming)Denial-of-service attackDependent and independent variablesMereologyOperator (mathematics)ParsingTerm (mathematics)Flow separationLine (geometry)SoftwareLogical constantPoint (geometry)String (computer science)Semiconductor memoryComputer animation
Graphical user interface9K33 OsaData typeContent (media)Computer fileSingle-precision floating-point formatDependent and independent variablesStandard deviationUniform resource locatorSoftware bugCrash (computing)RootBoss CorporationContent (media)CASE <Informatik>Rule of inferenceSource codeComputer animationProgram flowchart
Uniform resource locatorTheory of relativityCrash (computing)Variable (mathematics)Dependent and independent variablesSemiconductor memoryBranch (computer science)CASE <Informatik>Parameter (computer programming)Row (database)WhiteboardView (database)Point (geometry)Computer animation
Form (programming)Data typeGraphical user interfaceContent (media)Crash (computing)CASE <Informatik>Software testingRobotDependent and independent variablesCrash (computing)Slide ruleMereologyServer (computing)Scripting languageRadical (chemistry)Asynchronous Transfer ModeCurvatureBuildingInformationVirtual machineRandomizationSampling (statistics)Different (Kate Ryan album)Process (computing)Demoscene2 (number)Right angleQuicksortConnected spaceVideo gameBlogDivisorResultantSource code
MalwarePrice indexDaylight saving timePoint (geometry)Extension (kinesiology)Different (Kate Ryan album)Sampling (statistics)Real numberCommunications protocolRevision controlCuboidRadical (chemistry)Software bugDependent and independent variablesTelecommunicationHTTP cookieLecture/ConferenceComputer animation
Process (computing)CASE <Informatik>Projective planePhysical lawGame controllerMoment (mathematics)Uniform resource locatorCodeCausalitySoftware bugBuffer overflowMalwareRemote procedure call
EmailTrojanisches Pferd <Informatik>Web browserProbability density functionProxy serverParsingInjektivitätWeb browserWeb 2.0Trojanisches Pferd <Informatik>Computer fileComputer wormFunctional (mathematics)DampingStructural loadServer (computing)ParsingDependent and independent variablesForcing (mathematics)Representation (politics)Group actionSuite (music)MereologyForm (programming)Computer animation
Link (knot theory)Electronic visual displayComputer fileCrash (computing)ResultantBoss CorporationRouting2 (number)Crash (computing)RecursionCASE <Informatik>InfinityCausalityRootDependent and independent variablesStack (abstract data type)Buffer overflowComputer animationProgram flowchartSource code
RootkitCommunications protocolModule (mathematics)ParsingSource codeTrojanisches Pferd <Informatik>Computer fileElectronic data interchangeHill differential equationGame controllerOpen sourceQueue (abstract data type)Source codeParsingTrojanisches Pferd <Informatik>Dependent and independent variablesSoftware bugBoss CorporationCommunications protocolQuicksortVideo gameSuite (music)Computer animation
Dependent and independent variablesCrash (computing)Computer wormTrojanisches Pferd <Informatik>Point (geometry)Content (media)BitMultiplication signComputer animationDiagramProgram flowchart
String (computer science)WordIntegerContent (media)EmailString (computer science)Functional (mathematics)Buffer overflowLengthDependent and independent variablesCore dumpResultantLevel (video gaming)Negative numberComputer animation
Sampling (statistics)Semiconductor memoryEmailNegative numberFormal languageAverageGame controllerCASE <Informatik>Computer animation
Server (computing)Codierung <Programmierung>Heat transferWeb browserCrash (computing)Function (mathematics)EncryptionReverse engineeringCASE <Informatik>LengthCrash (computing)Block (periodic table)Content (media)Multiplication signAirfoilGroup actionInsertion lossReal numberComputer programFunctional (mathematics)Sampling (statistics)Stability theoryOpen sourceComputer fileCoroutineMereologyEncryptionSystem callReverse engineeringBoss CorporationCodeSource codeDifferent (Kate Ryan album)AlgorithmSoftware bugMalwareSoftware testingComputer animation
Open sourceSystem callSimilarity (geometry)MalwareInformationSampling (statistics)Letterpress printingOpen sourceMalwareCoroutineWindowInformationSystem callComputer programComputer fileStandard deviationStability theorySource codeComputer animation
Function (mathematics)Stability theoryCodeVisualization (computer graphics)Denial-of-service attackMalwareComputer networkSoftware bugStability theoryFunctional (mathematics)Cartesian coordinate systemWindowHeuristicVisualization (computer graphics)SoftwareCodeFuzzy logicMalwareCuboidTheory of relativityEndliche ModelltheorieDegree (graph theory)Weight
Different (Kate Ryan album)YouTubeProjective planeSingle-precision floating-point formatWeightBranch (computer science)Reduction of order
Transcript: English(auto-generated)
So, hi, everyone. My name is Maxim Shudrak. Friends call me Max. I'm a researcher. This is the last DevCon talk this year, which is really sad, but we're going to talk about malware fuzzing in the next 45 minutes, which is cool. So let's go. First of all, I'd like to send DevCon goons who gave me a chance to speak on this stage. This
is a real honor for me and this is the first DevCon in my life ever. So I'm talking and this is first DevCon. I'm really excited and enjoyed to be here. Originally, I'm from Russia. I did more research in 2016. Had the chance to work on malware analysis in Israel for two years, and since 2018, I've been living in the Bay Area, so I'm
with you guys. As you can see, my background combines experience in malware analysis and fuzzing. So one day, I got an idea. Why not try to search for bugs in malware? Sounds like a crazy idea, but my good friend Jonathan Brostert, who actually inspired me
to present this talk, was excited. So this presentation is logically divided into three parts. In the first part, I'm going to explain why, where, and how we can search for bugs in malware. Why coverage guided fuzzing is the best technique to search
for these bugs. What kind of problems we have to address to be able to find bugs in malware. And of course, I'm planning to show a demo my Pfizer implemented on top of VNFA. In the last part, I'll show you several cool zero-day I found in different malicious samples and explain future directions of work. So before I actually
start phizing binaries, I decided to find and look over some leaked malware source code files. And just to understand that this idea makes sense, and it was to spend time on it. And guess what? Right in the one of the first source code files, I found this
comment in Russian, which can be translated in the following way. So I was really laughing for a couple of minutes. I thought, okay, looks like this idea makes sense, and I'm going to find some bugs in this sample and probably in another. By the way, it's a problem. So when they write malware, they have to do a lot of complex
things like initial infection, payload, delivery, and most importantly for us, communication with CNC server. There are a lot of things that can potentially go wrong here. So an ideal place for us to search for bugs would be some complex
parser of commands from CNC or some complex file format parser. While some samples leverage very trivial algorithms to communicate with CNC, there are a lot of samples that support really complex communication protocols implemented from scratch. Despite of this complexity, bad guys usually are rarely interested in
implementing secure code for many reasons, such as lack of time, expertise. So in most cases, we will not see things like SLR or any other anti-exploitation techniques, which is actually good for us. Sometimes the code is so badly
written that malware doesn't work if environment has slightly changed. This tweet actually explains it a lot. And of course searching for bugs in bad guys' code could not be boring. So I hope you think the same. So hacking back in general is pretty well-known research topic. There were a bunch of great talks at
DEF CON presented in the past. I can safely guess this idea has lived with hackers' community for decades. I just listed two of cool DEF CON talks presented last year. But what about phasing malware? Well, there are much,
much less publications in this field. Actually, there is no systematic research on this topic at all. I found several research papers published by academia. But the main goal of this research was to find and trigger some malicious code paths hidden in malware, using phasing, which is a bit opposite. In
this talk, I am focusing on bugs hunting and how we can use these bugs to defend against malware. So legal issues. It's less relevant for this talk. But anyway, I want to say that hacking back is in very, very deep gray zone.
There are a lot of questions with attack attributions, scopes of attack, and a lot of other things. But no one can stop us to search for bugs in malware. So it's obvious. Cool. We now understand our motivation and legal aspects. Let's say we found some bug in malware. What kind of benefits we
can get from that? Let's imagine that we found some memory corruption bug that lead to crash in some sample that is spreading around the planet. Such bug actually might be quite useful. I guess many of you remember a famous kill switch found in WannaCry, which significantly helped to slow down spreading of this sample. If you place one file with
a special name in one specific folder, WannaCry will not infect your machine. Of course, they left the kill switch on purpose. But if we can automatically or semi-automatically find some memory corruption, for example, in some complex file parser, we don't even need such
gifts from them. We can just place this file in our operating system, and malware will not infect this machine. Besides that, if we can somehow trigger such bug remotely, we would be able to do a lot of other cool things. Like we can stop malware from spreading on our network or
slow down it or shut down existing agents just by modifying network parties coming to and from CNC. It's especially cool if you can do that against botnet that are trying to perform a DDoS attack against us. For example, if bots have some vulnerability in the
victim's response parser, we just need to send our exploit back to bots, and it will cause a crash. Later in the demo, I'll show that actually it's possible. Well, it would be really great if you can trigger remote code execution here. Of course, we can take control over botnet or shut down existing agents or track down botnet
owners and do a lot of other cool things here. And, of course, our sweet dream bug is remote code execution on CNC. In this case, we have guard mode and can do everything. But in my opinion, it's less likely today because most CNC are written in memory-safe
languages like Python, PHP, Go, or any other. I don't see any reason to write it in C or C++. Okay. How can we search for these bugs? Today, Phasink is the most efficient technique to search for bugs in memory-safe languages. Actually, Phasink is very important for software security at all. Top tech
companies, huge open source projects who integrated Phasink within development life cycle, they all report that this technique improves security of their products. Linus Torvald recently said that Phasink actually improves security of Linux kernel, which is really cool. Okay. What's Phasink?
Phasink is actually a very simple technique. You will provide potentially invalid or malformed input to your program and monitor your program for crash. Nothing hard. So, you start your Phasink. Phasink generates input and sends this input into your program. All you need is to sit and pray that it will find some cool crash for you.
This picture actually precisely explains my feelings when my Phasink reports a new unique crash. I'm usually very happy, like really happy. Okay. What does coverage get at Phasink? Many years ago when Phasink was dumb and blind, Phasink considered the program as a black box, and to which we
sent our test cases. It usually worked pretty good for trivial bugs that located not deep in the code. People wanted to find more complex bugs deeper in the code, so they decided to instrument program on the test at compilation stand and provide this coverage back into our Phasink to be able to improve test case generation. So the best example of such Phasink is American Phasink
or FL. So, during coverage get at Phasink, if we manage to find a test case that triggers a new code pass in our program, the Phasink saves this test case and then performs subsequent mutation on top of this new finding. The same for the next code pass
and for the next, and this way we can touch much more code deeper in the program and find more bugs. In theory, of course, dumb-blind fuzzer at some point can also find this pass, but it can take a lot of time to find them. So the best example is this code. In case of coverage-guided fuzzer, it's
gonna take about several minutes to find this new pointer dereference, but in case of dumb-blind fuzzer, it can take years to find the same problem. So you see why coverage-guided fuzzer is really powerful, too. It can be so effective.
Okay. Today, there are two state-of-the-art fuzzers. It's AFL and libFuzzer. There are a lot of AFL-4s implemented on top of AFL. So, for example, I really recommend this kernel fuzzer KFL, and what is more important for us, there is WinFL, a port
of AFL for Windows binaries. Sorry. So AFL injects instrumentation routines during the compilation step. So the resulting binary will have this AFL maybe log function
injected in each basic block. In case of malware, we have one tiny problem. We don't have source code. So I guess it's not a surprise, right? Actually, we have even more problems. Malware usually unpacks and executes code, most important part of malware, dynamically
at run time. In this case, source code instrumentation is useless. We have to find some way to be able to provide back to our fuzzer coverage of such dynamically unpacked and executed code paths. And we can try some other tools and techniques of automatic malware. But in my opinion, it's less
scalable and works only for specific subset of samples. So besides that, if you want to search for bugs in CNC communication, we have to encrypt our test cases the same way as malware. So we have a lot of problems here. But thanks God, WinFL
doesn't need source code for binaries fuzzing. So we don't need source code of our sample. Instrumentation is implemented on top of dynamic binary instrumentation framework. What is dynamic binary instrumentation? I'll call it DBI. DBI is technique of analyzing the
behavior of a binary application at run time through the injection of instrumentation code. I just want you to give a basic idea how it works. Let's say we have our DBI engine and binary we want to instrument. At step one, DynamoRio launch this binary suspended, inject instrumentation library, hook and
report to redirect control flow into instrumentation library and resume execution. So at this point, it looks like traditional classic DLL injection and control flow hijacking. But at step four, start the magic. DynamoRio
takes the first basic block, copies it, and in special place called code cache, then it performs transformation of this basic block to be able to inject instrumentation routines, instructions specified by user. And then execute it in this special code cache. The most challenging stuff is to make this
execution transparent towards instrumented binary. And DynamoRio knows how to do this. It's really complicated task, and they're doing pretty good. So then we take next basic block, copy it in code cache, perform transformation, inject our instrumentation routines, execute it, take the next one, and so on until we reach exit point of our program. So you can
see, this way, we can instrument everything that executed on our CPU. So we had three challenges, lack of source code, obfuscation, and encryption. VNFL plus DynamoRio solves the first problem and actually creates a new one. VNFL supports only a file
based fuzzing. So we can't actually perform fuzzing of network traffic parsers, which is a very serious limitation for us. To address this problem, I decided to implement a patch of, a patch on top of VNFL and call it a netFL. Suppose we
have our fuzzer and our malware instrumented by DynamoRio in memory. Let's assume our sample sent some request to CNC. Instead of actually send it to CNC, we redirect this request to our fuzzer. Our fuzzer generates a new test case, encrypts this test case if it's necessary, and then sends this response
back into our sample. Then we update our coverage bitmap, triggered by this test case, estimate code coverage, provide this coverage back to our fuzzer, and fuzzer generates a new test case, restarts our sample or
target function in our sample, and repeats all previous steps. It sounds like a bit complex scheme, but it's actually pretty easy to use. All you need is to specify IP address, port to listen on, and seed file. That's it. The fuzzer will do all the rest for you. If you need to encrypt your
test case before sending them back into our sample, you can provide a path to your custom encryption library. NetFL will load this library and will use this function to encrypt test cases. If you don't like default CNC, you can define your own one. If, for example, you need to implement some
really complex communication logic with your target. Okay. Let's see how it actually works. So in
this virtual machine, I have released build of NetFL. But before actually start our fuzzer, let me explain the code we are trying to analyze. So it's Dexter, version 2. Malware designed to steal point of sales, designed to steal credit and debit card information from point of sale sales terminals. So in
this function, HTTP main, they have initialization. So they generate agent string. Then they open connection with CNC using standard Windows API functions. So they will use post request to send commands to
CNC. Then it gets some information about victim's machine and then actually sends this data via standard Windows API functions. If they successfully send this data, they call this function, get cookie. So let's define this function. In this function, they receive
commands from CNC coming back. We are browser cookies. So this function, internet get cookie, is used for this purpose. So it's very trivial function. You just call it and you get your command and you get your cookies in P command variable. So then they
perform some parsing of this command and then let's go back. If they manage to obtain it, and command starts with dollar, they are going to execute this command. So this way they implement communication with
CNC. So as you can see, we need to implement some non-trivial communication logic with our CNC. So we need to implement our custom CNC library. Okay. It's actually not hard. All we need is to define two
functions. So this is our custom CNC. We need to define CNC that receives server port to listen on. Then we have like standard initialization of these circuits. And then we call listen on this port. And we have to implement the second function. CNC run. It
receives test case from net AFL for each generated test case that net AFL wants to send in our sample. So in this case, we have to accept this connection, receive this data coming from our malware, generate our response. We have to save our malware that
everything is okay. We receive your request. This is your response. And you can get your cookies. Then we generate cookie. There is nice function from Microsoft Internet set cookie. We generate this cookie on top of, on top of data provided by net AFL. And that's
it. So this way we can compile this binary and provide pass to our Pfizer. Okay. This is command to run AFL. Net AFL. It looks like pretty long command. But actually it's not hard to understand. The first
argument defines pass to our custom CNC library. The second parameter is port to listen on. The third and fourth parameters standard AFL in and out directories. And in directory we should have seed file. Minus D specify
address of our dynamic binaries. Minus D is timeout. Then we have internal win AFL arguments. You can find detailed explanation on GitHub. And then we have really important argument for those iterations. This argument
tells net AFL how many iterations should pass before actually restart the whole target sample. So this parameter can directly affect your stability. So this is very important. So in my case, I choose 5000. It
works pretty well for me. So and the last parameter is pass to our malware. Okay. Let's run. So it's initialization. Everything loaded successfully. Initialized. And in a few seconds we will see standard AFL screen. Okay. Here it is. It's not
related. So as you can see, our Pfizer stats looks pretty healthy. We already found like two, six paths. We have coverage. Our stability looks pretty good.
97%. Execution speed is growing. But it's still very slow because we are running in a very slow virtual machine with no parallelization. But if you can leave our Pfizer like this for a couple of hours, we can find bugs. So if you leave our Pfizer for four
hours with no parallelization, it will easily find several bugs in our sample. Okay. Great. Let's see what I managed to find in malware. First malware I selected for
my experiment was Mirai. Mirai is malware that targets IoT devices and uses devices as a part of botnet for large scale and DDoS attacks. This malware was used in some of the largest and most disruptive DDoS attacks in history
which caused major Internet platforms and services to be unavailable for large amount of users in different regions of the world. In 2017, source code of Mirai was leaked and different Mirai like botnets adapted this code and still operating in the wild. The fun fact about Mirai that it looks like after even followed some
security practices and use memory and debugging tool electric fans to search for hip overflows and use after free box which is a bit unusual for malware. Mirai DDoS capabilities are based on HTTP flood and several low level
network attacks. The most interesting part for us in terms of exploitation would be this HTTP response parser. Mirai needs to parse HTTP response coming back from victims to be able to perform HTTP flood attack. This parser has about 830 lines of code, hundreds of potentially
dangerous operations with memory, pointers, strings, so this is a wonderful target for our fuzzer. As I said file, I decided to use this very basic HTTP response. I ran my fuzzer for 24 hours and managed to find 43 unique
crashes which was caused by a single bug in relative URLs handler. Execution speed was around 1,000 executions per second which is pretty good and fuzzer managed to find approximately 430 unique paths which is also pretty good. What was the root of this bug? If our HTTP
response contains relative URL, this branch is triggered. In case of incorrect relative URL, variable double I always equal negative values which cause a memory violation and crash. This is logical error. So after forgot to set up this
argument to zero, they use it in code yearly and forgot to set it in case of relative URLs. So this is one example of test case that caused a crash. So if you see that Mirai or Mirai like botnet is trying to attack you, you can just answer this bot with this HTTP response and
they will all crash. So let me show how it actually works. Okay. So in this virtual machine in the right terminal we have our in the right terminal we have our
HTTP server, our victim implemented in Python. So it's just a simple HTTP server. In the left part we have debug build of Mirai. Debug build because we want to see
what's actually going on and in debug build they're printing a lot of information which is useful for demo. Okay. Let's start our server. So I guess it's start and running. Before actually start Mirai, let me explain one thing. So before actually start HTTP flat attack, they
need to connect with CNC. I was too lazy to deploy actual CNC server so I decided to implement in this Python script CNC and just response like random data to our Mirai sample. And surprisingly for me it worked.
So after dozens of attempts it actually start HTTP flat. Which was good. So it was like really easy solution for me. So to be clear, my Python server are now doing two jobs. The first one is to be HTTP response server
victim. And the second one is to be a CNC for our Mirai to answer that everything is okay. I'm up. Let's start. So okay. Let's run. So you can see it start in debug mode. It tries to attempt to connect with CNC. And after dozens of attempts it will receive some
meaningful data and actually start HTTP flat. Okay. Here it is. So it's starting HTTP flat. It's sending HTTP request to our Python server and Python server is going to answer with this malicious HTTP response I showed in the previous slide. Okay. So here it is. We have
segmentation fall and crash. So profit. This way we can stop Mirai to attack us. Okay. Thank you. There's more bugs in different samples. So I already
presented this sample when I was showing that this is Dexter version two. The first version of Dexter was one of the first known botnets that targeted point of sale terminals. As I already said in the demo, Dexter
communicates with CNC or HTTP protocol via post request and receive commands over the response cookies. Actually in the case of Dexter, you don't need at all. The malware code is so badly written that you can just answer with a long command and it will crash. And
actually it's remote code execution. You can find actually. We see this URL. It has 255 bytes. We can send command longer than this 255 bytes and it will cause stack overflow and we can explore it. And as I said before, there is no anti-exploitation techniques.
It's like 90s. We can easily cause remote code execution. So really old school bug. TinyNuke. TinyNuke is a zero style banking trojan designed to perform when the browser attack using Web Injects. It has a function load Web Injects. The name of this function
I guess is pretty self-explanatory. It is designed to ask CNC server to provide a payload to be injected in the victim's browser and then this response is represented in JSON format. It's then passed into our custom JSON parser. This JSON parser actually is very, very
complex. They implemented from scratch. So it's very good target for us. This is example of seed file I used to feed in net AFL and after 24 hours of fuzzing, I had pretty good results. Three unique paths, 800 executions per second and four unique crashes. The
root of these crashes is this function. It cause infinite recursion in case of very long response which contains only opening brackets. So in my case, it was like 7,000. So it will cause a very, very deep recursion, stack overflow and it finally will trigger
a crash in our target. So last example, KINZ. KINZ is a banking trojan fully based on leaked Zeus source code with some minor technical improvements. So all bugs I found in KINZ also related to Zeus. This example has been used to attack financial institution in
Europe. It has HTTP response parser. So I decided to use this simple HTTP response as a seed. Again, the trojan receives payload and commands from CNC or HTTP protocol. The HTTP response is then parsed by two complex routine analyze HTTP response and analyze HTTP
response body. And again, it's very interesting target for us. 24 hours of fuzzing get me 22 unique crashes which was triggered by one problem. This time, it was a bit more complex. So function get mime header is used
to extract a value of content length from HTTP response. But if the string contains a value, if the string doesn't contain a value of content length at all, it cause an integer overflow and as a result, the function will return negative values. So this get mime header will
return negative values. But then copy X is called with this value return by get mime header. So in my case, it was always negative three. It will try to override the whole memory layout of our sample. So if we try to copy with negative three, it means that we are
trying to override the whole memory layout. It's probably exploitable. We can sometimes control this negative value. But I'm not sure 100%. So this is example of crash case. We just need to send content length with block value. Okay. Let's discuss challenges and
issues. First of all, of course, in my case, I want to say that I skipped the hardest part. Reverse engineering and searching for target function. I took source code to reduce time of searching for a target. Of course, for real world samples, you need to perform
like initial reverse engineering of your sample and find this target function you want to analyze. Sometimes it can take a lot of time. Secondly, of course, there are bugs in MinFL, NetFL, especially when you deal with highly obfuscated samples. Third, you need to
find test case. So you need a seed file. Sometimes it might be really difficult, especially if CNC is down or you don't know what kind of file malware wants. So it might be challenging. Then we have a problem with encryption. So sometimes encryption algorithm is very complex. And I know it might be really painful to
perform reverse engineering of this algorithm. In this case, you can try to patch program just to disable encryption at all. It worked for me sometimes. And stability. Stability plays like critical role when you're doing coverage guided fuzzing. So if your
stability, if your coverage is always different when you send the same test case in your target, it means that your target is unstable and your fuzzer doesn't understand what's going on actually. So it's sending one file and it causes different code paths. It's strange and usually it means that you waste your
coverage guided fuzzer. So you have to pay attention for this. I also found this tool very useful when you want to find the target routine in your sample. I have implemented this tool on top of DynamoRio. It's basically a trace for Windows but
transparent towards instrumented program. So it will trace all API calls in malware and print this information in the file. So it's less detectable than standard API call tracer. So I hope, guys, it will work for you. You can try it. It's open source.
In future, I guess it would be really cool if you can somehow find this target function for our fuzzer automatically. Also, it would be great to increase stability. I implemented some heuristics to make it more stable. It worked for me. I hope
it will work for you. And it would be great to have some code coverage visualization tool for your fuzzer. I know there are some of them but it would be great to adapt it for net AFL. And, of course, improve stability. Okay. So I hope I convinced you that bugs in
malware might be useful and you can really find this bugs using fuzzing technique. Of course, net AFL can and should be used to find bugs in network-based application. So it's general purpose fuzzer. I designed it to be general purpose. You can use it to find bugs in benign software. So I
recently found one CVE in network-based application for Windows. So you can try it. And I'm also currently merging this project with VNFL just to reduce amount of different projects on this planet. So I guess I'm going to finish this work in two or three
weeks. So you don't need in future you will not need net AFL. Everything will be merged with original branch VNFL. And soon, probably in September, I'm going to release net AFL for Linux. I hope in September. Thank you for your attention.