Fuzzing Malware For Fun & Profit. Applying Coverage-Guided Fuzzing to Find Bugs in Modern Malware
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 322 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/39713 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
MalwareStudent's t-testMathematical analysisBinary fileReverse engineeringSystem callPlug-in (computing)CodeKolmogorov complexityVideo gamePresentation of a groupLevel (video gaming)Maxima and minimaAreaThomas BayesQuicksortEndliche ModelltheorieCuboidBitMereologyWeb pageCore dumpFuzzy logicGoodness of fitMathematical analysisMalwareSoftware bugSource code
01:07
CASE <Informatik>MereologySource codeDirection (geometry)MalwareComputer fileMultiplication signDifferent (Kate Ryan album)Software bugSharewareFuzzy logicSampling (statistics)CuboidOpen sourceForcing (mathematics)Computer virusPlanningComputer animation
01:57
WordPrice indexCodeSample (statistics)Complex (psychology)Software bugSampling (statistics)CuboidPoint (geometry)Integrated development environmentDemosceneCellular automatonEndliche ModelltheorieTelecommunicationGroup actionOffice suiteForcing (mathematics)Server (computing)Local ringCASE <Informatik>ComputerCommunications protocolInformation securityPartial derivativeFile formatMalwareAlgorithmCodeParsingTwitterComputer animation
03:29
Mathematical analysisMalwareServer (computing)Digital signalInternet forumInformation securityContext awarenessSoftwareMalwareFlow separationSoftware bugHacker (term)Field (computer science)CuboidBoss CorporationComputer animationSource code
04:31
MalwareNP-hardPresentation of a groupPlanningInsertion lossAttribute grammarCuboidSampling (statistics)Endliche ModelltheorieSemiconductor memoryMalwareSoftware bugTime zoneComputer animation
05:01
WindowService (economics)Complex (psychology)Software bugMalwareOperator (mathematics)Virtual machineComputer fileSampling (statistics)Semiconductor memoryMereologyOperating systemSpeech synthesisXML
05:57
Denial-of-service attackService (economics)Control flowTrailPhysical systemCrash (computing)Dependent and independent variablesGame controllerGroup actionBitDiscrete element methodWhiteboardCASE <Informatik>Spring (hydrology)ExistenceInformation securityDemosceneRobotRemote procedure callSoftwareDenial-of-service attackVulnerability (computing)Code
06:46
MereologyInformation securityTorvalds, LinusAsynchronous Transfer ModeSoftware bugFormal languageCodeRemote procedure callSemiconductor memoryCuboidBoss CorporationLevel (video gaming)Line (geometry)SurfaceOpen sourceSoftware developerInformation securityVideo gameProjective planeSoftwareTraffic reportingProduct (business)Cycle (graph theory)Computer animation
07:41
Crash (computing)Computer programNP-hardoutput
08:11
Black boxPoint (geometry)CASE <Informatik>Computer programCompilation albumWorkstation <Musikinstrument>Presentation of a groupGoodness of fitReal numberCuboidMultiplication signRow (database)Electric generatorBoss CorporationCodeSoftware testingSoftware bugMessage passingSource code
09:13
CASE <Informatik>Multiplication signCodePointer (computer programming)Forcing (mathematics)AdditionState of matterPoint (geometry)Direction (geometry)
09:40
Core dumpKernel (computing)Level (video gaming)Binary fileCompilerCoroutineBlock (periodic table)Source codeSystem callWindowBinary codeInjektivitätKernel (computing)Traffic reportingResultantStress (mechanics)Compilation albumDrum memoryCASE <Informatik>CoroutineFunctional (mathematics)Block (periodic table)Source codeComputer animation
10:28
Source codeFerry CorstenBit rateOpen sourceCASE <Informatik>Multiplication signMereologyRun time (program lifecycle phase)CodeMalwareSource codeSubsetTelecommunicationSampling (statistics)CuboidDiagramComputer animation
11:08
MalwareEncryptionCryptographyCoprocessorLibrary (computing)AerodynamicsBinary fileSource codeOpen sourceSource codeDynamical systemSoftware frameworkBinary codeSoftware testingCASE <Informatik>MalwareSoftware bugSystem callSampling (statistics)Open sourceFrame problemComputer animation
11:35
Binary fileRun time (program lifecycle phase)CodePhysical systemPoint (geometry)Read-only memoryCartesian coordinate systemInjektivitätBinary codeRun time (program lifecycle phase)Multiplication signSpywareFerry CorstenBlock (periodic table)Transformation (genetics)Classical physicsControl flowHookingCodePoint (geometry)Computer programBefehlsprozessorLibrary (computing)Cache (computing)Traffic reportingTask (computing)Game controllerConstructor (object-oriented programming)Computer animation
13:10
Source codeEncryptionRead-only memoryModule (mathematics)Server (computing)Client (computing)EstimationCoroutineComputer networkLibrary (computing)UDP <Protokoll>Default (computer science)CryptographyAddress spacePrototypeFunction (mathematics)OvalFreewareSuite (music)Software testingSampling (statistics)Library (computing)Dependent and independent variablesCASE <Informatik>Key (cryptography)Computer fileElectric generatorSpecial unitary groupWhiteboardComplex (psychology)LogicNumbering schemeEncryptionEndliche ModelltheorieOpen sourceFunctional (mathematics)Reduction of orderBoss CorporationDefault (computer science)IP addressOffice suiteView (database)TelecommunicationMalwareBitSemiconductor memoryFuzzy logicCodeEstimatorComputer animationProgram flowchart
15:05
SharewareComputer animation
15:33
Inclusion mapOscillationPoint (geometry)Office suiteVirtual machineRevision controlCodeDependent and independent variablesSoftware testingMalwareHTTP cookieOpen setStandard deviationTelecommunicationDigital electronicsConnected spaceWindowInternetworkingWeightParsingLogicPlastikkarteInformationVariable (mathematics)CASE <Informatik>Functional (mathematics)Server (computing)Internet service providerDemosceneComputerWeb browserLibrary (computing)Goodness of fitRadical (chemistry)Bit rateFerry CorstenStreaming mediaSampling (statistics)Source codeComputer animation
18:49
Library (computing)Forcing (mathematics)Parameter (computer programming)Network topologyCASE <Informatik>Stability theorySampling (statistics)2 (number)Address spaceIterationTouchscreenMalwareStandard deviationBinary codeDirectory serviceComputer file
20:25
Theory of relativityParallel portStability theoryVirtual machineSpeech synthesisPower (physics)Physical lawSource codeComputer animation
20:51
CuboidPolarization (waves)Sampling (statistics)Block (periodic table)Right angleMalwareSoftware bugComputer animation
21:21
TwitterInternetworkingDenial-of-service attackElement (mathematics)Fluid staticsScaling (geometry)MalwareMereologyDenial-of-service attackWeb serviceCuboidDifferent (Kate Ryan album)CodeSource codeFreewareInternetworkingSemiconductor memoryHost Identity ProtocolBitComputing platformDialectBuffer overflowInformation securityDivisorFormal grammarRight angleFlow separationTendonLevel (video gaming)Open sourceComputer animation
22:15
EmailDependent and independent variablesParsingExploit (computer security)Level (video gaming)Denial-of-service attackDependent and independent variablesMereologyOperator (mathematics)ParsingTerm (mathematics)Flow separationLine (geometry)SoftwareLogical constantPoint (geometry)String (computer science)Semiconductor memoryComputer animation
22:49
Graphical user interface9K33 OsaData typeContent (media)Computer fileSingle-precision floating-point formatDependent and independent variablesStandard deviationUniform resource locatorSoftware bugCrash (computing)RootBoss CorporationContent (media)CASE <Informatik>Rule of inferenceSource codeComputer animationProgram flowchart
23:16
Uniform resource locatorTheory of relativityCrash (computing)Variable (mathematics)Dependent and independent variablesSemiconductor memoryBranch (computer science)CASE <Informatik>Parameter (computer programming)Row (database)WhiteboardView (database)Point (geometry)Computer animation
23:50
Form (programming)Data typeGraphical user interfaceContent (media)Crash (computing)CASE <Informatik>Software testingRobotDependent and independent variablesCrash (computing)Slide ruleMereologyServer (computing)Scripting languageRadical (chemistry)Asynchronous Transfer ModeCurvatureBuildingInformationVirtual machineRandomizationSampling (statistics)Different (Kate Ryan album)Process (computing)Demoscene2 (number)Right angleQuicksortConnected spaceVideo gameBlogDivisorResultantSource code
26:34
MalwarePrice indexDaylight saving timePoint (geometry)Extension (kinesiology)Different (Kate Ryan album)Sampling (statistics)Real numberCommunications protocolRevision controlCuboidRadical (chemistry)Software bugDependent and independent variablesTelecommunicationHTTP cookieLecture/ConferenceComputer animation
27:08
Process (computing)CASE <Informatik>Projective planePhysical lawGame controllerMoment (mathematics)Uniform resource locatorCodeCausalitySoftware bugBuffer overflowMalwareRemote procedure call
27:47
EmailTrojanisches Pferd <Informatik>Web browserProbability density functionProxy serverParsingInjektivitätWeb browserWeb 2.0Trojanisches Pferd <Informatik>Computer fileComputer wormFunctional (mathematics)DampingStructural loadServer (computing)ParsingDependent and independent variablesForcing (mathematics)Representation (politics)Group actionSuite (music)MereologyForm (programming)Computer animation
28:26
Link (knot theory)Electronic visual displayComputer fileCrash (computing)ResultantBoss CorporationRouting2 (number)Crash (computing)RecursionCASE <Informatik>InfinityCausalityRootDependent and independent variablesStack (abstract data type)Buffer overflowComputer animationProgram flowchartSource code
29:03
RootkitCommunications protocolModule (mathematics)ParsingSource codeTrojanisches Pferd <Informatik>Computer fileElectronic data interchangeHill differential equationGame controllerOpen sourceQueue (abstract data type)Source codeParsingTrojanisches Pferd <Informatik>Dependent and independent variablesSoftware bugBoss CorporationCommunications protocolQuicksortVideo gameSuite (music)Computer animation
29:30
Dependent and independent variablesCrash (computing)Computer wormTrojanisches Pferd <Informatik>Point (geometry)Content (media)BitMultiplication signComputer animationDiagramProgram flowchart
29:57
String (computer science)WordIntegerContent (media)EmailString (computer science)Functional (mathematics)Buffer overflowLengthDependent and independent variablesCore dumpResultantLevel (video gaming)Negative numberComputer animation
30:24
Sampling (statistics)Semiconductor memoryEmailNegative numberFormal languageAverageGame controllerCASE <Informatik>Computer animation
30:53
Server (computing)Codierung <Programmierung>Heat transferWeb browserCrash (computing)Function (mathematics)EncryptionReverse engineeringCASE <Informatik>LengthCrash (computing)Block (periodic table)Content (media)Multiplication signAirfoilGroup actionInsertion lossReal numberComputer programFunctional (mathematics)Sampling (statistics)Stability theoryOpen sourceComputer fileCoroutineMereologyEncryptionSystem callReverse engineeringBoss CorporationCodeSource codeDifferent (Kate Ryan album)AlgorithmSoftware bugMalwareSoftware testingComputer animation
32:47
Open sourceSystem callSimilarity (geometry)MalwareInformationSampling (statistics)Letterpress printingOpen sourceMalwareCoroutineWindowInformationSystem callComputer programComputer fileStandard deviationStability theorySource codeComputer animation
33:24
Function (mathematics)Stability theoryCodeVisualization (computer graphics)Denial-of-service attackMalwareComputer networkSoftware bugStability theoryFunctional (mathematics)Cartesian coordinate systemWindowHeuristicVisualization (computer graphics)SoftwareCodeFuzzy logicMalwareCuboidTheory of relativityEndliche ModelltheorieDegree (graph theory)Weight
34:29
Different (Kate Ryan album)YouTubeProjective planeSingle-precision floating-point formatWeightBranch (computer science)Reduction of order
Transcript: English(auto-generated)
00:00
So, hi, everyone. My name is Maxim Shudrak. Friends call me Max. I'm a researcher. This is the last DevCon talk this year, which is really sad, but we're going to talk about malware fuzzing in the next 45 minutes, which is cool. So let's go. First of all, I'd like to send DevCon goons who gave me a chance to speak on this stage. This
00:21
is a real honor for me and this is the first DevCon in my life ever. So I'm talking and this is first DevCon. I'm really excited and enjoyed to be here. Originally, I'm from Russia. I did more research in 2016. Had the chance to work on malware analysis in Israel for two years, and since 2018, I've been living in the Bay Area, so I'm
00:46
with you guys. As you can see, my background combines experience in malware analysis and fuzzing. So one day, I got an idea. Why not try to search for bugs in malware? Sounds like a crazy idea, but my good friend Jonathan Brostert, who actually inspired me
01:04
to present this talk, was excited. So this presentation is logically divided into three parts. In the first part, I'm going to explain why, where, and how we can search for bugs in malware. Why coverage guided fuzzing is the best technique to search
01:21
for these bugs. What kind of problems we have to address to be able to find bugs in malware. And of course, I'm planning to show a demo my Pfizer implemented on top of VNFA. In the last part, I'll show you several cool zero-day I found in different malicious samples and explain future directions of work. So before I actually
01:43
start phizing binaries, I decided to find and look over some leaked malware source code files. And just to understand that this idea makes sense, and it was to spend time on it. And guess what? Right in the one of the first source code files, I found this
02:01
comment in Russian, which can be translated in the following way. So I was really laughing for a couple of minutes. I thought, okay, looks like this idea makes sense, and I'm going to find some bugs in this sample and probably in another. By the way, it's a problem. So when they write malware, they have to do a lot of complex
02:23
things like initial infection, payload, delivery, and most importantly for us, communication with CNC server. There are a lot of things that can potentially go wrong here. So an ideal place for us to search for bugs would be some complex
02:40
parser of commands from CNC or some complex file format parser. While some samples leverage very trivial algorithms to communicate with CNC, there are a lot of samples that support really complex communication protocols implemented from scratch. Despite of this complexity, bad guys usually are rarely interested in
03:05
implementing secure code for many reasons, such as lack of time, expertise. So in most cases, we will not see things like SLR or any other anti-exploitation techniques, which is actually good for us. Sometimes the code is so badly
03:22
written that malware doesn't work if environment has slightly changed. This tweet actually explains it a lot. And of course searching for bugs in bad guys' code could not be boring. So I hope you think the same. So hacking back in general is pretty well-known research topic. There were a bunch of great talks at
03:43
DEF CON presented in the past. I can safely guess this idea has lived with hackers' community for decades. I just listed two of cool DEF CON talks presented last year. But what about phasing malware? Well, there are much,
04:00
much less publications in this field. Actually, there is no systematic research on this topic at all. I found several research papers published by academia. But the main goal of this research was to find and trigger some malicious code paths hidden in malware, using phasing, which is a bit opposite. In
04:24
this talk, I am focusing on bugs hunting and how we can use these bugs to defend against malware. So legal issues. It's less relevant for this talk. But anyway, I want to say that hacking back is in very, very deep gray zone.
04:40
There are a lot of questions with attack attributions, scopes of attack, and a lot of other things. But no one can stop us to search for bugs in malware. So it's obvious. Cool. We now understand our motivation and legal aspects. Let's say we found some bug in malware. What kind of benefits we
05:01
can get from that? Let's imagine that we found some memory corruption bug that lead to crash in some sample that is spreading around the planet. Such bug actually might be quite useful. I guess many of you remember a famous kill switch found in WannaCry, which significantly helped to slow down spreading of this sample. If you place one file with
05:23
a special name in one specific folder, WannaCry will not infect your machine. Of course, they left the kill switch on purpose. But if we can automatically or semi-automatically find some memory corruption, for example, in some complex file parser, we don't even need such
05:41
gifts from them. We can just place this file in our operating system, and malware will not infect this machine. Besides that, if we can somehow trigger such bug remotely, we would be able to do a lot of other cool things. Like we can stop malware from spreading on our network or
06:03
slow down it or shut down existing agents just by modifying network parties coming to and from CNC. It's especially cool if you can do that against botnet that are trying to perform a DDoS attack against us. For example, if bots have some vulnerability in the
06:21
victim's response parser, we just need to send our exploit back to bots, and it will cause a crash. Later in the demo, I'll show that actually it's possible. Well, it would be really great if you can trigger remote code execution here. Of course, we can take control over botnet or shut down existing agents or track down botnet
06:42
owners and do a lot of other cool things here. And, of course, our sweet dream bug is remote code execution on CNC. In this case, we have guard mode and can do everything. But in my opinion, it's less likely today because most CNC are written in memory-safe
07:00
languages like Python, PHP, Go, or any other. I don't see any reason to write it in C or C++. Okay. How can we search for these bugs? Today, Phasink is the most efficient technique to search for bugs in memory-safe languages. Actually, Phasink is very important for software security at all. Top tech
07:23
companies, huge open source projects who integrated Phasink within development life cycle, they all report that this technique improves security of their products. Linus Torvald recently said that Phasink actually improves security of Linux kernel, which is really cool. Okay. What's Phasink?
07:42
Phasink is actually a very simple technique. You will provide potentially invalid or malformed input to your program and monitor your program for crash. Nothing hard. So, you start your Phasink. Phasink generates input and sends this input into your program. All you need is to sit and pray that it will find some cool crash for you.
08:02
This picture actually precisely explains my feelings when my Phasink reports a new unique crash. I'm usually very happy, like really happy. Okay. What does coverage get at Phasink? Many years ago when Phasink was dumb and blind, Phasink considered the program as a black box, and to which we
08:21
sent our test cases. It usually worked pretty good for trivial bugs that located not deep in the code. People wanted to find more complex bugs deeper in the code, so they decided to instrument program on the test at compilation stand and provide this coverage back into our Phasink to be able to improve test case generation. So the best example of such Phasink is American Phasink
08:43
or FL. So, during coverage get at Phasink, if we manage to find a test case that triggers a new code pass in our program, the Phasink saves this test case and then performs subsequent mutation on top of this new finding. The same for the next code pass
09:01
and for the next, and this way we can touch much more code deeper in the program and find more bugs. In theory, of course, dumb-blind fuzzer at some point can also find this pass, but it can take a lot of time to find them. So the best example is this code. In case of coverage-guided fuzzer, it's
09:23
gonna take about several minutes to find this new pointer dereference, but in case of dumb-blind fuzzer, it can take years to find the same problem. So you see why coverage-guided fuzzer is really powerful, too. It can be so effective.
09:41
Okay. Today, there are two state-of-the-art fuzzers. It's AFL and libFuzzer. There are a lot of AFL-4s implemented on top of AFL. So, for example, I really recommend this kernel fuzzer KFL, and what is more important for us, there is WinFL, a port
10:01
of AFL for Windows binaries. Sorry. So AFL injects instrumentation routines during the compilation step. So the resulting binary will have this AFL maybe log function
10:21
injected in each basic block. In case of malware, we have one tiny problem. We don't have source code. So I guess it's not a surprise, right? Actually, we have even more problems. Malware usually unpacks and executes code, most important part of malware, dynamically
10:42
at run time. In this case, source code instrumentation is useless. We have to find some way to be able to provide back to our fuzzer coverage of such dynamically unpacked and executed code paths. And we can try some other tools and techniques of automatic malware. But in my opinion, it's less
11:02
scalable and works only for specific subset of samples. So besides that, if you want to search for bugs in CNC communication, we have to encrypt our test cases the same way as malware. So we have a lot of problems here. But thanks God, WinFL
11:22
doesn't need source code for binaries fuzzing. So we don't need source code of our sample. Instrumentation is implemented on top of dynamic binary instrumentation framework. What is dynamic binary instrumentation? I'll call it DBI. DBI is technique of analyzing the
11:42
behavior of a binary application at run time through the injection of instrumentation code. I just want you to give a basic idea how it works. Let's say we have our DBI engine and binary we want to instrument. At step one, DynamoRio launch this binary suspended, inject instrumentation library, hook and
12:03
report to redirect control flow into instrumentation library and resume execution. So at this point, it looks like traditional classic DLL injection and control flow hijacking. But at step four, start the magic. DynamoRio
12:20
takes the first basic block, copies it, and in special place called code cache, then it performs transformation of this basic block to be able to inject instrumentation routines, instructions specified by user. And then execute it in this special code cache. The most challenging stuff is to make this
12:41
execution transparent towards instrumented binary. And DynamoRio knows how to do this. It's really complicated task, and they're doing pretty good. So then we take next basic block, copy it in code cache, perform transformation, inject our instrumentation routines, execute it, take the next one, and so on until we reach exit point of our program. So you can
13:03
see, this way, we can instrument everything that executed on our CPU. So we had three challenges, lack of source code, obfuscation, and encryption. VNFL plus DynamoRio solves the first problem and actually creates a new one. VNFL supports only a file
13:23
based fuzzing. So we can't actually perform fuzzing of network traffic parsers, which is a very serious limitation for us. To address this problem, I decided to implement a patch of, a patch on top of VNFL and call it a netFL. Suppose we
13:41
have our fuzzer and our malware instrumented by DynamoRio in memory. Let's assume our sample sent some request to CNC. Instead of actually send it to CNC, we redirect this request to our fuzzer. Our fuzzer generates a new test case, encrypts this test case if it's necessary, and then sends this response
14:04
back into our sample. Then we update our coverage bitmap, triggered by this test case, estimate code coverage, provide this coverage back to our fuzzer, and fuzzer generates a new test case, restarts our sample or
14:22
target function in our sample, and repeats all previous steps. It sounds like a bit complex scheme, but it's actually pretty easy to use. All you need is to specify IP address, port to listen on, and seed file. That's it. The fuzzer will do all the rest for you. If you need to encrypt your
14:41
test case before sending them back into our sample, you can provide a path to your custom encryption library. NetFL will load this library and will use this function to encrypt test cases. If you don't like default CNC, you can define your own one. If, for example, you need to implement some
15:02
really complex communication logic with your target. Okay. Let's see how it actually works. So in
15:40
this virtual machine, I have released build of NetFL. But before actually start our fuzzer, let me explain the code we are trying to analyze. So it's Dexter, version 2. Malware designed to steal point of sales, designed to steal credit and debit card information from point of sale sales terminals. So in
16:03
this function, HTTP main, they have initialization. So they generate agent string. Then they open connection with CNC using standard Windows API functions. So they will use post request to send commands to
16:22
CNC. Then it gets some information about victim's machine and then actually sends this data via standard Windows API functions. If they successfully send this data, they call this function, get cookie. So let's define this function. In this function, they receive
16:42
commands from CNC coming back. We are browser cookies. So this function, internet get cookie, is used for this purpose. So it's very trivial function. You just call it and you get your command and you get your cookies in P command variable. So then they
17:01
perform some parsing of this command and then let's go back. If they manage to obtain it, and command starts with dollar, they are going to execute this command. So this way they implement communication with
17:21
CNC. So as you can see, we need to implement some non-trivial communication logic with our CNC. So we need to implement our custom CNC library. Okay. It's actually not hard. All we need is to define two
17:43
functions. So this is our custom CNC. We need to define CNC that receives server port to listen on. Then we have like standard initialization of these circuits. And then we call listen on this port. And we have to implement the second function. CNC run. It
18:02
receives test case from net AFL for each generated test case that net AFL wants to send in our sample. So in this case, we have to accept this connection, receive this data coming from our malware, generate our response. We have to save our malware that
18:24
everything is okay. We receive your request. This is your response. And you can get your cookies. Then we generate cookie. There is nice function from Microsoft Internet set cookie. We generate this cookie on top of, on top of data provided by net AFL. And that's
18:43
it. So this way we can compile this binary and provide pass to our Pfizer. Okay. This is command to run AFL. Net AFL. It looks like pretty long command. But actually it's not hard to understand. The first
19:03
argument defines pass to our custom CNC library. The second parameter is port to listen on. The third and fourth parameters standard AFL in and out directories. And in directory we should have seed file. Minus D specify
19:25
address of our dynamic binaries. Minus D is timeout. Then we have internal win AFL arguments. You can find detailed explanation on GitHub. And then we have really important argument for those iterations. This argument
19:45
tells net AFL how many iterations should pass before actually restart the whole target sample. So this parameter can directly affect your stability. So this is very important. So in my case, I choose 5000. It
20:01
works pretty well for me. So and the last parameter is pass to our malware. Okay. Let's run. So it's initialization. Everything loaded successfully. Initialized. And in a few seconds we will see standard AFL screen. Okay. Here it is. It's not
20:26
related. So as you can see, our Pfizer stats looks pretty healthy. We already found like two, six paths. We have coverage. Our stability looks pretty good.
20:42
97%. Execution speed is growing. But it's still very slow because we are running in a very slow virtual machine with no parallelization. But if you can leave our Pfizer like this for a couple of hours, we can find bugs. So if you leave our Pfizer for four
21:12
hours with no parallelization, it will easily find several bugs in our sample. Okay. Great. Let's see what I managed to find in malware. First malware I selected for
21:24
my experiment was Mirai. Mirai is malware that targets IoT devices and uses devices as a part of botnet for large scale and DDoS attacks. This malware was used in some of the largest and most disruptive DDoS attacks in history
21:40
which caused major Internet platforms and services to be unavailable for large amount of users in different regions of the world. In 2017, source code of Mirai was leaked and different Mirai like botnets adapted this code and still operating in the wild. The fun fact about Mirai that it looks like after even followed some
22:03
security practices and use memory and debugging tool electric fans to search for hip overflows and use after free box which is a bit unusual for malware. Mirai DDoS capabilities are based on HTTP flood and several low level
22:21
network attacks. The most interesting part for us in terms of exploitation would be this HTTP response parser. Mirai needs to parse HTTP response coming back from victims to be able to perform HTTP flood attack. This parser has about 830 lines of code, hundreds of potentially
22:42
dangerous operations with memory, pointers, strings, so this is a wonderful target for our fuzzer. As I said file, I decided to use this very basic HTTP response. I ran my fuzzer for 24 hours and managed to find 43 unique
23:01
crashes which was caused by a single bug in relative URLs handler. Execution speed was around 1,000 executions per second which is pretty good and fuzzer managed to find approximately 430 unique paths which is also pretty good. What was the root of this bug? If our HTTP
23:21
response contains relative URL, this branch is triggered. In case of incorrect relative URL, variable double I always equal negative values which cause a memory violation and crash. This is logical error. So after forgot to set up this
23:42
argument to zero, they use it in code yearly and forgot to set it in case of relative URLs. So this is one example of test case that caused a crash. So if you see that Mirai or Mirai like botnet is trying to attack you, you can just answer this bot with this HTTP response and
24:03
they will all crash. So let me show how it actually works. Okay. So in this virtual machine in the right terminal we have our in the right terminal we have our
24:28
HTTP server, our victim implemented in Python. So it's just a simple HTTP server. In the left part we have debug build of Mirai. Debug build because we want to see
24:42
what's actually going on and in debug build they're printing a lot of information which is useful for demo. Okay. Let's start our server. So I guess it's start and running. Before actually start Mirai, let me explain one thing. So before actually start HTTP flat attack, they
25:05
need to connect with CNC. I was too lazy to deploy actual CNC server so I decided to implement in this Python script CNC and just response like random data to our Mirai sample. And surprisingly for me it worked.
25:22
So after dozens of attempts it actually start HTTP flat. Which was good. So it was like really easy solution for me. So to be clear, my Python server are now doing two jobs. The first one is to be HTTP response server
25:40
victim. And the second one is to be a CNC for our Mirai to answer that everything is okay. I'm up. Let's start. So okay. Let's run. So you can see it start in debug mode. It tries to attempt to connect with CNC. And after dozens of attempts it will receive some
26:00
meaningful data and actually start HTTP flat. Okay. Here it is. So it's starting HTTP flat. It's sending HTTP request to our Python server and Python server is going to answer with this malicious HTTP response I showed in the previous slide. Okay. So here it is. We have
26:22
segmentation fall and crash. So profit. This way we can stop Mirai to attack us. Okay. Thank you. There's more bugs in different samples. So I already
26:47
presented this sample when I was showing that this is Dexter version two. The first version of Dexter was one of the first known botnets that targeted point of sale terminals. As I already said in the demo, Dexter
27:01
communicates with CNC or HTTP protocol via post request and receive commands over the response cookies. Actually in the case of Dexter, you don't need at all. The malware code is so badly written that you can just answer with a long command and it will crash. And
27:22
actually it's remote code execution. You can find actually. We see this URL. It has 255 bytes. We can send command longer than this 255 bytes and it will cause stack overflow and we can explore it. And as I said before, there is no anti-exploitation techniques.
27:41
It's like 90s. We can easily cause remote code execution. So really old school bug. TinyNuke. TinyNuke is a zero style banking trojan designed to perform when the browser attack using Web Injects. It has a function load Web Injects. The name of this function
28:01
I guess is pretty self-explanatory. It is designed to ask CNC server to provide a payload to be injected in the victim's browser and then this response is represented in JSON format. It's then passed into our custom JSON parser. This JSON parser actually is very, very
28:21
complex. They implemented from scratch. So it's very good target for us. This is example of seed file I used to feed in net AFL and after 24 hours of fuzzing, I had pretty good results. Three unique paths, 800 executions per second and four unique crashes. The
28:41
root of these crashes is this function. It cause infinite recursion in case of very long response which contains only opening brackets. So in my case, it was like 7,000. So it will cause a very, very deep recursion, stack overflow and it finally will trigger
29:01
a crash in our target. So last example, KINZ. KINZ is a banking trojan fully based on leaked Zeus source code with some minor technical improvements. So all bugs I found in KINZ also related to Zeus. This example has been used to attack financial institution in
29:22
Europe. It has HTTP response parser. So I decided to use this simple HTTP response as a seed. Again, the trojan receives payload and commands from CNC or HTTP protocol. The HTTP response is then parsed by two complex routine analyze HTTP response and analyze HTTP
29:45
response body. And again, it's very interesting target for us. 24 hours of fuzzing get me 22 unique crashes which was triggered by one problem. This time, it was a bit more complex. So function get mime header is used
30:04
to extract a value of content length from HTTP response. But if the string contains a value, if the string doesn't contain a value of content length at all, it cause an integer overflow and as a result, the function will return negative values. So this get mime header will
30:23
return negative values. But then copy X is called with this value return by get mime header. So in my case, it was always negative three. It will try to override the whole memory layout of our sample. So if we try to copy with negative three, it means that we are
30:42
trying to override the whole memory layout. It's probably exploitable. We can sometimes control this negative value. But I'm not sure 100%. So this is example of crash case. We just need to send content length with block value. Okay. Let's discuss challenges and
31:03
issues. First of all, of course, in my case, I want to say that I skipped the hardest part. Reverse engineering and searching for target function. I took source code to reduce time of searching for a target. Of course, for real world samples, you need to perform
31:21
like initial reverse engineering of your sample and find this target function you want to analyze. Sometimes it can take a lot of time. Secondly, of course, there are bugs in MinFL, NetFL, especially when you deal with highly obfuscated samples. Third, you need to
31:41
find test case. So you need a seed file. Sometimes it might be really difficult, especially if CNC is down or you don't know what kind of file malware wants. So it might be challenging. Then we have a problem with encryption. So sometimes encryption algorithm is very complex. And I know it might be really painful to
32:03
perform reverse engineering of this algorithm. In this case, you can try to patch program just to disable encryption at all. It worked for me sometimes. And stability. Stability plays like critical role when you're doing coverage guided fuzzing. So if your
32:21
stability, if your coverage is always different when you send the same test case in your target, it means that your target is unstable and your fuzzer doesn't understand what's going on actually. So it's sending one file and it causes different code paths. It's strange and usually it means that you waste your
32:41
coverage guided fuzzer. So you have to pay attention for this. I also found this tool very useful when you want to find the target routine in your sample. I have implemented this tool on top of DynamoRio. It's basically a trace for Windows but
33:03
transparent towards instrumented program. So it will trace all API calls in malware and print this information in the file. So it's less detectable than standard API call tracer. So I hope, guys, it will work for you. You can try it. It's open source.
33:21
In future, I guess it would be really cool if you can somehow find this target function for our fuzzer automatically. Also, it would be great to increase stability. I implemented some heuristics to make it more stable. It worked for me. I hope
33:42
it will work for you. And it would be great to have some code coverage visualization tool for your fuzzer. I know there are some of them but it would be great to adapt it for net AFL. And, of course, improve stability. Okay. So I hope I convinced you that bugs in
34:01
malware might be useful and you can really find this bugs using fuzzing technique. Of course, net AFL can and should be used to find bugs in network-based application. So it's general purpose fuzzer. I designed it to be general purpose. You can use it to find bugs in benign software. So I
34:22
recently found one CVE in network-based application for Windows. So you can try it. And I'm also currently merging this project with VNFL just to reduce amount of different projects on this planet. So I guess I'm going to finish this work in two or three
34:42
weeks. So you don't need in future you will not need net AFL. Everything will be merged with original branch VNFL. And soon, probably in September, I'm going to release net AFL for Linux. I hope in September. Thank you for your attention.