Break your BSD kernel
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 490 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/46989 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Kernel (computing)Group actionSoftwareSoftware testingMathematical analysisCodeFluid staticsAerodynamicsRoundness (object)Kernel (computing)Group actionPhysical systemSoftwareMusical ensembleControl flowElectronic mailing listRow (database)Software bugWeightSoftware testingCodeGoodness of fitMathematical analysisVirtual machineBitMereologyFuzzy logicMultiplication signContext awarenessFeedbackSlide ruleQuicksortPresentation of a groupWebsiteOperating systemFamilyDot productFreewareSign (mathematics)Computer animationLecture/Conference
03:52
Line (geometry)outputComputer programSoftwareSummierbarkeitVideo gameSemiconductor memorySoftware testingValidity (statistics)Fuzzy logicBinary codeLine (geometry)Computer animation
05:09
Focus (optics)outputStrategy gameFeedbackCodeProgrammschleifeComputer configurationState of matterMeasurementDifferent (Kate Ryan album)Information securityState of matterNumberUniform resource locatorOrder (biology)FeedbackComputer wormPoint (geometry)outputType theoryComputer programQuicksortRun time (program lifecycle phase)Cartesian coordinate systemMilitary baseCode40 (number)BitInformationAdaptive behaviorSoftware testingStrategy gameCryptographySimilarity (geometry)Projective planeMultiplication signException handlingIntegrated development environmentWaveFirmwareSoftwareKernel (computing)EmailPattern languageStandard deviationFuzzy logicCuboidSequenceDefault (computer science)Black boxDisk read-and-write headComputer animation
09:37
Read-only memoryPoint (geometry)Kernel (computing)SequenceComputer programKernel (computing)SpeicheradresseTrailTracing (software)Queue (abstract data type)SoftwareFile formatThread (computing)FeedbackMultiplication signData miningCellular automatonRight angleMetropolitan area networkTwitterProcess (computing)Function (mathematics)Computer fileCompilerComputer animation
11:11
OvalDivision (mathematics)Similarity (geometry)CodeData typeKernel (computing)Read-only memoryBound stateLeakConfiguration spaceFunctional (mathematics)Parameter (computer programming)Division (mathematics)Subject indexingThread (computing)Computer programCompilation albumNetwork topologyMultiplication signSemiconductor memoryCrash (computing)LeakCompilerStack (abstract data type)Graph (mathematics)Software bugArray data structureSet (mathematics)Ferry CorstenSequenceKernel (computing)Fuzzy logicTracing (software)File formatNumberMacro (computer science)Operator (mathematics)Different (Kate Ryan album)Pairwise comparison1 (number)Interactive televisionElectronic mailing listInformationCodeAddress spaceBitType theorySoftwareSpeciesData compressionData acquisitionCartesian coordinate systemMobile appState of matterTerm (mathematics)Peer-to-peerForm (programming)Moment (mathematics)Sign (mathematics)TheoryCuboidMereologyCASE <Informatik>Workstation <Musikinstrument>WebsiteDisk read-and-write headComputer animation
17:18
Computer programSoftwareShared memoryFuzzy logicKernel (computing)State of matterData bufferModul <Datentyp>Kolmogorov complexityFilter <Informatik>CodeoutputCodeComputer filePoint (geometry)ProgrammschleifePlug-in (computing)Functional (mathematics)Order (biology)Semiconductor memoryMultiplication signResultantBitSequenceHookingThread (computing)Kernel (computing)Virtual memoryWrapper (data mining)Skeleton (computer programming)Different (Kate Ryan album)Fuzzy logicoutputSoftware bugOcean currentDigital rights managementFormal languageInterface (computing)Operator (mathematics)SpacetimeGoodness of fitFeedbackPattern languageDefault (computer science)Function (mathematics)Level (video gaming)Computer programFile formatFile systemStrategy gameUniqueness quantificationSystem callWeb pageData storage deviceTracing (software)Array data structurePhase transitionMedical imagingType theoryRow (database)Solid geometryRaw image formatIterationFocus (optics)Lie groupExecution unitCovering spaceRoyal NavyLatent heatPhysical systemMoment (mathematics)Closed setForm (programming)Vapor barrierProcess (computing)ComputerScripting languageLibrary (computing)Annihilator (ring theory)Boss CorporationWebsiteQuantumService (economics)CuboidInjektivitätModule (mathematics)Crash (computing)CompilerBenchmarkCASE <Informatik>SpeciesMereologyBlogOffice suiteOpen sourceWater vaporCellular automatonComputer animation
27:06
Fuzzy logicOracleRemote procedure callProcess (computing)Ocean currentCASE <Informatik>Touch typingCuboidMultiplication signFunction (mathematics)outputView (database)Local ringSoftware testingCycle (graph theory)State of matterComputer wormMedical imagingTime travelMatching (graph theory)SoftwareNumberDisk read-and-write headMoment (mathematics)Physical systemSampling (statistics)Computer configurationWrapper (data mining)IterationKernel (computing)SpacetimeEmailBefehlsprozessorBitCrash (computing)Fuzzy logicFreewareFile systemSoftware bugDebuggerFile formatWeightFigurate numberBinary codeComputer animation
30:39
Personal digital assistantType theoryWrapper (data mining)Hash functionFuzzy logicCore dumpPlot (narrative)Expert systemComputer fileElectronic program guideVirtual machineRevision controlMedical imagingComputer fileFile systemComputer animation
31:56
Lie groupPlot (narrative)Menu (computing)Fuzzy logicAsynchronous Transfer ModeExecution unitCore dumpComputer fontConvex hullBuildingEmailData structureComputer fileCASE <Informatik>Computer animation
32:31
Core dumpPersonal digital assistantType theoryHash functionWrapper (data mining)Plot (narrative)Fuzzy logicLink (knot theory)RootMenu (computing)Right angleAbsolute valueWrapper (data mining)outputNumberFunction (mathematics)Metropolitan area networkComputer animation
33:06
MIDIProgrammable read-only memoryConvex hullHill differential equationCore dumpFuzzy logicType theoryHash functionWrapper (data mining)Plot (narrative)RootPersonal digital assistantLink (knot theory)Lemma (mathematics)Meta elementDevice driverKernel (computing)BefehlsprozessorTask (computing)Directory serviceFunction (mathematics)Token ringBinary codePort scannerComputer fileoutputMenu (computing)EmulationBlogRankingMultiplication signTime travelDebuggerVirtual machineKernel (computing)View (database)Source codeComputer animation
33:47
Computer programView (database)Hill differential equationSicMenu (computing)Core dumpVideo game consoleOperator (mathematics)outputSoftware testingCrash (computing)Multiplication signSource codeComputer animation
34:29
Convex hullEmailWeb pageMaxima and minimaIntrusion detection systemData typePosition operatorMoment (mathematics)Right anglePrisoner's dilemmaMultiplication signPresentation of a groupKernel (computing)Source codeComputer animation
35:00
Data typeMenu (computing)Maxima and minimaKernel (computing)Core dumpControl flowKernel (computing)Different (Kate Ryan album)Vapor barrierLecture/ConferenceSource codeComputer animation
35:33
Software testingVideo trackingCodeKernel (computing)Installable File SystemInterface (computing)Different (Kate Ryan album)Kernel (computing)File systemDevice driverSoftware bugGoodness of fitPhysical systemCuboidFuzzy logicInformationMultiplication signAdventure gamePresentation of a groupFreewareDifferenz <Mathematik>WeightOpen sourceFunction (mathematics)Projective planeOracleInformation securityInterface (computing)BlogScaling (geometry)Lie groupMereologyMachine visionRevision controlComputer animation
38:03
Remote procedure callGroup actionVideo gameLibrary (computing)Core dumpSpacetimeProcess (computing)Crash (computing)Uniform resource locatorLie groupFunction (mathematics)outputSimilarity (geometry)Different (Kate Ryan album)InformationOrder (biology)Execution unitCommunications protocolVideo game consoleUniqueness quantificationKernel (computing)Directory serviceFuzzy logicResource allocationStrategy gameBinary codeMereologyServer (computing)Lecture/Conference
40:33
FacebookPoint cloudOpen source
Transcript: English(auto-generated)
00:05
Welcome everyone. My name is Maciej Gorowski. I was working for a couple last couple of months on fuzzing netBSD kernel. However, this knowledge also applied to fuzzing other kernels. It shouldn't be hard to get this working also on FreeBSD.
00:22
I was also doing some work to get this on FreeBSD, so you can also apply this to fuzz your FreeBSD kernel. So the outlet of the presentation, first of all, what we are trying to achieve, and this talk also assumes that you don't have a lot of knowledge about the fuzzing and breaking the kernel.
00:45
So this will give you a pretty good overview of what you can do, what techniques are available, and some background knowledge, like how these things work under the hood. So we'll talk about the coverage as a kind of feedback for our fuzzer that is used for the fuzzing.
01:05
I will also mention a little bit sanitizers, but I won't go too deeply to sanitizers because this is also a very wide topic. However, yesterday was a very good talk about sanitizers on LLVM Devroom, so if you missed it and you want to learn more you can also see the recording.
01:24
We'll also show how to do the basic setup for the fuzzer, and as a demo, we'll try to run fuzzing on the FFS, on the net music kernel as a virtual machine. So hopefully we may find something interesting.
01:46
Sure. Okay, so why are we trying to break things, or why I'm doing this is essentially a need for having multiple different ways
02:04
to improve the quality of software, and as we all know kernel is a very critical part of the operating system. If you break the kernel you have serious issue, and there is no known
02:20
silver bullet to solve all the issue and ensure software quality. So usually what people does is multiple parallel actions to improve code quality and software quality in general, because we know we always will have some bugs or some issues, but from the other side we would like to have reasonable quality software and
02:43
something that won't break very easily. So we can have some list with things that we can do to improve our software starting from stuff like getting code reviews, applying best practices, and so on and so forth. This list is far from being complete and probably it can be a couple of other slides
03:03
to just list different techniques that we know, but I think one of the most important and most interesting or some other people may say boring things is about testing the software. For me testing is like the time when software
03:23
meets the truth. So if you just do the reviews, if you are just doing analysis of your code without testing, without running the software, of course, it's not good. From the other side technique is also very wide area. There's a lot of resources how people do testing,
03:43
but I think in our context it's fair to say that fuzzing is also a kind of testing for our software. So first of all, what the fuzzing is? Usually we call fuzzing when we have some way to test the software which expects some input and
04:05
as a testing technique we are trying to give our program some strange unexpected input and we are observing what is the behavior for this particular input. So this simplest
04:23
fuzzer that can be written is just, for example, let's say we have some binary that you like to fuzz which is called fuzz binary and this binary gets us an input 1000 kilobytes and some raw byte input. So what we can do, we can just write some very dummy fuzzer in a couple of
04:45
bash lines or even in one if you like, very long sentences and just get some random input, save it somewhere or pipe it to the binary and then run binary. And the funny thing is if the program is not written having validation in mind, in that way you always
05:04
you already can break some programs, which is very funny. But the good thing I would be to think how we can improve our dummy fuzzer. So what we can what we can do to get the fuzzing smarter and
05:22
one thing that came usually to head is how we can generate the input in more intelligent way, more smarter way. So one of the technique is mutation-based fuzzing. So mutation-based essentially assume that we can came up with some strategies that will be
05:44
smarter or have some logical sequence instead of just getting the random bits and we can, based on those strategies, we can mutate the input and provide this input to the application. By definition, mutation-based fuzzing doesn't really care about the state of application because
06:05
we are just caring about strategy. So we have some strategy to flip certain bits or the strategy can also be aware about the grammatic. So for example, if you are fuzzing the HTTP requests, you can get some payload which is like in order of HTML standard and then play with different tags.
06:27
You don't have to have just raw bytes that you are modifying. So in that way, we are able to much easier find some interesting inputs than in just a random way. And also if you get a random
06:42
input, usually if your program expects some different tags, some different header in the input format, usually just stopping on some first couple of checks that are making sure that your input is have some, for example, certain header, certain pattern at the very beginning.
07:03
Another way to improve the fuzzer is introducing the feedback loop. So in the feedback loop, essentially get some feedback from running application about application state. And this state can be measured in different ways. The most popular one is the coverage, code coverage.
07:23
But we also can think about other things like timing, CPU resources, and stuff like how application behave with how application behave with the environment. And so funny thing when I was doing those slides, I just put timing because first of all, timing is known from
07:45
cryptographic software. So for example, if you have some cryptographic software which have different timing for different requests based on the execution path, you obviously have security issue. But I was thinking, yes, that's a good example. But from the other side, I did so many
08:02
timing-based fuzzers. And yesterday when I was waiting for my Fosden t-shirt, I met my colleague. He's working for Tor project and we started talking and I told him, oh, yeah, I'm doing the talk on BSD Dev Room about the fuzzing. And he was like, oh really? I have a friend who actually was doing the fuzzing based on mutation and
08:21
he used a timing for execution of this program to the firmware. And based on this execution time, he was doing modification to the software. And by this way, he was able to break some popular electronic device, maintained electronic device. So I was like, oh really? So that means this really works.
08:41
So yeah, I will also try to default in this timing timing way to, this way to measure the feedback from the application. So it was pretty funny. Also from the testing part, we have stuff like, we can think about our fuzzing based on the application that we fuzzed.
09:03
So if, for example, in a similar way as we are doing testing. So we have white box testing, black box testing. If we don't have, for example, all information, but we have some information, we can also say we have gray box testing. And so this feedback loop also
09:22
depends what kind of fuzzing are you doing. In our example, we will be doing white box testing because obviously we have a kernel that you can compile, we can instrument the code, and then we can monitor the state of the application based on those informations. So coverage tracking. As I said, that's one of the main techniques, how you can get the feedback
09:44
from the application, and that is how many fuzzers works. Main coverage trace is PC trace program counter. It tells us about the execution path and the good thing how we can start under our
10:03
understanding of this format is essentially the way how it starts in our program. So whenever we run the program or whenever we have a kernel, because kernel is also a kind of program, we need some array, we need some memory location where we will be putting the sequence of PC counters. So in our example, we have
10:22
an array of size with 100 entries, but it can be much wider. And when the program is executing, this can be done, for example, per thread, per process, or if you are running the kernel, it can be also something per kernel thread, but also you can think about some global
10:43
array. For example, if you don't care about the threads, but you care about the execution path, because what can happen if you are fuzzing the networking, you can put the request to the networking queue, and then another thread will be getting this request from the queue. So if you are fuzzing based on the thread, then you will be just seeing the path when you put the packet
11:03
to the queue instead of seeing the another thread, which can be different, that will be processing your request. So how this exactly works, as I said, those are compile time instrumentations, so they are put by compiler. So on the right, we have a program, very simple,
11:24
which have main, which call bar, which call foo, which call bar. So what will happen during the compilation phase, our compiler will put some magic instrumentation at the very beginning of all functions, and those will execute when we are executing the program and won't interact with our program.
11:42
But what are those magic instrumentation? So I get a code listing from netBSD kernel, and it's a little bit modified just to give you some brief idea of what is going on. So every time when we hit the instrumentation, we call this function instrument code, and this function need to get our
12:06
memory that we reserved for presenting the PC counters. Obviously, we get the index because we need to write another entry. We also need to do stuff like border checking because based on also
12:21
your intention, you sometimes may overflow, and if you do, the question is for you what you like to do. But the most important part, we are just getting the PC from the function from which we were called by just compiler macro. So if you run our very simple program, I will end up with array of three entries, main, foo, bar. So it's very straightforward.
12:46
And we don't have only PC trace, we have also a couple other different trays. We have CMP, DIV, and GEP, what they are. CMP trace is used for application, or is used to
13:01
instrument the comparison instructions. DIV is for every time when you're doing division between the arguments. GEP is for manipulating indexes of the arrays. So compiler has some understanding about those, and when it's compiled the code, it will put the instrumentation
13:21
before every instruction is performed. So then you can have better understanding. Why you need to have different, why you have different types. And so you can imagine like, based on your program, you may be doing some mathematical operations, like you can
13:41
compare the argument, you have some graph or some tree that you are traversing. So only the PC counter doesn't give you full information because you may always see the same path. However, the path is not the same because you are just, you know, getting different arguments. Or if you are manipulating a lot of indexes of array, you may also see the same function called over and over again, but from the other side you use different arguments.
14:07
So I here present the CMP trace. It's a little bit different than PC trace, but from the other side is something that also you have in all BSD kernels as far as I know. And also other ones are similar in terms of format.
14:23
So instead of just one information, you have arguments of the operation. So we have the numbers that we are comparing. The types tells you about the size of the arguments. So if it's 8 bits, 16, 30 to 64, so on.
14:43
And also you have PC trace. So you have a sequence of those sets of four values inside your array. And as I said, everything depends on your kernel, on your code that you're fuzzing. So it depends which part, for example, of the kernel are you trying to fuzz. It's always good to think,
15:03
okay, so if the PC trace is the only one that I would like to see in my fuzzer. Other important tools, sanitizers, as I said, I won't be going how they actually work, but I'm trying to convince you that they are very useful. And the reason why they are very useful is when you meet any issue in your code,
15:25
the code may not exit after invalid operations perform. For example, you can have memory corruption, but this memory corruption won't expose easily. So in that case, you can run the fuzzer and then don't have any crash even if you corrupted some data.
15:43
So if you get the sanitizers, here we have three kernel sanitizers, other sanitizers, leak sanitizer and memory sanitizer. They are available in netBSD. You can also have undefined behavior sanitizer. There is thread sanitizer. I didn't know too much about thread sanitizers at the moment, but
16:05
I am also looking about them. So the very simple one to start with is Casan because it detects things like out-of-bounds for cheap stack and other
16:20
common mistakes from the software. The downside is you cannot have all at once because some of them are mutually exclusive with other ones. So for example, by definition you cannot have asan and msan, other sanitizer and memory sanitizer because they are relying on the same data.
16:41
So they are overlapping each other. So in LLVM, it's specified that you cannot use both. And from the other side, they also provide some slowdown because they are compiled times and they also introduce other instructions. But as I said,
17:00
they allow you to fuzz the bugs easier and faster. So I think it's a big leverage for the fuzzer to actually run your fuzzer with address sanitizer or any other sanitizer. Depends on what kind of bugs are you expecting. Let's go to the fuzzer.
17:22
The one that I was using for a couple of months is American Fuzzy Loop. I think everyone knows American Fuzzy Loop is very popular. It has very good record for found bugs. Some people also claim it's pretty old at this moment. But I think as a starting point it's very good because first of all, it's very easy to use. It's rock solid.
17:43
But from the other side it didn't have a goal to fuzz the kernels, but just the user space programs. You cannot use it directly to fuzz your kernel. You need some modification.
18:00
So what that modifications are? We can think first of all what you have from the kernel side, what you have from AFL side, what format AFL require. So Unix, Linux, FreeBSD, NetBSD, OpenBSD, and other expose the coverage using
18:21
coverage device slash dev slash kcov. Using this device you can get for example PC trace, CMP trace. The way how you access this data you configure the device using iOctos and then you do the memory map. Then you run your program and after you finish your memory, your mapped memory should contain
18:45
those arrays that I discussed previously. From the other side, AFL use its own specific format where AFL focus on pairs of the PC counters. And it stores
19:00
unique PC counter pairs, which means every time when we perform another instrumentation, which means we have another PC, we remember the old PC counter or zero if we just started and then get the old PC counter and current PC counter, exhort them together very easily,
19:23
make sure that we don't overflow, and then increase one in our map, which gives a compiler, which means a fuzzer hint. This pair of those two PC counters just happened. So then the fuzzer can analyze this array and say, okay, so I got another pair that I didn't solve before or maybe
19:45
I always get the same pairs of PC counters, which mean I need to do something different. So it's very easy to convert PC trace to AFL trace because they look the same. But from the other side, we don't even have to do that.
20:01
We can just store, because it's just storing created on data, we can just do this on-fly, so we can store two values and this map, and this is everything that our AFL needs to work. In order to do that, we did some work and modified a little bit a netBSD kickoff,
20:24
and essentially we allow you to plug in another kernel module, which will hook to the coverage functions that are already there. By doing that, you don't have to manage the default resources, but you still can use them if you like. But from the other side, you can do things like,
20:44
as I show, like converting one input from coverage to another one. You can think about any other type of fuzzer that you can use, and you don't have to copy all the kickoff code and then manage file descriptors, make sure that threads are open
21:03
closed, all of those things that you need to copy from kickoff. You can just focus on coverage functions and map, that's what you really need, and then you can leverage on what is written already. Okay, so how this looks. First of all, when I think about the fuzzing,
21:26
I first of all think what is the input, what is the output. So here we have my setup for fuzzing the FFS mount. So my input is my file image, my output will be the result of
21:43
mount or token slash, and this sequence also requires a couple other things like a wrapper, as I said earlier, just like a plugin for takeoff, and it works like we have a fuzzer,
22:04
the fuzzer creates the file system image, we need a wrapper that will prepare the file system image to be mounted, but also will execute the code that we need, so it will be executing the mount. Also we do not trace the wrapper, this is just a thing that
22:22
helps us to trigger the execution path in the kernel. And then we do the mount, mount is calling scull, vfs, file system layer, on all of those layers we get the coverage data, this coverage data then is transformed by our AFL plugin, this plugin is exposed to the
22:45
fuzzer as this map of the pairs that I showed before, and then after every run the fuzzer will get the input, run the input, we'll get the output from the shared memory, we'll see okay so what different strategy as our feedback we should change, or did we found any interesting things,
23:06
and after the feedback phase it's just perform another operation over and over again until we hit something interesting. So I would like also to talk a little bit more about the wrapper itself because also it's interesting. So the way how you usually write those wrappers
23:27
is I write them in shell, I focus okay so what kind of operation I'm doing, and then after I'm with shelling I need to translate to some other better performing language. I usually just use C
23:41
but you can use C++ or anything else that can expose your own interfaces like system calls because you don't want to run on top of libraries, someone else's libraries, you just want to have the simplest, maybe not even the simplest, but the shortest code that you can do, and
24:03
this is also well documented on every fuzzer that the performance is very key. Even if your fuzzer is very smart and doing very good guesses about different inputs and then analyzing the output very well, it still can perform much worse than much simpler fuzzer which is just faster
24:22
because in order to find some bugs you will need to run this 10,000 times, million times iteration, maybe even billion times iteration if you have some very well-tested software, so in that in that regard you need something that is performing operation very quickly.
24:42
So remember performance is the key, use raw interfaces. Other things also that are useful when we are doing the fuzzing is how I can see what is going on inside my fuzzer, and for example the problem that I had was
25:03
when I ran my first fuzzer, it's not very slow but from the other side it's very ineffective, so I was thinking okay so what's going on inside this fuzzer, how I can debug that, so then what I did, I ran the take off on my fuzzing wrapper and I monitor what kind
25:24
of execution path is inside so by the fuzzer and I realized for example in netBSD we have much more verbose kernel and we got a lot of operations from a virtual memory even if you are doing mount you see more virtual memory operations which mean you are actually not fuzzing the mount, you are
25:45
fuzzing stuff related to the management of the pages. So this is also a bit tricky and for example in this way you can get the input, you can understand what is actually going on under the hood. Other thing that you can also do before you start or if you would
26:07
like to tune something, it's also doing coverage benchmark, fuzzer benchmark, so you can see for example if you want to understand if your wrapper is fast or slow you can for example put some
26:20
very dummy code inside the kernel, what I've done was create some simple character device which just gets some input, compare this with the pattern and then if the pattern matches you want the lottery so your kernel crashes, if it doesn't you just run over and over again
26:44
but then I can see how well my, because this code is very simple, I can see how well my operation in my wrapper perform and from the other side also you can compare this this kind of check with user space which also gives you some intuition how well are you doing.
27:08
So to do the local setup of the fuzzer we also need to have some initial corpus, so this initial corpus you can create the corpus from for example just raw zeros
27:24
it's very well documented that the AFL is able to reproduce some different formats so for example if you are fuzzing the jpeg or some other images AFL is able to reproduce those images but from the other side if you are running a little bit slower because you are in the
27:43
kernel space and also you don't want to spend like a initial CPU cycles just to figure out what is the header or what is the magic number for the file system inside the inside the image you can just create one and provide this as a raw initial as a raw initial data for the fuzzer
28:08
so this also very helps with speeding up the fuzzing and once we have that we can run our fuzzer with option k which was added in it was added in afl by guys from oracle when they
28:23
also were doing the fuzzing of a file systems so essentially I reuse a day work and integrated this with net bsd and free bsd to make them work for both and then you also need to specify the wrapper so the important thing is in that case you're running on the same you're running
28:45
on the same you're fuzzing the same binary that you're running on so do not shoot and yourself in food because if your calendar crash then your data may disappear you may even not see the latest output in the afl so you need to make sure that you know what you are doing
29:05
so if for example you are fuzzing something what is unstable and you will be finding a lot of bugs maybe it's good to get this input and output somewhere outside of your kernel so you can mount
29:21
this on nfs for example it will be slower but from the other side you will have less issues and then run this process remotely because if you don't then every time when you crash you can lose the data that you are looking for but from the other side if you have well-tested software and you just will be happy if you will find a couple of bugs or maybe one bug even
29:44
you can just run this natively on your kernel what I've done for example was connecting the debugger to the kernel run afl if anything crash I can just get output from debugger and see what's going on I can also get crushed somewhere sometimes I realize that depends on
30:04
the path that should crash you still may not get the I still may have some problems with getting the output so because the ideal scenario is you get the output which is the image that you were fuzzing and then you can get this image and check okay so if this image
30:23
always break my kernel of this was something that was from some before testing so usually is very nice if you have output that is a payload that you can just mount and then it ever and break your kernel okay so how many iterations to find a bug in fuzzing ffs so what
30:46
we'll be doing we'll just run on the same virtual machine the fuzzer and that's actually some other issue that I found at the very beginning but hopefully we can we can find something interesting
31:02
okay okay so I need to get I don't know how big it can be because then the fuzzer need to also okay so we have here
31:20
the fuzzer it's modified version we have inside in a file which is our prepare is not zero so the name is a little bit misleading but we can see that this is zero okay I was fuzzing this anyway and it's actual this is zero so I was fuzzing this before and I think
31:51
yes thanks so it's not zero so you can see this is the legit file system image
32:04
so we have the offset which is eight I think 8k and we have some some magic from the headers of ffs I want to go too much in deeper details but we see the structure is it's not it's not
32:21
just a zero file and so the way how we run it we can have this okay so I won't run this yet because I usually also get and for a person absolute buff wrapper
32:48
so we run afl fuzz minus k i from input out from output the intro number exactly but I think this definitely will work if we will get the the wrapper itself okay from the other side
33:08
this running on gdb so we have connected debugger because this is just a virtual machine and this is debugger that is debugging the kernel so we can stop in every time we can see the trace
33:22
we can retrace and we can also see that because we stopped virtual machine just froze you can just continue so then our virtual machine is still living okay that's let's run this thing oh we already get something so this run the dry
33:49
so this this run the so it was running the dry input the dry run at the very beginning so a dry run it's just do some simple modification and what you can see here
34:06
is yeah usually it also should show you the afl console but because we hit something probably because I was testing this before so we have some old history so it was also running something that was found before but we can go here and this is already running in getting the core
34:27
dump from the kernel okay but we have operations that crash so we have vfs reclaim which
34:42
was called on up on the on the vino that was reclaimed from the vcash so that's kind of a moment when you start debugging the kernel and I won't right now be doing this because it requires some time and also I'm not in the right now great position as a presenter to
35:03
start debugging that but this is some this is something that it's break broke our kernel so this might be related to them to the data from the v nodes and this we can get this as a core or we can start debugging this you can start debugging this in gdb
35:29
okay I think we can just go okay so conclusion so fuzzing so fuzzing the kernel is another way us as I said earlier
35:43
we need a lot of different ways to test our kernel and that we can just we can just do to make sure that our quality of our drivers our file systems is better also if you like to you start searching for some bugs this is also very easy way to start and we have
36:04
recently a lot of good work for improving things that allows us to run the fuzzing so for example gathering the coverage in net bsd and other bsd in general improved also sanitizers in particular in net bsd become much better over the time it's very easy to start as you see
36:25
it doesn't require much knowledge so I think in this half hour ice gives you almost everything that you need to start your adventure with the fuzzing and also you should run the sanitizers by the way this was not run on sanitizer because we didn't have any sanitizer output
36:44
so future work I would like to have some also discussion with other people I saw Andrew presentation from free bsd that he's also doing something on afl so maybe we can came with some similar way that we fuzz bsd's I saw previously on linux they introduced another
37:05
device it was slash diff slash afl which community didn't like it and it was not upstream at the end so I think if we will try to get this on common interface that will be beneficial for everyone because then people from security projects can just you know
37:23
fuzz our kernel in the same way I also need to spend more time on improving the remote fuzzing because it didn't scale very well if you want to find many bugs and just continue with fuzzing and fixing the bugs the resources so our net bsd blog there's also paper
37:46
oracle paper yes from 2016 my collection of mount wrapper clang documentation was very useful source of information and I will be happy to get any questions if any well thank you very much thank
38:13
you one group just we can list all the questions and the answers for remote
38:39
please repeat this one okay so the question was can you use afl for remote fuzzing not
38:45
for the fuzzing that is presented the answer is yes but you need to do additional setup because as I said afl use the same data on use input and output directory so you need to mount to have them remotely you need to run this as a pro you need to run this as a process and then you need
39:05
also some way to send back them send back the information to the fork server that crash happened because your process just stopped and the thing that I'm currently looking on is how to get efficiently the kernel dump because then you have the dumping process
39:23
so this allocation for dump also need to be mounted for something like nfs again or some other protocol I mean nfs is the simplest one but you can potentially come with some other good ideas and I don't know how hard will be to get afl console to realize to analyze those
39:43
crashes to say because when you're fuzzing the user space like user space library or user space binary you see those unique crashes for example so afl is going through the core dump of the process and then saying okay so this one occurred before so then it's not unique it's
40:01
like something similar or this one is not unique because it's something different and then afl also takes strategies from those crashes to improve so that's so I think as I said that part about analyzing the core dump from kernel is some unknown for me other things can be done easily
40:24
go ahead okay maybe someone wants to leave the room before the second question okay