We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Fuzzing Device Models in Rust: Common Pitfalls

00:00

Formal Metadata

Title
Fuzzing Device Models in Rust: Common Pitfalls
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
As of October 2022, we run a 15 minutes fuzzing session for all Pull Requests submitted to the Virtio Device implementation in rust-vmm. But implementing the fuzz targets was not smooth sailing. In this talk, we go over the challenges of implementing fuzzing for Virtio Devices and how to overcome them. Being one of the most critical components in a virtualization stack, the Virtio Device Model is a great target for fuzzing. Fuzzing is a security focused testing technique through which you can discover vulnerabilities in code that deals with untrusted input. When fuzzing Virtio Devices one of the biggest challenges is mocking the device-driver communication. This talk will show how we approached this in rust-vmm, and why fuzzing should be taken into consideration from the early development stages.
14
15
43
87
Thumbnail
26:29
146
Thumbnail
18:05
199
207
Thumbnail
22:17
264
278
Thumbnail
30:52
293
Thumbnail
15:53
341
Thumbnail
31:01
354
359
410
Software testingPhysical systemCrash (computing)outputDevice driverFuzzy logicLibrary (computing)Point cloudHypercubeQueue (abstract data type)Default (computer science)Level (video gaming)TelecommunicationSpacetimeExecution unitInterface (computing)Revision controlObject (grammar)Error messageData structureCodeParsingCodeLibrary (computing)SpacetimeDevice driverCrash (computing)outputKernel (computing)Position operatorLevel (video gaming)1 (number)Fuzzy logicIntrusion detection systemTelecommunicationInformationSoftware testing2 (number)ImplementationPoint cloudFunctional (mathematics)CASE <Informatik>Computer programmingData structureError messageRepository (publishing)DampingSerial portUnit testingProjective planeObject (grammar)Execution unitMultiplication signQueue (abstract data type)Point (geometry)Interface (computing)Line (geometry)Presentation of a groupMereologyOrder (biology)RandomizationSemiconductor memory
IdempotentExtension (kinesiology)CASE <Informatik>AdditionCodeFuzzy logicImplementationFrequencyExplosionCrash (computing)outputMacro (computer science)Error messageFunction (mathematics)TunisChainMUDError messageFuzzy logicFunctional (mathematics)IterationPresentation of a groupLevel (video gaming)Goodness of fitAreaCovering spaceMeasurementOrder (biology)Letterpress printingComputer fileChainDialectAdditionDevice driverCASE <Informatik>BitPoint (geometry)Unit testingInformationMereologyMathematicsMultiplication signLink (knot theory)Software maintenanceTheory of relativityWebsiteSoftware bugCodeLengthData structureBuffer overflowImplementationEmailRandomizationUniverse (mathematics)
Dynamic random-access memoryExplosionWeb pageFrequencyCodeParsingData structureExecution unitoutputCASE <Informatik>RandomizationSerial portMultiplication signCore dumpComputer fileGraph coloringUnit testingControl flowLinear regressionSoftware testingSoftware bugComputer animation
Program flowchart
Transcript: English(auto-generated)
Hello everyone, my name is Andrea and today I'm going to talk about fuzzing in HTML. This presentation is not about fuzzing itself, but rather how we failed at it. So before I start with a big pause of fuzzing, I will continue to think about fuzzing itself.
I hope some of you already know about it, I don't have a lot of time. So fuzzing is basically an automated testing technique. The idea is to just send random inputs to a program to see how it behaves in that case.
And how it works is that you use typically a tool like fuzzer that is going to generate random input for you. And then you're going to call some functions with that random input. And the fuzzer is going to record some findings.
And if it finds any interesting input files, it's going to write them to where it works. Findings in this case can have crashes, can be hands, but can also be timeouts. So for fuzzing, when you first do it, you typically start from an empty corpus.
But as you run fuzzing, you're going to generate some interesting inputs. Which is helpful because in the next ones, you can just reuse those inputs and then start from scratch. This helps with finding interesting things faster. So in RustVMM, we implemented fuzzing for VMbird.io.
We have three fuzz targets, one for the bird.io queue, one for the serialization of the bird.io queue, and one for the bird.io piece. So in the RustVMM project, we only have implementation for the packet, so that's why we fuzzed. During fuzzing, we discovered three crashes, and only one of them is treatable by a protection malicious driver.
And what we have right now is that we are able to run fuzzing for a request that you're submitting to RustVMM. To the VMbird.io repository. The fuzzing is implemented using libfuzzer. And besides the fuzzing that is happening in RustVMM itself, the folks from Cloud Hypervisor are also running fuzzing.
And we also discovered a time-out in the program. So this actually brings me to our first info. So what is it you want? It should actually be... Yeah, okay, so let me finish it afterwards.
It's a people, and that is me. The first people is that you actually have to run the timeout that is appropriate for what you're doing fuzzing for.
Because the default, for example, for the fuzzing that we were using is actually 20 minutes. And since we are just working with Rust.io and whatever we use, there's nothing that can possibly take 20 minutes to process. So we have to adjust the timeout to 60 seconds in our case, and this was something that was recommended by the folks from Cloud Hypervisor.
Now, how we run fuzzing in RustVMM is at the library level. So the advantage of this is that it's easy to set up. So it's really important that it's easy to set up.
It is a good thing. People are like, oh, but you're running fuzzing at the library level, so you don't have the kernel. It's so easy, so simple. So it's like, yeah, it's great, right? I mean, easy is a good thing. And yeah, it's a good thing because you can also run on almost any host. You just have to have a fuzzer installed and the repository, and then you just run fuzzing.
And it also runs in a user space. There's also disadvantages, of course. The first one being that you cannot cover the whole of your setup. So that means that you're going to have some things that are not going to be fuzzed.
And then because you are fuzzing in user space, you may need to do some input driver. And this tends to be a bit complicated. And also you can find false positives. With the false positives, the idea is that you will find crashes that otherwise would not be triggered by a driver
because maybe you have some other case in place. I would say that it's still important to fix these ones as well because you never know how you're going to change your code and how it might end up actually triggering those IDs in the future. And for the mocking of the driver, how it works, we already simplified here,
but the idea is that the driver is writing something in memory and then the device reads what the driver wrote in memory and it does stuff with the data. The part that we want to pass in U.S. VMM and the part that needs to be done in U.S. VMM is this side of the device.
And then what we need to mock is actually the driver side communication. And in U.S. VMM what we did is that we started this mocking of the driver from the beginning. So we needed it anyway to run some unit tests. We needed it for other kinds of testing as well. So we had an initial mock interface from the beginning.
And when we wanted to do fuzzing, we just evolved the mock driver in order to support that as well. Okay, so at the high level, how it happens right now in Rasphira is that we parse the random bytes. We initialize the mock driver with the data that was parsed by fuzzer.
So high level, it ends up with some descriptors and some key functions that have some random input that they need to process. And then we create the queue and we call these random functions with random input. And yeah, the second big point is that if you are trying to do fuzzing
and you just start when the project is already mature, what is going to happen is that it's going to be a bit difficult. You might find it very complicated to retrofit it. So instead, I know that it's not necessarily viable to start fuzzing when you start a project.
But what you can do instead is that you can keep fuzzing in the back of your head. And then when you create some mock objects or some unit tests, you can think about how you can actually use them in fuzzing as well.
Which is what we did, but not very well. So one of the crashes that we actually found was that the mock driver was crashing on invalid input. So we had to adapt it actually to return errors. Even though it was just one test, we couldn't just crash on invalid input anymore.
So the idea is to return errors at the level where you want to be fuzzing. That can be processed at higher levels. And so the fuzzing happened fresh. And now for structural fuzzing.
So without structural fuzzing how it works is that the fuzzer is going to generate some random bytes and then you have to interpret this as the bytes that you have to do for your library. With structural fuzzing it's really nice because there are some tools that are just going to basically
interpret the random bytes as a structure that you actually need. So it's super nice. What it does is that it significantly reduces the code that you need to write. And even raspbian remiscence is significantly arbitrary. Now we had to change it unfortunately.
But before we did that we had only 270 lines of code. And now we have around 740 lines of code for the fuzzer. And unfortunately it came with some problems. So that's why we have to actually fix it. The most important part is that it's not reproducible.
So you can't reuse the corpus that you had in previous runs. Which was a big problem for us. Because basically what happens is that arbitrary is introducing some randomness.
And that basically means that you cannot reuse the corpus from previous runs. The big problem here is that we realized that we can do incremental improvements for the fuzzer.
And we didn't check that what we want to implement can actually be implemented through the app. So instead a better problem will be to reuse the corpus that we generated. Okay, and now about when fuzzing actually fails.
So we had a PR in university map. At this point we were already running fuzzing for pull requests. And there was a PR that was introducing actually an overflow. So here the overflow is that the packet header size addition to the packet length can actually overflow.
Because the packet length is set up by the driver. This bug, I actually found it during code review. So it was a bit unexpected because I was hoping that the fuzzer is going to find it, which was not the case. So after some time I realized that writing fuzzing for just 15 minutes might not actually be enough.
Because the fuzzing, this bug was triggered with fuzzing but after running it for 14 minutes instead. So how we fixed that is that we added a fuzzing session that is optional and that runs for 24 hours.
This one is to be started manually by the REST human maintainers. And should only be started when there are pull requests that actually back the fuzzing relation. This is because we are also consuming a lot of resources when doing fuzzing.
And also you don't want to block all the pull requests for 24 hours. So typically the REST human site takes activities to execute. So blocking it for one day might not be reasonable for all the pull requests. So the pickle here was not fuzzing for long enough.
And instead we had to work our way to find a way to not block pull requests, but at the same time to provide more fuzzing. Put coverage for Rust. So in Rust you can actually get coverage information by running a link on top. In Rust you only get live coverage.
So basically this was the starting point of the presentation. I was thinking I was going to come here and I'm going to show you how great it is to run fuzzing for 15 minutes. And then more minutes and then with purpose and all these really extravagant things.
And so we ended up with fuzzing for 15 minutes generating time for missed regions. And the coverage of around 82%. So I was like well that's okay, that's good. So then let's just run with some minimal purpose as well. So this is some purpose that we generated from unit tests.
Let's just feed it to the fuzzer and see how this changes. There was no change actually. So I was like okay, not bad, not bad. Let's just run for two weeks. So what do you think is going to happen now? So actually, sorry.
So at this point I was like I have to change my presentation. So it's not what I expected. But instead I learned something. So you can't actually use coverage to decide when to stop fuzzing. So instead what you can do is that you can use coverage information to see what parts of your code are not actually covered.
And yeah, well that's about it actually. This is a summary of the people that we read into. And I think now we have a lot of questions.
Did you look at like how the fuzzer works and then like what areas were not covered and try to figure out why it wasn't covered in those areas? Yeah, so the question was if we looked at how the fuzzer works and what areas were not covered. Yes, we did and I have a slide for that.
Thanks for the question. Okay, so actually I have two slides for that. There were some functions that we were not calling on purpose. So because on the birthday of Q, for example, we have some functions that are just iterating over descriptor chains and then they are doing something with the data.
And at the birthday of Q level you can't do something with the data. So it's like okay this needs to be fast at the high level like at the device implementation level. So it's like okay we're not going to call these functions. Which is a bit hilarious because that's where a top hypervisor actually found the timeout problem.
Which we were not able to reproduce with birthday of Q but still. And we actually did this one function that shouldn't be called during fuzzing. And then I reran the fuzzing and yeah it's a bit better but it's still not great. And then I looked into what, well actually you can't see it very well.
Yeah, so I looked into what actually is not covered and you're not seeing there so you have to trust me. These are actually errors.
So the printing of errors to files. So since in the fuzzer we're not actually initializing the logger, these things cannot be triggered by fuzzing. So there's lots of error with this printing to a file that's not happening in the fuzzing.
Yeah? What steps have you taken to actually make sure the coverage covers everywhere which these people look at. So just covering certain areas which clearly aren't covered. I didn't understand the question. Which areas, what steps have you taken to make sure the errors weren't covered in the fuzzing.
Are they covered in futurfuly or unit tests? The question was what measures are we taking in order to make sure that the code that was covered before is going to still get covered in the next iterations. None?
So right now we're not doing anything. This whole coverage thing is just something that I did for the presentation and it's not automatic in any way. This is actually a good point for future investors to make sure that they're still covering code because what will help with as well is that we could make sure that new functions that we are adding to the code
are also covered by coverage. So it's a great point to make that way. Yeah? We're talking about the structure of our fuzzing. Yeah? And you mentioned that you cannot reuse the corpus. Can you explain a bit more about that?
Yeah. Okay, so the question was the whole structure of our fuzzing and the fact that you cannot reuse the corpus. Let me see if I actually have it already here. Okay, so the idea is that what we were using, which is arbitrary,
when it was taking the input from the fuzzer, it was also adding some randomness to it. So because it was random, basically every time it was writing the corpus, the file was introducing some randomness to it. So when the same input gets read again, it's not helping the same.
So where does the randomness come from? Where does the randomness come from? This is just how arbitrary it decided to implement. There is actually an issue in arbitrary that they are aware of the problem. They are not actually, it doesn't seem like they are just fixing it for some reason.
So what we ended up doing is that we ended up doing some custom serialization with info, which is also very well known in the Rust package. It's not much more difficult than it is. And it doesn't reduce.
Yeah. When you discover a bug with this fuzzing, does it transform into a unit test afterwards? The question is, when we discover a bug, does it transform into a unit test? Yeah. The way that we are fixing this kind of problems is that we are always adding the regression test for them,
just to make sure that they don't get it. There was another issue. I was wondering about the computational requirements. So how many cores are you using? How many cores we are using? Did you pass it? Yeah. Okay. So when you read this for two weeks, we actually used 96 cores.
In unit tests, I'm not, so when you're running on 40%, I don't know exactly how many. I think I have one. I don't know. But we've been running on 96 cores. There was another one.
One minute. What kind of color case does it try to shrink without placing smaller steps, breaking into a very long step? Oh, this is, I'm not going to, you know what, leave it less yet afterwards. Thanks.
Thanks.