Re-Targetable Grammar Bases Test Generation
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Alternative Title |
| |
Title of Series | ||
Number of Parts | 322 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/39701 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
00:00
Musical ensembleElectric generatorFormal grammarCASE <Informatik>Roundness (object)Software testingGoodness of fit
00:49
ParsingSoftware testingDialectRun time (program lifecycle phase)ParsingSource codeInformationFormal grammarFormal languageImplementationQuicksortPhysical systemDeterminantCASE <Informatik>AlgorithmQuery languagePerformance appraisalRecursive descent parserDifferent (Kate Ryan album)Black boxSoftware developerLine (geometry)WritingInformation securitySoftwareSequelComputer programmingTerm (mathematics)Multiplication sign
04:08
ParsingDifferent (Kate Ryan album)ParsingAreaFormal languageElectric generatorNetwork topologyAlgorithmRun time (program lifecycle phase)Abstract syntax treePoint (geometry)SubsetGroup actionMereologyLogicQuicksortProduct (business)Error messageInformationExploit (computer security)Pattern recognitionImplementationToken ringBlock (periodic table)Sheaf (mathematics)Formal grammarRule of inferenceData miningPoisson-KlammerState of matterCodeRegulärer Ausdruck <Textverarbeitung>String (computer science)Line (geometry)Vulnerability (computing)Representation (politics)Finite-state machineLaptopSemantics (computer science)Reverse engineeringConnectivity (graph theory)Process (computing)Arrow of timeComplex (psychology)RootContext awarenessContext-free grammarLatent heatNumberSingle-precision floating-point formatIntegerLink (knot theory)Right angleFitness functionCategory of beingMultiplication signConstructor (object-oriented programming)RecursionExterior algebraSoftware testingOperator (mathematics)Buffer solutionKnapsack problemCASE <Informatik>Stack (abstract data type)Validity (statistics)outputSyntaxbaumComputer animation
12:50
ParsingSoftware testingRepresentation (politics)BuildingImplementationAreaStack (abstract data type)CodeBuffer overflowElectric generatorSet (mathematics)Cartesian coordinate systemCASE <Informatik>Web 2.0Goodness of fitLengthComputer configurationFormal languageSpacetimeVariety (linguistics)Point (geometry)Bounded variationBit rateProjective planeQuicksortOpen sourceToken ringNetwork topology
15:09
Type theoryPoint (geometry)Software testingMultiplication signCartesian coordinate systemLatent heatParsingGreatest elementMedical imagingString (computer science)Perturbation theoryExistential quantificationTerm (mathematics)Right angleInformation1 (number)AreaCASE <Informatik>CodeLink (knot theory)Software bugBlogComputer programmingFile viewerProcess (computing)Goodness of fitStochastic processElectronic program guideOnline helpMereologyFitness function
18:20
Computer hardwareTask (computing)Software bugType theoryString (computer science)Medical imagingPhysical systemComputer animation
19:00
Type theorySpacetimeCartesian coordinate systemProcess (computing)Binary fileInheritance (object-oriented programming)NumberParsingMusical ensembleSheaf (mathematics)Point (geometry)String (computer science)Pattern languageCASE <Informatik>Multiplication signComputer fileCodeGoodness of fit
20:10
Complex (psychology)File formatComputer fileoutputTerm (mathematics)AreaOrder (biology)UsabilityDampingCodeType theoryFuzzy logicComputer animation
20:44
Type theoryCodeMereologyImage resolutionFormal grammarFunctional (mathematics)Formal languageSoftware testingComplex (psychology)outputProjective planePoint (geometry)Binary codeLangevin-GleichungSoftwareOpen sourceCrash (computing)Electric generatorComputer fileSummierbarkeitParsingString (computer science)Term (mathematics)CASE <Informatik>FeedbackControl flowMultiplication signComputer programmingPattern recognitionConnectivity (graph theory)Line (geometry)Semantics (computer science)Binary fileBuildingExpandierender GraphDifferent (Kate Ryan album)Web browserImplementationContext-sensitive languageDirection (geometry)ParsingCategory of beingLatent heat
27:26
Software testingFormal grammarCASE <Informatik>Computing platformSoftware testingComputing platformElectric generatorFuzzy logicComputer fileSyntaxbaumProcess (computing)Intermediate languageRule of inferenceNetwork topologyRootCombinational logicLibrary (computing)Programming languageFormal grammarConnectivity (graph theory)ParsingDebuggerDifferent (Kate Ryan album)Functional (mathematics)Data structureFront and back ends
29:16
LogicQuantificationStreaming mediaElectronic mailing listOperator (mathematics)Combinational logicLatent heatQuantificationType theoryLibrary (computing)Function (mathematics)LogicRight angleCross-platformFormal grammarCore dumpParsingTelecommunicationSoftware frameworkRule of inferenceComputer fileRootKeyboard shortcutPlanningFunctional (mathematics)Computing platformElectric generatorSocket-SchnittstelleFormal language
30:44
Demo (music)Loop (music)Computer programmingLaptopComputer animation
31:31
Multiplication signFormal grammarString (computer science)Formal languageBuildingSyntaxbaum1 (number)Graph (mathematics)RootComputer fileCASE <Informatik>Computer animation
32:10
Different (Kate Ryan album)Network topologyData structureMereologyElectric generatorRight anglePower (physics)CASE <Informatik>BuildingSoftware testingComputer animation
32:45
Infinite conjugacy class propertyMusical ensembleSoftware testingTouchscreenMereologyMultiplication signSeries (mathematics)Formal grammarParsingFormal languageCASE <Informatik>Projective planeRun time (program lifecycle phase)Error messageOracleCore dumpFront and back endsImplementationCombinational logicRight angleParsingGoodness of fitProcess (computing)AreaNegative numberPrice indexType theoryCycle (graph theory)Open sourceLoop (music)Source codeBounded variationGraph (mathematics)InformationPosition operatorToken ringCommunications protocolComputing platformTable (information)Sign (mathematics)Buffer overflowNetwork topologyJava appletComputer configurationPlanningQuery languageSheaf (mathematics)IdentifiabilityMathematicsCodeCartesian coordinate systemSelectivity (electronic)Keyboard shortcutString (computer science)Group actionSequelOvalCrash (computing)Electronic program guideInstance (computer science)Fitness functionReplication (computing)Arithmetic progressionElectric generatorContext-sensitive languageElectronic mailing listSummierbarkeitQuicksortComputer animation
42:15
PlanningSoftware testingSequelLibrary (computing)Musical ensembleSpacetimeFeedbackMereologySource codeFunction (mathematics)UsabilityPerformance appraisalLink (knot theory)Point (geometry)Client (computing)INTEGRALError messageCodeSemiconductor memoryCASE <Informatik>Pairwise comparisonBinary code
Transcript: English(auto-generated)
00:00
First, we're not going to let you deal with a hangover very well right now. We're going to start with Joe, who is going to talk about fuzzing. Yeah? Good stuff. Let's give our first speaker a big round of applause. Thank you.
00:24
Sorry.
00:42
My name is Joe Rosner, and this is Retargetable Grammar-Based Test Case Generation. So I want to start off with a story about how we got to where we are today. Ultimately, the lesson of the story is that testing parsers is really hard.
01:04
About four years ago, at my current company, I started implementing a SQL runtime and parser for a handful of different dialects of SQL. The goal of this was to evaluate SQL queries and determine whether or not they were dangerous.
01:27
In doing this, we built out mostly black box implementations of the vast majority of the SQL dialects out there. All told, it was something like 35,000 lines of ANTLR definition.
01:43
This is grammar definition, where you are defining what makes a language, and you pass it into a program, and it will generate you a parser that can recognize that language. Most of this was done through looking at poorly written documentation.
02:01
It's not documentation about how the system itself works, but documentation for a developer trying to learn how to write SQL queries and what's valid. Typically, they're incomplete. They don't have the information you need. You're kind of fumbling around trying to figure out just what it is that's correct
02:25
and what makes it correct. Things will be contradictory, and you never really know what the answer is. In a lot of cases, these are closed source systems. DB2, Oracle, MS SQL, so you can't just go to the source code for them.
02:46
We used ANTLR, but most of these tools, like Bison or YACC, things like that, they will use an LR style parser or recursive descent parser, which is vastly different than what ANTLR uses.
03:04
ANTLR uses an all-star parser, which is a top-down parser. The way that the actual algorithm works and you recognize a language from this grammar is going to be different. That leads to all sorts of weird cases where you might have two grammars that are identical in terms of definition,
03:24
but they're going to be recognized differently because of the algorithm that's being used. Most of the time, these lack public test cases because they're closed source software. The vendors don't care if you can verify that what they did is correct.
03:43
They just provide their software and wait for you to complain, and then they fix it, maybe. This issue that we ran into is really what gave birth to this idea and where it came from, but the security use cases for it are what vastly expanded the scope
04:01
and further drove the usefulness that I've found and why this has come to be. There's a few different problem areas in parsing when you're trying to verify whether your parser is correct. We're going to talk about the three of them here.
04:21
The first is overfit and underfit implementations. The goal of a parser is to recognize a language. The question of overfit and underfit is, am I correctly accepting and correctly rejecting sentences that are part of that language? If I have the reference implementation and it accepts something that mine doesn't,
04:43
I'm now underfit. I do not recognize the same language that the reference implementation does. If it doesn't recognize something that I do, then I'm overfit, and I still don't recognize the same language that it does.
05:00
This is a problem that is exacerbated by poor documentation and not having a firm understanding of what is correct and the internals of how the parser works, because without understanding exactly what the algorithm is that it's going through
05:23
and what it's using to determine whether or not the sentence is correct, it's really hard to replicate that same behavior and get to the point where you have something that is going to represent the same language. Without having public grammars and without knowing about the algorithm,
05:43
you're limited to reverse engineering and experimentation with the reference, trying out things, trying to discover what sort of behavior it's using, and basically just trial and error until you can get to that point where you really understand,
06:01
OK, this is how I need to define this section of the language, and go on from there. As I've already said, there's differences in parsing algorithms. If you've ever tried to reverse engineer one of these generated parsers, it's basically impossible. It's a giant state machine built out of go-tos, essentially, and you just have this state where you're constantly mutating it,
06:23
and it's a really bad time. You shouldn't do it. In these different parsing algorithms, they typically have different behavior for a handful of different properties, things like ambiguity. Are two different production rules going to be interpreted differently?
06:40
Do you have left recursion? Are you going to continually go and repeat a specific rule? And precedents. How are you expressing precedents for operators? These are all things that every single parser generator you use is going to handle differently. Taking a grammar from one to another is going to behave very differently in all of these areas.
07:05
And the worst part about all of this is that universally proving two context-free grammars are identical is undecidable. This is a problem that, for the general case, is not decidable, and you cannot write an algorithm that does this thing.
07:21
Now, we can slim down this problem, and we can take a subset of it, and we can maybe do it for that, but in the general case, this is not something that we can do. The next problem you worry about are tree generation flaws. So once you've recognized a sentence in a language,
07:41
typically your goal is to create a parse tree, or an abstract syntax tree. And basically what this does is it shows the syntactic relationship between various components or tokens of that sentence. So it has some kind of root node, which is the top point of the tree,
08:01
and going down there, this provides context for what a specific token is and how it relates to every other token in that sentence. And the whole process of building trees and working with trees is super complicated as you move from generator to generator,
08:23
runtime to runtime. Most parsers will require manual tree construction, and this is something that has kind of gotten better recently in newer tools like Antler, but typically you'll have a production rule which it will try to recognize,
08:41
and when it does, you'll then run a block of code. And that code is responsible for building up a node and then inserting it into the rest of the nodes at the right point to build that representation. And this is typically the point where you're going to see things like if you recognize a number,
09:00
you're going to parse that into a numeric representation for the language you're in. So the number 1234 gets read as decimal and converted into an integer in the language you're building this in. And we'll come into issues with that specifically shortly. This is an example of something like this.
09:22
This is a Lollipop, which is a parser generator for Rust. And basically on the left-hand side of the equals arrow, you have the different production rules. And so this is what it's trying to match, and this is what it's trying to recognize. And on the right side, you have the action that's being called.
09:40
So for each one of these going down, these are all alternates, so it's going to try to match. Oh. Sorry about that. I'm trying to juggle two laptops so I can get speaker notes, and it's not going so well. So these are all production rules.
10:01
And each one of these is an alternate, so it's going to try to match one of these. And everything on the left of the arrow is going to be a rule. Everything on the right is an action. So once it recognizes it, it calls the action, and that is what builds up the tree as it goes down or up, depending on the algorithm that's being used and whether it's top-down or bottom-up.
10:23
The third problem is unsafe and incorrect use of validated input. So once you've recognized a sentence and you think you understand what it's doing, what are you doing with it? Typically, recognition isn't the only step. It's typically the first step,
10:41
and many more steps you will take to use that information. And so most parsers will say, yes, this is a sentence, but don't necessarily say, yes, this is safe, or the information here is okay to use. So the only goal of a parser is to ensure
11:00
that a sentence is a member of that language. And so once you've recognized it, you have numbers, you have strings. What do you do with those strings later? And essentially what this comes down to is if you can get past the parser with something that is syntactically valid but is still something dangerous,
11:21
maybe down the line you copy this to a buffer somewhere. It's just back to any other kind of typical vulnerability exploitation at that point. Smashing the stack, logic flaws. And it's that simple. Your goal is to get past that point. It's the gatekeeper.
11:43
Reasons why this is especially bad is that defining grammars is really hard when you have complex languages. So you'll have places where you might have opaque tokens where you say, I need to read in a character set, for instance, like in a regex. But I'm just going to read that in as a string, anything between two square brackets.
12:02
And then later on I'm going to go ahead and figure out what that actually does. And this is super common. You'll see it all over real parsers. So you might recognize this as a sentence in the language, but this small subsection of that, you're not going to understand what it's actually doing or whether or not it has some bad value in it.
12:24
So these opaque tokens are kind of rough. And this ultimately leads to parsers not really having a lot of semantic information about the language they're recognizing. And there's just not really a good solution to this in many cases because of the tooling and because of the complexity of the languages.
12:40
And if you subscribe to the link set concept, we should be working to reduce the complexity to make this an easier problem so that some of these can go away. So how do we test this? It's great that we've identified the problem areas, but how do we do better? How do we ensure that our parser that we're building
13:01
is going to be a faithful representation of another reference implementation? Typically it comes down to just getting more test cases and getting to the point where you feel comfortable that your parser is recognizing the same set of sentences that another is.
13:21
And to do this, you need a shitload of sentences, like a massive amount. Having 100, 1,000, like 10,000 just really isn't enough, especially for large languages. So writing them by hand just really isn't a viable option. There's no way you're going to have enough variety
13:42
and enough targeted sentences that can cover that. It's just not realistic, unless you have people just cranking these out all day for hours. So something that we ended up doing was crawling the web and looking at Stack Overflow and just pulling out really bad examples,
14:02
going to open source projects and grabbing them out of there. And this is great because you can get a lot up until a certain point, but you have no guarantee that you're exercising different areas of it. You don't know really what your breadth is, how deep you're going, and there's not really good code coverage tools for this.
14:23
So you're kind of just assuming that you have all of this coverage. But at the end of the day, you don't really know. And even within a specific tree, like the individual values for tokens, just variation in that one space may have significant impact depending on what the application is doing with that value afterward.
14:44
So the need to generate test cases at a much higher rate is ultimately what I arrived at. And so the way to do this typically is using a fuzzer. And there's a few different styles of fuzzing that are pretty common today.
15:01
And all of these sort of had issues with the style of testing that we were trying to accomplish. So I'm going to walk through why that is and where these types of fuzzing are good for and how we can get to a point where we have something that is more targeted for this
15:21
and maybe be used for these more traditional types of fuzzing to augment them and make them more effective in what they do. So one of the most common ones you'll see today is AFL. This is like the hot fuzzer right now. And basically what this does is it focuses on path exploration
15:42
and code coverage to make sure that you are targeting as much of the application as possible. This isn't aware of any semantics or syntax. It is implemented not in a dumb, bad way, but it's dumb in terms of how it tries to explore.
16:03
So it instruments the application and watches as it goes through it. It will take some kind of seed value and mutates that. And it determines whether or not that mutation allows it to explore new areas. And if it does, it saves that and it continues down that path.
16:23
And if it doesn't, it throws it away as something that's not interesting. And then it repeats this process billions of times over and over until you stop it, basically, and bugs fall out. And so this can take a lot of time because it's using random mutation
16:41
and it's not very targeted and its whole goal is to just explore. So you might spend a lot of time in uninteresting code paths. And there are some ways to help guide it and cut off certain areas in some of these types of fuzzers. But for the most part, it's a very random process.
17:06
And in the case of trying to evaluate the specific types of tests that we want to, whether two parsers are implemented the same, it's not immediately clear how to build a really good test harness using something like AFL to do that.
17:22
The way that it determines whether or not something is interesting is by hooking the SIG abort and the SIG kill signals and then waiting for the program to essentially crash or terminate. And so getting the semantic information that we want to
17:41
about whether something's underfit or overfit or why did this test case succeed or fail and how does it compare to this other thing, it's not really immediately clear how you would encode that information into something like a test harness for this. And there's a great blog post showing some of the use cases where AFL actually can be used in some cases like this.
18:05
So basically the link at the bottom here walks through taking the string hello and passing mutations of that into an image viewer and basically getting from the point of the string hello to valid JPEGs.
18:20
And I have a few quotes that I want to share from this and basically just kind of showing where this is helpful and where this isn't helpful for this type of task. So the first image hit after about six hours on an eight-core system. And this is actually pretty good and pretty surprising. Going from nothing but the string hello to something that is a syntactically valid JPEG is pretty good.
18:46
I mean, this is not crazy hardware. It's something that anyone probably in this room has access to. And anyone who's a real adversary trying to find bugs, this is no problem to get access to.
19:01
Certain types of automatically executed checks with a large search space may pose an insurmountable obstacle to the fuzzer. And this is ultimately the real crux of the problem is that if you have something like this in your application where you're comparing some section of a sentence for a very specific value, a tool like AFL is going to do a really poor job
19:22
of figuring out what it takes to get past this check. Because it's using random mutation, it doesn't know what it needs to do to get to the string hacked by pigs here. Other types of fuzzers do a really good job of this, and we'll talk about those in a second. But this is a super common pattern you're going to see
19:43
in cases like shotgun parsers, binary file formats. And so things like checksums, magic numbers, offsets into files, these are all things that you don't want to spend time really exercising a lot of that, because you want to get past that typically to the point where something interesting will happen.
20:02
I mean, there are cases where just the code here might be interesting, but you really want to be able to get past that. In practical terms, this means the AFL fuzz won't have as much luck inventing ping files or non-trivial HTML documents from scratch.
20:20
And the reason is because these are very, very syntactic and complex file formats. There are aspects of it where things need to be in the right place, in the right order, and simply moving things around won't produce usable input. And so this is one of the areas that I'm really hoping to improve
20:41
by releasing this code today. The next type of fuzzing we're talking about is instrumentation and solving. And if you followed any of the CGC stuff, a lot of the solutions that were built for that used this approach as part of their solution. And like AFL, these also will focus on path exploration and code coverage,
21:03
but they do so in a different way. Rather than trying random mutations, they convert the problem into something that they can execute some amount of, and when they get to a point where there's some kind of input or something that is non-deterministic, they convert that over to a symbolic value.
21:23
And then you use something like a satisfiability solver to identify what it will take to move past that branching point. Generate me a value that will take each path and will allow me to explore those. So this is how you would solve that problem of very specific strings
21:42
or checksums or anything like that where you need to know how to get past this step. It will provide something that will solve that problem for you and then generate you an input that matches that. This still doesn't care about syntax or semantics. It still has no real concept of what any of that is or what the language is, even for that matter.
22:01
It's just trying to figure out how to get deeper into the program or to a point that you want in that program. Even with these, they're still not typically easy to build test harnesses for. A lot of this is still automated, but you could probably hack in a lot of more interesting stuff in it if you wanted. Great examples of this are Klee open source software.
22:25
There's the mayhem paper, which is really great, and you should go read it if you haven't. And then the last big type that we'll be talking about today is grammar-based. This isn't a new concept.
22:40
It's something that has existed, and there are a handful of projects out there that do it. But there's a whole bunch of problems with the approaches that have been built in terms of what we were looking for. What these do is it takes a grammar file, and it will generate you input that is syntactically correct based on that grammar file. Some of the problems with this are that
23:02
they typically provide their own grammar language. So when you define a grammar, you have to write it in their language. And if you have 10, 15,000 lines of grammar for a specific language, do you really want to go and translate all of that definition over to another language? You're doing it by hand, typically, which means that you might make mistakes.
23:21
You might define something that's not the same. And just the semantics of that language may not translate very well to the new language for this fuzzer. These are mostly targeted toward regular and context-free languages. And this is fine. A lot of languages do fall into that category.
23:42
But for more complex languages, things like really complex binary file formats, context-sensitive languages, these are just not applicable, and they're not going to help you. They're typically not byte-oriented, so you're kind of limited to a lot of text-based languages.
24:01
And so you're limited in what you can do with these and the use cases you can have for them. Mozilla's published Dharma, which is their tool. I believe it's primarily used for testing out recognition of things like CSS, HTML, JavaScript, and ensuring that things don't break as they test their browser.
24:23
But these are the main types you'll see, and we're going to expand on the grammar-based one shortly. And last, I've been talking a lot about traditional test case generation and just how you will go about actually testing and using what you generate as your input.
24:45
And so, I've already said there's a lot of inflexibility with being able to design complex test cases and using those in a meaningful way to compare two different implementations or ask questions about how closely one matches to another
25:01
or what's being done with that. And you're kind of limited just to does the application crash and identifying a few different things within it. You also don't get a lot of flexibility in defining a feedback loop, and this kind of falls into the prior discussion. You can't really feed in more than does this explore more
25:20
or did this do something interesting? And so being able to have more flexibility around that is something that's really nice. And so, there are many tools out there that do some parts of this, but we can do better. Ideally, we want something that's more flexible,
25:41
something that we can really program our own test harnesses, something that we can do whatever we want. We can design a test that does whatever we want for any case that we might have. We want to be able to use the grammars we've already written. Why should we have to translate these, especially by hand, to something that isn't used by almost anyone?
26:01
And isn't going to provide any meaningful value to us other than our test generation. It should be expressive enough for everything that most of the other fuzzers out there do right now, so covering context free and regular, but we want to support context sensitive. We want to do text and binary. We want the ability to kind of provide value
26:22
for any kind of language that we can think about and might want to test. Being embeddable from any language that we might want to use. I don't want to have to go out into C specifically and write this. If I'm building something in Python or Ruby, I want to be able to just call this directly.
26:41
If I have functionality around the parser generator that I'm using in that language, I want to be able to do it directly in there. I don't want to have to go and build a whole separate ecosystem to bring this over to somewhere else. And we should focus on as much code reuse as possible. If I have a whole bunch of components built,
27:01
I don't want to reinvent the wheel every single time I want to design a new test or I want to do something new with this. The project I'm sharing today isn't the first. It's not the most feature complete or even the most novel solution to this, but it is more flexible and it is more extensible
27:22
and it is something that can get to the point where it is most of those things. I want to share with you SynFuzz. I'm releasing it today and basically this is a platform for building test generation and test harnesses and fuzzers.
27:42
It's organized like this. At the center, you have an intermediate representation, which is a bunch of data structures and some helper functions that allow you to construct complex grammars. Outside of that, you will have your various front ends for your different parser generators and combinator libraries. Things like ANTLR and RAGEL and eBNF, BISEN.
28:04
Essentially, what these do is you take a grammar file with these front ends, it parses them, and it translates them into the intermediate representation. The test harnesses are reusable components that you design and these can be as simple or as complex as you want.
28:23
You have the full availability of any programming language you want to use and you can design a test to do whatever you want with these. You consume the intermediate representation. You give it a starting rule that is the root of your tree and it will start generating you parse trees and then from the parse trees will generate you sentences based on that.
28:45
Ideally, the only piece that you as an end user will have to build is the test harness. The reason being that it likely is going to be something that's non-trivial. There probably could be very reasonable components for this and that would make the job easier for you,
29:02
but that is ideally the only piece that needs to be built for you. ANTLR is currently the only front end that I've written, but I'm planning to expand pretty heavily out from there. This is a list of a handful of the features that you get.
29:24
Like most combinator libraries, you're going to get a lot of logical operations as well as quantification, specific values for generating specific types of values. Right now, everything is byte-oriented, so it has Unicode support,
29:41
but it will output to a byte stream. Using the framework is something like this. The core is written in Rust, but the plan is to release C bindings that can be brought into basically any language. Essentially, the way this works is you bring in a grammar file and you read it,
30:02
you pass it into a generateRules function, and it will take that grammar and parse that into all of the rules that are expressed by that grammar. You then provide it a root node and tell it to generate. Then you spit out the generated string and do whatever you want with it.
30:24
Any communication is available with this, whether you want to use sockets, pipes, files. It builds cross-platform and can likely target most embedded platforms, which is cool. If you wanted to do anything with this, pretty much anywhere you can.
30:44
I have a quick demo. If you'd like a live demo, you can catch me afterward, and I can show you more. Essentially, what this does is, you'll see here, I basically have a while loop.
31:01
That is really hard to read. I'm sorry. Is any of that legible? Cool. I'm not allowed to plug my laptop in, but I will talk you through it,
31:21
and if you'd like to see it, I can show you after. Basically, the way that it works is, I have a while loop here, and it's calling the program. What it's doing is, it's generating a string and then passing that into Antler's parsing tool,
31:41
where you specify a grammar and a root node. It will try to recognize that language and then visualize the parse tree that it's building. In this case, I'm generating a dot, so like graphis files. It's basically just running through, trying to generate new ones every single time.
32:01
You can see that it gives me a nice little parse tree here. It'll happen again and again and again. It might be kind of slow. You see, it's a totally different tree. It's bigger. It has different values in it. The structure is different.
32:25
The generation is actually very fast. The slow part right here is parsing it for Antler itself. It is multi-threaded, thread-safe, so you could spread this out as horizontally as you want to and put as much power on this as you want to build test cases.
32:47
That's probably pointless to keep showing because no one can read any of it.
33:05
Designing test harnesses. This is the part that you probably will care about. Let's see a few different ways we might want to design test harnesses for something like this. The first most common case is, does it crash?
33:21
This is what most fuzzers will do and why you're probably using a fuzzer in the first place. I want to know when this crashes and identify whether this is something that's exploitable. Basically, the way this works is you will start the process, you will generate a string of some sort, and you will feed that into the application.
33:43
You have to listen for the sigsegv or sigabort signals and basically just watch for it to crash. When it does, you get your core files and you investigate the crash, get your write what where, see if it's something that's meaningful to you. This is the most basic case
34:01
and something that you will get with any other fuzzer. It's pretty straightforward, pretty simple. There's tons of examples out there for it. You basically just register a sigaction handler on the Unix side and wait for it to do one of those things.
34:20
Now we get into the more interesting cases, and these are the cases that really gave birth to this project. Identifying whether a reimplementation of a grammar is overfit for a reference implementation. To do this, you would generate a test case using your own reimplementation grammar,
34:40
and you find an oracle within the reference implementation that specifies whether you're getting something that is either accepted and is valid, whether there's a syntactic error or there's some kind of runtime error that's happening, not because of a failure in parsing and understanding the sentence,
35:01
but a failure because of something happening in the runtime itself. You feed the test case in, and then you categorize what comes out. If it succeeds, everything is good. You have a sentence that's valid, your parser recognizes it, and the reference recognizes it.
35:20
If there is a failure, if the failure is a syntax error in the reference implementation, your parser is overfit. You are recognizing things that the reference does not. That's an indication that you probably need to redefine your grammar and change it to be more like that.
35:42
It should give you an idea of where those problems are, so it might guide you toward what specifically is causing that. You can manually go and change it to identify whether or not it's meaningful or modify the sentence that it spits out so that you can identify maybe why you're wrong.
36:00
There's not really good tooling around doing that yet other than trying to get good error messages from the reference. And parser error messages are usually pretty terrible, so you're kind of going to be on your own on that one. If the failure is a runtime error, it may or may not be a sign of being overfit or underfit. This is especially common because of the opaque tokens that I spoke about.
36:23
Basically, it might recognize something that is syntactically valid, but one of the tokens in there is totally batshit insane, and it recognized it because it uses an opaque token in that place. It also could just be an environmental thing.
36:41
In the case of MySQL, if you've ever written queries for it, you're probably very used to these types of errors. The top one is a syntactic error. You see the top query here, the select star from A, where id and a bunch of carets, 3. This tells you specifically that this is a syntax error.
37:00
This did not parse, and it tells you exactly where it is. This is an example of the case where your parser, if it generated this, is just wrong. That should never happen. Below that, however, is a runtime error, where select star from A, where fake column equals 3.
37:21
This is a runtime error because one of the tables referenced here doesn't exist. This is something that the parser passed, and it generated something that's likely a valid sentence in that language. But because of the environmental issues with what it is doing with that sentence,
37:43
it wasn't able to do that. This is probably a false positive in that case. Having the Oracle that can tell you about these things allows you to really cut down on false positives in this case. The next is underfit. This is the other big problem that you will likely worry about.
38:04
To do this, you generate a test case from the reference. This is something you can only do if you have the reference implementation's grammar. In the case of MySQL or Postgres, SQLite, you have that grammar definition, and this is something that's open source. For something like DB2, Oracle, MSSQL,
38:23
this isn't a viable option. You can't do that because you don't have the source code for that grammar. But you would parse it with the re-implementation once it generated a test case from the reference and see if your parser can read it. And then you categorize it.
38:40
If it fails, then the re-implementation is underfit. You've generated a sentence that is valid in the reference implementation that is not valid in yours. If it succeeds, then the re-implementation is likely correct for that sentence and that section of the grammar that you have used.
39:04
So what's ready today? The code is up here. It will be made unprivate after this. Crates are available for people who want to use Rust, and bindings will be released over the remainder of the year for as many languages as people are interested in
39:23
and I have time to build or other people have time to build. MIT licensed. You can submit pull requests, and I will happily accept them. Right now, you get an antler-for frontend. It mostly is functional. There's a few little things that aren't supported yet, but you can get the vast majority of many of the common types of parsers
39:45
you will generate with antler. What's next? There's a lot. The biggest area right now that's a huge issue is dealing with cycle detection and forced progression. So with recursive grammars, which are pretty common,
40:04
it does a really bad job with ensuring that it doesn't get stuck in a loop because it's essentially building a graph and walking that. So getting cycle detection in there to move forward and not get stuck is at the top of the list. Basically, it will just run and then get a stack overflow
40:21
and basically crash. But for a lot of more simpler grammars, you're not going to have a problem here, and this is immediately useful now. Like I said, exposing the C API and language bindings, it's built in Rust, but this will be available. I think early plans for languages right now
40:40
are going to be to Java and probably Go, probably Python and Ruby as well in the near future. The negation logic exists, but it's not great. The problem with negation is it's not always clear exactly what the opposite of something like one of the combinators is,
41:03
and so pretty much everything has negation logic right now, but it could be better. You don't get quite as much entropy as I'd like, and so if you're using a lot of negation, you might just not get as much variation. Context-sensitive support and being able to do
41:22
introspective things on the data that's being generated. So if you need to generate checksums that are valid, things like that, having support to do that. As I said, it's all byte-oriented right now. I might be interested in building out bit-level support. There's a lot of wire protocols that have some of that,
41:41
and it would be useful for that. Additional front-ends, this is the big one also. Bringing support for more platforms is a real goal to make this significantly more useful. And then grammar coverage information. How much of this grammar have I explored, and how much time have I spent in a specific area?
42:03
And then being able to reuse a tree and change individual tokens so I can just change the areas that are going to be dynamic in that and really, really hammer sections of code if I care about that. Again, this is the link if you're interested in the code.
42:20
And questions? Sure. So how do you implement an Oracle?
42:41
Basically, it comes down to the integration that you build in your test harness. So in the case of MySQL, we're literally using the MySQL library that is the client, and the error messages coming out of that will provide us the Oracle, and we just do comparison for what is being provided there.
43:01
That's something you're going to have to find if you have source code as possible. If there's something you can do to inspect the binary or the memory space or just get meaningful output of it, it usually comes down to just finding something in there. And if you can't, then you're kind of just fucked.
43:28
Yeah. I don't yet. The plan is to start doing some of that research and putting it together. This is something I've been working on for maybe the last eight months or so, and it's at the point now where it's usable
43:43
in a meaningful way for us. But like I said, part of the problem for us was building test harnesses that gave us meaningful feedback. But I do plan to start doing evaluation of that, and I want to have something published probably next year to actually find out whether or not this actually is useful
44:01
and provides more value than something like that. Anyone else? Cool. Thank you.