Re-Targetable Grammar Bases Test Generation

Video thumbnail (Frame 0) Video thumbnail (Frame 1224) Video thumbnail (Frame 6200) Video thumbnail (Frame 19245) Video thumbnail (Frame 22733) Video thumbnail (Frame 27510) Video thumbnail (Frame 28496) Video thumbnail (Frame 30259) Video thumbnail (Frame 31107) Video thumbnail (Frame 41157) Video thumbnail (Frame 43901) Video thumbnail (Frame 46090) Video thumbnail (Frame 47280) Video thumbnail (Frame 48246) Video thumbnail (Frame 49122) Video thumbnail (Frame 63382)
Video in TIB AV-Portal: Re-Targetable Grammar Bases Test Generation

Formal Metadata

Title
Re-Targetable Grammar Bases Test Generation
Alternative Title
Synfuzz Building a Grammar Based Retargetable Test Generation Framework
Title of Series
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2018
Language
English

Content Metadata

Subject Area
Electric generator Roundness (object) Personal digital assistant Formal grammar Software testing Musical ensemble
Implementation Parsing Run time (program lifecycle phase) Sequel Multiplication sign Source code Black box Computer programming Formal language Different (Kate Ryan album) Term (mathematics) Software testing Determinant Information security Physical system Recursive descent parser Dialect Algorithm Information Software developer Line (geometry) Parsing Performance appraisal Software Query language Personal digital assistant Formal grammar Quicksort Writing
Complex (psychology) Group action Context awareness Parsing Run time (program lifecycle phase) State of matter Code Multiplication sign Sheaf (mathematics) Mereology Stack (abstract data type) Semantics (computer science) Subset Formal language Different (Kate Ryan album) Single-precision floating-point format Arrow of time Recursion Error message Vulnerability (computing) Area Algorithm Electric generator Context-free grammar Block (periodic table) Constructor (object-oriented programming) Fitness function Regulärer Ausdruck <Textverarbeitung> Parsing Abstract syntax tree Category of being Exterior algebra Process (computing) Buffer solution output Right angle Quicksort Reverse engineering Point (geometry) Laptop Implementation Link (knot theory) Token ring Connectivity (graph theory) Syntaxbaum Rule of inference Knapsack problem Product (business) Number Latent heat Root Operator (mathematics) String (computer science) Representation (politics) Integer Software testing Validity (statistics) Information Poisson-Klammer Line (geometry) Exploit (computer security) Personal digital assistant Logic Network topology Formal grammar Finite-state machine
Area Point (geometry) Implementation Building Electric generator Code Token ring Variety (linguistics) Projective plane Set (mathematics) Cartesian coordinate system Parsing Formal language Web 2.0 Goodness of fit Bit rate Personal digital assistant Computer configuration Network topology Representation (politics) Software testing Quicksort Bounded variation Spacetime
Point (geometry) Existential quantification Greatest element Parsing Link (knot theory) Code Multiplication sign Electronic program guide 1 (number) Online help Mereology Computer programming Software bug Medical imaging Latent heat Goodness of fit Term (mathematics) String (computer science) Software testing Area Information Fitness function Cartesian coordinate system Type theory Process (computing) Personal digital assistant Blog File viewer
Medical imaging Type theory String (computer science) Computer hardware Task (computing) Software bug Physical system Spacetime
Point (geometry) Area Complex (psychology) Parsing Inheritance (object-oriented programming) Computer file Code File format Multiplication sign Sheaf (mathematics) Cartesian coordinate system Binary file Number Usability Type theory Process (computing) Personal digital assistant Term (mathematics) String (computer science) Order (biology) output Pattern language Damping Musical ensemble
Point (geometry) Complex (psychology) Functional (mathematics) Implementation Building Computer file Open source Code Connectivity (graph theory) Multiplication sign Expandierender Graph Control flow Web browser Mereology Binary file Semantics (computer science) Computer programming Formal language Langevin-Gleichung Crash (computing) Term (mathematics) Different (Kate Ryan album) String (computer science) Software testing Pattern recognition Electric generator Projective plane Binary code Feedback Line (geometry) Parsing Type theory Software Personal digital assistant Formal grammar output Summierbarkeit
Functional (mathematics) Parsing Computer file Connectivity (graph theory) Combinational logic Syntaxbaum Rule of inference Front and back ends Intermediate language Root Different (Kate Ryan album) Personal digital assistant Software testing Data structure Computing platform Formal grammar Programming language Electric generator Debugger Process (computing) Network topology Formal grammar Computing platform Fuzzy logic Software testing Library (computing)
Functional (mathematics) Socket-Schnittstelle Parsing Computer file Quantification Combinational logic Streaming media Function (mathematics) Rule of inference Formal language Latent heat Root Operator (mathematics) Core dump Logic Software framework Computing platform Electric generator Quantification Keyboard shortcut Electronic mailing list Planning Type theory Logic Telecommunication Formal grammar Library (computing)
Loop (music) Demo (music) Computer programming
Graph (mathematics) Root Computer file Personal digital assistant Multiplication sign String (computer science) Formal grammar Syntaxbaum 1 (number) Formal language
Building Electric generator Different (Kate Ryan album) Personal digital assistant Network topology Right angle Software testing Data structure Mereology Power (physics)
Group action Run time (program lifecycle phase) Parsing Java applet Code Multiplication sign Source code Combinational logic Sheaf (mathematics) Mereology Replication (computing) Formal language Sign (mathematics) Computer configuration Oval Core dump Negative number Series (mathematics) Error message Oracle Area Touchscreen Electric generator Keyboard shortcut Electronic mailing list Fitness function Price index Instance (computer science) Parsing Type theory Process (computing) Right angle Summierbarkeit Cycle (graph theory) Quicksort Arithmetic progression Bounded variation Implementation Sequel Open source Token ring Electronic program guide Context-sensitive language Goodness of fit Crash (computing) String (computer science) Software testing Computing platform Information Projective plane Planning Cartesian coordinate system Loop (music) Personal digital assistant Query language Network topology Infinite conjugacy class property Formal grammar Musical ensemble Table (information) Communications protocol Buffer overflow
Point (geometry) Link (knot theory) Sequel INTEGRAL Code Feedback Source code Planning Function (mathematics) Client (computing) Mereology Usability Performance appraisal Personal digital assistant Semiconductor memory Software testing Musical ensemble Error message Spacetime Library (computing)
first we're not going to let you deal with a hangover very well right now we're gonna start with a Joe who is going to talk about fuzzing yeah good stuff let's give our first speaker a big round of applause [Applause] [Music] cool my name is Joe Rosner and this is retarget of all grammar based test case generation so I want to start off with
the story about how we got to where we are today and ultimately the lesson of the story is that testing parsers is really hard about four years ago at my
current company I started implementing a sequel runtime and parser for a handful of different dialects of sequel and the goal of this was to evaluate sequel queries and determine whether or not they were dangerous and in doing this we built out mostly black box implementations of you know the vast majority of the sequel dialects out there and all told it was something like 35 thousand lines of antler definition so this is like grammar definition where you are defining what makes a language and you pass it into a program and it will generate you a parser that can recognize that language most of this was done through looking at poorly written documentation it's not documentation about how the system itself works but documentation for a developer trying to learn how to write sequel queries and what's valid typically they're incomplete they don't have the information you need and you're kind of fumbling around trying to figure out just what it is that's correct and and what makes it correct things will be contradictory and you never really know what the answer is and in a lot of cases these are closed source systems db2 Oracle MS sequel so you can't just like go to the source code for them now we used antler but most of these tools like bison or yak things like that they will use an LR style parser or recursive descent parser which is vastly different than what antler uses antler uses an all-star parser which is a top-down parser and so the way that the actual algorithm works and you recognize a language from this grammar is going to be different and that leads to all sorts of weird cases where you might have two grammars that are identical in terms of definition but they're going to be recognized differently because of the algorithm that's being used and most of the time these lack public test cases because they're closed source software the vendors don't care if you can verify that what they did is correct they just provide their software and wait for you to complain and then they fix it maybe so this issue that we ran into is is really what gave birth to this idea and and where it came from but the security use cases for it or what vastly expanded the scope and kind of further drove the usefulness that I've found and why this has kind of come to be so as a few
different problem areas in parsing when you're trying to verify whether your parser is correct and we're going to talk about three of them here the first
is over fit and under fit implementations so the goal of a parser is to recognize a language and the question of over fit and under fit is am I correctly accepting and correctly rejecting sentences that are part of that language so if I have the reference implementation and it accepts something that mine doesn't I'm now under fit I do not recognize the same language that the reference implementation does and if it doesn't recognize something that I do then I'm over fit and I still don't recognize the same language it does and this is a problem that is specifically based not specifically based but it's it's a it's a problem that is exacerbated by poor documentation and not having a firm understanding of what is correct and the internals of how the parser works because without understanding exactly what the algorithm is that it's going through and what it's using to determine you know whether or not the sentence is correct it's really hard to replicate that same behavior and get to the point where you have something that is going to represent the same language and so without having like public grammars and without knowing about the algorithm you're limited to reverse engineering and experimentation with the reference trying out things trying to discover what sort of behavior it's using and basically just trial and error until you can get to that point where you really understand okay this is how I need to define this section of the language and go on from there as I've already said there there's differences in parsing algorithms you've ever tried to like reverse engineer like one of these like generated parsers it's it's basically impossible it's a giant state machine built out of go twos essentially and you just kind of have this state where you're constantly mutating it and it's a really bad time like you just you shouldn't do it and in these different parsing algorithms they typically have different behavior for a handful of different properties things like ambiguity are two different production rules going to be interpreted differently do you have left recursion are you going to continually go and repeat a specific rule and precedents you know how are you expressing precedents for for operators these are all things that every single parser generator you use is going to handle differently and so taking a grammar from one to another is going to behave very differently in all of these areas and the worst part about all of this is that universally proving two context-free grammars are identical is undecidable this is a problem that for the general case is not decidable and you cannot write an algorithm it does this thing now we can we can slim down this problem you can take a subset of it and we can maybe do it for that but in the general case this is not something that we can do the next problem you worry about our tree generation flaws so once you've recognized a sentence in a language your goal is to create a parse tree or a knapsack an abstract syntax tree and basically what this does is it shows the syntactic relationship between various components or tokens of that sentence so it's has some kind of root node which is the top point of the tree and going down there this provides context for what a specific token is and how it relates to every other token in that sentence and the whole process of building trees and working with trees is super complicated as you move from generator generator run time to run time most partners will require manual tree construction and this is something that has kind of gotten better recently in newer tools like antler but typically you'll have a production rule which it will try to recognize and when it does you'll then run a block of code and that code is responsible for building up a node and then inserting it into the rest of the nodes at the right point to build that representation and this is typically the point where you're gonna see things like if you recognize a number you're going to parse that into a numeric representation for the language you're in so the number one two three four gets read as decimal and convert it into an integer in the language you're building this in and we'll come into you know issues with that specifically shortly this is an example of something like this this is a lollipop which is a parser generator for rust and basically on the left-hand side of the equals arrow you have the different production rules and so this is what it's trying to match and this is what it's trying to recognize and on the right side you have the action that's being called so for each one of these going down these are all alternates so it's going to try to match oh sorry about that I'm trying to juggle two laptops so I can get speaker notes and it's not going so well these are these are all production rules and each one of these is an alternate so it's going to try to match one of these and everything on the left of the arrow is going to be a rule everything on the right is an action so once it recognizes it it calls the action and that is what builds up the tree as it goes down or up depending on the algorithm that's being used and whether it's top-down or bottom-up the third problem is unsafe and incorrect use of validated input so once you've recognized a sentence and you've think you understand what it's doing what are you doing with it typically recognition isn't like the only step it's typically the first step in many more steps you will take to use that information and so most parsers will say yes this is a sentence but don't necessarily say yes this is safe or the information here is okay to use so the only goal of a parser is to ensure that a sentence is a member of that language and so once you've recognized it you have numbers you have strings what do you do with those strings later and essentially what this comes down to is if you can get past the parser with something that is syntactically valid but it's still something dangerous maybe down the line you copy this to a buffer somewhere it's just back to any other kind of typical vulnerability exploitation at that point smashing the stack logic flaws and it's it's that simple I mean it's like your goal is to get past that point it's the gatekeeper reasons why this is especially bad is that a lot of defining grammars is really hard when you have like complex languages so you'll have places where you might have copaque tokens where you say I need to read in a character set for instance like in a regex but I'm just going to read that in as a string anything between like two square brackets and then later on I'm gonna go ahead and figure out what that actually does and this is super common you'll see it all over real parsers so you might recognize this as a sentence in the language but this small subsection of that you're not going to understand like what it's actually doing or whether or not it has some like bad value in it so these opaque tokens are kind of rough and this this open it leads to parsers not really having a lot of semantic information about the language they're recognizing and there's just not really a good solution to this in many cases because of the tooling and because of the complexity of the languages and if you subscribe to the link set concept you know we should be working to reduce the complexity to make this an easier problem so that some of these can go away so how do we test this if it's
great that we've identified the problem areas but how do we do better how do we ensure that our parser that we're building is going to be a faithful representation of another reference implementation typically it comes down
to just getting more test cases and getting to the point where you feel comfortable that your parser is recognizing the same set of sentences that another is and to do this you need a shitload of like that of uh sentences like a massive amount like it having a hundred a thousand like ten thousand like just really isn't enough especially for large languages so writing them by hand just really isn't a viable option like there's no way you're gonna have enough variety and enough targeted sentences that can cover that like it's just not realistic unless you have people just cranking these out all for four hours so something that we ended up doing was crawling the web and looking at like Stack Overflow and just pulling out really bad examples going to open-source projects and grabbing them out of there and this is great because you can get a lot up until a certain point but you have no guarantee that you're exercising different areas of it you don't know really what your breath is how deep you're going and there's not really good code coverage tools for this so you're kind of just assuming that you have all of this coverage but at the end of the day you don't really know and even within a specific tree like the individual values for tokens just variation in that one space may have significant impact depending on what the application is doing with that value afterward so they need to generate test cases at a much higher rate is ultimately what I arrived at and so the way to do this typically is using a buzzer and there's a few different styles of fuzzing that are pretty common today and all of these sort of had issues with the style of testing that we were trying to accomplish and so I'm kind of walk through why that is and
where these types of fuzzing are good for and how we can get to a point where we have something that is more targeted for this and maybe be used for these more traditional types of fuzzing to augment them and make them more effective in what they do so one of the
most common ones you'll see today is is AFL now this is like the hot professor right now and and basically what this does is it focuses on path exploration and and code coverage to make sure that you are targeting as much of the application as possible this isn't aware of any semantics or syntax it is implemented not in a dumb bad way but it's it's dumb in terms of how it tries to explore so it instruments the application and watches as it goes through it it will take some kind of seed value and mutates that and it determines whether or not that mutation allows it to explore new areas and if it does it saves that and it continues down that path and if it doesn't it throws it away as something that's not interesting and then it repeats this process billions of times over and over until you stop it basically and bugs fall out and so this can take a lot of time because it's using random mutation and it's not very targeted and its whole goal is to kind of just like explore so you might spend a lot of time in in uninteresting code paths and there there are some ways to help kind of guide it and and cut off certain areas in some of these types of fuzzers but for the most part it's a very random process and in the case of trying to evaluate the specific types of tests that we want to is whether two parsers are implemented the same it's not immediately clear how to build a really good test harness using something like AFL to do that like the way that it determines whether or not something is interesting is by hooking the cig abort and the the cig kill signals and then waiting for the program to essentially like crash or terminate and so getting the semantic information that we want to about whether something's under fit or over fit or you know why did this test case succeed or fail and how does it compare to this other thing it's not really immediately clear how you would encode that information into something like a test harness for this and there's a great blog post showing some of the use cases where AFL actually can be used in some cases like this so basically the link at the bottom here walks through taking the string hello and passing mutations of that into a image viewer and basically getting from the point of the string hello to valid JPEGs and I
have a few quotes that I want to share from this and basically just kind of showing you know where this is helpful and where this isn't helpful for this type of task so the first image hit after about six hours on an eight-core system and this is actually pretty good and pretty surprising going from nothing but the string hello to something that is a syntactically valid JPEG is like pretty good I mean this is not crazy hardware it's something that you know anyone probably in this room has access to and anyone who's a real adversary trying to find bugs like this is no problem to get access to certain types
of automatically executed checks with a large search space may pose insurmountable obstacle to the fuzzer and this is ultimately like the real crux of the problem is that if you have
something like this in your application where you're comparing some section of a sentence for a very specific value a tool like AFL is going to do a really poor job of figuring out what it takes to get past this check because it's using random mutation it doesn't know what it needs to do to get to the string hacked by pigs here so there's other types of fuzzers do a really good job of this and I'll talk about those in a second but this is a super common pattern you're gonna see in cases like shotgun parsers binary file formats and so things like checksums magic numbers offsets into files these are all things that you you don't want to spend time really exercising a lot of that because you want to get past that typically to the point where something interesting will happen I mean there are cases where just the code here might be interesting but you really want to be able to get past that [Music]
in practical terms this means the AFL FFA's won't have as much luck inventing ping files or non-trivial HTML documents from scratch and the reason is because these are very very syntactic and complex file formats there are aspects of it where things need to be in the right place in the right order and simply moving things around won't produce usable input and so this is one of the areas that I'm really hoping to improve by releasing this code today the
next type of fuzzing we're talking about is instrumentation and solving and if you followed any of the cgc stuff a lot of the solutions that were built for that used this approach is part of their solution and like AFL like these also will focus on path exploration and code coverage but they do so in a different way rather than trying random mutations they convert the problem into something
that they can execute some amount of and when they get to a point where there's some kind of input or something that is non-deterministic they convert that over to a symbolic value and then you use something like a satisfiability solver to identify what it will take to move past that branching point generate me a value that will take each path and will allow me to explore those so this is how you would solve that problem of very specific strings where you need to know or check sums or anything like that where you need to know how to get past this step is it will provide something that will solve that problem for you and then generate you an input that matches that this still doesn't care about syntax or semantics it still has no real concept of what any of that is or what the language is even for that matter it's just trying to figure out how to get deeper into the program or to a point that you want in that program even with these they're still not typically easy to build test harnesses for a lot of this is still automated but you could probably hack in a lot of more interesting stuff in it if you wanted great examples of this are our CLE open source software there's a lick the mayhem paper which is really great and you should go read it if you haven't and then the the last big type that we'll be talking about today is grammar based and this isn't a new concept it's something that has existed and there are a handful of projects out there that do it but there's a whole bunch of problems with the approach it with the approaches that have been built in terms of what we were looking for so what these do is it takes a grammar file and it will generate you input that is syntactically correct based on that grammar file and some of the problems with this are that they typically provide their own grammar language so when you define a grammar you have to write it in their language and if you've have ten fifteen thousand lines of grammar for a specific language do you really want to go and translate all of that definition over to another language you're doing it by hand typically which means that you might make mistakes you might define something that's not the same and just the semantics of that language may not translate very well to the new language for this fuzzer these are mostly targeted toward regular and context-free languages and this is fine a lot of languages do you fall into that category but for more complex languages things like really complex binary file formats contact sensitive languages like these are just not applicable and they're not going to help you they're typically not byte oriented so you're kind of limited to a lot of text-based languages and so you're limited in what you can do with these and the use cases you can have for them Mozilla's published dharma which is their tool I believe it's primarily used for testing out recognition of things like CSS HTML javascript and ensuring that things don't break as they test our browser but these are like the main types you'll see and we're gonna expand on the grammar based one shortly and last I've been talking a lot about traditional test case generation and just how you will go about actually testing and using what you generate as your input and so I've already said there's a lot of in flexibility with being able to design complex test cases and using those in a meaningful way to compare two different implementations or ask questions about how closely one matches to another or what's being done with that and you're kind of limited just to kind of does the application crash and identifying a few different things within it you also don't get a lot of flexibility in defining a feedback loop and this kind of falls into the prior discussion you can't really feed in more than like you know does this explore more or does this did this do something interesting and so being able to have more flexibility around that it's something that's really nice and so there are many tools out there that do some parts of this but we can do better ideally we want something that's more flexible something that we can really program our own test harnesses something that we can do whatever we want we can design a test that does whatever we want for any case that we might have we want be able to use the grammars that we've already written why should we have to translate these especially by hand to something that isn't used by almost anyone and isn't going to provide any meaningful value to us other than our test generation it should be expressive enough for everything that most of the other fuzzers out there do right now so covering context-free and regular but we want to support context-sensitive we want to do text and binary we want the ability to kind of provide value for any kind of language that we can think about and might want to test being embeddable from any language that we might want to use I don't want to have to go out into sea specifically right this if I'm building something in Python or Ruby I want to be able to just call this directly if I have functionality around the parser that generator that I'm using in that language I want to be able to do it directly in there I don't want to have to go and build a whole separate ecosystem and to bring this over to somewhere else and we should focus on as much code reuse as possible you know if I have a whole bunch of components built I don't want to reinvent the wheel every single time I want to design a new test or I want to do something new with this and so the project I'm sharing today isn't the first it's not the most feature complete or even the most novel solution to this but it is more flexible and it is more extensible and it is something that can get to the point where it is most of those things so I
want to share with you syn fuzz I'm really seeing it today and basically this is a platform for building test generation and test harnesses and tests and fuzzers it's organized like this at
the center you have an intermediate representation which is a bunch of data structures and some helper functions that allow you to construct complex grammars outside of that you will have your various front ends for your different parser generators and Combinator libraries so things like antler and Ragle and EB and F bison and essentially what these do is you take a grammar file with these front ends it parses them and it translates them into the intermediate representation the test harnesses are reusable components that you design and these can be as simple or as complex as you want you have the full availability of any programming language you want to use and you can design the test to do whatever you want with these and so you consume the intermediate representation you give it a starting rule that is the root of your tree and it will start generating new parse trees and then from the parse trees will generate you sentences based on that ideally the only piece that you as an end user will have to build it the test harness and the reason being that it likely is going to be something that's non-trivial there probably could be very reasonable components for this and that would make the job easier for you but that is ideally the only piece that needs to be built for you antler is currently the only front-end that I've written but I'm planning to expand pretty heavily out from there
this is a list of a handful of the features that you get so like most Combinator libraries you're going to get a lot of logical operations as well as quantification specific values for generating specific types of values right now everything is byte oriented so it has Unicode support but it will output to a byte stream using the
framework is something like this the core is written in rust but the plan is to release C bindings that can be brought into basically any language but essentially the way this works is you bring in a grammar file and you read it you pass it into a generate rules function and it will take that grammar and parse that into all of the rules that are expressed by that grammar you then provide it a root node and tell to generates and then you spit out the generated string and do whatever you want with it any communication is available with this whether you want to use sockets pipes files it built across platform and can likely target most embedded platforms which is cool so if you wanted to do anything with this you know pretty much anywhere you can and I
have a quick demo and if you'd like a live demo you can catch me afterward and I can show you more but essentially what this does is you'll see here I basically
have a while loop that is really hard to read I'm sorry is any of that legible cool I'm not allowed to plug my laptop in but uh I will walk I will talk you through it and if you'd like to see it I can show you after but basically the way that it works is I have a while loop here and it's calling the program and what it's
doing is it's generating a string and then passing that into antler's parsing tool where you specify a grammar and a root node and it will try to recognize that language and then visualize the parse tree that it's building and in this case I'm generating dot so like graph is files and so it's basically just running through trying to generate new ones every single time and you can see that it gives me a nice little parse tree here it'll happen again and again
and again it might be kind of slow but
so this is like you see it's a totally different tree it's bigger it has different values in
it structures is different and so it's
the generation is actually very fast the slow part right here is parsing it for
for antler itself but it is
multi-threaded to thread-safe and so you could spread this out as horizontal as
you want to and put as much power you
know on this as you want you to build test cases and that's probably pointless
to keep showing because no one can read any of it
[Music]
where is that screen so designing test
darkness is you know this is the part that you probably will care about and so let's see a few different ways we might want to design test harnesses for something like this the first most
common case is does it crash this is what most fuzzers will do and why you're probably using a buzzer in the first place is I want to know when this crash is and identify whether this is something that's exploitable so basically the way this works is you will start the process you will generate a string of some sort and you will feed that into the application you have to listen for the six egg V or sig abort signals and basically just watch for it to crash when it does get core files and you investigate the crash get your right wetware see if it's something that's meaningful to you it's like this is the most basic case and something that like you will get with any other fuzzer it's pretty straightforward pretty simple there's tons of examples out there for it you basically just register a sig action handler on the unix side and wait for it to do one of those things now we get into the more interesting cases and these are the cases that really gave birth to this project so identifying whether a reimplementation of a grammar is over fit for a reference implementation to do this you would generate a test case using your own reimplementation grammar and you find an oracle within the reference implementation that specifies whether you're getting something that is either accepted and is valid whether there's a syntactic error or there's some kind of runtime error that's happening not because of a failure in parsing and understanding the sentence but a failure because of something happening in the runtime itself you feed the test case in and then you categorize what comes out so if it succeeds everything is good you have a sentence that's valid your parser recognises it and the reference recognises it if there is a failure if the failure is a syntax error in the reference implementation your parser is overfit you are recognizing things that the reference does not and that's an indication that you probably need to redefine your grammar and change it to be more like that and it should kind of give you an idea of like where those problems are so you might kind of guide you toward what specifically is causing that and you can manually go and change it to identify whether or not it's meaningful or modify this instance that it spits out so that you can identify maybe why you're wrong but there's not really good tooling around doing that yet other than trying to get good error messages from the reference and parts of error messages are usually pretty terrible so you're kind of gonna be on your own on that one if the failure is a runtime error it may or may not be a sign of being overfit unfit this is especially common because of the opaque tokens that I spoke about basically it might recognize something that like is syntactically valid but like one of the tokens in there is just like totally batshit insane and like it recognized it because it uses an opaque token in that place it also could just be an environmental thing so like in uh in the case of like my sequel if you've ever written queries for it you're probably very used to these types of errors the top one is a syntactic error you see the top query here the select star from a where ID and a bunch of carrots three and this tells you specifically this is a syntax error this did not parse and it tells you like exactly where it is so this is an example of the case where your parser if it generated this like is just wrong like that that should never happen below that however is a runtime error where select star from a where column equals three this is a a runtime error because one of the tables referenced here doesn't exist and so this is something that the parser passed and it generated something that's likely a valid sentence in that language but because of the environmental issues with what it is doing with that sentence it wasn't able to do that and so this is it probably a false positive in that case and so having the Oracle that can tell you about these things allows you to really cut down on false positives and in this case the next is under fit this is the other big problem that you were likely worried about so to do this you generate a test case from the reference and this is something you can only do if you have the reference implementations grammar so in the case of like my sequel or Postgres sequel light you have that grammar definition and this is something that's open source for something like you know db2 oracle ms sequel this isn't a viable option and you can't do that because you don't have the source code for that grammar but you would parse it with the replication once it generated a test case from the reference and see if your parser can read it and then you categorize it if it fails then the reimplantation is under fit you have generated a sentence that is valid in the reference implementation that is not valid in yours if it succeeds then the implementation is likely correct for that sentence and that section of the grammar that you have used [Music] so what's ready today the code is up here it will be made unprofitable for people who want to use rust and bindings will be released over the remainder of the year for as many languages as people are interested in and I have time to build or other people have time to build MIT licensed you can submit pull requests and I will happily accept them right now you get an eMeter for front end it mostly is functional as a few little things that aren't supported yet but you can get the vast majority of many of the common types of parsers you will generate with antler what's next there's a lot the biggest area right now that's a huge issue is dealing with cycle detection and forced progression so with recursive grammars which are pretty common it does a really bad job with ensuring that it doesn't get stuck in a loop because it's essentially building a graph and like walking that so getting cycle detection in there to move forward and not get stuck is is at the top of the list basically it will just run and then get a stack overflow and basically crash but for a lot of more simpler grammars like you're not gonna have a problem here and this is immediately useful now like I said exposing the C API and language bindings it's built in rust but like this will be available in early plans for languages right now are going to be to to Java and probably go probably Python and Ruby as well in the near future the negation logic exists but it's not great the problem with negation is it's not always clear exactly like what the opposite of something like but one of the Combinator's is and so pretty much everything has negation logic right now but it could be better you don't get quite as much entropy as I'd like and so if you're using a lot of negation you might just not get as much variation context-sensitive support and being able to do introspective things on the data that's being generated so if you need to generate check sums that are valid things like that having support to do that as I said it's all byte oriented right now I might be interested in building out bit level support there's a lot of like wire protocols that have some of that and it would be useful for that additional front ends this is the big one also bringing support for more platforms is a real goal to make this significantly more useful and then grammar coverage information how much of this grammar have I explored and how much time have I spent in a specific area and then being able to reuse a tree and changed individual tokens so I can just change the areas that are going to be dynamic in that and really really hammer or sections of code if I care about that
again this is the link futures in the code and questions [Applause] [Music] sure so how do you implement an Oracle basically it comes down to the integration that you built in your test harness so in the case of my sequel we're literally using the my sequel library that is like the client and the error messages coming out of that will provide us the Oracle and we just do comparison and for what is pretty provided there that's something you're gonna have to find if you have source code as possible if there's something you can do to inspect the binary or like the memory space or just get meaningful output of it it usually comes down to just finding something in there and if you can't then you kind of just [Music] yeah I don't yet the plan is to start doing some of that research and putting that together this is something I've been working on for maybe the last like eight months or so and it's at the point now where it's usable in a meaningful way for us but like I said part of the problem for us was building test harnesses that gave us meaningful feedback but I do plan to start doing evaluation of that and I want to have something published probably next year to actually find out whether or not like this actually is useful and provides more value than something like that anyone else cool thank you
Feedback