Automated Discovery of Deserialization Gadget Chains

Video thumbnail (Frame 0) Video thumbnail (Frame 3361) Video thumbnail (Frame 11416) Video thumbnail (Frame 12435) Video thumbnail (Frame 13304) Video thumbnail (Frame 15165) Video thumbnail (Frame 18527) Video thumbnail (Frame 21542) Video thumbnail (Frame 24464) Video thumbnail (Frame 25715) Video thumbnail (Frame 27157) Video thumbnail (Frame 36366) Video thumbnail (Frame 37831) Video thumbnail (Frame 43032) Video thumbnail (Frame 43869) Video thumbnail (Frame 44881) Video thumbnail (Frame 45950) Video thumbnail (Frame 48739) Video thumbnail (Frame 50583) Video thumbnail (Frame 52729) Video thumbnail (Frame 55925) Video thumbnail (Frame 56899)
Video in TIB AV-Portal: Automated Discovery of Deserialization Gadget Chains

Formal Metadata

Title
Automated Discovery of Deserialization Gadget Chains
Title of Series
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2018
Language
English

Content Metadata

Subject Area
Point (geometry) Polymorphism (materials science) Classical physics Slide rule Serial port Service (economics) Computer file Code Java applet Streaming media Focus (optics) Revision control Chain Casting (performing arts) Type theory Object-oriented programming Information security Computing platform Social class Thumbnail Vulnerability (computing) Vulnerability (computing) Building Software developer Java applet Code Control flow Exploit (computer security) Type theory Word Software Personal digital assistant Chain Volumenvisualisierung MiniDisc output Social class Thumbnail Object (grammar) Data type Extension (kinesiology)
Serial port Run time (program lifecycle phase) Parsing Code Java applet Equals sign Archaeological field survey Function (mathematics) Area Priority queue Formal language Casting (performing arts) Strategy game Different (Kate Ryan album) Object (grammar) Hash function Pairwise comparison Information security Social class Vulnerability (computing) Mapping Electronic mailing list Bit Serializability Process (computing) Hash function Chain Order (biology) output Quicksort Data type Reading (process) Point (geometry) Slide rule Implementation Mobile app Functional (mathematics) Open source Streaming media Binary file Revision control Chain Goodness of fit String (computer science) Proxy server Priority queue Default (computer science) Addition Focus (optics) Just-in-Time-Compiler Key (cryptography) Java applet Cartesian coordinate system Algebraic closure Personal digital assistant Social class Object (grammar) Table (information) Library (computing)
Functional (mathematics) Implementation Serial port Mapping Code Binary code Java applet System call Chain Voting Hash function Oval Object (grammar) Computer worm Object (grammar) Endliche Modelltheorie Table (information) Abstraction Social class Computer worm
Building Serial port Service (economics) Parsing Sequel Code Java applet Multiplication sign Archaeological field survey Streaming media Formal language Chain Goodness of fit Strategy game Computer configuration String (computer science) output Social class Vulnerability (computing) Injektivität Default (computer science) Vulnerability (computing) Projective plane Java applet Code Streaming media Extreme programming Cartesian coordinate system Parsing Similarity (geometry) Type theory Computer configuration Personal digital assistant Chain Strategy game Statement (computer science) output Configuration space Social class Object (grammar) Information security Computer worm Library (computing)
Suite (music) Dynamical system Serial port Code Java applet 1 (number) Client (computing) Mechanism design Type theory Library (computing) Vulnerability (computing) Social class Software developer Electronic mailing list Process (computing) Computer configuration Oval Order (biology) Chain output MiniDisc Right angle Quicksort Spacetime Server (computing) Service (economics) Patch (Unix) Exploit (computer security) Streaming media Binary file Metadata Revision control Chain Operator (mathematics) Computer worm MiniDisc Firmware Plug-in (computing) Plug-in (computing) Vulnerability (computing) Suite (music) Projective plane Java applet Extreme programming Cartesian coordinate system Interprozesskommunikation Exploit (computer security) Strategy game Speech synthesis Object (grammar) Library (computing) Computer worm
Bytecode Building Code Java applet Density of states Entire function Formal language Chain Estimator Algebraic closure Position operator Vulnerability (computing) Task (computing) Social class Context awareness Source code Vulnerability (computing) Information Building Software developer Bytecode Total S.A. Density of states Cartesian coordinate system Exploit (computer security) Entire function Chain Summierbarkeit Negative number Computer worm
Context awareness Serial port Java applet Code Multiplication sign 1 (number) Parameter (computer programming) Neuroinformatik Fluid statics Hash function Pairwise comparison Library (computing) Social class Mapping Reflection (mathematics) Bit Variable (mathematics) Sequence Entire function Message passing Hash function Chain Order (biology) output Condition number Quicksort Genetic programming Recursion Bytecode Point (geometry) Slide rule Trail Dataflow Functional (mathematics) Game controller Implementation Branch (computer science) Mathematical analysis Infinity Entire function Hypothesis Sequence Chain Hierarchy Energy level Implementation Condition number World Wide Web Consortium Default (computer science) Bytecode Graph (mathematics) Java applet Mathematical analysis Code Line (geometry) Cartesian coordinate system System call Symbol table Enumerated type Personal digital assistant Object (grammar) Library (computing) Computer worm
Context awareness System call Serial port Run time (program lifecycle phase) Constructor (object-oriented programming) Graph (mathematics) Multiplication sign Source code Parameter (computer programming) Function (mathematics) Mereology Direct numerical simulation Different (Kate Ryan album) Hash function Data conversion Library (computing) Vulnerability (computing) Social class Source code Algorithm Mapping Constructor (object-oriented programming) Electronic mailing list Sound effect Streaming media Serializability Hash function Chain Order (biology) Data type Directed graph Point (geometry) Slide rule Implementation Functional (mathematics) Computer file Mathematical analysis Chain Goodness of fit Implementation Proxy server Directed graph Form (programming) Graph (mathematics) Inheritance (object-oriented programming) Run time (program lifecycle phase) Projective plane Java applet Mathematical analysis Limit (category theory) Cartesian coordinate system System call Uniform resource locator Enumerated type Analog-to-digital converter Limit set Object (grammar) Library (computing)
Standard deviation Standard deviation Serial port Open source Java applet Reflection (mathematics) Java applet Exploit (computer security) Exploit (computer security) Chain Chain Social class Library (computing) Position operator Library (computing) Social class
Point (geometry) Group action Mobile app Boolean algebra Serial port Proxy server Computer file Multiplication sign Patch (Unix) Density of states Compiler Revision control Chain Object (grammar) Algebraic closure Process (computing) Endliche Modelltheorie Pairwise comparison Proxy server Library (computing) Social class Email Mapping Reflection (mathematics) Structural load Java applet Electronic mailing list Code Density of states Cartesian coordinate system Algebraic closure Hash function Chain Revision control MiniDisc Object (grammar) Table (information) Singuläres Integral
Point (geometry) Presentation of a group Functional (mathematics) Code Civil engineering Correspondence (mathematics) Java applet Code Compiler Uniform resource locator Chain Uniform resource locator Algebraic closure Hash function Different (Kate Ryan album) Object (grammar) Algebraic closure Chain Formal verification Process (computing) Computer worm Social class
Point (geometry) Default (computer science) Serial port Constructor (object-oriented programming) Patch (Unix) Multiplication sign Decision theory Software developer Point (geometry) Constructor (object-oriented programming) Cartesian coordinate system Limit (category theory) Number Web application Chain Chain Social class Library (computing) Social class Vulnerability (computing)
Standard deviation Context awareness System call Thread (computing) Serial port Interior (topology) Set (mathematics) Array data structure Object (grammar) Cuboid Recursion Library (computing) Social class Constraint (mathematics) Mapping Concurrency (computer science) Computer file Interior (topology) Public-key cryptography Entire function Type theory Web application Array data structure Fluid statics Chain Self-reference Quicksort Data type Reading (process) Functional (mathematics) Mobile app Computer file Constraint (mathematics) Generic programming Binary file Field (computer science) Number Power (physics) Chain Web application Configuration space Inheritance (object-oriented programming) Java applet Generic programming Core dump Directory service Cartesian coordinate system Uniform resource locator Mixed reality Boom (sailing) Aerodynamics Social class Object (grammar) Resolvent formalism Computer worm Library (computing)
Building Context awareness Serial port Code Java applet Graph (mathematics) Multiplication sign Calculus Function (mathematics) First-person shooter Formal language Virtual reality Position operator Vulnerability (computing) Source code Jackson-Methode Concurrency (computer science) Reflection (mathematics) Control flow Web application Chain output Quicksort Prototype Functional (mathematics) Data type Resultant Directed graph Bytecode Existence Functional (mathematics) Mobile app Service (economics) Computer file Blind spot (vehicle) Exploit (computer security) Streaming media Number Chain Prototype String (computer science) Web application Software testing Configuration space Directed graph Form (programming) Dot product Image resolution Java applet Mathematical analysis Core dump Directory service Enumerated type Limit (category theory) Cartesian coordinate system Exploit (computer security) String (computer science) Library (computing)
so let's talk about D serialization vulnerabilities before I get into it just a couple words about myself my name is Ian Hagen I'm a senior software security engineer at Netflix I'm on the platform security team where we make much of tools to keep our micro service ecosystem safe download the slide back afterwards we talked a lot about all the cool stuff we do so you can check that out after the talk but today I'm talking about D serialization gadget chains so I'm gonna start by just answering the obvious question what is a D serialization vulnerability and then getting into the question of what is a destabilization gadget chain and ultimately what I want to talk about is a new tool that I built for understanding gadget chains and of course the fun stuff at the end some of the new exploits of that tool was able to uncover so what is a D serialization vulnerability so in object-oriented languages like Java and I'm mostly going to be using Java examples in this talk code is contained in classes and classes hold your data alongside the code and that's the whole point of object-oriented design and that gives you cool features like polymorphism but this means that if you control the type of data if you're able to specify what data type something is then you're implicitly controlling what code gets run so let me give you an example so this is kind of a classic Java D serialization vulnerability it's a rest endpoint that reads in a post body and passes it into an object input stream and then you read some object out of it and in this case we're casting that object to a user and calling render on it so what the developer might intend is that this is some user class that exists on the class path and so the post body that gets sent in is some serialize version of this it has a name when you call render on it it returns that name totally innocuous nothing interesting can really happen with this but where you start getting into dangerous territory is if maybe you had something like this on your class path so it extends user and it's a thumbnail user the intent is that there some member that specifies a file path with the thumbnail of that user and we call render it reads that file from disk so if an attacker sends a thumbnail user to this endpoint instead of a regular user then when it calls user dot render he can read off any file from the disk and get that returned so that's what I mean by controlling data types means that you end up controlling what code yes executed so why am I talking about D
serialization today like that's this 2016 topic this is not new this is something that we've been thinking about for a little while but I mean honestly this class of vulnerabilities really goes back to even before 2016 so some of the first mentions of it go all the way back to 2006 mark Shonen Feld gave a talk in a black hat that year and kind of identified how some application containers basically were subject to this kind of vulnerability they were using object input stream in an unsafe way and you could get code execution on them but the talk that really kind of put the spotlight back on the shop subject was given by fro often Lawrence in 2015 at apps at Cowie and this just really kind of blew up this vulnerability class because they showed that there's these gadget chains that exists in all sorts of open-source libraries that mean basically any class any application that's doing unsafety serialization is subject to some kind of RCE and it's because they utilize these libraries that have these RCE gadget chains in them so in the year that followed I've heard a lot of application security researchers refer to that as like the java d serialization apocalypse because everyone realized that their application was vulnerable to this sort of thing so every talk every conference every convention had someone talking about this stuff in 2016 my favorite talk from that year was probably by Luca at a wast meetup where he just did a really good job of kind of explaining what these vulnerabilities are what they look like what exploits look like and how you should remediate them so if you really want to dive into this a bit more after that's definitely good talk to go look at but you might have thought that was the end of it if 2016 was the java DC realization apocalypse then it's all said and done but at last year's blackhat ninos a mirage gave a survey of JSON parsing libraries that talked about how all these other libraries also can potentially do some unsafe to use d serialization and you can be subject to just as much dangerous behavior as if you're using like the java object input stream because up to that point most of the focus was really on this Java object input stream and they did a survey not just in Java but across other languages like C sharp of just other JSON parsing libraries where things can go wrong and in case you think that was the last talker basically this is the last talk this vulnerability class isn't going away in October at apps IQ si there's someone's talking again about DC realization vulnerabilities and why you've got to do stuff to protect yourself from them because we haven't solved this yet it's not gone so why are deserialization vulnerabilities so bad and interesting if they were all really just like that first slide I showed then they actually wouldn't pop up that much because it's not that often that you have some class on your class path that does something dangerous that overrides something where you meant it to do something safe and the reason that they're so bad is because there's these things called magic methods and what those are are they're methods on classes that get automatically invoked by the deserialize er before the DISA serializer ever even returns so that means that dangerous behavior that's implemented in one these magic methods can get invoked regardless of what datatype you meant to be returned from that deserialize er so here's another example so this is exactly the same dangerous endpoint or vulnerable endpoint from that first slide but let's say there's some bad class you have on your class path that's doing something unsafe inside one of these magic methods like read object so in this case it's just executing some string that it's reading out of that object input stream but even though my application isn't using evil class at all even though it expects a user to come back the DCR eliezer is going to execute that read object magic method before it ever returns so it's going to execute that runtime exec before the cast a user so it doesn't matter what my application actually expected the data type to be so what's the deal with magic methods maybe you've never even heard of them before how common can they actually be and the answer is they're actually really common because all sorts of classes inside the JDK implement magic methods and so hashmap and priority queue are a couple of good examples but they're all over the place and the reason that these magic methods exist is because it allows classes to customize how they seer lies and deserialize their data so if you had a hash map that just use the default serialization strategy where it serialize all this hash tables and different maps and bins and buckets then that serialize version of the probably wouldn't be interoperable between Java versions because they may change their implementation under the hood and then everything would break when you try to deserialize it so instead it what it does is it implements these magic methods where when you try to write out the object instead of implement instead of writing out all its hash tables it just writes out a list of key value pairs and then inside its read object method it expects to be able to read in a list of key value pairs and it calls this not put on the key and the value and that means that each object or each key at least that is reading in for that input stream it's calling hashcode and equals on it in order to put it into the hash map so this gets you some additional known entry points because it means that if you have some class on your class path that does something dangerous inside hash code or equals we know we can wrap that class inside a hash map and get from its read object magic method into the dangerous hashcode method and so this is how we start building up a gadget chain so here's a
really specific example of what a gadget chain might look like so here's more or less what hash map does inside its read object method and all of its it's doing is basically what I just said it's reading keys and values out of a list and then calling put on it and in particular it calls hash code on the keys that reads out so let's say there's this class that exists on your class path and this is an example of the class out of the closure library so it's basically a proxy object we're inside hash code what it does is it looks up in I function interface inside its map for hash code and then it invokes it and so inside that closure function map we the attacker could see R lies some interesting I function implementation so
as an example you could implement you could supply the compose function which just has two members functions inside of it that it composes and so as one of those functions we could supply the constant function and then as the other function we could supply eval and then basically when you wrap all of this up in a nice package and tell your DC R eliezer to deserialize it it's going to automatically call rhe object on your hash map that's automatically gonna call and vote on this compose function which is automatically going to call invoke on this constant function and pass that into the eval function and then do arbitrary code execution so this is an
example of what that payload might look like using Jackson style serialization so and that's exactly what it just described you wrap things in a hashmap and as its members you use this abstract table model class with the dangerous hash code implementation as hash code you use this compose function and then you supply the values you want for each of those two functions inside there and then you can execute whatever binary or command you want so the important thing
to understand about gadget chains and the things that makes D serialization vulnerabilities so dangerous is as I kind of showed you in that example and kind of alluded to earlier what gadget chains can be constructed has nothing to do with what your class with what your application actually does because if there are classes on your class path they can be specified by the serialized payload and then your application can therefore be made to construct them and run whatever magic methods exist in those classes so your codes in as with that example wouldn't have to have called any of those things in fact maybe there's no code anywhere even transitively that called any of those methods but by the mere fact that they exist on your class path they can potentially be exploited so what Java libraries are
vulnerable and I again I'm kind of focusing on Java but this is definitely something that applies to c-sharp and PHP and lots of other languages but in Java the object input stream the one that's built into the JDK is probably the most well-known and well most studied one but extreme is another library it's an XML parser in its default configuration that can be used unsafely and all these JSON parsing libraries have unsafe configurations where they can basically be induced to deserialize arbitrary types and therefore potentially do dangerous behavior and if you're interested in exactly one those libraries might be dangerous you should definitely spend some time reading you know a morose they did a really good survey of how and when these kind of libraries can be misused but what's important is that as you begin studying these additional libraries beyond just the object input stream libraries end up having different magic methods that will automatically get invoked and they have different notions of what can be serialized and that's going to be really important as I keep talking about this and later in the talk so how do you know if your application is vulnerable so finding potentially vulnerable applications is really basically the same thing as a lot of other application security vulnerabilities so things like XSS or sequel injection all the vulnerability really is is some kind of attacker controlled input flowing into one of these dangerous libraries so in this case it's the object input stream or extreme or jackson and so existing tools are kind of already good at understanding how to find those vulnerabilities because it's exactly the same thing as looking for some kind of attacker controled string going into some kind of sequel statement so I'm not too interested in digging further on how you find those vulnerabilities because existing tools are really good at that but what you do once you do find a vulnerability that's the big question that I wanted to talk about and one of the these simple answers is why don't you just use a better serialization strategy why use one of these dangerous libraries use something that's safe and Luca has this great quote from his talk in 2016 it's 2016 there's better options why do you still use object input stream and I think that's really good advice if you're working on a new project if you're building a new service but what happens if you're not working on a new project
so who recognizes these guys or in particular thing on the left so that's the original Netflix disk that got sent out to owners of a we so that you could stream Netflix from Wiis and so that's got client code stamped on a disk that was sent out in 2010 that we still have to be able to speak to and so you might be in situations where you don't control your clients and can readily update your IPC mechanism the guy on the right is the first-generation Roku that came out and it's exactly the same thing it's got firmware in there that needs to be able to talk to upstream services and even if you can you know you're thinking you can just update firmware and update your IPC mechanism if someone's got one of those in a closet and they pull it out in two years at the very least we need to be able to talk the IPC mechanism that tells them they need to go fetch an update so you can't just turn things off easily necessarily and even if you're not in one of these contacts where you've got some clients that you can't easily update it's just a very costly operation to start ripping apart your IPC mechanism if you need to update your server to speak something new something other than JSON or extreme or object input stream binary format then you've got to update your server then make sure you update all of your clients and then only once you finally tear down everything on the server side would you be safe and that's just a lot of work even in an ecosystem where you control both the client and the server and so it Netflix where we've got a micro service ecosystem we've got thousands of applications and we're coming across these things and we have to decide what to tell a developer about how important it is to patch this issue we found we need to answer the question is it worth the effort to drop what I'm doing and spend three or four weeks or maybe more doing exactly that process I've described of updating all your clients and services in order to patch that vulnerability is your D serialization vulnerability we just found even exploitable and that's something that's not immediately obvious when all you know is that some kind of untrusted input flows into one of these unsafe libraries so how do you find
exploits for a D serialization vulnerability how do you find these gadget chains so why so cereal is one of the most well known projects in this space that frolov maintains and it's got a bunch of gadget chains for the object input stream marshal SEC is another another project in this space that's got some wider breadth and understand some gadget chains for some of these other D serialization libraries but they're both basically projects that have these known gadget chains and you can compare your application to that list of bad libraries where you know there's some version of this particular library where you can construct a gadget chain but that doesn't tell you something that might be unique about your application maybe there's a gadget chain that only shows up when there's some clasp in your application plus some other classes in these other libraries that only when all put together end up giving you some kind of interesting gadget chain and furthermore those are all bound to these kind of known D serialization libraries what if you're using something new or something custom that is vulnerable to these same kind of attacks but isn't one of these sort of a well-studied ones how do you answer the question is my vulnerability exploitable besides the couple that I mentioned there's a bunch of other existing tools in this space so Jugal is a good tool for programmatically querying about metadata on your class path there's a Java D serialization scanner which is a burp suite plug-in that mostly uses payloads from Y so serial in order to detect whether or not you're vulnerable to one of these known gadget chains the NCC grouper plug-in is something that was released earlier this year again another dynamic scanner that's mainly based on payloads from the MU nose and mer OSHA's work at last year's blackhat so this is more focused on the json c serializers but again these are all kind of tools that might help you but don't immediately answer that question is there something unique to my application that makes it vulnerable to one of these exploits so given that I
wasn't able to find a tool that did exactly what I wanted I went about the task of asking how can we evaluate the risk of this kind of vulnerability and what what do we really want to be able to answer and what we want to be able to answer is what is the risk how important is it to remediate a vulnerability we want to know if that deserialization vulnerability is exploitable so and if it is a coid well what exploits are possible you know RCEs tend to be much more interesting than DOS and so if that's our goal just to evaluate the risk we don't necessarily have to be perfect we don't have to set about to solve this problem once and for all a reasonable over estimation of risk is reasonable and we don't actually have to generate payloads if you don't want to knowing what kinds of payloads might be constructible is also a really useful piece of information so if that's if
those are the requirements what I want to set out to do then specifically what I'd like to do is build something that's fine so sketchy chains so I'm not looking for vulnerabilities I'm only going to use this new tool if I already know my application is vulnerable but it needs to be able to look at the entire class path because of what I said at the beginning it doesn't matter what code is in my application it matters about the sum total of classes on my class path it should err on the side of false positives because a reasonable over estimation of risk is more useful because I don't want to tell developers to drop what they're doing and fix something unless I have good reason to believe that there are some exploitable innit and lastly it should operate on the Java bytecode because we've got like a million plus one languages written on top of the JVM now and I don't want to write something that has to understand groovy and Scala enclosure and Kotlin and whatever comes out next week so if I just operate on bytecode then I've got it covered so I
put together a tool that I called gadget inspector which is a Java bytecode analysis tool for finding gadget chains that's what it does so the way it works
is it operates on a class pass so you specify either some jars and their dependencies or an entire war basically your entire application and then it reports discovered gadget chains which is really just a sequence of method invocations where one invokes the next and you're starting at some known entry point and you're getting to some kind of dangerous behavior it does a little bit of simplistic symbolic execution to figure out when some attacker controlled arguments can get passed in to an method and then that gets passed to the next one in the chain and most importantly because of the context we're working in this tool is able to make a lot of simplifying assumptions that actually makes this pretty easy to do it's not something where you have to have written a thesis on symbolic execution in order to understand or implement it so alright specifically how does this tool work so the first step is just enumerated everything on your class path you want to figure out the whole class hierarchy all the method hierarchies so that when you see something do calling a method from one magic method like you know hash map calling hash code you want to know what are all the implementations of hash code that you might jump to so first up is just a numerating all that stuff and that's not terribly difficult you can use the plain old Java reflection API is to do that if you want to but important first step for the rest of the analysis so where things start getting interesting is when I want to understand the data flow inside an application so the first thing that I wanted to discover is what I call pass-through data flow so this is where basically what I mean is if an attacker can control the input to a function does that entire control data get returned back out of a function so in this case like with the constant function if an attacker controls the implicit this to this argument then they're going to be able to control this dot value and therefore the return value so that's one of the first assumptions that into this object is tainted and this is basically taint analysis that I'm doing here and by and if you're not familiar with that or don't really know what I mean about by taint in this context all I really mean is that I'm thinking of it as being attacker controllable so if an object is tainted then every member on that object is also considered tainted and that's pretty reasonable assumption because if we are thinking of an object as being attacked or controlled that means it came out of the serialization library so all the members on that object are also in that serialize payload so that means when we look at a function like this we can enumerate this piece of pass-through data flow and all this kind of funky custom syntax means is that if the attacker controls argument 0 which in this case is the implicit this then the return value is also considered attacker controlled and that's just because we returned this top value so as one other example where things start getting a little hairier there's this default function which wasn't on a previous slide so all this does is look at an argument and if it's not null it returns it and otherwise it invokes some other function like a constant function and in this case we've got a branch condition which is something that's also really hairy to deal with if you're doing any kind of static or symbolic analysis but in this case we make another assumption which is that all branch conditions are satisfiable I'm not going to worry about whether or not I can go down different paths and this is probably one of the weakest assumptions that's made in this but it's also one of the easiest ones to make because in practice if you're inside these magic methods or going down a gadget chain where all this stuff is attacker controllable because it would have to be for you to get there in this chain then basically all of the variables and arguments going into a branch condition or attacker controllable so usually an attacker can tweak these things to get down whatever branch condition they want to so if we assume all different all branches can be walked down then we end up with these paths through data flow so in this case the first argument just gets directly returned here if we go down the true path and if we get on the false path based on the first pass through data flow we discovered the return value of F dot invoke is going to be consider tainted as well so we numerate that so step 3 is basically exactly the same thing it's the same symbolic execution of just walking through what data flows where but this time instead of looking at return values we care about where data flows into subsequent method calls and so we're going to use the data from step two in this to just enhance this enumeration but let's look at that dangerous hashcode method that we had earlier and see how that shakes out here so in this case we would end up in numerating these pass through call graphs or method calls so again some so we're funny custom syntax but all I'm saying here is that if argument 0 the implicit this is a taproot controllable then that's going to flow in as argument 1 what I call function dot invoke in this case all I know is that it's the I function interface and so we get that literally because this gets passed in as argument 1 to that function there so that one's kind of easy to figure out but F dot invoke F comes out of this map which is a member of this so again because of that assumption where we assume all members are attacker controllable we know that F would be attacker controllable so F which gets passed in as the implicit this to function don't invoke would also be attacker controllable so that's where we get that from and just to go through sort of one more example this is what you would get if you looked at the compose function again from the previous slide and again all we're really doing when you are doing this symbolic execution is just stepping through by code one line at a time and it's actually kind of easier to understand what's going on when you look at it that way but sort of at a higher level what we do is we see argument 1 gets passed and it's argument 1 2 functioned on invoke then we see F 1 which is a member of argument 0 the implicit this gets passed in is the implicit this two functions ought invoke and finally the value gets returned from that based on our analysis from step two would also be considered attacker controllable and then I guess that gets passed in as function one two or argument one to F 2 so just a lot of walking through a func walking through these functions and enumerate these things and really there's not a lot very deep going on here it's just kind of a lot to keep track of but computers are good at that
so step four next to last step is just enumerate enone entry points and that's basically just using all the known tricks that researchers have come up with over the last few years to figure out how to get into interesting gadget chains so for example we look we see this hashcode method we know it overrides object on hash code so we can enumerate that as an entry point so all right that that steps super easy especially after the last few but this does highlight one limitation that I want to point out which is that this does rely on known tricks so knowing that we can get to hash code we could have derived from this analysis just by going through that symbolic execution of the weed objects method of hash map but there's other clever tricks that researchers have come up with like wrapping things in a dynamic proxy where that then calls and vocation handler dot handle that we wouldn't be able to derive so there's definitely room for more gadget chains that this thing might be missing just because there might be more clever tricks that aren't hard-coded into this guy so all right
very last step now that we've enumerated all that stuff the only thing that we have left to do is literally just do a like algorithms 101 breadth-first search on this call graph in order to see if we can get from one of these known sources to a method that does something interesting so just using exactly that stuff we've enumerated to build up that gadget chain from some of those first slides we would look at that entry point and then looking at the methods of that calls below we'd want to step into each of those and see what methods those things would subsequently call and here's where we make one of the last assumptions which is any method implementation can be jumped to so down here we see we're calling I function invoke and we don't have a specific method that were specific implementation that we're jumping to there so as we're going through this call graph we're gonna go look at every implementation of that as long as that class is considered - you're Liza ball and the reason we assume we can do that is literally because that's how we build up gadget chains if we control the data type of one of those members that determines what implementation of I function this is then we can build up our gadget chain in such a way to get to whatever implementation we want to get to so for example we might use this call in order to get into function compose I'll invoke and then looking at what functions that calls we're gonna end up walking through each of those invocations and one in particular might be calling function on invoke where we pass in Tainted argument of one and use the eval as our implementation and then inside there we would see we call run times not exact and we know that does something interesting and something dangerous so we would output this as our gadget chain so by walking through all those steps this thing would look at that library and spit out this gadget chain the one last limitation that I will point out here is that this of course relies on knowing what are interesting methods or interesting sinks that we should output gadget chains for so there's lots of good stuff in the JDK so reading files writing files runtime dot exec opening up a URL doing dns lookups sleeping there's all kinds of side effects that you might be interested in so adding more to its list of interesting sinks is a way to improve this tool but even with kind of a limited set of just knowing what's interesting gets you pretty far so one of the things that I mentioned at the top of this talk that was really important to me is that there's a lot of different libraries now where we know there's serialization vulnerabilities and as part of this analysis I mentioned a few times that there's things like known entry points that we want to start with or we consider any class at sea or Eliza Bowl to have a method we can jump to so all those things are parameter izybelle in this analysis so for JRE D serialization anything implementing serializable is considered a serializable class but for extreme it depends on what converters you've enabled so it depends on how your application is set up for Jackson it's basically any class with the no art constructor is considered to your Eliza Bowl for Jackson you can also only jump into constructors as your entry points and so there's lots of differences between libraries but all those things can be easily tweaked and parameterize in this analysis so this is what makes this tool I think especially powerful is that you might be working in some kind of custom context where you're doing unusual forms addy serialization that happen to be unsafe but aren't well studied yet by a project like Marshall sack or why so serial and this is a tool that can help give you insight into those kinds of libraries so all right I
describe this tool it does a whole bunch of funky things that maybe you you did or didn't follow depending on how much sleep you guys got last night and I claimed that like at the end of the day this thing can find some gadget chains does it live up to the hype so the first thing I did after writing this thing on like a 10-hour flight to Europe was run it on some open source libraries to see does this thing actually do anything useful can it find some stuff because at the very least it should be able to find gadget chains we know exists because of the stuff that fro often Laurens discovered in 2015 so all right built
this tool ran it against a hundred most popular libraries at least according to maven repository calm and look for exploits against the standard Java D serialization library so it did successfully rediscover some known gadget chain so cool it's at least doing what I claim it's supposed to do it didn't find a ton of classes implementing serializable so it didn't have a ton of new findings but he did have some and so I'm going to talk about those and it did have a handful of false positives because this does try to err on the side of false positives but not as many as you'd expect easy like just a dozen enough that are easy to rule out and it's mostly because reflection is hard to reason about so alright old
gadget chains what did it discover so it rediscovered the commons collections gadget chain and the reason this gadget chain was so interesting when throw off and lawrence first discovered it is because it's a 38 most popular dependency at least when i looked this up a couple months ago and so it's everywhere like every application more or less ends up pulling this thing in is some kind of transitive dependency and this is more or less what that gadget chain looked like you wrap your object inside a dynamic proxy and then you get into this invocation handler and then you go to the comments collections lazy map which ends up doing some reflection things and lets you basically call any method you want and so hooray it found that old gadget chain it's finding things that i expect that it should be able to find but the first thing that it actually found was this new gadget chain
the side closure and this is basically the gadget chain that i've been kind of discussing and using this example leading up to this so this was super interesting because this was a sixth most popular dependency according to maven repository comm and so what this gadget chain did at least according to the way the version that originally found is load a closure file from disk and execute it which may or may not be interesting but it also turned out that by tweaking that last step in there to call instead of load file it would execute arbitrary closure that you pass in so it's basically RC so that's super interesting if there are people that patch their version of Commons collections but decided that they're good now and they're still doing unsafety serialization chances are you're probably pulling in this dependency so you're still in hot water hopefully in the last couple years people figured out that they shouldn't be doing unsafety serialization but people continue to surprise me so I did report this as a closure dev mailing list when I discovered it and they decided who's even serializing this class anyway we're just going to turn off serialization for that class and then that's that's great so all releases since one I know have disabled serialization on that AB fact table model class so that hash code entry point doesn't exist anymore yay we're making the world safer one gadget chain at time more recently I discovered some new gadget chains in Scala using this tool so Scala is the third most popular dependency according to maven repository comm so this gadget chain isn't an RCE maybe not as interesting but it does allow you to write or overwrite a zero byte file on disk and that's an interesting dos exploit because you can overwrite some application resource file 0 it out and then your app goes down so that's possibly interesting there's a very
similar one that gadget inspector also found that can do an SS RF so it does a get at an arbitrary URL and it's basically the same thing and this is something that gadget chains spat out and I've got examples of the actual gadget chain payloads on my Fork of Y so serial that you can check out after this talk so these are not just things that it found and I'm claiming like code to actually work you build the gadget chain I did actually build the corresponding gadget chain and verify that these things work so cool stuff so just before
this talk a couple of weeks ago I reran gadget inspector on the latest release of closure and then it turns out that that exact same gadget chain I found before still exists in closure just with the different entry points there's another class implementing hash code that delegates to a function and so you can actually do exactly the same gadget chain using just for an entry point and that's been in every release since 1.80 so apparently there's still in RC e gadget chain and every release of closure that's out there so I need to follow up with the closure guys and see if they want to lock down this too but really I hope this is just hammering in the point that you've gotta stop doing unsafety civilization guys there's gadget chains everywhere but alright enough of that
looked at open-source libraries but what I was getting at at the top of this talk was that what I really wanted to find was gadget chains that are specific to my applications I'm looking at so that I can go back to developers and tell them how important is it that you patch this thing right away or can you wait until your next release so that you can finish up these critical features so let's look
at four nerble web app number one so this was using some potentially dangerous use of Jack's on D serialization an attacker could specify any class to instantiate and put an arbitrary body in there but there were a lot of limitations on it it was using more or less the default configuration of Jackson's so you could only deserialize classes with no are constructors your only entry points are going to be no our constructors and most of the time classes don't do anything terribly interesting in constructors but it did have a 200 megabyte class path and was bringing in like six dozen dependencies so there might be something there and I don't really have enough time to manually go through every constructor of every class on that class path to find out if any of them do anything interesting so I ran gadget inspector and it found nothing so alright that wasn't the cool exploit in bombshell you guys were hoping for but it saved a bunch of time because no one had to go through every constructor and decide if it was important to remediate this vulnerability we could tell the developers that hey it's cool if you wait until the next time that you're able to get to this but the story
doesn't end there so internal web app number two so this one was really interesting because it used a non-standard D serialization library something that had some like custom in-house tweaks to it that had some really unique constraints on it so it invoked read resolve matching methods but not read object and it was able to deserialize any class on the class path that didn't have to implement serializable except for the rest of these constraints down here so one that it's member fields couldn't have dollars in it because that screwed up the binary format of this thing so non-static inter classes always have this implicit dollar outer member name so basically anything that happened to be a non-static inner class what it's not see erisa below more it didn't have any support for serializing arrays or generic maps and most importantly every member value had to be non null and that meant that every member value every type the every type of every member value also had to satisfy all of these constraints because he couldn't leave it null so you had to serialize it as something so I had to satisfy all these constraints and also meant that you couldn't have any data types that had any character arrays or byte arrays in it you couldn't have any data types that had some kind of self referential or recursive type because like thread for example has parent which is a type of thread so there's no way to have non null members for all those I'm sick in their payload so it was really really hard to determine what classes were even considered serializable in this context much less whether or not you could actually build a gadget chain that went through those particular classes but that's the sort of thing where gadget inspector has the functionality to stick in all of those constraints and then ask it to tell you what do you find and this is what it found so this is a twelve-step deep gadget chain starts at read resolved like I promised and the bottom thing that it does here is copy a file from any arbitrary location to any other arbitrary location and that was cool because it allowed us to do things like exfiltrate private keys off the box by dropping them in the web app resources directory and you can look at this really closely but I feel like you don't really have to the thing that's really interesting is just looking at the package names that are showing up
here so here I've highlighted the different dependencies that this gadget chain is flowing through and if you count the app itself and the JRE there's seven different libraries involved in this gadget chain and it's something that you would never have found by analyzing any of those individually or and you would never have found it by looking at the set of dependencies without also pulling in the classes from the application itself but it's something that just lit up as soon as I ran gadget inspector on it and so that's super cool and that's what I'm talking about where this thing is utilizing the power to look at the entire class path and it's able to utilize the parameterization of what it means to be serializable according to your kind of custom constraints so that was a really
cool gadget chain that this thing found but also spending just like five or ten minutes staring at this you see this step this gadget chain method except eight which is stream pump or dot run what that actually does is copies an input stream to an output stream so if you look at this for just a few minutes you realize you can tweak this last thing to copy an arbitrary string input stream to a file output stream and be able to write an arbitrary string to an arbitrary file so I was able to write a JSP to my web app resource directory and get RCE with this gadget chain so this was a really cool result to come out of this and immediately allowed us to say alright we've got to fix this thing now because you're getting RCE on this like really sensitive service so this was really powerful and it allowed us and it saved this time of trying to actually build up this thing as a matter of fact this was a web app that we actually had a pen test team looking at and they identified that was vulnerable to this kind of vulnerability but they spent a couple days kind of looking at it here and there and basically weren't able to decide whether or not you could do anything with it gadget inspector took about 15 minutes to run on this application and spit this out so that is a huge time saver and I think a huge win for both pen testers and app sack engineers trying to understand deserialization vulnerabilities and applications so obviously there's gonna be a lot of room for improvement in this kind of tool so reflection continues to be the bane of existence for anyone doing code analysis of any form it's hard to understand and this tool basically just treats any kind of reflection is an interesting sink just cuz it doesn't know how to do any better but that also leads to a lot of false positives and some blind spots so it can be improved there I also mentioned that there's a number of assumptions and limitations that I made in the course of building this tool and while I think most of those were reasonable given the context we're working in it's obviously something that could be improved but that being said I think diving down into this kind of automatic analysis for D serialization vulnerabilities is territory that has a lot of room for more discovery and more time spent on it because this was something that's just kind of a functional prototype but it's already saved us a bunch of time as we've been doing AB SEC reviews of our internal applications and this was something that I specifically wrote for Java and to understand Java bytecode but I think all the techniques I described here apply equally well to c-sharp and PHP and all these other languages that have these kind of libraries that while you specify data types and therefore it can be used dangerously but this tool is open-source so I encourage you to go look at it check it out see if you want to do a PR on it or improve it or just use these ideas and kind of build your own thing that's better but also most importantly D serialization vulnerabilities aren't gone yet they're still relevant and they're still interesting and I think exploits can and will be more complex as time goes on this is the first time I've ever seen a gadget chain that long and I think we need better tools to help us better understand those sorts of vulnerabilities so if you got questions we've got about five minutes you can also hit me up online later and thank you all for coming [Applause]
Feedback