Automated Discovery of Deserialization Gadget Chains
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 322 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/39698 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
00:00
Vulnerability (computing)ChainFocus (optics)BuildingCodeSocial classJava appletType theoryControl flowExtension (kinesiology)ThumbnailCodeData typeSoftwareType theoryChainObject-oriented programmingSlide ruleRevision controlSerial portCASE <Informatik>Information securityPolymorphism (materials science)Point (geometry)Social classoutputWordCasting (performing arts)Java appletComputer fileStreaming mediaMiniDiscComputing platformObject (grammar)Classical physicsVulnerability (computing)Exploit (computer security)Service (economics)VolumenvisualisierungThumbnailSoftware developerComputer animation
02:14
ChainJava appletAreaJust-in-Time-CompilerSocial classSerializabilityPriority queueHash functionObject (grammar)Equals signPairwise comparisonCodeData typeFormal languageImplementationOrder (biology)ParsingString (computer science)Function (mathematics)Library (computing)BitChainSlide ruleTable (information)QuicksortRevision controlGoodness of fitSerial portCASE <Informatik>Process (computing)Binary fileStrategy gameInformation securityAdditionPoint (geometry)Social classHash functionoutputReading (process)Cartesian coordinate systemFocus (optics)Java appletElectronic mailing listOpen sourceStreaming mediaKey (cryptography)Different (Kate Ryan album)Object (grammar)Archaeological field surveyPriority queueVulnerability (computing)MappingDefault (computer science)Level (video gaming)Casting (performing arts)Run time (program lifecycle phase)Mobile appComputer animation
07:37
ChainImplementationLibrary (computing)Level (video gaming)Algebraic closureFunctional (mathematics)ChainSocial classHash functionReading (process)Electronic mailing listKey (cryptography)Object (grammar)Proxy serverXMLUML
08:17
Object (grammar)ChainOvalCodeFunctional (mathematics)Logical constantObject (grammar)Message passingLevel (video gaming)System callVotingHash function
08:52
Computer wormJava appletChainCodeImplementationBinary codeFunctional (mathematics)Table (information)Serial portAbstractionSocial classHash functionEndliche ModelltheorieComputer wormCodeChainCartesian coordinate systemVulnerability (computing)Library (computing)Java applet
10:07
Streaming mediaJava appletSocial classChainInformation securityoutputVulnerability (computing)Similarity (geometry)Computer configurationStrategy gameFormal languageParsingString (computer science)Library (computing)Type theoryBuildingParsingConfiguration spaceSerial portoutputCartesian coordinate systemJava appletStatement (computer science)Streaming mediaC sharpObject (grammar)Archaeological field surveyMultiplication signVulnerability (computing)Service (economics)Default (computer science)Computer configurationInjektivitätProjective planeGoodness of fitCASE <Informatik>Strategy gameSequelExtreme programming
12:21
Computer configurationStrategy gameVulnerability (computing)CodeOrder (biology)Speech synthesisLibrary (computing)Server (computing)Operator (mathematics)Serial portProcess (computing)Binary fileFirmwareoutputCartesian coordinate systemInterprozesskommunikationClient (computing)Streaming mediaExtreme programmingMiniDiscPatch (Unix)Object (grammar)Vulnerability (computing)Right angleService (economics)Mechanism designSoftware developerElectric generatorBinary codeContext awareness
14:22
ChainLibrary (computing)Exploit (computer security)MiniDiscJava appletType theoryOvalPlug-in (computing)Suite (music)Computer wormOrder (biology)Library (computing)ChainProjective planeRevision controlSerial portSocial classoutputCartesian coordinate systemMetadataJava appletElectronic mailing listStreaming mediaObject (grammar)LengthVulnerability (computing)Exploit (computer security)Plug-in (computing)Computer wormSpacetimeSuite (music)Dynamical systemQuicksort1 (number)Computer animation
16:19
Context awarenessChainVulnerability (computing)Density of statesBuildingInformationTask (computing)Density of statesEstimationVulnerability (computing)Exploit (computer security)Computer wormEstimatorComputer animation
17:09
Entire functionBytecodeVulnerability (computing)Source codeNegative numberAlgebraic closureCodeFormal languageEntire functionTotal S.A.ChainEstimationCovering spaceSummierbarkeitSocial classCartesian coordinate systemJava appletBytecodeVulnerability (computing)Position operatorSoftware developerBuildingEstimatorComputer animation
18:06
Mathematical analysisJava appletBytecodeChainSequenceGenetic programmingCodeLibrary (computing)Entire functionPairwise comparisonWorld Wide Web ConsortiumInfinityImplementationRecursionCondition numberHash functionMathematical analysisChainJava appletBytecodeSequenceImplementationOrder (biology)Fluid staticsLevel (video gaming)Variable (mathematics)BitFunctional (mathematics)Line (geometry)Slide ruleHypothesisDataflowQuicksortSystem callBranch (computer science)Parameter (computer programming)Serial portCASE <Informatik>Point (geometry)Social classHash functionoutputCartesian coordinate systemReflection (mathematics)Graph (mathematics)Enumerated typeCondition numberObject (grammar)Symbol tableContext awarenessComputer wormGame controllerDefault (computer science)1 (number)CodeLibrary (computing)Entire functionHierarchyTrailNeuroinformatikMultiplication signMessage passingJSONXMLUMLComputer animation
24:15
Hash functionChainSource codeMathematical analysisLimit (category theory)ChainPoint (geometry)Hash functionReading (process)NeuroinformatikObject (grammar)Level (video gaming)Inheritance (object-oriented programming)Enumerated typeProxy serverComputer animation
25:13
Graph (mathematics)Hash functionSource codeChainRun time (program lifecycle phase)System callImplementationSerializabilityDirected graphJava appletAnalog-to-digital converterStreaming mediaConstructor (object-oriented programming)Mathematical analysisLibrary (computing)AlgorithmMathematical analysisData typeGraph (mathematics)ImplementationOrder (biology)Function (mathematics)Library (computing)Form (programming)Functional (mathematics)Directed graphLimit (category theory)ChainMereologyProjective planeSystem callGoodness of fitParameter (computer programming)Serial portPoint (geometry)Social classCartesian coordinate systemData conversionElectronic mailing listSound effectSource codeDirect numerical simulationLimit setRun time (program lifecycle phase)Different (Kate Ryan album)Constructor (object-oriented programming)SerializabilityContext awarenessMultiplication signSlide ruleComputer fileVulnerability (computing)Uniform resource locatorComputer animation
28:41
Library (computing)Library (computing)ChainOpen sourceJSON
29:15
ChainSocial classJava appletExploit (computer security)Standard deviationLibrary (computing)Library (computing)ChainSocial classJava appletReflection (mathematics)Exploit (computer security)Standard deviationPosition operatorSerial portComputer animation
29:55
Java appletObject (grammar)Proxy serverSinguläres IntegralChainGroup actionChainReflection (mathematics)Object (grammar)Level (video gaming)Cartesian coordinate systemProxy serverComputer animation
30:38
CompilerCodeJava appletRevision controlChainAlgebraic closureProcess (computing)Boolean algebraPairwise comparisonObject (grammar)Library (computing)Density of statesAlgebraic closureChainStructural loadTable (information)Density of statesEmailRevision controlSerial portAbstractionPoint (geometry)Inheritance (object-oriented programming)Social classHash functionCartesian coordinate systemElectronic mailing listComputer fileEndliche ModelltheorieMiniDiscMultiplication signMobile appPatch (Unix)Computer animation
32:30
Process (computing)Java appletUniform resource locatorObject (grammar)ChainCompilerPresentation of a groupCodeAlgebraic closureFormal verificationChainSerial portCorrespondence (mathematics)Computer wormCodeUniform resource locatorPoint (geometry)Social classHash functionDifferent (Kate Ryan album)Algebraic closureFunctional (mathematics)Civil engineeringComputer animation
33:43
Social classConstructor (object-oriented programming)Point (geometry)ChainChainCartesian coordinate systemOpen sourcePatch (Unix)Library (computing)Decision theoryLimit (category theory)Point (geometry)Social classConstructor (object-oriented programming)Multiplication signVulnerability (computing)Web applicationDefault (computer science)Software developer1 (number)NumberSerial portJSONXMLComputer animation
35:09
Object (grammar)Fluid staticsSocial classInterior (topology)Constraint (mathematics)Library (computing)Standard deviationArray data structureGeneric programmingChainWeb applicationCore dumpComputer fileJava appletConcurrency (computer science)Configuration spaceData typeType theoryRecursionSelf-referenceConstraint (mathematics)Functional (mathematics)ChainNumberQuicksortPublic-key cryptographyField (computer science)Social classCuboidInterior (topology)Array data structureComputer fileGeneric programmingUniform resource locatorComputer wormWeb applicationLibrary (computing)AerodynamicsSerial portDirectory serviceBinary fileInheritance (object-oriented programming)Resolvent formalismReading (process)Thread (computing)Mixed realityObject (grammar)Context awarenessMappingComputer animation
37:17
System callJava appletConcurrency (computer science)Boom (sailing)Configuration spaceWeb applicationLibrary (computing)Entire functionConstraint (mathematics)ChainPower (physics)Social classSet (mathematics)Cartesian coordinate systemMobile appXML
37:56
Java appletConcurrency (computer science)Configuration spaceCore dumpString (computer science)Web applicationCalculusChainDirected graphGraph (mathematics)Enumerated typeVirtual realityImage resolutionFirst-person shooterControl flowFunctional (mathematics)PrototypeExploit (computer security)Source codeMathematical analysisCodeData typeFormal languageString (computer science)Function (mathematics)Library (computing)BuildingSoftware testingFunctional (mathematics)Directed graphLimit (category theory)ChainResultantNumberQuicksortJackson-MethodeDirectory serviceExistenceoutputCartesian coordinate systemJava appletReflection (mathematics)Computer fileStreaming mediaContext awarenessMultiplication signVulnerability (computing)Exploit (computer security)Service (economics)Position operatorWeb applicationForm (programming)Dot productPrototypeSerial portBytecodeBlind spot (vehicle)Mobile app
Transcript: English(auto-generated)
00:00
And so let's talk about deserialization vulnerabilities. Before I get into it, just a couple words about myself. My name is Ian Haken. I'm a senior software security engineer at Netflix. I'm on the platform security team where we make a bunch of tools to keep our microservice ecosystem safe. Download the slide deck afterwards. We talk a lot about all the cool stuff we do, so you can check that out after the talk.
00:22
But today I'm talking about deserialization gadget chains. So I'm gonna start by just answering the obvious question, what is a deserialization vulnerability, and then getting into the question of what is a deserialization gadget chain. And ultimately what I wanna talk about is a new tool that I built for understanding gadget chains and of course the fun stuff at the end,
00:40
some of the new exploits that that tool was able to uncover. So what is a deserialization vulnerability? So in object-oriented languages like Java, and I'm mostly gonna be using Java examples in this talk, code is contained in classes and classes hold your data alongside the code. And that's the whole point of object-oriented design, and that gives you cool features like polymorphism.
01:02
But this means that if you control the type of data, if you're able to specify what data type something is, then you're implicitly controlling what code gets run. So let me give you an example. So this is kind of a classic Java deserialization vulnerability. It's a rest endpoint that reads in a post body and passes it into an object input stream,
01:22
and then you read some object out of it, and in this case we're casting that object to a user and calling render on it. So what the developer might intend is that this is some user class that exists on the class path, and so the post body that gets sent in is some serialized version of this. It has a name. When you call render on it, it returns that name.
01:41
Totally innocuous. Nothing interesting can really happen with this. But where you start getting into dangerous territory is if maybe you had something like this on your class path. So it extends user, and it's a thumbnail user. The intent is that there's some member that specifies a file path with the thumbnail of that user, and when you call render, it reads that file from disk. So if an attacker sends a thumbnail user
02:02
to this endpoint instead of a regular user, then when it calls user.render, he can read off any file from the disk and get that returned. So that's what I mean by controlling data types means that you end up controlling what code gets executed. So why am I talking about deserialization today? That's the 2016 topic. This is not new.
02:20
This is something that we've been thinking about for a little while. But I mean, honestly, this class of vulnerabilities really goes back to even before 2016. So some of the first mentions of it go all the way back to 2006. Mark Schoenenfeld gave a talk in a black hat that year and kind of identified how some application containers
02:41
basically were subject to this kind of vulnerability. They were using object input stream in an unsafe way, and you could get code execution on them. But the talk that really kind of put the spotlight back on the subject was given by Frooff and Lawrence in 2015 at AppSec Cali. And this just really kind of blew up this vulnerability class because they showed
03:01
that there's these gadget chains that exist in all sorts of open source libraries that mean basically any application that's doing unsafe deserialization is subject to some kind of RCE. And it's because they utilize these libraries that have these RCE gadget chains in them. So in the year that followed, I've heard a lot of application security researchers
03:21
refer to that as like the Java deserialization apocalypse because everyone realized that their application was vulnerable to this sort of thing. So every talk, every conference, every convention had someone talking about this stuff in 2016. My favorite talk from that year was probably by Luca at a OWASP meetup where he just did a really good job of kind of explaining what these vulnerabilities are,
03:43
what they look like, what exploits look like, and how you should remediate them. So if you really want to dive into this a bit more after, that's definitely a good talk to go look at. But you might have thought that was the end of it. If 2016 was the Java deserialization apocalypse, then it's all said and done. But at last year's Black Hat, Minos and Maros
04:01
gave a survey of JSON parsing libraries that talked about how all these other libraries also can potentially do some unsafe deserialization and you can be subject to just as much dangerous behavior as if you're using like the Java object input string. Up to that point, most of the focus was really on this Java object input stream.
04:20
And they did a survey not just in Java, but across other languages like C Sharp of just other JSON parsing libraries where things can go wrong. And in case you think that was the last talk, or basically this is the last talk, this vulnerability class isn't going away. In October at AppSecUSA, there's someone talking again about deserialization vulnerabilities and why you've got to do stuff
04:41
to protect yourself from them because we haven't solved this yet. It's not gone. So why are deserialization vulnerabilities so bad and interesting? If they were all really just like that first slide I showed, then they actually wouldn't pop up that much because it's not that often that you have some class on your class path that does something dangerous that overrides something where you meant it
05:00
to do something safe. And the reason that they're so bad is because there's these things called magic methods. And what those are, are they're methods on classes that get automatically invoked by the deserializer before the deserializer ever even returns. So that means that dangerous behavior that's implemented in one of these magic methods can get invoked regardless of what data type
05:22
you meant to be returned from that deserializer. So here's another example. So this is exactly the same dangerous endpoint or vulnerable endpoint from that first slide. But let's say there's some bad class you have on your class path that's doing something unsafe inside one of these magic methods like read object. So in this case, it's just executing some string
05:41
that it's reading out of that object input stream. But even though my application isn't using evil class at all, even though it expects a user to come back, the deserializer is going to execute that read object magic method before it ever returns. So it's gonna execute that runtime exec before they cast a user. So it doesn't matter what my application
06:01
actually expected the data type to be. So what's the deal with magic methods? Maybe you've never even heard of them before. How common can they actually be? And the answer is they're actually really common because all sorts of classes inside the JDK implement magic methods. And so hashmap and priority queue
06:20
are a couple of good examples, but they're all over the place. And the reason that these magic methods exist is because it allows classes to customize how they serialize and deserialize their data. So if you had a hashmap that just used the default serialization strategy where it serialized all of its hash tables and different maps and bins and buckets, then that serialized version of the hashmap
06:42
probably wouldn't be interoperable between Java versions because they may change their implementation under the hood and then everything would break when you tried to deserialize it. So instead, what it does is it implements these magic methods where when you've tried to write out the object, instead of writing out all of its hash tables, it just writes out a list of key value pairs.
07:01
And then inside its read object method, it expects to be able to read in a list of key value pairs, and it calls this.put on the key and the value. And that means that each object, or each key at least, that is reading in from that input stream, it's calling hashcode and equals on it in order to put it into the hashmap. So this gets you some additional known entry points
07:21
because it means that if you have some class on your class path that does something dangerous inside hashcode or equals, we know we can wrap that class inside a hashmap and get from its read object magic method into the dangerous hashcode method. And so this is how we start building up a gadget chain. So here's a really specific example of what a gadget chain might look like.
07:41
So here's more or less what hashmap does inside its read object method. And all of it's doing is basically what I just said. It's reading keys and values out of a list and then calling put on it. And in particular, it calls hashcode on the keys it reads out. So let's say there's this class that exists on your class path. And this is an example of a class out of the closure library.
08:01
So it's basically a proxy object where inside hashcode, what it does is it looks up an I function interface inside its map for hashcode and then it invokes it. And so inside that closure function map, we the attacker could serialize some interesting I function implementation. So as an example, you could implement, you could supply the compose function
08:21
which just has two members functions inside of it that it composes. And so as one of those functions, we could supply the constant function. And then as the other function, we could supply eval. And then basically when you wrap all of this up in a nice package and tell your deserializer to deserialize it, it's gonna automatically call read object on your hashmap.
08:41
That's automatically gonna call invoke on this compose function, which is automatically going to call invoke on this constant function and pass that into the eval function and then do arbitrary code execution. So this is an example of what that payload might look like using Jackson style serialization. So, and that's exactly what it just described.
09:01
You wrap things in a hashmap as its members, you use this abstract table model class with the dangerous hashcode implementation as hashcode, you use this compose function and then you supply the values you want for each of those two functions inside there and then you can execute whatever binary or command you want. So the important thing to understand about gadget chains
09:22
and the things that makes deserialization vulnerability so dangerous is, as I kind of showed you in that example and kind of alluded to earlier, what gadget chains can be constructed has nothing to do with what your class, with what your application actually does because if there are classes on your class path, they can be specified by the serialized payload
09:43
and then your application can therefore be made to construct them and run whatever magic methods exist in those classes. So your code, as with that example, wouldn't have to have called any of those things. In fact, maybe there's no code anywhere even transitively that called any of those methods
10:01
but by the mere fact that they exist on your class path, they can potentially be exploited. So what Java libraries are vulnerable? And again, I'm kind of focusing on Java but this is definitely something that applies to C Sharp and PHP and lots of other languages but in Java, the object input stream, the one that's built into the JDK
10:21
is probably the most well-known and most studied one but Xtreme is another library. It's an XML parser in its default configuration that can be used unsafely and all these JSON parsing libraries have unsafe configurations where they can basically be induced to deserialize arbitrary types and therefore potentially do dangerous behavior and if you're interested in exactly
10:40
when those libraries might be dangerous, you should definitely spend some time reading Munoz and Marocha. They did a really good survey of how and when these kind of libraries can be misused but what's important is that as you begin studying these additional libraries beyond just the object input stream, libraries end up having different magic methods that will automatically get invoked and they have different notions of what can be serialized
11:01
and that's gonna be really important as I keep talking about this later in the talk. So how do you know if your application is vulnerable? So finding potentially vulnerable applications is really basically the same thing as a lot of other application security vulnerabilities. So things like XSS or SQL injection, all the vulnerability really is
11:21
is some kind of attacker controlled input flowing into one of these dangerous libraries. So in this case, it's the object input stream or X stream or Jackson and so existing tools are kind of already good at understanding how to find those vulnerabilities because it's exactly the same thing as looking for some kind of attacker controlled string
11:40
going into some kind of SQL statement. So I'm not too interested in digging further on how you find those vulnerabilities because existing tools are really good at that. But what do you do once you do find a vulnerability? That's the big question that I want to talk about. And one of the simple answers is why don't you just use a better serialization strategy?
12:02
Why use one of these dangerous libraries, use something that's safe? And Luca has this great quote from his talk in 2016. It's 2016, there's better options. Why do you still use object input stream? I think that's really good advice if you're working on a new project, if you're building a new service. But what happens if you're not working on a new project?
12:22
So who recognizes these guys or in particular the thing on the left? So that's the original Netflix disc that got sent out to owners of a Wii so that you could stream Netflix from Wiis. And so that's got client code stamped on a disc that was sent out in 2010 that we still have to be able to speak to. And so you might be in situations
12:42
where you don't control your clients and can readily update your IPC mechanism. The guy on the right is the first generation Roku that came out and it's exactly the same thing. It's got firmware in there that needs to be able to talk to upstream services. And even if you can, you're thinking you can just update firmware and update your IPC mechanism,
13:02
if someone's got one of those in a closet and they pull it out in two years, at the very least we need to be able to talk the IPC mechanism that tells them they need to go fetch an update. So you can't just turn things off easily necessarily. And even if you're not in one of these contexts where you've got some clients that you can't easily update, it's still just a very costly operation
13:22
to start ripping apart your IPC mechanism. If you need to update your server to speak something new, something other than JSON or Xtreme or object input stream binary format, then you've got to update your server, then make sure you update all of your clients, and then only once you finally tear down everything on the server side would you be safe.
13:41
And that's just a lot of work, even in an ecosystem where you control both the client and the server. And so at Netflix, where we've got a microservice ecosystem, we've got thousands of applications, and we're coming across these things, and we have to decide what to tell a developer about how important it is to patch this issue we found, we need to answer the question, is it worth the effort to drop what I'm doing
14:01
and spend three or four weeks or maybe more doing exactly that process I've described of updating all your clients and services in order to patch that vulnerability? Is your deserialization vulnerability that we just found even exploitable? And that's something that's not immediately obvious when all you know is that some kind of untrusted input flows into one of these unsafe libraries.
14:23
So how do you find exploits for a deserialization vulnerability? How do you find these gadget chains? So YsoSerial is one of the most well-known projects in this space that Frolof maintains, and it's got a bunch of gadget chains for the object input stream. MarshallSec is another project in this space
14:42
that's got some wider breadth and understands some gadget chains for some of these other deserialization libraries, but they're both basically projects that have these known gadget chains, and you can compare your application to that list of bad libraries where you know there's some version of this particular library where you can construct a gadget chain,
15:00
but that doesn't tell you something that might be unique about your application. Maybe there's a gadget chain that only shows up when there's some class in your application plus some other classes in these other libraries that only when all put together end up giving you some kind of interesting gadget chain, and furthermore, those are all bound to these kind of known deserialization libraries.
15:21
What if you're using something new or something custom that is vulnerable to these same kind of attacks but isn't one of these sort of well-studied ones? How do you answer the question is my vulnerability exploitable? So besides the couple that I mentioned, there's a bunch of other existing tools in this space. So JOOGLE is a good tool for programmatically querying
15:41
about metadata on your class path. There's a Java deserialization scanner, which is a Burp Suite plugin that mostly uses payloads from YSO serial in order to detect whether or not you're vulnerable to one of these known gadget chains. The NCC Group Burp plugin is something that was released earlier this year. Again, another dynamic scanner that's mainly based on payloads
16:01
from the Munoz and Morocha's work at last year's Black Hat, so this is more focused on the JSON deserializers. But again, these are all kind of tools that might help you but don't immediately answer that question. Is there something unique to my application that makes it vulnerable to one of these exploits? So given that I wasn't able to find a tool
16:22
that did exactly what I wanted, I went about the task of asking how can we evaluate the risk of this kind of vulnerability and what do we really want to be able to answer? And what we want to be able to answer is what is the risk? How important is it to remediate a vulnerability? We want to know if that deserialization vulnerability is exploitable.
16:42
And if it is exploitable, what exploits are possible? RCEs tend to be much more interesting than DOS. And so if that's our goal, just to evaluate the risk, we don't necessarily have to be perfect. We don't have to set about to solve this problem once and for all. A reasonable overestimation of risk is reasonable. And we don't actually have to generate payloads
17:01
if we don't want to, knowing what kinds of payloads might be constructible is also a really useful piece of information. So if those are the requirements, what I want to set out to do, then specifically what I'd like to do is build something that finds those gadget chains. So I'm not looking for vulnerabilities.
17:20
I'm only gonna use this new tool if I already know my application is vulnerable. But it needs to be able to look at the entire class path because of what I said at the beginning. It doesn't matter what code is in my application. It matters about the sum total of classes on my class path. It should err on the side of false positives because a reasonable overestimation of risk is more useful
17:40
because I don't want to tell developers to drop what they're doing and fix something unless I have good reason to believe that there's something exploitable in it. And lastly, it should operate on the Java bytecode because we've got like a million plus one languages written on top of the JVM now and I don't want to write something that has to understand Groovy and Scala
18:00
and Clojure and Kotlin and whatever comes out next week. So if I just operate on bytecode, then I've got it covered. So I put together a tool that I called Gadget Inspector, which is a Java bytecode analysis tool for finding gadget chains. That's what it does. So the way it works is it operates on a class path, so you specify either some jars and their dependencies
18:21
or an entire war, basically your entire application, and then it reports discovered gadget chains, which is really just a sequence of method invocations where one invokes the next and you're starting at some known entry point and you're getting to some kind of dangerous behavior. It does a little bit of simplistic symbolic execution to figure out when some attacker-controlled arguments can get passed into a method
18:40
and then that gets passed to the next one in the chain. And most importantly, because of the context we're working in, this tool is able to make a lot of simplifying assumptions that actually makes this pretty easy to do. It's not something where you have to have written a thesis on symbolic execution in order to understand or implement it. So, all right, specifically, how does this tool work?
19:02
So the first step is just enumerating everything on your class path. You want to figure out the whole class hierarchy, all the method hierarchies, so that when you see something calling a method from one magic method, like hash map calling hash code, you want to know what are all the implementations of hash code that you might jump to. So first step is just enumerating all that stuff,
19:21
and that's not terribly difficult. You can use the plain old Java reflection APIs to do that if you want to. But important first step for the rest of the analysis. So where things start getting interesting is when I want to understand the data flow inside an application. So the first thing that I wanted to discover is what I call pass-through data flow. So this is where basically what I mean is
19:41
if an attacker can control the input to a function, does that attacker control data get returned back out of a function? So in this case, like with the constant function, if an attacker controls the implicit this to this argument, then they're going to be able to control this dot value, and therefore the return value. So that's one of the first assumptions that goes into this.
20:01
If an object is tainted, and this is basically taint analysis that I'm doing here, and if you're not familiar with that or don't really know what I mean by taint in this context, all I really mean is that I'm thinking of it as being attacker controllable. So if an object is tainted, then every member on that object is also considered tainted. And that's a pretty reasonable assumption because if we are thinking of an object
20:21
as being attacker controlled, that means it came out of the serialization library, so all of the members on that object are also in that serialized payload. So that means when we look at a function like this, we can enumerate this piece of pass-through data flow. And all this kind of funky custom syntax means is that if the attacker controls argument zero, which in this case is the implicit this,
20:42
then the return value is also considered attacker controlled. And that's just because we returned this dot value. So as one other example where things start getting a little hairier, there's this default function, which wasn't on a previous slide. So all this does is look at an argument, and if it's not null, it returns it, and otherwise it invokes some other function,
21:01
like a constant function. And in this case, we've got a branch condition, which is something that's also really hairy to deal with if you're doing any kind of static or symbolic analysis. But in this case, we make another assumption, which is that all branch conditions are satisfiable. I'm not gonna worry about whether or not I can go down different paths. And this is probably one of the weakest assumptions
21:21
that's made in this, but it's also one of the easiest ones to make because in practice, if you're inside these magic methods or going down a gadget chain where all this stuff is attacker controllable because it would have to be for you to get there in this chain, then basically all of the variables and arguments going into a branch condition are attacker controllable. So usually an attacker can tweak these things to get down whatever branch condition they want to.
21:42
So if we assume all branches can be walked down, then we end up with these paths through data flow. So in this case, the first argument just gets directly returned here if we go down the true path. And if we go down the false path based on the first pass-through data flow we discovered,
22:00
the return value of f dot invoke is gonna be considered tainted as well. So we numerate that. So step three is basically exactly the same thing. It's the same symbolic execution of just walking through what data flows where. But this time, instead of looking at return values, we care about where data flows into subsequent method calls. And so we're gonna use the data
22:21
from step two in this to just enhance this enumeration. But let's look at that dangerous hash code method that we had earlier and see how that shakes out here. So in this case, we would end up enumerating these pass-through call graphs or method calls. So again, some sort of funny custom syntax,
22:40
but all I'm saying here is that if argument zero, the implicit this, is attacker controllable, then that's gonna flow in as argument one, what I call function dot invoke. In this case, all I know is that it's the I function interface. And so we get that literally because this gets passed in as argument one to that function there. So that one's kind of easy to figure out.
23:01
But f dot invoke, f comes out of this map, which is a member of this. So again, because of that assumption where we assume all members are attacker controllable, we know that f would be attacker controllable. So f, which gets passed in as the implicit this to function dot invoke, would also be attacker controllable. So that's where we get that from. And just to go through sort of one more example,
23:22
this is what you would get if you looked at the compose function, again, from the previous slide. And again, all we're really doing when you are doing this symbolic execution is just stepping through byte code one line at a time. And it's actually kind of easier to understand what's going on when you look at it that way. But sort of at a higher level,
23:41
what we do is we see argument one gets passed in as argument one to function dot invoke. Then we see f1, which is a member of argument zero. The implicit this gets passed in as the implicit this to function dot invoke. And finally, the value that gets returned from that based on our analysis from step two would also be considered attacker controllable. And then that gets passed in as argument one to f2.
24:04
So just a lot of walking through these functions and enumerating these things. And really, there's not a lot very deep going on here. It's just kind of a lot to keep track of. But computers are good at that. So step four, next to last step is just enumerating known entry points.
24:22
And that's basically just using all the known tricks that researchers have come up with over the last few years to figure out how to get into interesting gadget chains. So for example, we see this hash code method. We know it overrides object dot hash code. So we can enumerate that as an entry point. So all right, that step's super easy, especially after the last few. But this does highlight one limitation
24:41
that I want to point out, which is that this does rely on known tricks. So knowing that we can get to hash code, we could have derived from this analysis just by going through that symbolic execution of the read objects method of hash map. But there's other clever tricks that researchers have come up with, like wrapping things in a dynamic proxy,
25:01
where that then calls invocation handler dot handle, that we wouldn't be able to derive. So there's definitely room for more gadget chains that this thing might be missing, just because there might be more clever tricks that aren't hard-coded into this guy. So all right, very last step. Now that we've enumerated all that stuff, the only thing that we have left to do is literally just do a, like,
25:22
algorithms 101 breadth-first search on this call graph in order to see if we can get from one of these known sources to a method that does something interesting. So just using exactly that stuff we've enumerated to build up that gadget chain from some of those first slides, we would look at that entry point. And then looking at the methods that that calls below,
25:42
we'd want to step into each of those and see what methods those things would subsequently call. And here's where we make one of the last assumptions, which is any method implementation can be jumped to. So down here we see we're calling I function dot invoke, and we don't have a specific method that we're, or a specific implementation that we're jumping to there.
26:02
So as we're going through this call graph, we're gonna go look at every implementation of that, as long as that class is considered serializable. And the reason that we assume we can do that is literally because that's how we build up gadget chains. If we control the data type of one of those members that determines what implementation of I function this is,
26:22
then we can build up our gadget chain in such a way to get to whatever implementation we want to get to. So for example, we might use this call in order to get into function compose dot invoke, and then looking at what functions that calls, we're gonna end up walking through each of those invocations, and one in particular might be calling function dot invoke,
26:40
where we pass in a tainted argument of one, and use the eval as our implementation, and then inside there we would see we call runtime dot exec, and we know that does something interesting and something dangerous, so we would output this as our gadget chain. So by walking through all those steps, this thing would look at that library and spit out this gadget chain. The one last limitation that I will point out here
27:01
is that this, of course, relies on knowing what are interesting methods or interesting sinks that we should output gadget chains for. So there's lots of good stuff in the JDK, so reading files, writing files, runtime dot exec, opening up a URL, doing DNS lookups, sleeping, there's all kinds of side effects that you might be interested in, so adding more to its list of interesting sinks
27:22
is a way to improve this tool, but even with kind of a limited set of just knowing what's interesting gets you pretty far. So one of the things that I mentioned at the top of this talk that was really important to me is that there's a lot of different libraries now where we know there's serialization vulnerabilities,
27:42
and as part of this analysis, I mentioned a few times that there's things like known entry points that we want to start with, or we consider any class that's serializable to have a method we can jump to. So all those things are parameterizable in this analysis. So for JRE deserialization, anything implementing serializable is considered a serializable class,
28:01
but for Xtreme, it depends on what converters you've enabled, so it depends on how your application is set up. For Jackson, it's basically any class with an OR constructor is considered serializable. For Jackson, you can also only jump into constructors as your entry points, and so there's lots of differences between libraries, but all of those things can be easily tweaked and parameterized in this analysis.
28:21
So this is what makes this tool, I think, especially powerful, is that you might be working in some kind of custom context where you're doing unusual forms of deserialization that happen to be unsafe, but aren't well studied yet by a project like Marshall Sec or YsoSerial, and this is a tool that can help give you insight into those kinds of libraries.
28:42
So all right, I described this tool. It does a whole bunch of funky things that maybe you did or didn't follow, depending on how much sleep you guys got last night, and I claimed that at the end of the day, this thing can find some gadget chain, sorry. Does it live up to the hype? So the first thing I did after writing this thing on like a 10-hour flight to Europe
29:01
was run it on some open-source libraries to see does this thing actually do anything useful? Can it find some stuff? Because at the very least, it should be able to find gadget chains we know exist because of the stuff that Frooff and Lawrence discovered in 2015. So all right, built this tool, ran it against 100 most popular libraries, at least according to MavenRepository.com,
29:22
and looked for exploits against the standard Java deserialization library. It did successfully rediscover some known gadget chain. So cool, it's at least doing what I claim it's supposed to do. It didn't find a ton of classes implementing serializable, so it didn't have a ton of new findings, but it did have some, and so I'm gonna talk about those.
29:42
And it did have a handful of false positives because this does try to err on the side of false positives, but not as many as you'd expect, like just a dozen, enough that are easy to rule out, and it's mostly because reflection is hard to reason about. So all right, old gadget chains, what did it discover? So it rediscovered the Commons Collections gadget chain,
30:01
and the reason that this gadget chain was so interesting when Frooff and Lawrence first discovered it is because it's the 38th most popular dependency, at least when I looked this up a couple months ago, and so it's everywhere. Like every application, more or less, ends up pulling this thing in as some kind of transitive dependency, and this is more or less what that gadget chain looked like. You wrap your object inside a dynamic proxy,
30:22
and then you get into this invocation handler, and then you go to the Commons Collections lazy map, which ends up doing some reflection things, and lets you basically call any method you want. And so, hooray, it found that old gadget chain. It's finding things that I expect that it should be able to find. But the first thing that it actually found was this new gadget chain, the side closure,
30:41
and this is basically the gadget chain that I've been kind of discussing and using as an example leading up to this. So this was super interesting because this was the sixth most popular dependency, according to MavenRepository.com, so what this gadget chain did, at least according to the version that I originally found, is load a closure file from disk and execute it,
31:02
which may or may not be interesting, but it also turned out that by tweaking that last step in there to call eval instead of load file, it would execute arbitrary closure that you pass in, so it's basically RCE. So that's super interesting. If there are people that patched their version of Commons Collections but decided that they're good now
31:20
and they're still doing unsafe deserialization, chances are you're probably pulling in this dependency, so you're still in hot water. Hopefully in the last couple years, people have figured out that they shouldn't be doing unsafe deserialization, but people continue to surprise me. So I did report this to the ClosureDev mailing list when I discovered it, and they decided who's even serializing this class anyway,
31:43
we're just gonna turn off serialization for that class, and then that's great. So all releases since 1.9.0 have disabled serialization on that abstract table model class, so that hash code entry point doesn't exist anymore. Yay. We're making the world safer, one gadget chain at a time.
32:01
More recently, I discovered some new gadget chains in Scala using this tool. So Scala is the third most popular dependency, according to MavenRepository.com. So this gadget chain isn't an RCE, maybe not as interesting, but it does allow you to write or overwrite a zero byte file on disk, and that's an interesting DOS exploit,
32:21
because you can overwrite some application resource file, zero it out, and then your app goes down. So that's possibly interesting. There's a very similar one that Gadget Inspector also found that can do an SSRF. So it does a get at an arbitrary URL, and it's basically the same thing. And this is something that Gadget Chain spat out,
32:42
and I've got examples of the actual Gadget Chain payload on my fork of YSO serial that you can check out after this talk. So these are not just things that it found and I'm claiming could actually work if you built the gadget chain. I did actually build the corresponding gadget chain and verify that these things work, so cool stuff.
33:00
So just before this talk, a couple of weeks ago, I reran Gadget Inspector on the latest release of Clojure, and then it turns out that that exact same gadget chain I found before still exists in Clojure just with a different entry point. There's another class implementing hash code that delegates to a function, and so you can actually do exactly the same gadget chain
33:21
using this different entry point, and that's been in every release since 1.8.0. So apparently there's still an RCE gadget chain in every release of Clojure that's out there. So I need to follow up with the Clojure guys and see if they want to lock down this too, but really I hope this is just hammering in the point that you've got to stop doing unsafe deserialization, guys.
33:41
There's gadget chains everywhere. But all right, enough of that. I've looked at open source libraries, but what I was getting at at the top of this talk was that what I really wanted to find was gadget chains that are specific to my applications I'm looking at so that I can go back to developers and tell them how important is it that you patch this thing right away,
34:01
or can you wait until your next release so that you can finish up these critical features. So let's look at vulnerable web app number one. So this was using some potentially dangerous use of Jackson deserialization. An attacker could specify any class to instantiate and put an arbitrary body in there, but there were a lot of limitations on it.
34:22
It was using more or less the default configuration of Jackson, so you could only deserialize classes with no R constructors. Your only entry points are gonna be no R constructors, and most of the time, classes don't do anything terribly interesting in constructors, but it did have a 200 megabyte class path and was bringing in like six dozen dependencies, so there might be something there,
34:42
and I don't really have enough time to manually go through every constructor of every class on that class path to find out if any of them do anything interesting. So I ran gadget-dispector, and it found nothing. So all right, that wasn't the cool exploit and bombshell you guys were hoping for, but it saved a bunch of time because no one had to go through every constructor
35:02
and decide if it was important to remediate this vulnerability. We could tell the developers that hey, it's cool if you wait until the next time that you're able to get to this. But the story doesn't end there, so internal web app number two. So this one was really interesting because it used a non-standard deserialization library, something that had some like custom in-house tweaks to it
35:20
that had some really unique constraints on it. So it invoked read-resolve matching methods, but not read object, and it was able to deserialize any class on the class path that didn't have to implement serializable, except for the rest of these constraints down here. So one is that its member fields couldn't have dollars in it because that screwed up the binary format of this thing.
35:40
So non-static inner classes always have this implicit dollar outer member name, so basically anything that happened to be a non-static inner class was not serializable. Furthermore, it didn't have any support for serializing arrays or generic maps, and most importantly, every member value had to be non-null, and that meant that every member value,
36:01
every type of every member value, also had to satisfy all of these constraints because you couldn't leave it null, so you had to serialize it as something, so it had to satisfy all these constraints. Also meant that you couldn't have any data types that had any character arrays or byte arrays in it. You couldn't have any data types that had some kind of self-referential or recursive type, because thread, for example, has parent,
36:21
which is a type of thread, so there's no way to have non-null members for all those and sticking to a payload. So it was really, really hard to determine what classes were even considered serializable in this context, much less whether or not you could actually build a gadget chain that went through those particular classes. But that's the sort of thing where
36:40
gadget inspector has the functionality to stick in all of those constraints and then ask it to tell you, what do you find? And this is what it found. So this is a 12-step deep gadget chain, starts at read resolve, like I promised, and the bottom thing that it does here is copy a file from any arbitrary location
37:00
to any other arbitrary location. And that was cool, because it allowed us to do things like exfiltrate private keys off the box by dropping them in the web app resources directory. And you can look at this really closely, but I feel like you don't really have to. The thing that's really interesting is just looking at the package names that are showing up here. So here I've highlighted the different dependencies
37:20
that this gadget chain is flowing through. And if you count the app itself and the JRE, there's seven different libraries involved in this gadget chain. And it's something that you would never have found by analyzing any of those individually, or you would never have found it by looking at the set of dependencies without also pulling in the classes from the application itself. But it's something that just lit up
37:41
as soon as I ran gadget inspector on it. And so that's super cool, and that's what I'm talking about, where this thing is utilizing the power to look at the entire class path, and it's able to utilize the parameterization of what it means to be serializable according to your kind of custom constraints. So that was a really cool gadget chain that this thing found, but also spending just like five or 10 minutes
38:02
staring at this. You see this gadget chain method at step eight, which is streamPumper.run. What that actually does is copies an input stream to an output stream. So if you look at this for just a few minutes, you realize you can tweak this last thing to copy an arbitrary string input stream to a file output stream
38:21
and be able to write an arbitrary string to an arbitrary file. So I was able to write a JSP to my web app resource directory and get RCE with this gadget chain. So this was a really cool result to come out of this, and immediately allowed us to say, all right, we've got to fix this thing now because you're getting RCE on this really sensitive service. So this was really powerful,
38:42
and it saved us time of trying to actually build up this thing. As a matter of fact, this was a web app that we actually had a pen test team looking at, and they identified that it was vulnerable to this kind of vulnerability, but they spent a couple days kind of looking at it here and there and basically weren't able to decide whether or not you could do anything with it.
39:01
Gadget Inspector took about 15 minutes to run on this application and spit this out. So that is a huge time saver, and I think a huge win for both pen testers and AppSec engineers trying to understand deserialization vulnerabilities in applications. So obviously, there's gonna be a lot of room for improvement in this kind of tool. So reflection continues to be the bane of existence
39:22
for anyone doing code analysis of any form. It's hard to understand, and this tool basically just treats any kind of reflection as an interesting sink just because it doesn't know how to do any better, but that also leads to a lot of false positives and some blind spots, so it can be improved there. I also mentioned that there's a number of assumptions and limitations that I made
39:40
in the course of building this tool, and while I think most of those were reasonable given the context we're working in, it's obviously something that could be improved. But that being said, I think diving down into this kind of automatic analysis for deserialization vulnerabilities is territory that has a lot of room for more discovery and more time spent on it
40:02
because this was something that's just kind of a functional prototype, but it's already saved us a bunch of time as we've been doing AppSec reviews of our internal applications. And this was something that I specifically wrote for Java and to understand Java byte code, but I think all the techniques I described here apply equally well to C Sharp and PHP
40:20
and all these other languages that have these kind of libraries that allow you to specify data types and therefore can be used dangerously. But this tool is open source, so I encourage you to go look at it, check it out, see if you want to do a PR on it or improve it, or just use these ideas and kind of build your own thing that's better. But also, most importantly,
40:41
deserialization vulnerabilities aren't gone yet. They're still relevant and they're still interesting, and I think exploits can and will be more complex as time goes on. This is the first time I've ever seen a gadget chain that long. And I think we need better tools to help us better understand those sorts of vulnerabilities. So if you've got questions, we've got about five minutes. You can also hit me up online later,
41:01
and thank you all for coming.