We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

JavaJournal

00:00

Formal Metadata

Title
JavaJournal
Title of Series
Part Number
5
Number of Parts
20
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Despite the multitude of Java decompilers available, we often have the need to debug or trace malicious or obfuscated Java bytecode. Existing Java debuggers and tracers are mostly targeted towards Java developers, are closed-source, and are not meant to handle malicious or obfuscated targets. We present a new open-source cross-platform framework for debugging Java, written completely in Python, designed specifically for reverse engineering. We also present a Java method call tracer as a sample Python application that utilizes this framework.
Term (mathematics)String (computer science)SoftwareType theoryJava appletAuthorizationWindowProcess (computing)EncryptionCodeMalwareComputer programInterface (computing)Level (video gaming)Run time (program lifecycle phase)InformationError messagePhysical systemFunctional (mathematics)System callCompilerData recoveryArea3 (number)Function (mathematics)WeightCompilation albumEntire functionFamilyScaling (geometry)Electronic mailing listComputer configurationGodSet (mathematics)Different (Kate Ryan album)Fluid staticsWorkstation <Musikinstrument>ResultantPrincipal idealInformation securityProjective planeProgramming languageMultiplication signSubject indexingGraph (mathematics)Key (cryptography)ArmSource codePrice indexTracing (software)Integrated development environmentHelmholtz decompositionGoodness of fitCompilerIdeal (ethics)Dynamical systemJSONXMLUMLComputer animation
Interface (computing)Semiconductor memoryDemo (music)Process (computing)CodeHeat transferComputer configurationStress (mechanics)Right angleBlock (periodic table)Exception handlingClient (computing)Java appletMultiplication signEnterprise architectureReverse engineeringCross-platformWrapper (data mining)Standard deviationTheoryNetwork topologyExtension (kinesiology)Physical systemNumberBytecodeWritingInformationBuildingAutomationWeightSource codeCommunications protocolModul <Datentyp>Software frameworkPresentation of a groupProduct (business)Loop (music)Shared memoryMereologyParameter (computer programming)ConnectionismComputer programWater vaporDebuggerBridging (networking)Mathematical analysisLine (geometry)Scripting languageGoodness of fitFunctional (mathematics)Remote procedure callMalwareWindowFigurate numberWeb pageArithmetic meanRevision controlGenderMoment (mathematics)Local ringGroup actionProfil (magazine)Address spaceDynamical systemMotion captureVisualization (computer graphics)VotingAndroid (robot)
Sampling (statistics)Java appletProcess (computing)Computer animation
TouchscreenSource codeJava appletDemo (music)Image resolutionDifferent (Kate Ryan album)Computer animation
View (database)Perfect groupTouchscreenMultiplication signSpring (hydrology)
Social classRange (statistics)Function (mathematics)Java appletoutputComputer animation
String (computer science)Integrated development environmentFunction (mathematics)QuicksortWindowWeb crawler
MalwareCodeStack (abstract data type)Functional (mathematics)FamilyFluid staticsSystem callFunction (mathematics)Right angleReverse engineeringEncryptionoutputString (computer science)Java appletSymmetry (physics)Graphics tabletBranch (computer science)Letterpress printingSampling (statistics)Streaming mediaNetwork socketWellenwiderstand <Strömungsmechanik>Computer animation
String (computer science)Multiplication signComputer programProcess (computing)Line (geometry)System callComputer clusterCuboidJava appletoutputComputer fileVirtualizationWordAdditionRight angle
AdditionComputer configurationLine (geometry)Goodness of fitCodeSampling (statistics)Java appletCartesian coordinate systemData loggerIntegrated development environmentComputer programProcess (computing)Right angleComputer clusterType theoryObject (grammar)Function (mathematics)Parameter (computer programming)Computer fileProduct (business)Installation artHydraulic jumpAlpha (investment)OracleSoftware frameworkTheoryOpen sourceAbstractionComputer animation
Transcript: English(auto-generated)
So, next up, we have Jason Geffner from CrowdStrike, who is known for being the discoverer of Venom, and he will be giving a talk on Java Journal and PySpresso. Thanks for the introduction. Hey, everybody. Good to see a lot of familiar faces out there.
My name is Jason Geffner. I'm a principal security researcher with CrowdStrike, and, yeah, today, I'm here to talk with you about two projects that we're releasing, one called Java Journal and one called PySpresso. Now, as you can probably guess from the name of the talk, Java Journal, you're probably thinking right now, Jason, people still use Java?
Really? Come on. What year is it? Well, believe it or not, they do. Java is actually really, really popular. If we look at the TOB Programming Community Index, I pulled this graph from actually just a few days ago, the TOB Programming Community Index is an indicator of the popularity
of programming languages, and what we can see in light blue is Java. We can see an actual resurgence in Java over the past few years. We're seeing that it has overtaken C in terms of popularity, and it's, in fact, the most common programming language that we're seeing these days.
But it's not just authors of legitimate software that are using Java more and more now. We're also seeing an uptick in the use of Java by authors of malware, especially cross-platform malware. So more programs getting written in Java, and, you know, for a recon, we're probably
all thinking that's great news because Java decomposition is easy. Where's the problem? Look at all the tools we have for decomposition. We have CFR. We have Fernflower, JD GUI, Krakatoa, Prakion. I mean, all we have to do is take whatever we want to analyze, a Java binary, throw it
into one of these decompilers, and then just read the source code. Oh, my God! Oh, my God, I totally forgot. It's pretty easy to obfuscate Java code. So sometimes even having a decompiler really isn't enough. We're still going to have problems when it comes to obfuscated Java.
So then this begs the question, well, how do we then deal with obfuscated Java? How to analyze it? Well, there are three main approaches. The first approach is to take the decompiled Java code and try to recompile it and debug
it. Has anyone tried this, taking the output of a decompiled Java program, putting it into an IDE and trying to rebuild it? Some of you, I see some hands, I see a lot of laughter. I think some of you are laughing because you know that it almost never works. You're going to get a whole bunch of errors from the compiler, and even if you spend
the hours to try to fix all those errors and you can get it to compile, the chances of it running correctly are very slim. So it's very difficult to go the recompile and debug route, so we're going to cross that off our list. The next option is creating a deobfuscator. Seems like a good idea, and if you're writing your own AV engine, maybe it makes
sense if you're seeing the same obfuscation techniques used over and over again in a given set of families of malware. But the truth is that the work required to do this doesn't really scale, especially if you're an individual researcher or if you're seeing many different types of obfuscation
used. The other thing to keep in mind when it comes to creating a deobfuscator, especially a static deobfuscator, is that oftentimes the runtime deobfuscation that's done will actually make use of runtime information that you can't easily glean statically. So for an example, it's common to say see a string
decryption function being used in a Java program. And the decryption function uses for its decryption key information from the runtime. For example, it might look at the call stack. And from there, use that call stack information to generate a decryption key
to decrypt the string. So at runtime, it's pretty easy to figure out what the key would be because that information is going to be evaluated dynamically. But statically, it's very hard to write a program to automatically decrypt the strings for that type of challenge. Which leaves us with dynamic tracing. So I know everyone in here knows of tools like Process Monitor and S-Trace.
And they're really useful for capturing high-level information for the resources, the system resources that are used by a given process. But oftentimes, that level of information is just too high level to be able to get
a good understanding of what a program is doing, such as a program in Java. Now, for, say, native Windows programs written in C++ and whatnot, we can use tools like Rohitab's API monitor. I think some of us have probably heard of that. Really nice interface. We can see every API call that's made either to a Windows DLL API function or
a third-party DLL's export function. And we can see all the information, the entire call trace. It would be really nice to have that type of information for Java. So if we then keep going with this thought, what would an ideal tracer look like?
Well, there are several things that we'd want in an ideal tracer. First of all, we'd want it to be lightweight. Ideally, we don't wanna have a bunch of third-party dependencies to deal with. We'd also want it to be extensible. In theory, we'd be using some automation along with this. We'd wanna be able to have the ability to write code on top of it,
to be able to automate and interface with it. And because chances are, well, for almost everyone in the world except for me, you'd be using someone else's product. You'd want it to be well documented so you can actually understand what it's doing and how to use it. Okay, number two, I personally don't like writing Java code.
A lot of the solutions out there for analyzing Java programs require the user, the reverse engineer, to actually write their own analysis scripts in Java. I don't like that requirement. So let's say we don't wanna have to write in Java. Third, we want it to be cross-platform. So we want it to work on Java processes running on Windows,
on Linux, Mac OS, even Android. And number four, we wanna capture a lot of information, right? We're doing a whole trace, we wanna do a whole trace of all the method calls made. So we'd also wanna capture information like the arguments passed to methods and the return values.
Number five, we wanna be able to begin tracing at the very beginning. Now, there are some Java analysis tools out there right now that allow us to attach to an already running process and then start capturing information. Well, that's not too useful if you're dealing with malware,
because you might miss some really important things at the very beginning of the malware program before you can actually attach to it. Also, if there is anti-debugging work going on by the malware process, by the time you actually attach to it, it might be too late. And you actually might not even be able to attach to it, because maybe the process terminated itself.
And lastly, we ideally not want to have to transform the Java bytecode in memory. The reason being, just like the anti-debugging topic I just mentioned, it's possible for malware to detect that it's been modified in memory, it's on bytecode. So I mentioned there are several options already. We have a btrace, it requires the user to write in Java to be able to
trace on their program. Bytecode Visualizer requires Eclipse, so not too lightweight. It's also not extensible and doesn't show method return values. Cronon is very heavyweight, doesn't properly show arguments or return values, etc. So there are a lot of great tools up here.
I only have a 30 minute talk, so I'm not gonna go through each one. But as good as many of these tools are, none of them meet all of our requirements. So then what is our solution? Well, our solution is something that we built from the ground up. It is Java Journal running on top of
the debugging framework that we wrote called Pyspresso. So what is Pyspresso? Well, Pyspresso, you can see on the bottom right in the green and blue blocks. It is a transport client that can go over TCP IP or
shared memory, and a debug interface on top of that to communicate directly with the JVM. Now everything on the left actually ships with the standard edition of Java. The Java Virtual Machine Tools Interface, the Java Debug Wire Protocol Agent.
And this Java Debug Wire Protocol Agent, which is part of Java, can actually communicate with a debugger process over what's called the Java Debug Wire Protocol. And again, this can be done over TCP IP for local or remote debugging sessions. Or if you're running a local debug session on Windows, you can do it over shared memory, it's a little more performance.
Now because we're using a well-defined interface that is actually supported by Oracle, we don't have to worry about hooking things, which is nice. Because once you start relying on hooks, maybe the next version of whatever you're trying to hook, addresses change, function names change, whatever you were hooking may not be reliable. So by relying on a well-defined interface, chances are this is gonna be
a lasting solution that's gonna work for a long time. Another thing to note is that both the Pyspresso Debug Transport and the Pyspresso Debug Interface are written entirely in Python, and there are absolutely no dependencies. So that means a few things.
It means that there's no need to install any third-party packages. There's no need for external Java Debug Interface wrappers. And here's the thing that I really like. If you're debugging a remote Java process, you don't even need to install Java on your host system. That's pretty cool.
You can debug Java processes without having to install Java on your host system, so I like that a lot. And here's the other cool thing. Because of the modular design of Pyspresso and Java Journal on top of it, the actual debug loop and logging functionality in Java Journal, pretty much all of Java Journal itself,
it's about 100 lines of Python. So I was able to write this cool tool, which I'll demo in a moment, in just about 100 lines of Python. And because we're so modular with this, we can actually write other programs on top of Pyspresso as well. So maybe someone out here in the audience will next week write
passive profiler or the week later dynamic debugger. It's really very easy to write things on top of this debugging framework. But enough talk, let's get to the demos. So before I even get to the demos, let me take a vote. So between seeing a previously recorded demo and showing a playthrough of that, or seeing a live demo, anyone have a preference?
Wants to see the live demo instead? All right, okay, cool, good, good, okay. So let's take a really easy sample to start with. So this is just a hello world sample. All it does is it just, when you run it, it prints out hello world. It's pretty straightforward. So let's see, let's see how that would look with Java Journal.
View, full screen, okay.
So here in my VM, I have two jars for two different demos. I have my hello world jar, which is the source code I just showed you compiled into a jar. I have Pyspresso and I have Java Journal. So if I go ahead and run, actually, let me see if I can magnify this.
Let me change the resolution in the VM.
No, that didn't do what I wanted. View full screen, perfect. That's not right at all. Let's try it one more time. View full screen. All right, well, that's not gonna play along well with us, is it?
Let me try one other thing. Magnifier, all right, that's a little better, okay. So I'm running Java Journal, specifying my input jar of hello world.jar.
And I'm saying, you know what? Even though we're gonna begin our debugging at the very beginning of the JVM's execution, I really don't need to see all the JVM internals that actually occur before my jar is executed. So I'm gonna say, start showing the output at hello world.
When that class is loaded. So let me zoom out again. This is maybe, here we go, 200, okay. So I see a lot of output, and let's see if I can move the window.
Well, I can't easily move it, but let me scroll up. If I scroll up enough, I should see
the string hello world being printed out. Well, it's hard to do this in this environment, so I'll tell you what. I'll show you what it looks like here. Because the output is recorded to a file, I can pretty much copy the output and put it into something like notepad++ and use wrapping of the functions, collapsing of the functions.
And I would see that eventually I see a call to java.io.printstream.append with the string hello world. So let me show you one other sample. So this is similar to what I discussed earlier. This is a snippet of code, a function from a malware family called Adwind.
It's also known as JSocket and AlienSpy and like a dozen other names of this malware family. But it's pretty prevalent right now. It's written in Java. And here's an example of this function, I, I, I, I, I, I, JSocket.
It takes as input an encrypted string, and it's called from many places throughout the code. And you can see that the decryption key for the string is actually calculated dynamically based on the stack. So again, very difficult to try to reverse engineer this or create a de-obfuscator statically. But we know that this is in the package org.jsocket.b.
So we can use Java Journal again, this time by specifying a slightly different line.
We're gonna say run the jar, adwind.jar, and only show the method calls for methods in org.jsocket.b*. I can press Enter, and it's running in my VM now.
And okay, just ran. And I can see all the calls to that, to any methods in that package. I can see the input strings that are encrypted.
Then I can see the decrypted return values. Now again, doing this statically would be extremely time consuming. But I'm able to now see, okay, it's looking up and decrypting program files, that string, virtual box guest additions, program files again.
Now it's looking for VMware tools. And it seems to, that was the last string it decrypted. It just, the process terminated after that. So just based on this run, now I'm actually running this inside of VMware. I think I could pretty much assume that this is looking to see if it's running in a VM, looking for VMware tools. And if it is, it terminates.
So again, determining this statically, just by looking at the decompilation, would have been very, very difficult and very time consuming. But just by running it through Java Journal, I can see dynamically what's happening. And again, I could take the log file that was created as output,
and see a little bigger what actually happened. All right, so what are the takeaways then? So the good news is, you can download this right now. So it's about 4,000 lines of Python code. Again, zero dependencies.
And it's very well documented too. So we have about 400 kilobytes worth of HTML documentation for this. You can download it from GitHub. It's on PyPy as well. So you can just go to your command line and type pip install pyspresso. And you'll have it.
Some things to note. First of all, it's still in alpha, which means just like Oracle's official JVM, you should not run it in a production environment. But seriously, it is an alpha. Not every code path has been tested, so use it at your own risk.
In addition to being very well documented though, you also have the Java journal sample application that we're releasing fully open source, well commented as well. So you can take a look at that to get a feel for how to write your own programs on top of PySpresso. As with any project, there are always more things that can be done. So some of the things we're looking to do now, better inspection of method
arguments for opaque frames, kind of looking at things more natively, the way that PStack does it. There's a little work that can be done to improve object abstraction, to make things more Python-y. It would be kind of nice to be able to automatically have the option to attach
to child processes that your debugged Java process creates. And of course, right now, it's just right now a text-based output. It would be pretty nice if someone out here wants to create a very nice GUI for it, similar again to something like Rohitab's API monitor.
So with that, I'm happy to take any questions that you have. Thank you.