We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Shrinking in the Age of Kotlin

00:00

Formal Metadata

Title
Shrinking in the Age of Kotlin
Title of Series
Number of Parts
542
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Java is now more than 25 years old and throughout the Age of Java there has been many tools for shrinking, optimizing and obfuscating Java bytecode. This was and is especially important for mobile devices, which have certain resource constraints, from the early J2ME devices to today's Android devices. Today, in the Age of Kotlin, shrinking is more relevant than ever and not just for mobile applications. As well as mobile Android applications, desktop & server applications are also ever growing in size; for example Compose for Desktop typically generates large application packages. JetBrains has recently integrated ProGuard into the Compose for Desktop Gradle plugin to tackle this issue. But what has changed and how does Kotlin affect shrinkers like ProGuard (which is 20 years old and originally created to process Java compiler produced bytecode)? How does ProGuard shrink classes produced by the Kotlin compiler? The Kotlin compiler just produces Java bytecode anyway, so nothing changes? The talk will take a look at ProGuard under the hood and what's required to process Kotlin apps & libraries. In particular, there will be a deep dive into Kotlin metadata and how ProGuard (via the ProGuardCORE library) makes use of JetBrains' kotlinx.metadata library to read & write the metadata.
Mobile appDiagram
SoftwareSquare numberProcess (computing)CodeMathematical optimizationPhysical systemSoftware testingJava appletOvalFluid staticsConstructor (object-oriented programming)EmailString (computer science)Context awarenessExecution unitInformationIntegrated development environmentCompilerReflection (mathematics)Software developerMobile appDifferent (Kate Ryan album)CompilerCodeParameter (computer programming)UsabilityJava appletNetwork topologySocial classCodeGraph (mathematics)RootLibrary (computing)Level (video gaming)Point (geometry)Computer fileConstructor (object-oriented programming)BitAliasingType theoryCategory of beingPresentation of a groupInformation securityProcess (computing)Logical constantContent (media)Mathematical optimizationBytecodeRule of inferenceIntegrated development environmentNormal (geometry)Reflection (mathematics)Enterprise architectureString (computer science)Field (computer science)SpacetimeSingle-precision floating-point formatMathematical analysisData storage deviceMobile WebLimit (category theory)Metric systemContext awarenessProduct (business)Square numberFunctional (mathematics)Android (robot)Price indexGreatest elementFocus (optics)MetadataControl systemSoftware engineeringComputer animation
MetadataRun time (program lifecycle phase)String (computer science)Network topologyMultiplicationComputer fileSocial classEuclidean vectorHash functionCodeProcess (computing)Function (mathematics)SummierbarkeitCore dumpLocal GroupRevision controlCodeProjective planeCore dumpSlide ruleDemo (music)Social classUtility softwareFunctional (mathematics)Network topologyHierarchyComputer programComputer fileRevision controlBitWritingField (computer science)Different (Kate Ryan album)Sinc functionView (database)MetadataPoint (geometry)ParsingBytecodeWaveLibrary (computing)Java appletMereologyLevel (video gaming)Rule of inferenceCompilerMathematical analysisLink (knot theory)Representation (politics)String (computer science)FlagInformationUsabilityNormal (geometry)Source codeBinary fileOpen sourceProcess (computing)Zoom lens
Error messageSocial classJava appletGradientSource codeMenu (computing)TwitterNetwork topologyFunctional (mathematics)Letterpress printingMetadataComputer programJava appletoutputFunction (mathematics)Connectivity (graph theory)MereologyComputer fileSocial classBytecodeReflection (mathematics)EmailLibrary (computing)Arithmetic progressionUsabilityTwitterProcess (computing)Mathematical optimizationMappingField (computer science)Core dumpCodierung <Programmierung>Attribute grammarInformationString (computer science)Shared memorySubsetMobile appSpacetimeWeb serviceCommunications protocolUtility softwarePoint (geometry)Projective planeGoodness of fitService (economics)NumberBit rateCrash (computing)Sampling (statistics)Computer animation
Computer animationProgram flowchart
Transcript: English(auto-generated)
So let's get started with the next session and It seems like we're going to talk about making smaller apps With James Hamilton and a talk shrinking in the age of kotlin, please welcome
Thank you Okay. Yeah, so we're gonna talk today not just about kotlin But about shrinkers as well So first off, who am I? My name is James I'm a software engineer at guard square. You might know products such as ProGuard and X guard
So we built these products. So Mostly I work on things like mobile security Java bytecode dialogue bytecode code analysis obfuscation and and these kind of things mostly on ProGuard and X guard I Previously I worked for a few years on something completely different on control systems at CERN
And before that I did a PhD in code analysis and metrics So first let's let's just talk about what is shrinking so if you're a developer, if you're a mobile if you're Android developer, you might produce APKs if you're
a Non mobile developer, you might have might produce jars and you would Probably want to keep these as small as possible, especially in mobile because of the limitations on resources the small Amount of storage on the devices or maybe the users are paying per megabytes something like that
So you want to keep these things as small as possible? And so to do that we want something that can shrink these so if you are already An Android developer, you might know then ProGuard for example R8
Redex Y guard is another one. So these are all Java bytecode and Dalvik bytecode shrinkers Just a small disclaimer that this is not a shrinker tutorial. I'm not going to Teach you how to configure ProGuard. I'm not going to fix your keep rules today
and It's also not a sales pitch for shrinker I'm not going to Sell you ProGuard. I'm not going to tell you that you should use ProGuard over R8 or something like that So if it's not if it's not a sales pitch and it's not a tutorial what am I going to talk about today? So I want to basically answer a few questions
How does shrinker process to Kotlin generated code and to help answer that one? We need to know something about the differences between the Java classes and the Kotlin classes And then I want to show you a bit about how you can build tools to analyze and modify Kotlin classes
So first off, let's just talk a little bit about a very high level about how does a shrinker work? So there are normally three broad categories Of shrinking first one is tree shaking code optimization and name obfuscation So tree shaking if you think of your app
As a tree of all the reachable codes so you start at the root of the app in it For example in Java or Kotlin you start at the main method And you follow all the references that you can find You build a graph from that and then you shake this tree and all of the non-use stuff falls away So just like if you shake an apple tree the apples are going to fall out all of your unused code
It's going to fall away So this is especially useful for example with libraries So as an app developer you might use a bunch of different libraries Those libraries might use libraries and those libraries might use libraries But you might just want a few features
But all of that code gets pulled into your app So you can use a shrinker to remove To do tree shaking on that and remove unused classes methods fields for example And then another shrinking technique is code optimization
So tree shaking was all about removing the bigger entities the classes and methods And code optimization is really about the bytecode So for example if an optimizer can tell that some path is always going to be taken Then we can remove some of the code
And the last one I want to talk about is name obfuscation So this is about making the strings smaller So if you're an enterprise Java developer you might have some class names like this More characters means more bytes
So if we just rename this to a single character it's going to take up less space Just a small side note here which could make up a whole presentation on its own Name obfuscation on its own is not security But I won't talk about that more today if you want to discuss that more later
I'd be happy to But today I want to focus on shrinking So why am I talking about shrinkers in the Kotlin Dev Room? Why is the presentation called in the age of Kotlin? So the Kotlin compiler generates Java classes just like the Java compiler
So isn't it all just Java bytecode? Why is there a difference? So let's take a look at a very simple example So let's look at the hello world in Java hello world in Kotlin we will use the Java P tool to print out a disassembly of the of the class file and
Let's see what the difference is So it doesn't matter the exact content here Right away you can see that on the right sides the Kotlin sides is longer So what do we have here? We have some header, which is basically the same. So that's not very interesting We have a constant pool. We already see here that there are more constants used in the job in the Kotlin class
On the Java side we have an extra constructor which doesn't appear in the Kotlin side And that's because actually in this example, there is no class here So this main is in the top level of the file. There's no class here
So from the Kotlin point of view, you cannot instantiate this generated Java class file And then we have a main method and actually on the Kotlin side we have two methods Because I declared the methods without the args parameters So actually the Kotlin compiler generates two and one will call the other one
And then at the bottom here which is going to be most of the focus of this talk is Kotlin metadata and Why do we need this extra metadata that we saw in the class file here? So let's look at a very simple example. If you have a data class in Kotlin
Data classes don't exist in Java. So when you compile this to a Java class file you get a Java class There's no indication here that it was a data class Another example with Context receivers so if you have
Context receivers in Kotlin when you compile this to Java bytecode you will have a Java function which looks something like this or your context receivers will end up as a parameter the first parameters of your method So if you're just looking at this from the from a Java class for a point of view, how does
How do you know that the first parameters are context receivers and not just any other normal Context receivers and not any other parameters And then there are many other things encoded in the metadata, for example, nullability, type aliases and a lot more
And so this is a big problem for code that inspects the Kotlin code. So for example using reflection For example the compiler for example IDE all of this all of these tools Need to know that a class is a data class, for example
And how is this metadata encoded let's have a look Again at the Java P outputs and let's zoom in on the metadata So if we zoom in a bit we see that it's actually just Java annotation so I say just in quotes because inside that annotation is a bit more complicated has to be decoded
But it is a Java annotation So since it's a since it's just an annotation we can actually see the source code so you can find the source code on GitHub There are a bunch of different fields in the annotation
One of them the first one is the kind so we saw already that the main function The main that the small example that I gave it the main function at the beginning there was no class So actually this is a file kind not a class kind There was also version here And there are some two fields where the actual metadata is stored in a binary formats
And strings that are referenced by the metadata stored and then there are some other fields here with some strings and some bit flags Okay, so that's what metadata is why we need metadata
But why am I talking about shrinking what's then is the problem with shrinking Kotlin code So one of the most basic problems here is that there is an annotation right, so If if your shrinker or your other user who is configuring the shrinker does not tell the shrinker that it needs
annotation Typically, this annotation is not not used directly by the program. So when you do your tree shaking you won't See that it's used And then it can just be removed But then then it's just going to be a normal Java class again
So either your shrinker needs to know about Kotlin or you need to configure it to keep the annotation Another simple example is if you start renaming stuff In the Java classes, so if you rename the class if you rename the methods
Then you see in this example here. That's actually in the metadata still refers to all of the old names And then if you are removing methods because they're unused Well, they also there's also information about these functions from the Kotlin point of view in the metadata
So if you remove it from the the Java part It's still going to be in the in the Kotlin metadata unless your shrinker knows about Kotlin metadata So as I mentioned I work on ProGuard. I work on DexGuard
So and both of these process Kotlin metadata in the same way And so let's have a look at how that actually works So it's a very high level we have a textual representation of the metadata here So for example
It's a it's a there's a Java class it has some metadata attached there is a function there And you'll see in the metadata part. There is a link So for the class there's a attached metadata And then you'll see also that function in the Kotlin metadata points to an actual
Java a Java bytecode a Java method And then the metadata doesn't contain any of the actual bytecode the bytecode is in the Java method
So ProGuard core, that's so Sorry So There and there are two basic rules here, that's if the Java part is renamed rename the Kotlin part And if the Java part is unused
Remove the Kotlin part so for example if you rename the methods some here You should also rename the function in the metadata if you remove the method you should also remove the function in the metadata And at a high level that's to the basic rules that ProGuard follows when processing the metadata There are a lot of details around that, but at a high level. That's that's what's happening, and so how is this implemented?
So we have an open source project, which is separate from ProGuard called ProGuard core But it was born out of the ProGuard projects So basically it's extracted from the ProGuard project project
a lot of the bytecode manipulation And analysis so for example you can read and write Java class files and Kotlin files And you can modify and generate and analyze code And importantly for this talk you can inspect and modify Kotlin metadata, and this actually is powered by
the Kotlin X metadata library, which is Which is developed by JetBrains, so we don't actually need to Dive deep into the actual parsing of what's in this annotation
So JetBrains does that for us we take advantage of the library to be able to load the data from the annotation Manipulate it and then write it back again, and this also has a big advantage in that For example with versioning from the the ProGuard core point of view We don't really care about the version of the metadata that we need to parse different versions in different waves
That's that is delegated to the JetBrains library, so how can we use ProGuard core to read and modify Kotlin metadata? So let's have a look at an example
So I was thinking about doing a live demo here But I practiced yesterday, and there was IntelliJ problems and stuff so I decided to make some slides instead so basically What you can do is you can create For example new gradle projects add a dependency on ProGuard core, and then you will be able to use the features to modify
The metadata, so let's have a look an example of what kind of code you can write so let's say We've created a new project in IntelliJ we added a dependency on ProGuard core and
We have just a main function. We have a file called main. We have the main function And we want to read some Kotlin as we want to read some Java class file That was generated by the Kotlin compiler and look at the metadata So let's try reading the metadata from this class that we're writing
So once it's compiled it's going to end up somewhere here in the build folder Let's read it back in and then see what metadata is there So we can use a small utility function to be able to read in class files It will read in the class file, and it will initialize the the Kotlin metadata
It will put that class file into a container called a program class pool Once we've done that we should initialize all the cross references And this is quite an important concept in ProGuard core like for example the references to the super classes So you have the whole hierarchy
references between classes with the method calls So that's an important step after you've loaded in the class initializer references And once you've done that you now have access to the Kotlin metadata So what we can do is we can we can visit all of the classes that are loaded into the class pool
We can visit all of their metadata and within that metadata we can visit all of the functions And then we can for example print out the function name. I know that like this is not printing out the The method name of the Java method this is printing out the function name that is in the metadata
So if we run this we we will see some output here So we've run the input to this program is this program itself, so there is one function And so it prints out main if we add another function we run it again it will print through and main
But we don't just need we cannot we can't just We can't only just read metadata we can also modify
Metadata and we can also modify the Java parts of the class file So let's say with that our shrinker wants to rename a method to some other name So let's visit all of the methods in the class Let's rename it if it's if it's called foo already. Let's rename it to new foo
otherwise we just keep the original name and I know that now that We've renamed the Java components and now the metadata is out of sync, so how do we fix that?
Well what we can do is we can visit the metadata we can we can then Look at the reference where the where the metadata points to the Java method, and then we can set the name But actually there is a utility and protocol which can do that for you the class reference fixer that will fix up all the
names after you've renamed stuff Once we've done that we need to write the metadata back into the annotation So we use a Kotlin metadata writer for that and once we've done that we can write out the class to overwrite the original file, so if we open the file now in the
IntelliJ decompiler we see that the the function is now called new foo So what's important here is that we've renamed the Java components the method where the bytecode actually lives and also the Kotlin metadata if you want to learn more about
protocol if you want to start modifying bytecode Kotlin metadata yourself, or if you want to build tools to modify Kotlin metadata Good place to start is the manual if you just want to look at metadata You can check out our Kotlin metadata printer projects It will take in APK or JAR file or class file as input and show you all the metadata
This is Actually built into the ProGuard playground web service as well So you can upload a JAR there and it will just show you the Kotlin metadata And as I mentioned before the ProGuard Core metadata support is built on top of the Kotlin metadata library from JetBrains
So you don't need to use ProGuard Core to use that library so you can also check that out as well If you have any questions, I'll be happy to answer you can also Contact me via Twitter or Twitter. I'm also on LinkedIn as well. If you have any questions later
Thank you Awesome. We do have five minutes for questions from the audience So yeah, please just shout it
most yeah, so if you're just Okay, so the question is can you throw our way metadata if you're developing an app? So not a library in a lot of cases, yes, unless you're using reflection and Reflection is quite popular. So
If you don't use reflection, you're not you're not making a library you can probably get rid of a lot of metadata But then reflection is a big problem now
I Don't have any any numbers here, but All of the basically all of the header information for all of the functions accepts the Accepts the actual bytecode is encoded in the metadata
So it can be quite big there is there is some sharing because there is You saw in the metadata Annotation there's a strings array. So actually those strings are shared with with other strings because they're part of the constant pool So that saves space but it can be a lot and if you're developing an app
Which doesn't use reflection then maybe you can just remove all of it Yes Yeah, yeah, so classes
So the question was with tree shaking. Can you remove methods not just classes? Yeah, so the tree shaking Normally will remove entities in app for example classes, but also methods can be removed fields can be removed Sorry
Yes, so sometimes so this is more at least in progress the inline is more of the optimizers job But yeah, so some things can be inlined and then the methods the original method can then be removed Also for Java class files attributes can be removed if they're not used
Yeah and decode and for progress of the decade is part of the optimizers job And then once you remove that code, you can also run the shrink and set step the tree shaking step again and then start removing unused methods Fields and classes that just became unused because you optimized
So question is how does it affect the debugging but what's exactly that do you mean sorry?
Yeah Okay, yeah, so so when you rename everything then how does this affect debugging for example if you get a stack trace from
Some crash or something. Yeah, so ProGuard will generate a mapping file which maps from the original names to the new names And and this mapping file is also used by our rate as well It's the same mapping file and then and this is also supported by services like crashlytics
So the mapping file will be uploaded to crashlytics for sample and if you see crashes from customers, it will be automatic