Shrinking in the Age of Kotlin
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
Contributors | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61973 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Mobile appDiagram
00:31
SoftwareSquare numberProcess (computing)CodeMathematical optimizationPhysical systemSoftware testingJava appletOvalFluid staticsConstructor (object-oriented programming)EmailString (computer science)Context awarenessExecution unitInformationIntegrated development environmentCompilerReflection (mathematics)Software developerMobile appDifferent (Kate Ryan album)CompilerCodeParameter (computer programming)UsabilityJava appletNetwork topologySocial classCodeGraph (mathematics)RootLibrary (computing)Level (video gaming)Point (geometry)Computer fileConstructor (object-oriented programming)BitAliasingType theoryCategory of beingPresentation of a groupInformation securityProcess (computing)Logical constantContent (media)Mathematical optimizationBytecodeRule of inferenceIntegrated development environmentNormal (geometry)Reflection (mathematics)Enterprise architectureString (computer science)Field (computer science)SpacetimeSingle-precision floating-point formatMathematical analysisData storage deviceMobile WebLimit (category theory)Metric systemContext awarenessProduct (business)Square numberFunctional (mathematics)Android (robot)Price indexGreatest elementFocus (optics)MetadataControl systemSoftware engineeringComputer animation
08:38
MetadataRun time (program lifecycle phase)String (computer science)Network topologyMultiplicationComputer fileSocial classEuclidean vectorHash functionCodeProcess (computing)Function (mathematics)SummierbarkeitCore dumpLocal GroupRevision controlCodeProjective planeCore dumpSlide ruleDemo (music)Social classUtility softwareFunctional (mathematics)Network topologyHierarchyComputer programComputer fileRevision controlBitWritingField (computer science)Different (Kate Ryan album)Sinc functionView (database)MetadataPoint (geometry)ParsingBytecodeWaveLibrary (computing)Java appletMereologyLevel (video gaming)Rule of inferenceCompilerMathematical analysisLink (knot theory)Representation (politics)String (computer science)FlagInformationUsabilityNormal (geometry)Source codeBinary fileOpen sourceProcess (computing)Zoom lens
16:45
Error messageSocial classJava appletGradientSource codeMenu (computing)TwitterNetwork topologyFunctional (mathematics)Letterpress printingMetadataComputer programJava appletoutputFunction (mathematics)Connectivity (graph theory)MereologyComputer fileSocial classBytecodeReflection (mathematics)EmailLibrary (computing)Arithmetic progressionUsabilityTwitterProcess (computing)Mathematical optimizationMappingField (computer science)Core dumpCodierung <Programmierung>Attribute grammarInformationString (computer science)Shared memorySubsetMobile appSpacetimeWeb serviceCommunications protocolUtility softwarePoint (geometry)Projective planeGoodness of fitService (economics)NumberBit rateCrash (computing)Sampling (statistics)Computer animation
24:52
Computer animationProgram flowchart
Transcript: English(auto-generated)
00:05
So let's get started with the next session and It seems like we're going to talk about making smaller apps With James Hamilton and a talk shrinking in the age of kotlin, please welcome
00:22
Thank you Okay. Yeah, so we're gonna talk today not just about kotlin But about shrinkers as well So first off, who am I? My name is James I'm a software engineer at guard square. You might know products such as ProGuard and X guard
00:42
So we built these products. So Mostly I work on things like mobile security Java bytecode dialogue bytecode code analysis obfuscation and and these kind of things mostly on ProGuard and X guard I Previously I worked for a few years on something completely different on control systems at CERN
01:04
And before that I did a PhD in code analysis and metrics So first let's let's just talk about what is shrinking so if you're a developer, if you're a mobile if you're Android developer, you might produce APKs if you're
01:24
a Non mobile developer, you might have might produce jars and you would Probably want to keep these as small as possible, especially in mobile because of the limitations on resources the small Amount of storage on the devices or maybe the users are paying per megabytes something like that
01:45
So you want to keep these things as small as possible? And so to do that we want something that can shrink these so if you are already An Android developer, you might know then ProGuard for example R8
02:00
Redex Y guard is another one. So these are all Java bytecode and Dalvik bytecode shrinkers Just a small disclaimer that this is not a shrinker tutorial. I'm not going to Teach you how to configure ProGuard. I'm not going to fix your keep rules today
02:20
and It's also not a sales pitch for shrinker I'm not going to Sell you ProGuard. I'm not going to tell you that you should use ProGuard over R8 or something like that So if it's not if it's not a sales pitch and it's not a tutorial what am I going to talk about today? So I want to basically answer a few questions
02:43
How does shrinker process to Kotlin generated code and to help answer that one? We need to know something about the differences between the Java classes and the Kotlin classes And then I want to show you a bit about how you can build tools to analyze and modify Kotlin classes
03:01
So first off, let's just talk a little bit about a very high level about how does a shrinker work? So there are normally three broad categories Of shrinking first one is tree shaking code optimization and name obfuscation So tree shaking if you think of your app
03:22
As a tree of all the reachable codes so you start at the root of the app in it For example in Java or Kotlin you start at the main method And you follow all the references that you can find You build a graph from that and then you shake this tree and all of the non-use stuff falls away So just like if you shake an apple tree the apples are going to fall out all of your unused code
03:44
It's going to fall away So this is especially useful for example with libraries So as an app developer you might use a bunch of different libraries Those libraries might use libraries and those libraries might use libraries But you might just want a few features
04:03
But all of that code gets pulled into your app So you can use a shrinker to remove To do tree shaking on that and remove unused classes methods fields for example And then another shrinking technique is code optimization
04:21
So tree shaking was all about removing the bigger entities the classes and methods And code optimization is really about the bytecode So for example if an optimizer can tell that some path is always going to be taken Then we can remove some of the code
04:43
And the last one I want to talk about is name obfuscation So this is about making the strings smaller So if you're an enterprise Java developer you might have some class names like this More characters means more bytes
05:02
So if we just rename this to a single character it's going to take up less space Just a small side note here which could make up a whole presentation on its own Name obfuscation on its own is not security But I won't talk about that more today if you want to discuss that more later
05:20
I'd be happy to But today I want to focus on shrinking So why am I talking about shrinkers in the Kotlin Dev Room? Why is the presentation called in the age of Kotlin? So the Kotlin compiler generates Java classes just like the Java compiler
05:41
So isn't it all just Java bytecode? Why is there a difference? So let's take a look at a very simple example So let's look at the hello world in Java hello world in Kotlin we will use the Java P tool to print out a disassembly of the of the class file and
06:02
Let's see what the difference is So it doesn't matter the exact content here Right away you can see that on the right sides the Kotlin sides is longer So what do we have here? We have some header, which is basically the same. So that's not very interesting We have a constant pool. We already see here that there are more constants used in the job in the Kotlin class
06:27
On the Java side we have an extra constructor which doesn't appear in the Kotlin side And that's because actually in this example, there is no class here So this main is in the top level of the file. There's no class here
06:40
So from the Kotlin point of view, you cannot instantiate this generated Java class file And then we have a main method and actually on the Kotlin side we have two methods Because I declared the methods without the args parameters So actually the Kotlin compiler generates two and one will call the other one
07:02
And then at the bottom here which is going to be most of the focus of this talk is Kotlin metadata and Why do we need this extra metadata that we saw in the class file here? So let's look at a very simple example. If you have a data class in Kotlin
07:22
Data classes don't exist in Java. So when you compile this to a Java class file you get a Java class There's no indication here that it was a data class Another example with Context receivers so if you have
07:41
Context receivers in Kotlin when you compile this to Java bytecode you will have a Java function which looks something like this or your context receivers will end up as a parameter the first parameters of your method So if you're just looking at this from the from a Java class for a point of view, how does
08:03
How do you know that the first parameters are context receivers and not just any other normal Context receivers and not any other parameters And then there are many other things encoded in the metadata, for example, nullability, type aliases and a lot more
08:23
And so this is a big problem for code that inspects the Kotlin code. So for example using reflection For example the compiler for example IDE all of this all of these tools Need to know that a class is a data class, for example
08:42
And how is this metadata encoded let's have a look Again at the Java P outputs and let's zoom in on the metadata So if we zoom in a bit we see that it's actually just Java annotation so I say just in quotes because inside that annotation is a bit more complicated has to be decoded
09:06
But it is a Java annotation So since it's a since it's just an annotation we can actually see the source code so you can find the source code on GitHub There are a bunch of different fields in the annotation
09:23
One of them the first one is the kind so we saw already that the main function The main that the small example that I gave it the main function at the beginning there was no class So actually this is a file kind not a class kind There was also version here And there are some two fields where the actual metadata is stored in a binary formats
09:45
And strings that are referenced by the metadata stored and then there are some other fields here with some strings and some bit flags Okay, so that's what metadata is why we need metadata
10:02
But why am I talking about shrinking what's then is the problem with shrinking Kotlin code So one of the most basic problems here is that there is an annotation right, so If if your shrinker or your other user who is configuring the shrinker does not tell the shrinker that it needs
10:25
annotation Typically, this annotation is not not used directly by the program. So when you do your tree shaking you won't See that it's used And then it can just be removed But then then it's just going to be a normal Java class again
10:44
So either your shrinker needs to know about Kotlin or you need to configure it to keep the annotation Another simple example is if you start renaming stuff In the Java classes, so if you rename the class if you rename the methods
11:02
Then you see in this example here. That's actually in the metadata still refers to all of the old names And then if you are removing methods because they're unused Well, they also there's also information about these functions from the Kotlin point of view in the metadata
11:22
So if you remove it from the the Java part It's still going to be in the in the Kotlin metadata unless your shrinker knows about Kotlin metadata So as I mentioned I work on ProGuard. I work on DexGuard
11:40
So and both of these process Kotlin metadata in the same way And so let's have a look at how that actually works So it's a very high level we have a textual representation of the metadata here So for example
12:01
It's a it's a there's a Java class it has some metadata attached there is a function there And you'll see in the metadata part. There is a link So for the class there's a attached metadata And then you'll see also that function in the Kotlin metadata points to an actual
12:27
Java a Java bytecode a Java method And then the metadata doesn't contain any of the actual bytecode the bytecode is in the Java method
12:41
So ProGuard core, that's so Sorry So There and there are two basic rules here, that's if the Java part is renamed rename the Kotlin part And if the Java part is unused
13:01
Remove the Kotlin part so for example if you rename the methods some here You should also rename the function in the metadata if you remove the method you should also remove the function in the metadata And at a high level that's to the basic rules that ProGuard follows when processing the metadata There are a lot of details around that, but at a high level. That's that's what's happening, and so how is this implemented?
13:29
So we have an open source project, which is separate from ProGuard called ProGuard core But it was born out of the ProGuard projects So basically it's extracted from the ProGuard project project
13:43
a lot of the bytecode manipulation And analysis so for example you can read and write Java class files and Kotlin files And you can modify and generate and analyze code And importantly for this talk you can inspect and modify Kotlin metadata, and this actually is powered by
14:06
the Kotlin X metadata library, which is Which is developed by JetBrains, so we don't actually need to Dive deep into the actual parsing of what's in this annotation
14:23
So JetBrains does that for us we take advantage of the library to be able to load the data from the annotation Manipulate it and then write it back again, and this also has a big advantage in that For example with versioning from the the ProGuard core point of view We don't really care about the version of the metadata that we need to parse different versions in different waves
14:47
That's that is delegated to the JetBrains library, so how can we use ProGuard core to read and modify Kotlin metadata? So let's have a look at an example
15:04
So I was thinking about doing a live demo here But I practiced yesterday, and there was IntelliJ problems and stuff so I decided to make some slides instead so basically What you can do is you can create For example new gradle projects add a dependency on ProGuard core, and then you will be able to use the features to modify
15:27
The metadata, so let's have a look an example of what kind of code you can write so let's say We've created a new project in IntelliJ we added a dependency on ProGuard core and
15:40
We have just a main function. We have a file called main. We have the main function And we want to read some Kotlin as we want to read some Java class file That was generated by the Kotlin compiler and look at the metadata So let's try reading the metadata from this class that we're writing
16:02
So once it's compiled it's going to end up somewhere here in the build folder Let's read it back in and then see what metadata is there So we can use a small utility function to be able to read in class files It will read in the class file, and it will initialize the the Kotlin metadata
16:22
It will put that class file into a container called a program class pool Once we've done that we should initialize all the cross references And this is quite an important concept in ProGuard core like for example the references to the super classes So you have the whole hierarchy
16:41
references between classes with the method calls So that's an important step after you've loaded in the class initializer references And once you've done that you now have access to the Kotlin metadata So what we can do is we can we can visit all of the classes that are loaded into the class pool
17:04
We can visit all of their metadata and within that metadata we can visit all of the functions And then we can for example print out the function name. I know that like this is not printing out the The method name of the Java method this is printing out the function name that is in the metadata
17:25
So if we run this we we will see some output here So we've run the input to this program is this program itself, so there is one function And so it prints out main if we add another function we run it again it will print through and main
17:51
But we don't just need we cannot we can't just We can't only just read metadata we can also modify
18:02
Metadata and we can also modify the Java parts of the class file So let's say with that our shrinker wants to rename a method to some other name So let's visit all of the methods in the class Let's rename it if it's if it's called foo already. Let's rename it to new foo
18:24
otherwise we just keep the original name and I know that now that We've renamed the Java components and now the metadata is out of sync, so how do we fix that?
18:43
Well what we can do is we can visit the metadata we can we can then Look at the reference where the where the metadata points to the Java method, and then we can set the name But actually there is a utility and protocol which can do that for you the class reference fixer that will fix up all the
19:02
names after you've renamed stuff Once we've done that we need to write the metadata back into the annotation So we use a Kotlin metadata writer for that and once we've done that we can write out the class to overwrite the original file, so if we open the file now in the
19:23
IntelliJ decompiler we see that the the function is now called new foo So what's important here is that we've renamed the Java components the method where the bytecode actually lives and also the Kotlin metadata if you want to learn more about
19:42
protocol if you want to start modifying bytecode Kotlin metadata yourself, or if you want to build tools to modify Kotlin metadata Good place to start is the manual if you just want to look at metadata You can check out our Kotlin metadata printer projects It will take in APK or JAR file or class file as input and show you all the metadata
20:06
This is Actually built into the ProGuard playground web service as well So you can upload a JAR there and it will just show you the Kotlin metadata And as I mentioned before the ProGuard Core metadata support is built on top of the Kotlin metadata library from JetBrains
20:22
So you don't need to use ProGuard Core to use that library so you can also check that out as well If you have any questions, I'll be happy to answer you can also Contact me via Twitter or Twitter. I'm also on LinkedIn as well. If you have any questions later
20:40
Thank you Awesome. We do have five minutes for questions from the audience So yeah, please just shout it
21:02
most yeah, so if you're just Okay, so the question is can you throw our way metadata if you're developing an app? So not a library in a lot of cases, yes, unless you're using reflection and Reflection is quite popular. So
21:23
If you don't use reflection, you're not you're not making a library you can probably get rid of a lot of metadata But then reflection is a big problem now
21:44
I Don't have any any numbers here, but All of the basically all of the header information for all of the functions accepts the Accepts the actual bytecode is encoded in the metadata
22:04
So it can be quite big there is there is some sharing because there is You saw in the metadata Annotation there's a strings array. So actually those strings are shared with with other strings because they're part of the constant pool So that saves space but it can be a lot and if you're developing an app
22:24
Which doesn't use reflection then maybe you can just remove all of it Yes Yeah, yeah, so classes
22:41
So the question was with tree shaking. Can you remove methods not just classes? Yeah, so the tree shaking Normally will remove entities in app for example classes, but also methods can be removed fields can be removed Sorry
23:01
Yes, so sometimes so this is more at least in progress the inline is more of the optimizers job But yeah, so some things can be inlined and then the methods the original method can then be removed Also for Java class files attributes can be removed if they're not used
23:23
Yeah and decode and for progress of the decade is part of the optimizers job And then once you remove that code, you can also run the shrink and set step the tree shaking step again and then start removing unused methods Fields and classes that just became unused because you optimized
23:50
So question is how does it affect the debugging but what's exactly that do you mean sorry?
24:13
Yeah Okay, yeah, so so when you rename everything then how does this affect debugging for example if you get a stack trace from
24:25
Some crash or something. Yeah, so ProGuard will generate a mapping file which maps from the original names to the new names And and this mapping file is also used by our rate as well It's the same mapping file and then and this is also supported by services like crashlytics
24:45
So the mapping file will be uploaded to crashlytics for sample and if you see crashes from customers, it will be automatic