Making the memory dump a powerful development tool
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 46 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/47171 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
droidcon Berlin 201529 / 46
5
7
10
13
16
18
20
21
23
26
31
34
35
36
41
42
45
00:00
Read-only memoryCore dumpSlide ruleAndroid (robot)Software developerRead-only memoryWordRead-only memoryLecture/Conference
00:35
Online helpComputer networkWindows PhoneAndroid (robot)Android (robot)Computing platformOffice suiteRead-only memoryCellular automatonRead-only memoryComputer animation
01:27
LeakPersonal digital assistantCore dumpRead-only memoryResource allocationSocial classField (computer science)String (computer science)Core dumpRead-only memoryCartesian coordinate systemMobile appComputer fileDampingInheritance (object-oriented programming)Read-only memorySlide ruleLecture/Conference
01:54
Read-only memoryCore dumpSocial classString (computer science)Field (computer science)Core dumpString (computer science)Social classInformationRead-only memory2 (number)Object (grammar)Cartesian coordinate systemField (computer science)Instance (computer science)Read-only memoryDampingComputer animation
02:59
LeakRead-only memoryPersonal digital assistantCartesian coordinate systemRead-only memoryLeakCASE <Informatik>Read-only memoryDampingMedical imagingRight angleCore dumpComputer animation
03:55
Online helpRead-only memoryRead-only memoryDampingState of matterException handlingPointer (computer programming)Core dumpLecture/Conference
04:13
Online helpView (database)HierarchyView (database)Cartesian coordinate systemInformationThread (computing)Event horizonRead-only memorySet (mathematics)State of matterRight angleObject (grammar)Moment (mathematics)DampingCrash (computing)Proper mapMedical imagingTouchscreenHierarchyRead-only memoryBus (computing)Computer animation
05:45
Online helpLatent heatView (database)HierarchyThread (computing)State of matterQueue (abstract data type)Event horizonCore dumpRead-only memoryProfil (magazine)Android (robot)Moment (mathematics)Read-only memoryDampingComputer animation
06:27
Graph (mathematics)Read-only memoryState of matterAndroid (robot)Computer fileRead-only memoryLevel (video gaming)Profil (magazine)Right angleLecture/Conference
06:59
Connectivity (graph theory)Read-only memoryRead-only memoryComputer animation
07:14
Information privacyString (computer science)State of matterInformation privacyCore dumpComputer fileMedical imagingLimit (category theory)Sensitivity analysisMessage passingMobile appDigital photographyCartesian coordinate systemRead-only memoryComputer animation
08:04
String (computer science)RepetitionCrash (computing)Medical imagingTwitterFacebookRead-only memoryService (economics)Message passingCASE <Informatik>Computer fileCartesian coordinate systemCore dumpRead-only memoryServer (computing)Token ringGroup actionLecture/Conference
08:43
Read-only memoryCore dumpToken ringRead-only memoryKey (cryptography)Point (geometry)Core dumpRead-only memory
09:09
Read-only memoryCore dumpLimit (category theory)Read-only memoryProfil (magazine)Electronic mailing listSoftware developerLecture/Conference
09:31
Read-only memoryCore dumpJava appletProfil (magazine)Computer fileLimit (category theory)Electronic mailing listPoint (geometry)CASE <Informatik>Open sourceAndroid (robot)Integrated development environmentCore dumpAxiom of choiceRead-only memoryFile formatData conversionUsabilityTransport Layer SecurityRead-only memoryNP-hardCodeComputer animation
12:27
IntegerString (computer science)Primitive (album)Information privacySocial classArray data structureResource allocationCASE <Informatik>File formatPasswordKey (cryptography)Library (computing)Computer fileNumberPoint (geometry)InformationString (computer science)Information privacyChecklistCompact spaceRaster graphicsRevision controlRead-only memoryIntegerArray data structurePrimitive (album)LengthCommunications protocolFunction (mathematics)Variable (mathematics)Open sourceCore dumpBinary fileMereologyServer (computing)Profil (magazine)Proof theoryCircleCellular automatonRight angleArithmetic meanHash functionSpacetimeView (database)Multiplication signAndroid (robot)Streaming mediaBuffer solutionLecture/ConferenceComputer animation
16:54
Resource allocationDigital photographyServer (computing)Image resolutionCASE <Informatik>Point (geometry)Lecture/Conference
17:13
Data conversionServer (computing)Read-only memoryCrash (computing)Core dumpMultiplication signCartesian coordinate systemLimit (category theory)Computer fileBitDisk read-and-write headMereologyFile formatServer (computing)Point (geometry)Proof theoryMobile appComputer animation
18:55
Data conversionServer (computing)Crash (computing)Latent heatDivisorCASE <Informatik>Software bugCore dumpCausalityCrash (computing)Computer configurationComputer fileRead-only memoryInsertion lossComputer animation
19:28
Data conversionServer (computing)Crash (computing)Inclusion mapLatent heatOpen sourceCore dumpProduct (business)Presentation of a groupCodeComputer animation
19:50
Data conversionServer (computing)Crash (computing)Inclusion mapLecture/Conference
Transcript: English(auto-generated)
00:05
Well, hi everyone and welcome. My name is Eric Ondrej and I'm an Android developer at Badoo. And I'm here today, just like I said, to talk about Memory Dams and how we can make Memory Dams a useful tool for developers
00:21
for development and debugging. But first of all, let me just say a few words about the company I work for, Badoo. Awesome. Let me go back a couple of slides. Damn it. Okay.
00:42
There we go. So Badoo. So Badoo is a social networking platform. We're quite big, even though you might not have heard of us. We have 240 plus million users worldwide, 190 countries, got offices in London and Moscow. It's a really awesome place to work and yes, we are hiring.
01:02
Yeah, on Android we're pretty big. Got about 50 million downloads, Google Play Store. Anyway, that's a sell. So let me talk about Memory Dams now. But first of all, I just want to get a feel for the audience. So if you have ever worked with Memory Dams on Android in any way, maybe you can raise your hands.
01:23
Whoa, that is more than I expected. So that's really cool. So probably you know what a Memory Damp is. I'm just going to give you a super brief overview anyway. So, wow. This works so much better than when I practice this. Let's go back a couple of slides again.
01:41
Right, so a Memory Damp. So a Memory Damp is usually a huge file with a lot of stuff in it. And as the name implies, it's a dump of your application memory of your app while it's running. And there are basically three important things here. There are actually a lot more, I'm not going to go into details, but there are three things in there that are extremely useful
02:02
while you're looking at a Memory Damp. First of all, we have all those objects. All the objects that your application created while it's running. However, the instances themselves, the objects, are not very useful if you don't have class information. So that's like the second most useful thing. The class information tells you what fields are what,
02:22
and the relationship between classes and so on. Third one is strings. The strings, you need to make sense of the Memory Damp at all. So basically the strings will tell you what are the names of the classes, what are the names of fields and so on. Of course, you might be running obfuscation, most you probably are,
02:40
but at least the string identifies the class. So without these three things, a Memory Damp is pretty useless. So this is very much the core. And actually right now, what we're using Memory Damp for, I can pretty much guess what all of you have used Memory Damps for. And it's probably going to be related to solving out of memory issues.
03:06
Well, either that or trying to hunt down a memory leak. So these are the two main use cases for working with Memory Damps. And as you can see, it's pretty much one, well, it is basically one use case. It's all related to memory leaks and memory problems.
03:22
And this is something I've been working with quite a lot. Not that we have a lot of problems we might do, but it's something that I spend some time on. And while I was doing that, I got to thinking about what we could do more with Memory Damps. Because like I said, in a Memory Damp we have a whole lot of stuff, right?
03:40
It is a complete image of the application. So why aren't we doing more with Memory Damps? Right, why not more? And at least the idea I got while working with this was that why don't we use Memory Damps to analyze any other kind of problems as well,
04:00
not just stuff related to running out of memory. For example, you got your null pointer exception somewhere or your legal state exception. Why don't we analyze those with Memory Damps instead of just other memory issues? And also there is a lot of cool Android, very Android specific information in your Memory Damps.
04:20
For example, you have your backstacking information. That's just objects in your Memory Damp. And using this we can know exactly where in the application the user was when a crash occurred. You can compare this to the data we usually have when we analyze a problem. Well, usually we have maybe a stack trace and logs, if we're lucky.
04:44
And sure, this will tell you exactly where it crashed, but it won't really give you an image of what happened before. There are some ways around this, of course, if you're using tools like Nokia, for example. But you still have a very vague idea
05:01
what happened right before the crash. And Memory Damp, however, will tell you exactly what went wrong. So besides this, you also have stuff describing exactly what's on the screen right at the moment it crashed. You have your full view hierarchy.
05:21
So basically it's just a set of objects there as well. And besides the purely visual stuff going on on the screen, you also have the full state of your handlers, loopers and so on. If you're using an event bus, for example, you can see what's going on in my event bus, what is being queued up right now, what's waiting to execute.
05:41
Maybe you have a problem with, for example, a thread pool that gets full where stuff isn't executing properly. So this is a lot of cool stuff. And like I said before, we're using Memory Damps for other memory issues. So why aren't we using it for this?
06:02
And it turns out that there is a very, wow, that's very dark. There's a very, oh, thanks. There's quite a simple reason why we aren't already using Memory Damps for analyzing these other kind of problems. And it all connects to the kind of Memory Damps
06:23
we use on Android at the moment. So you're probably familiar with age profiles. So if you're in Android Studio, you take a Memory Dump, you get an age profile, right? And if you looked at an age profile, you will see that it's big. It's really, really big.
06:42
On a modern Android device, you can have a file that's several hundred megabytes big. At least if you reach a state where you run out of memory. And one thing about these really big files is that on most mobile devices, you are probably going to be connected with a 3D or 4D connection if you're quite lucky.
07:04
And this means that that Memory Dump, the only feasible way to get a Memory Dump off a device is to actually have physical access to the device. So file size is really a big limitation here. However, there is some other limitations connected to Memory Dumps that aren't purely technical.
07:22
And this is something that's very important, especially to your users, but also to us as developers. And that is that there are some really important privacy issues connected to the Memory Dumps. So like I said, a Memory Dump has your full application state. And this could include a lot of sensitive data.
07:41
Just a quick example. In Badoo, we let our users send or receive messages and private photos. And these images might be quiet, you know. Well, they might be very personal. And we would like them to stay personal. And when it comes to storing them on the server
08:00
and in the app itself, we are very careful so that they're always protected and no one else can read them. However, if you say the application crashes and we were to take a Memory Dump, these images might actually be right there in the Memory Dump file. That means that we need to be as careful with Memory Dumps, in that case, as we are with protecting the data
08:22
while it's on the server and in transit. And it's not only images and messages and so on, but this very sensitive data could be things like access tokens as well. I'm sure most of the application developers out there are using third-party services such as Facebook, Twitter, anything using OAuth.
08:43
And one thing they have in common is that you have some kind of access token or a key. And this key is usually a string. And if you are going to be using it, you will be having it in memory at some point or another. That means that this will probably be in your Memory Dump as well.
09:06
When I got to this point, I had some cool ideas of what I wanted to do with Memory Dumps, but there seemed to be a lot of limitations that meant that, well, basically, it wasn't impossible. It was impossible to do it with age profiles.
09:21
The thing is that I'm a software developer and an engineer, and when I see a list of problems, that's when I get really excited. Because if things are easy, they usually aren't very fun. And in this case, it seemed almost, well, pretty hard. So what I did at this point was that I sat down and I wanted to figure out
09:41
how to create a better Memory Dump format. And it turned out it was actually super easy, because I already had a list of the limitations, and it turns out that a better Memory Dump format would basically take these limitations and, well, turn them upside down. So let's take a look. What is a better Memory Dump?
10:01
So for me, at least, a better Memory Dump would be a Memory Dump where you have very small files. And this might seem very counterintuitive, because if you have a very small Memory Dump, you don't know what happened with all that useful data that you had there in the first place. So that means that you need to have a very small file,
10:22
but you still need to be able to keep all the data that is relevant for you as an Android developer. And it turns out that in HPro files, for example, there is a lot of stuff that isn't relevant for you as an Android developer, because this file format was created some 20 years ago for desktop Java.
10:40
And while, sure, Java is Java, right, but it turns out that desktop Java and Java on Android are quite different beasts, and there's a lot of data there that isn't relevant at all for us. Another thing is that I wanted to have a file format that is easy to read and write.
11:01
And of course, since I was going to implement it, that's quite obvious, but there's also another big reason why you need a file format that is easy to read and write. And the reason is that, like I said, on Android we have H profiles, and since that is what Google is doing, we're pretty much stuck with it. And this means that if you want to make use of another file format,
11:22
you need to do a conversion at some point. And the place where you need to do this is going to be on the device. Why? Because the whole problem is that we need to have a small file format so we can move it off the device. So you have no choice. It has to be done on the device. And this means that you're working in a very resource-constrained environment.
11:43
So you need to be able to convert these files, which are several hundred megabytes big, do it on the device without draining the user's battery, without affecting the user experience in any way. So this is actually quite a challenge. And since this talks only 20 minutes, I'm afraid that I can't show you all the code.
12:02
But let me just say that all this is open source. It's on the Badoo GitHub, and you can check it out, play around with it, have a lot of fun. But yeah, that leads me on to the next step, what I created. And I'd like to introduce at this point something called DMD,
12:21
which stands for Badoo Memory Damps. It's not the most imaginative name, but supposedly naming things is sometimes as hard as actually implementing it. And it was in this case. But at least I made it in the Badoo colors, so that's quite something. So what is DMD then?
12:40
Well, DMD is basically the last checklist you saw there. It's all those things. It's a very small, compact file format. And how did I achieve this? Well, there are two parts to it. First of all is to use very efficient coding techniques. And the main key to doing this
13:00
was borrowing something some very clever people implemented. In this case, Google. Because they already have a very good binary format called protocol buffers, which we actually were already using at Badoo to send and receive data from our servers. And protocol buffers, it uses some very cool techniques for storing data.
13:20
For example, something called variable length integers, which is a way of storing integers using as few bytes as possible. That meant that you can go from hprof, where you might need four bytes to store a very small integer to just one byte. And in some cases, this meant that we could store a fourth or less of the data needed.
13:42
But actually the biggest improvement was simply to throw away data we don't need. And the biggest gain here was to actually throw away a lot of primitive arrays. So that is one thing. If you ever looked at a memory dump, usually what you will see is that one of the biggest contributors to the file size is byte arrays.
14:02
And in Android, byte arrays are usually bitmaps. Sometimes it could be useful to see what's in the bitmap, but most of the time, you only need to know how big the bitmap is. And in this case, we could just discard all the bitmap data and keep information about how big it was in the first place. Secondly, strings.
14:21
Like I mentioned, strings are, well, they both take space, but they also are a privacy concern in some cases. So in BMD, instead of keeping the whole string data, we are throwing away the string itself by keeping a hashed version of it. And this means that, first of all, we eliminate all the privacy issues,
14:43
but secondly, if you have a hashed string, at some point you can still recover the actual string. I'm not going to go into too much detail, but this is usually how passwords are stored in a fairly secure way. And this means that at some point, if you want to convert it back, that's actually possible.
15:03
So basically the key here for creating an efficient memory dump is keeping only the information that you need. And... Oh, five minutes. And this means that... Oh, actually, sorry. So the thing about keeping all of the data you need
15:21
is that what I need might not necessarily be what some other developer might need. But that's the beauty of open source in this case, because the libraries we have created for converting to and from this file format are very flexible. So if, for example, you need to have more information about views, for example, that's quite an easy tweak to just preserve more data
15:44
while still keeping a pretty small file size. Anyway, I know you want to see some numbers, right? Everyone loves numbers, so let's look at numbers. Oh, sorry. Yes, one more point. Anyway, so comparing BMD and HPROF.
16:02
There's a lot of numbers here, but there's one that's circling red because that's the one that really sells this whole thing. 1.8%. Doesn't sound like much, but this is actually how big the BMD file is compared to the original HPROF file. So in this case, we started out with a 210 megabyte file,
16:22
and in the end, you have a file that's only 3.8 megabytes big. This is still not tiny to be sure, but this is actually before you then compress it. Because when we have been using this, we add a second step where we take the output file,
16:40
run it through a gzip output stream, and in this case, you actually end up with a file that's only 1.8 megabytes big. And this actually means that we have a file that is, while still fairly big, it is small enough that you can transmit it from the end user's device to your servers. How do we know that this is the case?
17:01
Well, it's pretty simple. At Badoo, we already allow our users to upload all their high-resolution photos. So we know that 1.8 megabytes, it's not really an issue. That works quite fine. And this means that we actually open up the door to a lot of cool practical applications. And this is pretty much what I mentioned at the start.
17:23
This means that you could potentially collect memory dump for all the crashes that occur in your application out there in the wild from live users. However, this means that you might probably be drowning in memory dumps because while we have removed all the technical limitations,
17:44
there's still another limitation for analyzing memory dumps, and it's this thing right here on top of my head. Or rather, it is my head. And that is that someone actually needs to analyze these files. So sadly, I don't have a tool yet for analyzing memory dumps and fixing problems based on it.
18:00
So you still need to do that part yourself. But let me just mention how the workflow would be for using this. It's really super simple as a developer to actually do this. And that is that basically the same way you're hooking up your hockey app, whatever, you need to set it up to collect a memory dump when a crash occurs, for example.
18:21
At this point, we just catch the memory dump, and the next time your application starts, we can convert it to BMD. And this would be done in the background quite slowly to not affect the user experience. However, when this is done, you can then upload it to your server, and then you can convert it back to HPROF. The thing is, it's a new file format.
18:41
There are no other tools for working with BMD, which means that you still have to convert it back to HPROF, which is a little bit slower process, but since this doesn't have to be done on the user's device, it can be a bit slower and be done in a leisurely pace.
19:01
Right, so like I said, you are probably the limiting factor in this case. So you might also want to add a layer of filtering to only target very specific crashes. Because in most cases, you will be very happy and fine with only having a stack trace and some log files. So this is something that you want to use as a last resort when you run out of options,
19:22
you have some bug that you really cannot find a cause for, then that's when you want to bring in the big guns, memory dumps. And yeah, like I said, it's all open source, it's easy to customize, and we would love someone to contribute to your product as well,
19:40
if you want to try. And with this, I would like to thank you all for sitting through this presentation without seeing any code whatsoever. I know it's disappointing. But there's also an article I've written about this which has more nitty-gritty technical details. So thank you all for listening.