Speed Science
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 10 | |
Number of Parts | 94 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/30706 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
RailsConf 201510 / 94
1
4
7
8
9
10
11
13
14
16
17
19
21
24
25
29
30
33
34
35
36
37
39
40
42
47
48
49
50
51
53
54
55
58
59
61
62
64
65
66
67
68
70
71
77
79
81
82
85
86
88
92
94
00:00
Video gameTheoryMultiplication signReduction of orderComputer animation
00:39
Software bugMachine visionCartesian coordinate systemWorkstation <Musikinstrument>Square numberLatent heatShape (magazine)BitSource codeComputer animation
01:24
Square numberCASE <Informatik>Computer programmingEngineering drawing
01:58
Endliche ModelltheorieGraph (mathematics)Observational studyBitComputer animationMeeting/Interview
02:29
Key (cryptography)Video gameDivisorComputer animationMeeting/Interview
03:02
Disk read-and-write headMusical ensembleBoiling pointSeries (mathematics)Multiplication signBuildingInformationSign (mathematics)Meeting/InterviewComputer animation
04:21
Service (economics)MereologyMultiplication signRight angleRow (database)Cartesian coordinate systemAnalogyResultantMass
05:07
AreaGroup actionTable (information)BefehlsprozessorInstance (computer science)Entire functionThread (computing)Process (computing)Computer animation
06:13
Concurrency (computer science)Set (mathematics)MereologyMassTheorySemiconductor memoryUniverse (mathematics)Process (computing)Computer animation
06:49
BefehlsprozessorTheoryMathematics
07:24
BefehlsprozessorOrder (biology)Disk read-and-write headGreatest elementTheoryMiniDiscDifferent (Kate Ryan album)SoftwareType theoryProcess (computing)Computer animation
08:20
1 (number)FamilyInsertion lossVarianceView (database)SoftwareStreaming mediaTwitterUtility softwareStatisticsComputer animation
09:10
MiniDiscMassSlide ruleCycle (graph theory)Computer animation
09:41
Computer fileMiniDiscAreaThread (computing)MereologyComputer animation
10:15
BitMereologyHypothesisResultantWordSlide ruleTheoryQuicksortComputer animation
11:00
Bit rateDependent and independent variablesInvariant (mathematics)PlanningHyperbolischer RaumFood energyPower (physics)Computer animationMeeting/Interview
12:14
Metric systemView (database)Musical ensembleBefehlsprozessorBit rateComputer animation
13:05
Formal languageTheoryMereologyString (computer science)Process (computing)SpeicherbereinigungSemiconductor memoryLogical constantComputer animation
14:24
String (computer science)Physical lawComputer animationLecture/Conference
14:59
Object (grammar)Bus (computing)Power (physics)CASE <Informatik>Computer animation
15:35
Exception handlingFunction (mathematics)String (computer science)Information securityBus (computing)SpeicherbereinigungAreaVarianceCartesian coordinate systemObject (grammar)System callRevision controlComputer animationLecture/Conference
16:59
SpeicherbereinigungWordError messageObject (grammar)Process (computing)
17:39
SpeicherbereinigungComputer virusComputer programmingSemiconductor memoryOperating systemOperator (mathematics)Basis <Mathematik>AreaSet (mathematics)ResultantComputer animation
18:16
Semiconductor memoryBitComputer virusError messagePeer-to-peerGoodness of fitDoubling the cubeStudent's t-testQuicksortComputer animation
19:00
Multiplication signDataflowSemiconductor memoryMobile appWordContext awarenessProcess (computing)Group actionForm (programming)1 (number)Uniform resource locatorWebsiteReading (process)ResultantVariancePhysical lawCombinational logicCodeCountingTotal S.A.SpeicherbereinigungSet (mathematics)Level (video gaming)Bus (computing)Computer animation
21:20
Multiplication signConfiguration spaceSemiconductor memoryWeb 2.0Structural loadOrder (biology)BuildingMetric systemNumberTouchscreenMobile appOperator (mathematics)SpeicherbereinigungOpen sourceUniverse (mathematics)WordObject (grammar)Revision controlLevel (video gaming)CASE <Informatik>Insertion lossInstallation artComputer animation
23:11
Software frameworkMereologyCartesian coordinate systemResultantAreaComputer animation
23:46
MathematicsType theoryFiber bundleSoftware bugRevision controlComputer fileEndliche ModelltheorieVolume (thermodynamics)Computer animation
24:19
NumberOrder (biology)CodeBootingHypothesisInsertion lossSemiconductor memoryBenchmarkRight angleGraph (mathematics)Profil (magazine)Server (computing)Computer animation
25:02
Metropolitan area networkMultiplication signDialectTheoryVideo gamePattern languageGraph (mathematics)WordSemiconductor memoryCountingAuthorizationBlogResource allocationGroup actionObject (grammar)Profil (magazine)Sign (mathematics)CASE <Informatik>Real numberProduct (business)Cartesian coordinate systemNP-hardComputing platformComputer animation
27:04
Endliche ModelltheorieGraph (mathematics)Semiconductor memoryGroup actionLimit (category theory)Cartesian coordinate system
27:38
HypothesisNumberCartesian coordinate systemLimit (category theory)View (database)Element (mathematics)Order (biology)Multiplication signStructural loadVideo gameComputer animation
28:16
CASE <Informatik>Inheritance (object-oriented programming)Semiconductor memoryLibrary (computing)ApproximationSet (mathematics)MultilaterationPatch (Unix)TheoryRandom walkComputer configurationComputer animation
28:58
Computer fileResultantOrder (biology)Network topologyPatch (Unix)MeasurementData storage deviceKernel (computing)BootingCartesian coordinate systemTrailCodeQuicksortRow (database)Block (periodic table)EmailEvent horizonProcess (computing)Social classPoint (geometry)Semiconductor memoryParsingSlide ruleMultiplication signPhysical lawWordLink (knot theory)MereologyAnalytic continuationLibrary (computing)Level (video gaming)Moment (mathematics)Insertion lossReading (process)FreezingVideo game
32:45
Context awarenessSoftware bugWordNumberParsingCartesian coordinate systemEmailBootingSemiconductor memoryBit rateRevision controlValidity (statistics)MereologyPower (physics)Video gameXMLComputer animation
34:05
Water vaporStructural loadCartesian coordinate system5 (number)XML
35:01
Revision controlProcess (computing)Different (Kate Ryan album)Real numberBenchmarkSoftware bugCASE <Informatik>
35:36
CodeWordMultiplication signXMLLecture/ConferenceComputer animation
36:07
Multiplication signPhysical lawComputer animationXMLMeeting/Interview
36:40
Cartesian coordinate systemRight anglePlanningEvent horizonCodeComputer wormComputer animationMeeting/Interview
37:15
Computer fileBranch (computer science)SummierbarkeitTheory of relativityMultiplication signSource codeComputer animation
37:46
Boss CorporationCuboidSoftware developerMultiplication signInternetworkingPatch (Unix)Library (computing)Traffic reportingBenchmarkArithmetic progressionTwitterCode refactoringSource codeAdditionWeb 2.0Revision controlInverse element1 (number)Right angleUniverse (mathematics)Goodness of fitStudent's t-testBasis <Mathematik>QuicksortGraph (mathematics)MereologyTheory of relativityContext awarenessMeeting/InterviewComputer animation
Transcript: English(auto-generated)
00:01
I hate to disappoint everyone, the title of this talk is Speed Science. We will not actually be talking about methamphetamine production, but we'll talk about any science behind it.
00:25
So instead, we're going to be talking a lot about goals. Sorry about that. We're going to be talking a lot about goals, and today we're going to be talking, our main goal is going to be going fast, getting faster, and achieving more.
00:42
So one of the things that I like to think about whenever I'm dealing with any kind of performance problem is that slowness is a bug. This is really comforting for me because mostly in my application I deal with bugs every day, every hour, every minute, a lot. And the great thing about bugs that I really love is if you have a bug, you can reproduce it, and if you can reproduce it, then you can squash it.
01:06
So a little bit of audience participation here. Does anybody know what this shape is? Okay, so I heard triangle, any other? Very specific.
01:23
So I hate to tell you, but this is actually a square. This is a speed square, it's used in woodworking, it's used for making fast angles, here you can see making fast angles, and square edges. So from now on for the rest of this talk, this will be a square.
01:40
You've been informed. So programming has our own speed square, it's slightly different, and it works kind of like this. We've got IOCPU and RAM, and these are all resources that we can trade for speed. Similarly, we can trade one resource for another if we're overbooked on one. So, okay, how do we make things fast in general?
02:02
Well, first we're going to find a bottleneck. On a side note, I highly, highly recommend if you don't have an industrial engineering background, or even if you do, to check out this book called Goal by this person who I can't pronounce their name. So just Google Goal in the book, and then that's all I'm going to say about that.
02:21
Once we've found a bottleneck, or even to find a bottleneck, we can use a scientific method. It's going to look a little bit something like this, we're going to touch on it later. So, I kind of started off relatively quickly. For those of you who don't know me, my name is Richard Scheeman, or Scheems. I really, really, really love Ruby.
02:40
And as KiwiHarmon once famously said, if you love something, why don't you marry it? And so I'd like to introduce you to my wife, Ruby. She is actually a pipeline programmer. Like a house divided.
03:01
And we're actually really excited to announce that we're having a child. So my baby's coming in June, we've been talking a lot about names. You know, is it going to be a boy, is it going to be a girl? And I was like, alright, if it's a girl, we have to name her Pearl.
03:27
And then we were like, well, no matter what, the middle name should be something we can both agree on. And we decided on pseudo. And we'd be like, pseudo finger room.
03:42
Pseudo finger room. Alright, so I worked for Baroku on the Ruby build pack, among some other things. And I'm not going to have time for questions at the end of this, but we will have a booth that's going to be open up tomorrow. Please come by the booth, I will be there online.
04:00
Also, Kuji Sasada has a talk tomorrow, please go to that. It's going to be really good. Also Terrence Lee has a talk, so please check this out. Also, I'll be doing a book signing. Come, I wrote a Ruby book, and I'll be giving out free copies. If you come by the booth, you can get some more information. So, I also have Rails Commitment, and I've run a really small conference down in Austin called Ruby Weird.
04:25
By myself, and not with Caleb Thompson sitting in the front row wearing a Ruby Weird shirt right there. Okay, so I'd like to start out with the story, if I may.
04:42
Once upon a time, there was an application, and I was working on this application. It had some requirements, it did some things. It downloaded a really large zip file, it then depressed the file, and we did some work on that file.
05:00
And that was pretty much it, that's the gist, that's really all you need to know. Was this thing fast? Does it sound fast? It does not sound fast. Taking a look at our speed square, we can kind of look at those individual things and say, okay, what resources are we using? It's like, okay, we're using IO, we're using IO and CPU.
05:21
We're also using random CPU for the processing. So, first of all, I had a hypothesis, and I was like, okay, this thing isn't fast, let's use some more CPU. Roku has this really nice thing, this really neat feature that we launched, I have no idea when, but relatively recently, called the PX dyno.
05:41
It's got 60 gigabytes of RAM, and one PX dyno is actually an entire AWS instance, which is pretty cool. And the benefit of this is you get a lot, there's no sharing on it. We're not going to subdivide up that AWS instance into other smaller containers, so you get less noisy neighbors. Once we have this, I've now got a bunch more resources at my disposal,
06:04
but now I need to take advantage of them. To do that, we can use more threads, and more threads is going to be more CPU load. Has anybody ever used Sidekick? There's going to be a bunch of rhetorical questions. A lot of hate raising.
06:21
Get that blood flowing. Okay, cool. Well, great, so you're familiar with Sidekick background workers. The first thing I did with this, I was like, oh, I have this huge box, I'm going to set Sidekick concurrency to something huge, like 30. And at the same time, we're going to be downloading, unzipping, and processing with 30 threads, kind of all at the same time.
06:43
And when we did this, well, so it was definitely faster than just sequentially doing it, but it was still pretty slow. Why is that? So my next theory was that we still had unused CPU. I was looking at some metrics, and like CPU, we still had plenty of it.
07:05
And I was like, okay, great, let's change this from 30 up to 60. Anybody know what happened? Okay, well, yeah, it got slower.
07:24
When this happened, I was like, come on, you've got to be kidding me. So it turns out that we weren't CPU bound, so we can strike that off. We go back to our little diagram, and we say, oh, what else do we have a lot of? What else are we doing a lot of? And it turns out that we have a lot of IO in this process.
07:41
So there are different types of IO. On the first one, we have network IO, so we're actually downloading the file. We're hitting, we're using the network adapter. On the second one, we're only using the disk. So my first theory was like, hey, networks are inherently slow, so that's got to be the problem. To test this out, I tried a bunch of different solutions.
08:02
I tried a bunch of different libraries, curl, curve, shelling out, typhoius. And then I came to try the thing that all Rubyists advocate when they're talking about making things fast. Hyper-caching is not quite, really what everybody advocates is using Go.
08:23
That is the way to make Ruby faster, according to 90% of people on Twitter. By the way, 70% of statistics in this talk are going to be made up on the spot, including this one. Okay, so one of my co-workers has a really great utility called HTCAT.
08:42
And this is written in Go, and it will parallely download and stream large GIF requests. It's really cool. There are other libraries like it. But one of the neat things about this one is people actually start piping. They will start streaming to the pipe, similar like if you were going to cap a file, you can pipe that to something else.
09:01
And they will start doing that completely as soon as they can. Unfortunately for us, and maybe for a lot of people on Twitter, Go is not the answer, and network IO is not the problem. So that kind of leaves us with really one thing, and that's disk IO. In this case, it was pretty simple to test that out.
09:22
So I had a psychic currency of 30, and I changed that to 4. And then, everything was like screaming fast, like really fast. Previously, we were not able to keep up with the queue, and then all of a sudden, it's like we've got spare cycles.
09:41
And what actually happened was, our hard drive, the disk, was at max read-write capacity. We were downloading that file and putting it onto the disk, and we were unzipping it, and basically copying from one area of the disk to an even larger area of the disk. And we had all these threads just kind of sitting there, idle, as if you have a dinner party of 30, and you only have four forks,
10:04
and everybody's just grabbing that fork and trying to eat as fast as they can, but nobody can get anything done because there's a limiting resource. So in the end, we actually ended up using 2X dinos, and the moral of this story is that you can blindly try things all at once, but it's not very scientific.
10:24
It doesn't, it doesn't, it will work eventually, but maybe there's a little bit of a better way. So, going back, I had a slide of the scientific method, and we could take a look and see, here's the things that we kind of did.
10:42
You've heard me use the words theory and hypothesis. I tried something, and then we took, we observed from that, and then kind of repeated the whole thing over and over and over again until we got the results that we wanted. Notice that we are sort of missing out on this one, on this, this research one, which is, which is pretty critical.
11:00
So one thing that is very, very important in all of science is that it's repeatable. Does anybody know who these two people are? Fleischmann and Bons, wow, that's great. I was not expecting anybody to get that. Okay, so this is Fleischmann and Bons, and this is not cold fusion.
11:26
This is, so cold fusion is the idea that, so fusion is the thing that powers the sun, and the idea of cold fusion is that you could have that at temperatures less than the sun, which would be great, and it would require lots of energy. And these people claimed and actually published and said, hey, we've done it, we've done it.
11:44
And unfortunately, it wasn't repeatable, and lots and lots of people tried it. And it is good in the sense that it's kind of showed that this whole peer review thing can work and does work, but it's, they are, unfortunately, the poster children of making these kind of hyperbolic scientific claims
12:04
without actually using science. So they did not use good science. We, on the other hand, are going to use good science. The best one is. We are going to measure and benchmark. So Broku did add some metrics.
12:21
This is an older view of what our dashboard looks like. We explicitly call out speed, throughput, and then CPU and RAM. And the interesting thing for me is you can actually correlate and be like, oh, hey, you know, CPU used to shot up here, and then like throughput, you know, took a nosedive and whatnot.
12:40
And this will tell you why you're slow or what exactly is going on, but it will give you a high-level metric that you can use to then ask a question and get a low-level metric. So that's kind of our basic research. Again, we want to reproduce the slow. So RAM is the one thing that we haven't really touched upon.
13:04
Ruby is typically going to be RAM-bounded. And this is because Ruby is a managed language. So some of you might already know this, just bear with me for a little bit, but this is crucial to understanding the rest of my talk.
13:22
So Ruby uses a garbage collector. And if you have questions about the garbage collector, Kuji is currently here. He's great. Broku, Ira, and Nobu, and Matt work on MRI full-time, which is pretty cool. So the garbage collector is going to allow us to do things like just say,
13:42
here's a string literal, I want a string, without having to say, all right, we're going to allocate, you know, five bytes for this string. A lot of people aren't really familiar, intimately familiar with different ways that Ruby can consume more memory. So one of the most obvious and what people think of when they think of memory use in Ruby is retained objects.
14:05
So here we've got a loop, and we're just looping around. We've got this constant. Constants are never garbage collected. They're global. You can collect something that's global. And we're just adding a bunch of strings to it.
14:20
And when we do this and we take a look at the process memory, it's 7.32 gigabytes, which is kind of large. I guess it is 100, is that 100 million? Yeah, that's 100 million strings. So why is it so large? So Ruby allocates chunks of memory, and then in those chunks of memory,
14:44
it has slots where it can put objects. So as we are going through and we're looping, it is going to put one, two, three, four, five, six, seven into different slots. It needs one slot for each of those. Whenever we're out with slots, we need to allocate more slots.
15:02
So then now we have twice the number. You can think of this, I've chosen the bus analogy, and kind of a shipping container, really anything you can put things in. And so now we can actually go all the way up to 14, but we have to go up to 100 million. So we just keep on allocating more and more and more until we get enough.
15:22
So this, to me anyway, makes a lot of sense where we say, okay, Ruby's using a large amount of memory, and here's where it's coming from. We need this. It is a retained object. So the second way, which is a lot less intuitive, at least to me, is allocate objects. Here we're doing the exact same thing, except for instead of storing it as an array,
15:43
we're just going to output it out for the string. We use it and throw it away. This is only going to take 21 megabytes, which makes sense. We're not actually doing anything. We fill up our bus. We are going to run garbage collection,
16:01
and our garbage collector is going to say, hey, it doesn't look like you actually need any of these objects. 234567, guess what? You're not going to use it again. So we can get rid of it, and we can put another object there. And this is a very oversimplified version of garbage collection, but it won't help us later. So it looks like garbage collection is going to save us from everything forever,
16:24
and it's the perfect answer to all things. If that was true, I wouldn't be up here. So here's another example. It's an in-between. So what we're going to be doing is we're going to be looping through a million elements again. We're going to be adding them to an array.
16:42
The interesting thing is we're going to not return that array. We're not going to do anything with it. So it's not a constant. Nothing has a reference to it. And then we can call garbage collection on it. We can force a garbage collection. So what happens when we run this? Well, your application is still going to use 7.32 GB of RAM.
17:04
Which is kind of surprising, right? It's like, well, the garbage collector couldn't just get rid of all that stuff. It just throws it away. Well, to understand why, or I guess to verify this and prove this, you can take a look at gcsat. Maybe we can look at total freed objects.
17:22
And we can verify and say, okay, the garbage collector is doing its job. It did free 100 million objects here. So why is it still so large? Why is our memory so large? So we can't clear a slot while a reference to that item still exists. So what's happening is we've filled up our buses.
17:43
And after we've done this, we've gotten to 14 and we need to keep going. We're going to run garbage collection. And after the entire program has finished executing, yes, we can clear those slots. But Ruby never actually releases memory.
18:01
Ruby only memories whichever goes up. We basically assume that once we've allocated memory, we're going to need that for the future. Also, freeing memory to the operating system is an expensive operation. So even though we don't need those things anymore, we still are going to hold on to that memory.
18:25
So it's important not necessarily that it's retained forever, but just that it's retained for a while. So that array actually needed reference to all those things. And while the array was in memory, we couldn't clear them. So I do highly, highly recommend that you test everything.
18:41
Even these slides. As I was writing these, I realized I was double checking everything. I realized there was a giant mistake in them. So please, if you're interested in getting started with performance stuff, double check me. Verify that what I've said, that's good science. Peer review. So the experiment is super, super critical in benchmarking performance.
19:07
Never go in and be like, I think I know why something is slow. Please always double check. Okay, back to speed. How does this actually affect your Rails app? So you can kind of think of your Rails app
19:20
as a collection of retained memory. These are going to be things that will live forever. Or forever in a movie world. Such as an app-based connection, or your high-level Rails app, your controllers, any constants, any code that you've loaded. So that's going to have a base set of memory.
19:41
And on top of that, it may be larger or maybe smaller, we're going to kind of spike up and down, sort of like a level on a soundboard, this allocated memory. And our total memory usage is going to be the combination of both of those. So why this really affects you, it helps when we take a look at something like, let's say we've got a bunch of memory
20:02
and we're in a Rails app and our request starts. So we're processing, we're getting the database, we're processing, we're loading up templates, and we're processing and doing some other stuff, left turbos, rockets, three, processing. And so we're creating these objects
20:20
and while this request is still going, we still need them, so we can't clear them. So now we're out of memory, all of our slots are full, what do we do? Okay, well Ruby can run the garbage collection, but it looks like we need a reference to all of these objects. So we can allocate memory. So we add an extra bus onto our,
20:41
onto our memory, and then finally we're done. We're done with the request, we can deliver the response, and we no longer need a lot of those objects. Granted, this is definitely an oversimplification, but it helps. Eventually, we can go back and we can say,
21:01
all right, we're gonna run garbage collection and we don't need those objects. We can just mark them, or we can just get rid of them. And we do, we get rid of them, but you notice that our, our bus count is still high. So, Ruby is going to increase memory use without retention.
21:21
Even if you're not constantly retaining objects, even if that retention number stays the same, if you have one request that is, loads up, you know, millions and millions and millions of objects, and only one person ever gets that request, once in a day, guess what? Your app is gonna be pegged at that memory usage, and you're charged that memory usage
21:42
by the operating system, or until that process dies. So object creation takes, it takes up RAM, and it takes up time. I do also recommend checking out Generational GC. It's, don't have time to cover right here. So, okay. That was all relatively critical
22:02
for a customer story that I have. One day, it came into Barovia, and I mentioned I work on the Ruby build pack. It's this open source thing. Whenever you run, you push the Ruby master, and all that stuff flies by the screen. That's actually me sitting at a computer.
22:21
Like, furiously typing. So if anybody tries it right now, it won't work. Sorry. That's not entirely true. Terrence helps out. Quite a bit. So one of the things that we do is make recommendations.
22:41
We say, use this web server, or do this other thing. In order to do that, we have to provide configuration. And some of the things that we care about are some of the things that our customers care about, which makes sense. So one day, a ticket came in at regular support, and the customer said, you know, my app is slow. And the supporter looked at it, and they're like,
23:00
well, you know, I don't know. Maybe it looks like a Ruby thing. Like, we'll escalate it up there. And so I get to look at escalated tickets. And they said, the app is slow. I checked metrics. It looks something kind of like this. And you can see that the RAM they were using is just, like, way too high. So they're using swap. That's this red bar.
23:21
And that's actually indicating that we've used up all of the available RAM, and we're starting to use the disk, which is really, really slow. Like, really, really, really slow. So this is what I was thinking, is actually making their application slow. The next question, of course, is gonna be why.
23:40
And obviously, I asked the customer, and if you've ever worked in support, you've got answers like this. It didn't used to happen. And we didn't change anything. It's totally new. And then I'm like, we haven't deployed anything in, like, a month, so. Whenever these types of tickets happen
24:00
in Heroku, a lot of times, what people don't realize is that their gem file or the gem file that locked changed one of their, one of their dev's RAM bundle update something, and a minor version or a major version had some bugs in it, and they didn't realize that that was an issue or even think to look into it. So my number one hypothesis
24:22
was that RAM was increasing due to misbehaving code in a gem somewhere. We don't know. In order to test this out, we want to, I want it to be able to boot the app, hit it with requests, and profile the memory. In order to do this, I wrote
24:40
something called E-Rail Benchmarks. It allows you to actually do those things without having to run your server somewhere. You can do those in a single process, and it uses wrap-mock, and then we can do really neat things like wrap it in other tools like stack-prop and memory profiler.
25:00
So, E-Rail Benchmarks is going to boot the app, it's going to process the request, and then, conveniently enough, we're going to use memory profiler by Sam Safran, who is at this conference. Please go like him on high-five and be like, your memory work is awesome. Also, also, Koichi Sasada has
25:20
another gem called allocation tracer, which helps as well. So, in E-Rail Benchmarks, you can run ring-perf mem, and I have an application called codetriage.com. It helps people get started with open source. It sends you one GitHub issue per day, and this is just something I run so that
25:41
I can have something in production on the platform that I care about, and I get a real customer experience. So, for those of you who aren't paying attention, that was codetriage.com. codetriage.com codetriage.com sign up today.
26:02
And so when I ran this, I got something that looked like this. On every single request, here's all of the object counts of what we're using, and rack, okay, yeah, using a bunch of rack, using action pack, that makes sense, active support, it's like, whoa, a hashie. Where did hashie come from? If you know me, you know I have certain feelings about hashie,
26:20
certain blog posts about hashie. And it was really remarkable to me that hashie is using, everybody like gives active support a hard time, it's like more objects than active support. So, hashie is incredibly expensive, and in our case it was using lots of unneeded objects, and I said,
26:40
okay, I didn't put hashie in my project, I'm not using it. I looked at my gem file, the lock, and apparently it turns out that on-the-aunt uses hashie, which actually makes sense to the same author, hashie is the same author of on-the-aunt, it's like, oh, they use the same tooling. Originally I wanted to remove hashie from on-the-aunt, and the author of on-the-auntses like that might be a little
27:02
heavy-handed, so I basically found some hotspots where, and this is, we were not offing, this is every single request to any action in your entire application. Yeah, so I fixed it by just memorizing a couple items, and instead of having to
27:20
recreate the object, we just used the same one, so basically we are using up more memory by retaining that, but ultimately creating less objects, which is less memory than using lean-on. So I did this, and the application limit of our, send it over to the customer to try it out, the application
27:40
is still over the rain limit. So hey, hypothesis number two, we're still restricted to the gem file, maybe it's something else, maybe it's a bad gem using too much RAM at required time. Okay, in order to do that, how can we test it out? I had the customer send me the gem file, and the gem file's
28:00
out locked, and I was thinking, well hey, you know, how can we figure out which of these gems is, it's just like, if the gem just loads and retains a ton of code, like how can we know about that? Maybe it's not something on every request. At the time there was no tooling for benchmarking required memory use, and so
28:22
let's write one. Alright, the general concept is pretty simple, we're going to measure RAM before and after require is called, and we're going to be using a library called get-process-memory, and now one thing to note if you're like a super memory geek is that it uses RSS,
28:42
RSS stands for resident set size, and this is not memory, this is not, this is, does not take into account shared memory use. It is a close-ish approximation of memory, but and for our cases it's probably good enough. But yeah, it doesn't take into account shared memory. The next thing we're going to do is
29:01
we're going to monkey patch kernel.require because we can, it's awesome. Great. Yeah, monkey patch because you can. Okay, and then we're going to run it and we're going to get an output like this. It's going to spit out every single file and the associated cost, so here it looks like real patch made active record costs
29:22
1.49 megabytes, which is a little large, but I don't think that's where our problem is. Now that we know where all of our memory is, we can sort this and it looks like mail is using about 40 megabytes of memory, which is slightly larger than real patch makes 1.5. When we count it all up, it's one gem
29:42
accounted for 65% of all of the RAM at boot time. And we haven't even done any work. We haven't hit a thousand requests or anything. Remember in that slide I had with the retain on the bottom? This is the starting memory that you have to deal with, and anything you have after that is just going to go up.
30:00
So in order to debug further, we're going to have to dig deeper. Whenever you require one gem, it requires other files. And we need to see where exactly inside of mail is causing that problem, so we're going to use a tree in order to do that. Each node can have many children, each child has a boss.
30:21
It's going to end up looking kind of something like this. Where mail will load one layer, and that will load the next layer, and that will load the next layer. We have a tree class that we're going to store these things in, and then when we're done, we can sort the children and print them recursively. It's something that looks kind of
30:41
like this. The final process looks like this. We're going to instantiate a new tree, a new tree node, and this might be mail. So we've already required the application, and now we're going to be requiring the mail gem. We are using a stack in order to keep track of where we are, so the last
31:01
thing in the stack would be the application, and that is going to be the parent, and we're going to push the child node, which is mail, onto the parent, so we're saying that mail belongs to the application. Finally, we're going to add the mail gem and say, okay, now we're going to require everything inside of the mail gem, so we need to push
31:20
that onto the stack. We take a measure memory, we measure it before, we call require, and this is actually what goes down, and then this thing will be called again, and again, and again, and again, and again, until all of mail has been fully required. When it's done, we pop the last thing off of the stack, which at this
31:40
point in time would be mail, because everything else has been popped off of the stack, and we take a measure, a memory measurement again. Finally, we record the cost and store that. So one thing if you're interested in doing this yourself, it is really important to note the ensure block.
32:01
One thing I didn't realize, well, I realized, but didn't fully say again, was that a lot of people actually require files that don't exist. They expect that to raise back a load error, and if we're not ensuring that we're popping something off the stack and recording the memory, then in the event
32:20
that we try to try to require something that's bad, then this code just won't run. It'll be bad. Okay, the final result of all of this looks something like this, where we have we can see sort of the tree structure, we see application requires mail, require mail parsers, and we see, oh, it looks like mail parsers takes up
32:42
20 megabytes nearly, 19.24 megabytes. I opened up an issue. I raised awareness. One thing I'd like to point out is you don't necessarily always have to fix the fix of a bottleneck or performance bug whenever you find one, even just pointing out, hey, there is one. Raising visibility sometimes can be just as important, or
33:02
is just as important. And I didn't fix this, but somebody else did in Michael's mail, number 817. And basically they just said, so if you're not familiar, inside of mail, there's a parser that actually parses mail. Whenever you reply back to GitHub and you say, hey, this
33:22
issue looks great, and that gets parsed by probably Ruby code, and they pull out what you said and drop the rest of the text from it. So I'm going to do all that other stuff, and we'll actually generate new comments. So some applications do actually need a parser in there.
33:40
And mail switched over from using the Treetop parser to a Rable parser, which is much more efficient, much faster, but it uses the tradeoff of more memory at boot time. So if most applications don't use this, which is why our customer wasn't using this, most applications
34:00
are not parsing mail, so they don't even need it. So the fix was actually to just lazy load that, or if somebody really wants to load it and take advantage of copyright optimizations, then they can require before your application works. So, okay, what happened after we do this?
34:22
Well, the customer's frame usage dropped dramatically. It was no longer swapping, and instead of just kind of crawling along, they were now bats. They were like screaming bats, and they were giving me high fives over the internet, and they were like, great job, and I was feeling really good about myself.
34:43
You don't have to care about that, it's okay. I work remotely, so I work at home alone, so that was really like me giving myself high fives at my desk. So, okay, some of the takeaways is we want to remember to
35:01
reproduce the Splonus bug. These are bugs, they're problems. In this case, it was fixed with code, or it can be fixed with a different version sometimes. We want to definitely make sure to get visibility benchmarks and use these methods and use this
35:21
process of repeatedly asking questions, digging deeper, and gathering more and more metrics. kind of real quick, has anybody here ever made a bad code deploy on a Friday?
35:43
So, this will happen, and then you're like, oh, I'm going to deploy a lot fixed, and you're like, oh, I totally fixed it, and it's like whoa, and then your weekend is just gone, and your co-worker's weekend is just gone, and you have a lot of support,
36:02
and they're just like, oh, why didn't she deploy that? So, this happens to everyone, whether you raise your hand or not, it's kind of an eventuality that it will happen. So, I have a modest proposal um,
36:20
don't deploy on Fridays. Now, instead we can go shopping or ride roller coasters.
36:43
Okay, so that's not going to apply. Instead of just not doing any work, I'm proposing something I'm calling Monday-Friday. If your application is slow, maybe instead of complaining about it, just like, oh, why is this thing so slow, you could look into it on
37:01
Monday-Friday, you could run some benchmarks, you could maybe implement some of this, you could take a look at the slides. If you're constantly complaining you can't find files, you can't find methods, and the code base is a mess. It's like Monday-Friday, or if you are like constantly having to stop your, it's like, all this feature has to be delivered tomorrow,
37:21
and they come over to your desk, and you have like a thousand files open, and like ten branches checked out, and they're just like, what are you doing, and you're like, oh, I couldn't stand it anymore, I just had to refactor right now. Instead of doing that, Monday-Friday. A lot of times I hear companies wish more companies sponsored
37:41
open source, and guess what? Your company just sponsored you to work on open source on Monday-Friday. If you're not sold, it's okay. This is actually not just something I'm randomly recommending. It will give you more accurate deadlines, it's a well proven technique, it's called time boxing.
38:01
It will hopefully lead to less burnout, and if your boss is never like, man, I really want to hire more senior developers, I don't know a boss who doesn't say that these days, and you can be like, oh, you mean like developers that make refactorings and speed improvements and contribute to open source, you know, the libraries really, really well?
38:20
It's like, maybe you can have us do Monday-Friday and learn all of those things instead. So, it's not just as simple as doing this. Unfortunately, it never is. I hate to tell you, but if you're going to participate in Monday-Friday, you have to report your progress to your team.
38:40
You have to let other people know you're doing this. Report it to your team, report it to your boss. You're going to be amazed, so I actually kind of started doing this without really telling anybody. And then just after the fact, I would be like, hey, I made this patch to Rack or to Rails, or look at this thing, and then now, whenever somebody in the company is like, hey, I've got a question in Rails, they come to me. My boss is like,
39:02
hey, there's this critical thing that's failing for our customers. Why don't you take a look at that? And I'm like, wow, this is amazing. It's not even Funday-Friday. It's like, Funday-Tuesday. And I get to work on a new source. I am the luckiest guy in the world. So, in addition to reporting to your boss, please report to Twitter
39:21
hashtag Funday-Friday. There are a lot of other people just also randomly tweeting Funday-Friday, so it's okay. If you mentioned to me that I'll retweet it, and hopefully, if you see somebody doing this, give them love and be like, even if they're like, hey, I spent eight hours
39:42
looking at benchmarks and got nowhere, that's progress. Or they're like, or maybe they're like, I signed up for CodeTriage.com, CodeTriage.com, CodeTriage.com. That is awesome. So please, please get love, get love.
40:01
We're a little excited. We've had a lot of fun today. Thank you very much for coming. I want to leave off with two relatively serious questions, and this is silent meditation. Don't scream out any answers or raise your hand
40:21
or anything. So, thank you very much.