Pointers for Eliminating Heaps of Memory
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 66 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/46605 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Ruby Conference 201831 / 66
5
10
13
14
17
18
21
22
26
29
37
45
46
48
50
51
53
54
55
59
60
61
63
65
00:00
Pointer (computer programming)Semiconductor memoryVideoconferencingStandard deviationLibrary (computing)MereologyMusical ensemblePointer (computer programming)Streaming mediaCodeBitLibrary (computing)Configuration spaceGame theoryCASE <Informatik>Semiconductor memoryMemory managementRule of inferenceStandard deviationTrailMathematicsInternetworkingImplementationJSONXMLComputer animation
02:12
Different (Kate Ryan album)YouTubeFacebookComputer-assisted translation
02:41
Point (geometry)Power (physics)Semiconductor memoryPatch (Unix)Cache (computing)Directed setCodeStack (abstract data type)Principal ideal domainCore dumpBlogContent (media)Thread (computing)FreewareTotal S.A.TorusSystem callResource allocationReading (process)Mathematical optimizationString (computer science)InformationComputer programmingString (computer science)Address spacePhysical systemFreewareResource allocationPatch (Unix)2 (number)Semiconductor memoryMathematical analysisCartesian coordinate systemMultiplication signLine (geometry)BitComputer fontType theoryMathematical optimizationWritingRadical (chemistry)Storage area networkVideo gamePoint (geometry)Process (computing)Content (media)Order (biology)EmailObject (grammar)Regular graphComputer-assisted translationComputer fileCodePie chartWindowMemory managementBranch (computer science)Hard disk driveSpeicherbereinigungLetterpress printingData loggerPrincipal ideal domainSequenceShared memoryDirection (geometry)Buffer solutionSystem callStack (abstract data type)Programming languagePower (physics)Functional (mathematics)RandomizationVariable (mathematics)Integrated development environmentObject-oriented programmingSpacetimeComputer animation
11:39
Rule of inferenceString (computer science)Computer fileArray data structureCache (computing)CodeData structurePrice indexLoop (music)Object-oriented programmingComputer multitaskingExtension (kinesiology)AlgorithmHash functionImplementationMeasurementResource allocationTurbo-CodeLoginObject (grammar)String (computer science)Resource allocationComputer fileMultiplication signRule of inferenceSemiconductor memoryNumberGraph (mathematics)System callParameter (computer programming)Order (biology)Pointer (computer programming)Structural loadHash functionObject-oriented programmingQuicksortBitMathematical optimizationBuffer solutionCartesian coordinate systemCache (computing)Computer programmingTrailPoint (geometry)Shared memorySampling (statistics)Electric generatorMeasurementFunction (mathematics)Arithmetic meanData structureAlgorithmPatch (Unix)Functional (mathematics)Key (cryptography)Fitness functionArray data structureSearch algorithmFrequencyGaussian eliminationPoisson-KlammerRevision controlReading (process)Radical (chemistry)Right angleCASE <Informatik>
20:19
Closed setRevision controlNormal (geometry)Price indexData structureDirected setStack (abstract data type)CodePhase transitionProcess (computing)Source codeElectronic mailing listAbstract syntax treeLinker (computing)BuildingMathematical optimizationMountain passTranslation (relic)Object-oriented programmingIcosahedronCompilation albumComputer virusCodeElectronic mailing listLink (knot theory)Address spaceOperator (mathematics)MathematicsBytecodePhase transitionComputer programmingPoint (geometry)CASE <Informatik>IntegerGoodness of fitNumberProcess (computing)Stack (abstract data type)BitArray data structureWaveObject-oriented programmingSequenceMathematical optimizationVirtual machineDirection (geometry)Control flowInheritance (object-oriented programming)String (computer science)Semiconductor memoryLevel (video gaming)ResultantCompilation albumFunctional (mathematics)RecursionNetwork topologyObject (grammar)Product (business)Multiplication signPatch (Unix)Type theoryRun time (program lifecycle phase)Source codeBinary codeImplementation3 (number)ParsingSheaf (mathematics)Data structurePointer (computer programming)Abstract syntax treeHigh availabilityVirtualizationComputer animation
29:00
CodeComputer virusImplementationStack (abstract data type)Multiplication signObject-oriented programmingResource allocationSemiconductor memoryData storage deviceSequenceArray data structureInformationElement (mathematics)NumberGraph (mathematics)Channel capacityVirtual machineVirtual realityLoop (music)Letterpress printingParameter (computer programming)Object-oriented programmingSequenceString (computer science)BytecodeArray data structureObject (grammar)SpeicherbereinigungComputer programmingMultiplication signBitFunction (mathematics)Term (mathematics)Resource allocationParameter (computer programming)Point (geometry)1 (number)Reduction of orderSemiconductor memoryLine (geometry)Different (Kate Ryan album)Video gameCartesian coordinate systemCASE <Informatik>Mathematical optimizationAddress spaceIntegerStack (abstract data type)SpacetimePointer (computer programming)Channel capacityLetterpress printingElement (mathematics)CodeOperator (mathematics)NumberBootingResultantMemory managementComputer clusterRun time (program lifecycle phase)Software bugLevel (video gaming)Green's function
37:40
Semiconductor memoryProcess (computing)Structural loadCodeCompact spaceCodePixelPatch (Unix)Reduction of orderProcess (computing)Memory managementSemiconductor memoryResource allocationSpacetimeLevel (video gaming)XML
38:49
JSONXMLComputer animation
Transcript: English(auto-generated)
00:02
For my less embarrassing bio, does anyone else think it's inappropriate that we're
00:25
having a Ruby track inside the crystal ballroom? I'm very nervous. So this is a hamburger hat.
00:40
And for those of you watching the live stream, this is a hamburger hat. And those of you reading the captions, this is still a hamburger hat. So, my talk is pointers for eliminating heaps of memory. And yes, there will be C code in this talk. This is a pun.
01:00
In case there's any question about that. But first, first, first I want to talk a little bit about Ruby's standard library. Sorry, this is a good one. There is a library which you can install called standard.
01:20
It is Ruby's standard library. If you were at the game show yesterday, we were talking about the most popular gems. And I want to make this to be I want to make Ruby's standard library one of the most popular gems. Basically what this gem is, it is an opinionated implementation well opinionated configuration
01:41
of RuboCop. So you can't configure it at all. You can so you install it and use it. It's an anti bike shedding tool. You install it, use it like RuboCop, but you cannot change anything. That is it. You just go by those rules. Those are the correct rules. So go ahead and download Ruby's new standard library today.
02:06
My name is Aaron Patterson. You may know me on the Internet as tenderlove. If you don't recognize me in a burger hat, this is what I look like online. It's a little different. I have two cats. This is Gorbachev Puff Puff Thunder Horse.
02:21
The other cat is SeaTac Airport Facebook, YouTube, Snapchat, Instagram. We call her Choo Choo for short. But I have stickers of my cats, so if you want to come talk to me, please come say hello and I will give you a sticker. If you want a sticker. You don't have to take one. But I have them. Anyway, I work for a very small startup company called GitHub.
02:44
We were recently acquired by a very large company. So I had to change my desktop background to stay on brand. Anyway, GitHub is the very first legit company I have ever worked for.
03:01
I love using Git, but I will not force push it on you. Now, yeah. Now your pain feeds me. This acquisition by Microsoft has really given me a chance to branch out.
03:29
I'd say that it's opened a lot of doors for me. Did I say door? Door, I meant it's opened a lot of windows for me. Oh, thanks, Shane.
03:43
So one important lesson that I have learned, so you may have received some random airdrops of cats. If you recognize the cats in the slides, that's me, but I've learned one important lesson that is don't airdrop the people who are speaking, which is happening to me right now.
04:00
So anyway, I just barely got my yesterday, I just got my Microsoft email address, and I'm like really, really excited about it. So I want to share it all with you. This is my email in Microsoft. So please, like seriously, this is my email, please email me here, like send send me many emails, but much like my regular email accounts, this will go into an inbox that
04:22
I do not read. So but please email me here. I will check it when I figure out how to use it. And I hope that your emails will help to give me a better outlook on life. All right. So now let's get to the real technical content.
04:41
I hope that you will enjoy this talk. I'm going to make many PowerPoints. Actually, Comic Sans and Times New Roman are the only fonts I'm allowed to use now. All right, now, okay, let's get to the actual content here.
05:02
So I'm going to talk a bit about reducing memory usage in Ruby, specifically I'm going to talk about two optimizations that I wrote for Ruby 2.6, we're going to go through how two patches I wrote, and we're going to go through how I came up with those patches, what tools I used, and how you can look up, do this same type of thing.
05:22
So the first patch I'm going to talk about is a patch for the loaded features cache, and I'll talk about that later. The second patch I'll talk about is one I call direct ICIC marking, ICIC stands for instruction sequence. But before we can talk about reducing memory usage, we have to be able to find what
05:42
memory to reduce. So the first thing I want to talk about is finding actually finding memory usage in your application. Now, the patches that I wrote, well, Ruby is written in C, MRI is written in C, so a lot of the heap analysis that I do has to be with the C programming language, so
06:01
we're going to talk about techniques for finding memory usage in C programs. So there's two main techniques that I use, and the first one is a very bad technique, and it is called reading the code. This is an inefficient technique, and it does not work all the time. So it's a technique that I sometimes have to use.
06:22
The other technique that I really like to use is called malloc stack tracing, and I'll talk about how to use that in a minute, but with Ruby programs, we have to think about two different places where we allocate memory. One is in the garbage collector, and the other one is from malloc, so we allocate
06:41
Ruby objects using Ruby's garbage collector, and we allocate other stuff, for example strings or I don't know, other things using the system's malloc. We actually have a lot of tools for profiling objects that are allocated in the GC. For example, object space and the allocation tracer gem, you can use these too, I encourage
07:02
you to go look them up, but I'm not going to talk about them today, but these are very good for profiling Ruby object allocations, but they're not so good at profiling malloc allocations, so we're going to look at how to find malloc-based allocations, and the technique that I use is called malloc stack logging, and this is a tool that's available
07:25
on OS 10, but there's other tools available on Linux and other operating systems, but I don't know all of them, I will step away from this topic now. So malloc stack logging, the way you use it looks like this, you basically just set
07:42
an environment variable that says, hey, I want to log all malloc calls, and here what I'm doing is I'm starting up a Rails application, I print out the PID, oh, I have labels for these, so we enable the logger, print out the PID, clean up any garbage, and then just pause the process.
08:00
And the reason we have to pause the process is because OS 10 uses address randomisation, what that means is every time you run your programme, stuff gets allocated into different addresses, so in order to know what those addresses map to, we have to pause the programme and profile it while it's still running. So that gets you just pauses the programme, so we run this in one terminal, and then
08:23
in another terminal, we take the PID that was printed out and run malloc history with the PID, and write everything to a log file, and what this will do is it will log every time a malloc or a free was called, along with the stack trace for that malloc and free. Now, you need to have a lot of room available on your hard drive when you do this, because
08:43
if you have, this is just profiling a very basic Rails application, for a very basic Rails application, the log file is about 6.2 gigs, so you need to, like if you're going to do this on your actual app, you need to have some free space. Now, the file contents look a bit like this, where we have basically just one line per
09:03
alloc and one line per free, and you can see where in memory the allocation happened, what size it is, the stack trace, and for the free, we can see what address was freed and the stack trace for that particular free. Now given that information, we can reconcile a live memory in the system, so this is a
09:21
very simple program, you don't need to read it too closely, but it's a very simple program to reconcile allocs and frees, so first we know how much memory the program was using at any point in time before we paused it, and second we can know what's allocating the most memory, and we can also know exactly how much memory we have allocated at the time the program was paused.
09:41
So that's cool, but when you look at these stack traces, you will see that the very top thing is either malloc or free, and we don't really care about malloc or free necessarily, we care about who is calling malloc. Those are the places we want to target in our code. So if we adjust that previous program just a little bit, instead of logging, like taking
10:01
a look at what malloc we have, we take a look at the caller of malloc so we know where to look at our profile, and to give you an idea of what this will look like, here is a pie chart of the top allocators in Ruby when booting up a Rails application. These are the top callers of malloc. So, don't read this too closely, just know that you can do this.
10:23
So, after doing a process like this, I will combine this technique with the other technique, I'll combine these two techniques, instrumentation plus reading the code, I'll go look at those functions that were allocating a lot of memory, and then figure out how I can reduce the memory usage for that particular function.
10:43
So, with that, let's take a look at the loaded features cache. So this is the first patch that we introduced. The loaded features cache patch that I wrote takes advantage of a technique called the shared string optimization, and I'm gonna teach you about the shared string optimization, then we'll look at the patch.
11:02
So, essentially, Ruby objects, Ruby strings allow us to share the same underlying C buffer. Every Ruby string is backed by a char star buffer in C. Well, not every one of them, let's pretend that's true for now, mostly. In this example, we have three Ruby objects, and they're all being represented by the same underlying C string buffer.
11:25
So, we have X pointing at the head, A is a dupe that also points at the head, B is a substring that points at the next character all the way to the end. Now, we can only do this technique, we can only share strings like this if we're looking at some point in the middle of the string
11:41
all the way to the end of the string. So, for example, let's take a look at an example of where we can't share strings. Like, for example, this one. We have X equals some string, and then we take a substring from the beginning of the string to somewhere in the middle. We can't share those. The reason we can't share them is because Ruby,
12:00
the underlying C buffers need to be null terminated, so we have to share all the way to the end. So, these are not shared strings. So, our shared string rule is always copy to the end of the string if you can. That way we can represent more than one Ruby string while using the same underlying string buffer.
12:20
So, what does this have to do with loaded features, and what are loaded features? Loaded features is a global variable like this. It keeps track of every single file that has been required in your Ruby program. So, if you do require foo, and you look at loaded features before and after, you'll see that it's added a new string to that array.
12:41
And this array is used as sort of a database, so we know that every time we require the same file, like if we require the same file over and over and over again, it's only executed once. Ruby uses this loaded features array to determine whether or not a file has been required. Now, unfortunately, there's a little bit of a complication.
13:01
What is the same file? When I say the same file, what does that mean? So, here is a program, these all require the same file, but you'll notice that the parameters we pass to require are different on each call. So, this is an extreme example of this particular issue, but we expect that particular file
13:22
to only be executed once. Now, that means that Ruby has to look up, oh, have I required this before? Have I seen this? Have I seen this particular thing? And it'll have to search through that loaded features array in order to find it. But the problem is, array search is slow. We don't wanna search the array for every single time you call require.
13:41
Just think of all the files you require in a Rails application. It would take forever to start your application up. So, in order to speed this up, Ruby creates a cache of every possible parameter that could be passed to require. I'll give you an example of this. So, for example, let's say we require abc.rb.
14:01
Ruby will, when you require that file, it'll create a cache that looks like this. This is every single key that could be passed to require that may map to the same file, okay? That way, when require is called again, we can quickly look up, hey, have we seen this key before? If so, let's go check it out.
14:22
So, the actual cache structure looks like this. It's all of those keys mapping to the array index of the file that we actually required, okay? So, when we require a file, we can look that up, look at the array, see what actually got required, and say, yes, we have this already. Let's just skip it. So, the way that we generate this cache looks like this.
14:42
It's actually written in C. I have translated it to Ruby because this is a Ruby conference, not a C conference, and I think Ruby is a little bit easier to read. Don't read this closely, please. Please, we'll walk through this. So, let's say we do this require abc.rb, and we have a string that looks like this.
15:02
The way that the algorithm works is it puts two pointers at the end of this string buffer. Then it moves one pointer back to the first period. Then it moves the first pointer back to the first slash, and we take from the first pointer to the second pointer and the first pointer to the end of the buffer, and that's our first two cache keys.
15:23
Then it moves the pointer backwards again, taking from one pointer to the next and one pointer to the end, and we get our next cache keys, and we keep repeating this until we get to the very beginning of the string, and now we have our entire cache thing, which I skipped over, sorry about that. Let's see that again. Go.
15:41
Go, yes. Ah. Let's see if I can do this. Does anyone go to LA Fitness while you're here? I think they just call it fitness here. I wanted to pick up some LA gear,
16:01
but I think it's just clothes here. All right. So the, that, okay, so I'm distracting myself. So the problem with this cache key generation is that it takes substrings of the original string. So it uses this rbSubstring function call, which is the same function call we basically use
16:21
when we're doing square brackets. And when we do this, it actually takes copies of that string. These Ruby objects here at the bottom, four of those get to point at the original long buffer, but these other four at the bottom, they have to copy the string because they're not going all the way to the end. So our first optimization we can do
16:41
is we can reduce the number of mallocs we have here. We actually have to do four mallocs along the bottom there. So we have eight Ruby objects and four new malloc allocations. So we can reduce the mallocs by using this shared string optimization. And the way we do that, our first step is we take a copy of the string once. So we create a new copy of the string
17:00
that just goes all the way to the period. Now when we take substrings of these two, we're able to share these buffers between all eight objects. So we have four objects pointing at the top buffer and then four objects pointing at the bottom buffer. So now we're down to only two mallocs where we had five before, I guess. We can also eliminate Ruby objects.
17:21
Now this cache is only used in C. It's never exposed to you, the Ruby user. So you don't need to see these Ruby objects at all. Today, in Ruby 2.5, the cache looks like this. We have a hash that points at a bunch of Ruby objects. Those Ruby objects point to the underlying string buffers. But since we never expose this, you don't need to see these and we can just have the hash point
17:41
directly into the strings themselves like this. All right. So this way we can eliminate all of those Ruby object allocations. So now we're down to only two malloc calls. And I made a patch for this. This is the patch. It's a very, very simple patch.
18:00
It's this. Very simple, right? Just kidding, it was not simple. So we can measure the impact of this and we'll measure the impact of this by using allocation tracer. In this example, we just require a very long file and measure the number of objects allocated. So before, we expect to see a bunch of string allocations
18:22
and in the new version, we expect to see fewer string allocations. And in fact, when we output it, it looks like this. I have made a graph because this is not easy to read. But this is what the object allocations look like when we require a file. You can see that we have, the blue one is Ruby 2.5,
18:41
the green one is Ruby 2.6. And in this case, we were able to reduce the number of string allocations by about 50%. So, but that doesn't mean that it reduces your program's allocations by 50%. So let's see what it actually does, how it actually impacts your application. So this is a graph of malloc stack logging
19:01
for a Rails application. The x-axis is the sample number and the y-axis is the amount of memory that we allocated. So you can see, if it's closer to the origin, that's better. Before we, before this patch, we're up there in the upper right
19:20
and after we're a little bit down. And this is about a 4.2% savings in overall memory. Now the thing is, this absolutely depends on your application. If you require more files, you'll actually save more memory. So in other words, the more you load, the more you save.
19:42
How much would you pay for this feature? $100, $1,000, you can have it today when you upgrade to Ruby 2.6. Ah! So I can't tell you how much this will actually save for your particular application because it depends on how many files you require. But at work, we require a lot of files. So I expected this will have a good impact.
20:02
Now, after I got done writing this patch, as I said, this is a story. After I got done writing this patch, I was very, very proud of myself. I debugged the codes, I wrote the patch, I made sure that it worked, I had some nice graphs, I was really pleased with myself. So I submitted a bug, I was so happy.
20:22
Posted the patch here, and then I got a comment on it. And the comment was from this person named Funny Falcon. And he said, did you compare it to this other one? And I was like, hmm, I don't know about this other one. I will click on that link. And so I did, I clicked on a link, and it turns out that he had actually found
20:42
exactly the same problem that I was talking about, and submitted a patch as well. But unfortunately, he submitted his patch five years ago, and it was not applied. So I applied his patch. And, oh, thank you, yes.
21:02
Yes, now, the lesson from this story is always search the issue. All right, anyway, Funny Falcon is now a Ruby committer, so he can just do it himself next time.
21:21
Anyway, all right, let's move on to the next patch I wanna talk about, and this one is called direct instruction sequence marking. Now, unfortunately, this technique requires that we know a little bit about how Ruby's virtual machine works, so I'm gonna talk about Ruby's virtual machine, and we're gonna implement a tiny virtual machine before we write this patch. So Ruby's virtual machine is a stack-based VM,
21:43
and what that means is that we have a list of instructions, so we have a list of instructions like this, and we have a stack, and we can think of the instructions as essentially our program, and the stack as our workspace. These instructions are, you can kind of think of each instruction as an array,
22:01
where we have the instruction, the particular instruction, and the operands for that instruction. So in this case, we have an instruction called push and an operand called three. And the way that the VM works is, the VM keeps what's called a program counter, and the program counter always points at the next instruction that's going to be executed.
22:23
And then it executes it and increases the, increments the PC. So in this case, we'll push three onto the stack, we'll push five onto the stack, and then finally, the add instruction will pop two off the stack, add them together, and push the result value back onto the stack so we get eight put onto the stack.
22:43
So how do we get these instructions? Like where we see simply how the VM works, but where do these instructions come from? Your Ruby programs are actually compiled. Before your Ruby program is ever run, it's compiled into bytecode, and it has to go through a few processing phases.
23:01
The phases are something like this. We start out with some source code, which is the text, the code that you wrote, or the new standard gem you downloaded from RubyGems.org. Then it gets transformed into an AST. The AST is transformed into a linked list, and the linked list is finally transformed into bytecode.
23:21
And we can kind of break these phases up into two main sections where we have parsing and compiling, and where those two kind of overlap, it's a little bit gray, a little gray area, but that's kind of how I think about it. The linked list phase is where we have any optimizations applied to your code, so if any code is dead, it gets removed there,
23:41
or other types of optimizations, like peephole optimizations. That happens in that phase, and it's because linked lists are easy to manipulate. Finally, the linked list is just turned into a bytecode, which is our product, and that's the thing that we actually execute. The data structures that are involved, we have an AST, which stands for abstract syntax tree.
24:03
I'm not sure why it's called abstract, because we use it, it is very concrete, but that is what people call it. It's essentially just a tree data structure. We take our source code, for example, this three plus five, and we convert it to an AST, which looks something like this. We have a plus node that points at two children,
24:22
a three and a five. And if we look at how these are actually implemented, under the hood, they're implemented as Ruby objects. These are actually T node objects. They're implemented as Ruby objects, you just never see them. Now these T node objects are then converted into a linked list, and the way that we convert them into a linked list is by recursively walking
24:41
each node in the tree. So we'll start out at plus, and plus says, oh, I have children, I need you to process my children first. So the recursive function says, okay, I will visit your children first. We have a three, the three has no children, so we end up with just a push instruction with the number three. Then we visit the five child,
25:01
we end up with another push instruction. Finally, there's no more children, so we add a plus instruction to the end. And that's how we end up with our linked list. And if we were to convert this into Ruby, it would look something like this. This is kind of the high level way that we process this. So we process all the childrens before we process the parent object. Next, we apply optimizations to the linked list,
25:22
and I am going to hand wave over this. Emoji pun, yes! All right, I'm gonna hand wave over this part because we're talking about memory optimizations, not runtime optimizations, so we'll just
25:42
not talk about these particular optimizations. So we start out with a linked list, and then we hand wave a bit, and this is our optimized linked list. They are the same, yes? Ha ha! So then we convert it into byte code. But let's talk a little bit more. What exactly is byte code?
26:01
Byte code is binary code that's executed by the virtual machine, and if you look at the implementation, it's actually literally just a list of numbers. It is truly just a list of integers. So where did this list of integers come from? Well, we take the byte code, and we have to translate all these linked list nodes into integers. So in this case, we'll translate push three
26:22
into some number that represents push, say one, two, three. And then we have the number three, which is we'll use that to represent three. And again, we have a push five, so we'll use one, two, three for our push, and five, and then we'll have add, and we'll say we represent that as four, five, six. And now we've got our byte code. That's it.
26:41
So this linked list gets translated into which are Ruby objects, also gets translated into byte code, which is again a Ruby object. It's called an iMemo object. So all of these are Ruby objects, and this is important for our optimization. We'll get back to that soon in less than 12 minutes, I hope. So let's make a simple VM.
27:01
We know how this stack-based VM works. We need to be able to push something onto the stack, but instead of these arrays, we actually have just a list, so we can't just push our PC one at a time. We have to actually push it two, and then two, and then one. So we'll write our program to take that into account.
27:22
We have here our simple VM, which all it does is exactly what we've described in this code. However, we increment the program counter by an extra one whenever we encounter a push instruction. This is it. This is our entire VM. We have done it. We have gone through it. We have learned how compilation works.
27:41
We've learned how the VM works, and I think this is a really, really good point for us. We are doing well. But I want to look at one more program before we can get into this memory optimization, and that's a hello world program. It's almost exactly the same as our integer-based mathematics program,
28:01
but instead we're using strings. So in this case, we'll end up with an AST again, just like we did in the mathematics example. However, when Ruby builds this AST, it actually allocates Ruby string objects. This hello and world, they are both Ruby string objects. They're stored in this AST.
28:22
Then, when we take the AST and convert it into a linked list, we get these push operations, and these push operations also have those Ruby objects. They are the same Ruby objects that we allocated when we parsed the code. Now, when we translate the linked list into byte code,
28:40
we go through here, and we have to actually use an address for the string, so our byte code, remember, we can only deal with integers, so rather than an actual object, we have an integer that's a pointer to the object, so we have the object address. So in this linked list, we have two Ruby objects, and we have the object addresses that correspond to those Ruby objects.
29:04
So now, when we write our VM this time, obviously we don't have pointers in Ruby. These only exist in C, so for our example byte code, we're just gonna embed the strings themselves, okay? So we have our byte code here. We'll use our VM that we wrote before. Now, the thing I need you to note here,
29:20
I know this code is very small. You don't need to read it. Just know that we're executing the byte code twice, okay? This is important. Here, we execute the byte code twice, run this program twice, essentially. When we run it, we see a bug. It says, hello world, hello world world. We see that printed out twice. Now, why is that? The reason is because we allocated Ruby objects here,
29:43
and we push these Ruby objects onto the stack. We'll run through this again. We push hello on, we push world on, and then when we call the append instruction, this is a destructive operation. It appends world to hello. That mutates the objects, but those objects are the same ones that were in our instruction sequences.
30:03
So when the object on the stack gets mutated, the objects in the instructions also get mutated. This is a problem. If we think about this, if we execute the program a third time, it's gonna print out hello world world, three worlds, I think. So when we execute this again,
30:20
we'll see too many worlds here. So if we run it again, we see this output. So how do we fix this? How can we deal with this? We have to change our VM a little bit, and the thing that we need to do is we actually need to copy the object before we push it onto the stack. So we add a new dupe here. All this does is just copy the object before we push it. So when we run the program again,
30:43
instead of pushing the object itself, we push a copy onto the stack, then the copies get mutated, and everything works the way that we expected. And if we think about this in terms of object allocations, this means at compile time, we're allocating two strings for the literals that are in our program. We allocate two strings at runtime
31:02
because we have to dupe those strings before we push them onto the stack. Now as a side note, this is why the frozen string literal optimization helps us. In this case, this is telling Ruby, hey, we cannot mutate these strings. And if we cannot mutate them, it means that you can push the original onto the stack
31:21
and not dupe every time. So if we change to use the frozen string literal, we'll allocate two strings again at runtime, but we only have to allocate two strings at compile time and only allocate one at runtime, which is a result of the plus operation. So now that we understand how the VM works, we understand why this optimization helps us,
31:42
and I think that's really cool. So reducing memory usage. We've gone through all this, all this background. We're actually going to, I think we understand enough details at this point where we can actually look at this allocation technique.
32:00
So bytecode, as we said earlier, bytecode is stored on instruction sequences. Instruction sequences are just Ruby objects. I think I've said this a whole bunch of times, but remember, iseek just stands for instruction sequence. So if you see that around, that's what it means. So if we look at these instruction sequence objects, we'll see that this iseek object
32:21
points at a list of integers. That's our bytecode. Now our iseek layout, as we saw during compile time, since we have to point at, we only have integers, so we have to point at the address of those Ruby objects. So in this case, we may have some strings in here. It points to hello and world. Now those string objects are managed
32:41
by the garbage collector, and as I said, this iseek object, it's a Ruby object, so it's also managed by the garbage collector. But imagine if these two strings were GCed, if they got freed, and your program ran, what's gonna happen? Your program's gonna blow up. There has to be a way for the garbage collector
33:02
to know about those particular strings. Come on, come on, KinoAni, there we go, yes, your program will blow up. Imagine I said that perfectly on time. Anyway, so how do we deal with this? How do we prevent those string objects from going away? Well, the technique that we used up until Ruby 2.6
33:22
was a thing called the mark array. What this was was the instruction sequence object would also contain a reference to an array. So when your program was being compiled, as the instruction sequences were being built, any time a Ruby object was created, like for example those strings,
33:41
the string would be added to the mark array. So it's pointed to by the byte code as well as this mark array. So you can see as we compile it, we add it to two places. So the reason this works is because when the GC marks the instruction sequence object, the instruction sequence object says, hey, I also have an array, go mark the array.
34:03
And the array says, oh, I have some stuff too, go mark them. So it marks those strings and keeps them alive. And that way your Ruby program won't blow up when you execute it. So this mark array has a couple of problems and I think the first problem with it is just an organizational problem.
34:21
In this case, we actually have what I think of as duplicate information. We have two references to each object and the only reason we have the references from the array is to keep those objects alive. But if you just look at byte code being executed or the byte code itself, it's very difficult to tell
34:40
that this byte code actually contains references to Ruby objects. So I think that's very confusing. It's confusing to me. So I think these two lines here, what I like to call hidden references. We have references but it's not obvious. The other issue is that we have array bloat. So whenever you add an object to an array, you know Ruby automatically expands the object for,
35:02
or the array for you to have more capacity. But when you add one object, it doesn't just increase the capacity of the array by one. If it did that, it would be very expensive for you to keep adding objects to an array. Instead, it allocates a bunch of stuff, a bunch of space and then adds to that empty space.
35:21
So if we were to map that out, the array size, the actual number of elements versus the capacity, it would look something like this. So the X, well, I don't know, looks like that. I forgot what these axes are. You get it. So the difference between this green line
35:40
and this blue line is actually unused waste. We're not using it until we start filling up with more objects. Now, this isn't normally a problem in your programs because you'll use those arrays, mutate them and then throw them away and we don't need that memory anymore. It's not a big deal. It's functional, we use it.
36:00
But this array in these instruction sequences, this array lives forever. It lives for the life of your program. So we have this empty space that's just sitting there unused and that really bothers me. I like to use stuff. So how can we fix this? One way we can fix this is to remove array bloat.
36:21
We can resize the arrays when we compile. So the other technique I think which would be better is to just remove the array itself. And we can, we can do that, and we can do that with a program that's almost exactly like the VM that we wrote. If we just had a program that could decode these instructions and whenever it finds an object, mark that object.
36:44
Yes, animations, go. Mark that object, then we could accomplish this without the array. So we can do this almost exactly the same way as our VM. So here we have something that's written almost like the VM, but rather than executing the instructions,
37:00
we just mark something. We mark the parameters. That's entirely what this technique is. So if we use this, we can eliminate the array completely and be able to mark these objects directly. So we call this mark, we're marking the objects directly, which is why I called it a direct marking technique. So in the real world,
37:20
with a basic Rails application, we're able to reduce the array allocation right there. You can see the two, blue is two five, green is two six. In Ruby two five, we allocate about 35,000 arrays. On boot in two six, it's about 8.5,000 arrays. This is a, overall, it's a 34% reduction in objects at boot time. If you look at, this is a map of the heap on Ruby 2.5.
37:42
Black pixels are objects, white pixels are empty space. This is in Ruby 2.6. This is after Ruby 2.6 with GC compaction that I am still working on. Talk to me about that later. Overall process memory reduction, we lose about 6%. Again, this is before and after, so this is a 6% reduction in allocations.
38:03
Now again, this depends on how much code that you load. Remember, the more you load, thank you. I did it. How much would you pay for this? $100, $1,000. You can have it today by upgrading to Ruby 2.6.
38:20
So this is where the patch is. You can go take a look at it. The actual code is right here. Again, it's a very simple patch. Okay, conclusion. We learned about VMs, we learned about memory management, but most of all, we learned about ourselves. Everyone, please upgrade to Ruby 2.6 coming in December.
38:41
Thank you very much.