Visualizing Garbage Collection in Rubinius, JRuby and Ruby 2.0
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 50 | |
Autor | ||
Lizenz | CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben. | |
Identifikatoren | 10.5446/37503 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache | ||
Produzent | ||
Produktionsort | Miami Beach, Florida |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
Ruby Conference 201342 / 50
2
3
7
13
15
17
18
19
28
29
35
39
44
48
49
50
00:00
SpeicherbereinigungInformatikGesetz <Physik>StandardabweichungWeb logRuhmasseVersionsverwaltungSoftwaretestPhysikalische TheorieObjekt <Kategorie>App <Programm>BeobachtungsstudieMereologieDifferenteSchnittmengeComputerspielQuick-SortHalbleiterspeicherReelle ZahlPunktMultiplikationsoperatorMatchingRechenschieberGleitendes MittelIdentifizierbarkeitFigurierte ZahlBitImplementierungSuite <Programmpaket>Äußere Algebra eines ModulsInterpretiererVorlesung/Konferenz
03:10
SpeicherverwaltungIdentifizierbarkeitObjekt <Kategorie>Quick-SortHalbleiterspeicherAnalogieschlussMultiplikationsoperatorEinsSpeicherbereinigungKartesische KoordinatenAlgorithmusMereologieCodeMathematische LogikComputeranimation
03:54
GammafunktionWurm <Informatik>SpeicherbereinigungRechter WinkelMereologiePhysikalisches SystemBitrateKartesische KoordinatenHalbleiterspeicherZweiComputervirus
04:31
SpeicherbereinigungHalbleiterspeicherKartesische KoordinatenObjekt <Kategorie>CodierungComputeranimationDiagramm
05:02
Klasse <Mathematik>SpeicherverwaltungSchreib-Lese-KopfKartesische KoordinatenObjekt <Kategorie>SkalarproduktStandardabweichungCodeMailing-ListeDatenstrukturFreewareZeiger <Informatik>AlgorithmusVariableEinsQuadratzahlMetropolitan area networkInstantiierungDiagrammRechter WinkelMultiplikationsoperatorKlasse <Mathematik>SpeicherbereinigungTouchscreenGeradeQuick-SortRechenschieberVerschlingungSpeicherverwaltung
07:21
CodeBernštejn-PolynomMetropolitan area networkFunktionalFunktionale ProgrammierspracheSpeicherbereinigungp-BlockAlgebraisch abgeschlossener KörperInformatikProgrammierspracheFormale SpracheCodeMereologieAlgorithmusRekursive FunktionMailing-ListeSpeicher <Informatik>NeunzehnFreewareSymboltabelleArithmetischer AusdruckVirtuelle MaschineNeuroinformatikFamilie <Mathematik>Besprechung/Interview
08:25
McCarthy, JohnPhysikalisches SystemVirtuelle MaschineMailing-ListeCoprozessorComputerProgrammierungRegulärer Ausdruck <Textverarbeitung>Algorithmische ProgrammierungGenetischer AlgorithmusRekursive FunktionFunktion <Mathematik>MereologieAlgorithmusFreewareMultiplikationsoperatorSpeicher <Informatik>CoprozessorSymboltabelleArithmetischer AusdruckMailing-ListeNeuroinformatikMereologieRekursive FunktionFunktionalVirtuelle Maschinesinc-FunktionMathematische LogikComputeranimation
09:00
NeuroinformatikMultiplikationsoperatorDigitale PhotographieSmartphoneLeistung <Physik>PunktAlgorithmusSpeicherbereinigungBefehlsprozessorHalbleiterspeicherComputeranimation
09:29
DifferenteWeg <Topologie>Objekt <Kategorie>SpeicherverwaltungPunktMailing-ListeRechter WinkelZeiger <Informatik>BildschirmsymbolDemoszene <Programmierung>RelationentheorieHalbleiterspeicherVersionsverwaltungAnalytische FortsetzungAlgorithmusProzess <Informatik>VerschlingungVererbungshierarchieVirtuelle MaschineMultiplikationsoperatorDiagrammÄußere Algebra eines ModulsBitAppletComputeranimation
10:48
HalbleiterspeicherDifferenteObjekt <Kategorie>SpeicherverwaltungCachingDatenstrukturKlasse <Mathematik>Quick-SortVererbungshierarchieBefehlsprozessorSpeicheradresseZeichenketteOrdnung <Mathematik>DiagrammFlächeninhaltPunktZeiger <Informatik>Demoszene <Programmierung>Ähnlichkeitsgeometrie
12:22
SpeicherverwaltungArithmetisches Mittelt-TestPunktObjekt <Kategorie>Quick-SortDifferenteSoftwareentwicklerProgrammierungMailing-ListeRechter WinkelIdentifizierbarkeitMultiplikationsoperatorBildschirmsymbolSpeicherbereinigungComputeranimation
13:23
Objekt <Kategorie>Quick-Sortt-TestProgrammierungPunktComputerspielObjekt <Kategorie>DifferenteKartesische KoordinatenVirtuelle Maschinesinc-FunktionMailing-ListeFreewareMultiplikationsoperatorVariableThreadPhysikalisches SystemTopologieÄhnlichkeitsgeometrieGraphSpeicherbereinigungInformatikBitmap-GraphikHash-AlgorithmusCodeEinsWort <Informatik>GrenzschichtablösungSoftwareVersionsverwaltungKontextbezogenes SystemGlobale OptimierungMomentenproblemSchreiben <Datenverarbeitung>App <Programm>AutorisierungHalbleiterspeicherNeuroinformatikCAN-BusRechenwerkOrtsoperatorPhysikalische TheorieDatenstrukturBitZeitrichtungGRASS <Programm>
17:58
Dijkstra-AlgorithmusInformatikZweiPhysikalische TheorieSpeicherbereinigungKartesische KoordinatenVorlesung/KonferenzComputeranimation
18:45
Wurzel <Mathematik>Objekt <Kategorie>Objekt <Kategorie>Prozess <Informatik>Mailing-ListeVirtuelle MaschineGraphfärbungKategorie <Mathematik>VersionsverwaltungRechter WinkelSpeicherbereinigungMultiplikationsoperatorZeitrichtungAlgorithmusCodeTopologieWurzel <Mathematik>GraphSystemaufrufRoutingComputeranimation
20:28
Objekt <Kategorie>Objekt <Kategorie>Web logQuick-SortKartesische KoordinatenZeitrichtungSpeicherbereinigungMomentenproblemKeller <Informatik>Computeranimation
21:16
DatenparallelitätSpeicherbereinigungProgrammZustandsdichteBenchmarkCodecKanal <Bildverarbeitung>Prozess <Informatik>Fundamentalsatz der AlgebraFormation <Mathematik>ServerObjekt <Kategorie>FeuchteleitungMinkowski-MetrikSpeicherverwaltungLesen <Datenverarbeitung>Web logSpeicherbereinigungMailing-ListeMinkowski-MetrikObjekt <Kategorie>SpeicherverwaltungCodeAlgorithmusVersionsverwaltungFeuchteleitungKartesische KoordinatenKategorie <Mathematik>CASE <Informatik>HalbleiterspeicherGeradePunktMereologieSkalarproduktStandardabweichungSweep-AlgorithmusZeiger <Informatik>Quick-SortProgrammierungSystemaufrufVerschlingungEnergiedichteEinsComputeranimation
25:03
Mailing-ListeAlgorithmusLoopFunktion <Mathematik>Schreib-Lese-KopfSoftware Development KitSpeicherverwaltungMinkowski-MetrikSpeicherbereinigungAlgorithmusTVD-VerfahrenInformatikSystemaufrufComputerspielSpeicherverwaltungFokalpunktPhysikalischer EffektPhysikalische TheorieMomentenproblemQuick-SortRechter WinkelComputeranimationDiagramm
26:06
SpeicherverwaltungMinkowski-MetrikSpeicherbereinigungRechter WinkelHalbleiterspeicherObjekt <Kategorie>CASE <Informatik>Generator <Informatik>ComputeranimationTechnische Zeichnung
26:53
Statistische HypotheseMinkowski-MetrikObjekt <Kategorie>Statistische HypotheseMultigraphGenerator <Informatik>App <Programm>MultiplikationsoperatorQuick-SortKlasse <Mathematik>ZwischenwertsatzKonfigurationsraumObjektorientierte ProgrammierspracheFrequenzCASE <Informatik>Nichtlinearer OperatorZahlenbereichWeb SiteCMM <Software Engineering>GRASS <Programm>Lesen <Datenverarbeitung>RechenschieberMessage-PassingProjektive EbeneComputeranimation
28:29
Objekt <Kategorie>CMM <Software Engineering>Generator <Informatik>SpeicherbereinigungDifferenteParallele SchnittstelleDatenparallelitätProgrammierungQuick-SortAppletSerielle SchnittstelleBitAusnahmebehandlungThreadStandardabweichungPhasenumwandlungCodeProzess <Informatik>Kartesische KoordinatenSchreiben <Datenverarbeitung>StrömungsrichtungUnternehmensmodellComputeranimation
30:16
RechnernetzDatenparallelitätViereckAbstraktionsebeneKonvexe HülleThumbnailDoS-AttackePROMLesen <Datenverarbeitung>ComputersicherheitFrequenzSweep-AlgorithmusMultiplikationsoperatorApp <Programm>VersionsverwaltungDatenparallelitätSpeicherbereinigungAlgorithmusComputeranimation
30:43
Innerer PunktDatenparallelitätLemma <Logik>GruppenoperationKreisringFormale SpracheDoS-AttackeKonvexe HülleRechnernetzAlgorithmusDean-ZahlPROMAbstraktionsebeneSpeicherbereinigungParallele SchnittstelleInklusion <Mathematik>Kategorie <Mathematik>SchlussregelMenütechnikVirtuelle MaschineProgrammObjekt <Kategorie>MultiplikationHalbleiterspeicherMachsches PrinzipEbener GraphE-MailFontElementargeometrieMinkowski-MetrikRechenwerkWeitverkehrsnetzSampler <Musikinstrument>Fibonacci-FolgeMultiplikationsoperatorFrequenzApp <Programm>SpeicherbereinigungAlgorithmusNetzbetriebssystemHalbleiterspeicherObjekt <Kategorie>PlastikkarteDifferenteDialektSoundverarbeitungComputeranimation
31:20
RankingRechnernetzVirtuelle MaschineProgrammObjekt <Kategorie>AbstraktionsebeneSpeicherbereinigungVollständigkeitMultiplikationKategorie <Mathematik>ProgrammiergerätHalbleiterspeicherDatenparallelitätDoS-AttackeSchlussregelFormale SpracheEbener GraphAlgorithmusBinärdatenFeuchteleitungObjekt <Kategorie>DifferenteDialektHalbleiterspeicherTeilbarkeitSoundverarbeitungNetzbetriebssystemSpeicherbereinigungProzess <Informatik>Generator <Informatik>Produkt <Mathematik>MultiplikationsoperatorEinsKategorie <Mathematik>Web logExistenzaussageDiagrammRechenschieberRechter WinkelFeuchteleitungSpeicherabzugPunktBildschirmfensterHash-AlgorithmusComputeranimation
34:00
Objekt <Kategorie>Patch <Software>DefaultHalbleiterspeicherElement <Gruppentheorie>FeuchteleitungDiagrammSoftwareentwicklerComputeranimation
34:27
Elektronischer ProgrammführerRauschenMultiplikationsoperatorSpeicherbereinigung
Transkript: Englisch(automatisch erzeugt)
00:17
So I'm Pat. My name's Pat Shaughnessy. I'm really excited to be here today. I'm gonna talk about
00:21
garbage collection. It's almost like the theme of the conference this weekend. So Matt's, actually I stole Matt's slide. Complete coincidence, as a matter of fact. But Matt's told us all to be garbage collectors. And then Koichi Sasada did a great talk yesterday. It was a lot of fun to hear about what's coming in Ruby 2.1 with garbage collection.
00:42
And I want to talk about garbage collection, how it works in, in three different versions of Ruby. Standard Ruby, or MRI. MRI stands for Matt's Ruby interpreter. I also want to talk about two alternative implementations of Ruby, called JRuby and Rubinius. And I know a couple of those guys are here so you guys can correct me when I get, when I get into trouble.
01:02
But before we get into the technical stuff, like what is garbage collection, how does it work, why do we want to talk about garbage collection? Why are we talking about it so much at this conference? It seems like a boring, dry technical topic. You know, there was a blog post recently, it's been a couple months now, where someone said, if you tweak your garbage collection settings, you can get your test suite
01:22
to run twenty percent faster. You know, that's great. I certainly admit that garbage collection is important. It's important that we run our tests fast. It's important that our apps run fast while, and users don't have to wait for long pauses while garbage collection is going on. But is it really interesting? Is it really exciting? Like, how many of you here get excited by
01:41
garbage collection? That's not what I, you guys all need to get a life, all right. This is really boring stuff. This is like computer science textbook stuff. But of course, I agree. I actually think garbage collection is really exciting and interesting. I think it's one of the most fascinating parts of computer science theory. And, you know,
02:02
I think there's two real good reasons why to study garbage collection. One is, it's a great way to sort of go back to the classroom. You know, if you majored in computer science a long time ago, it's a great way to go back and sort of relearn what you already know. Or, if you're like me, maybe you came into Ruby and you never majored in computer science. You came from some other place, some other path. So studying something like this
02:21
can be a great way to sort of get your feet wet with real computer science theory. And there's another interesting reason why we should, I think, look at garbage collection. The way I like to put it is, you can learn a lot about someone from their trash, from their garbage. You know, if you go outside of a house, like a big fancy mansion, and you look at what they throw out, you can get a sense of what's going on inside the house. Or if you, you know,
02:41
archaeologists, when they go study ancient civilizations, like ancient Greece or ancient Rome or whatever, you know, what are they doing? They're going into, they're pulling stuff out of the ground that people threw out thousands of years ago. And from that trash, they can sort of figure out what was going on in that culture in that city. And so even with Ruby, I think we look at how Ruby handles trash and garbage, we can get a sense of, at least, how the
03:02
rest of that Ruby implementation works. But even the name garbage collection is a little bit of a misnomer. Garbage collectors don't just collect garbage. They, in fact, they do three things. They allocate memory for new objects or new values. So every time you create a Ruby object, you're actually interacting with the garbage collector. So they're kind of interesting and unexpected, but true. The second
03:24
thing is they identify garbage objects. So they figure out which objects you're actually using in your code, and which ones you're not. The garbage objects. And then they reclaim memory from those garbage objects. And just how do they do that? How do you get memory back from garbage and, you know, reuse it again for allocating more objects? I mean, it's
03:40
cool. If you think about it, it makes perfect sense that garbage collectors both allocate memory for new objects and reclaim memory from garbage objects. It's sort of two sides of the same coin. So it works really well. So an analogy I like to draw about this is, if your application is the human body, let's say all of your code, all of your business logic, your beautiful algorithms, everything that you do is
04:02
the brain, right. The intelligence of the body. So what part of the body do you think the garbage collector is? Kidneys. Liver. Gallbladder. Wow, I'm getting a lot of interesting answers. What was that? The heart. That's my answer, the heart. But
04:22
one other interesting answer I got last month at another conference was, somebody said it's the immune system. It's, you know, the white blood cells that go looking for viruses and bacteria. But yeah, I think it's the heart. I agree with you. I think that garbage collectors are actually the beating heart inside your application. You know, just as your real heart, if your heart stopped beating, you'd be dead in seconds. You know, if your
04:41
garbage collector stopped working, your application would be dead. You know, just the way that your heart provides blood and nutrients for the rest of the body, the garbage collector's providing memory and objects for your application to use. And, you know, if you had heart disease, if your heart slowed down, if you had clogged arteries inside your garbage collector, your application would slow down and slow
05:01
down and eventually die. OK. So what I want to do today is talk about how garbage collection works. And just to give us something to talk about, I'm not gonna use, really, a lot of code today, or actually very, almost no code. But I'm not gonna show any C code details. But what I do want to do is just put a little class on the screen that we can use as an example of something to talk about. So the node class, kind of
05:20
a boring name, I apologize. Initialize method. It just saves a value in an instance variable. And that's just a way for me to, you know, remember which node is which. So I can say node dot new one, node dot new two, and you know, we would know which node is node, which node is which. So let's start with allocation. So what happens inside of Ruby when I allocate a new object? You
05:41
know, what kind of work does Ruby need to do? And the amazing thing about this is, when I learned this, is Ruby actually does very little. The reason why is that, ahead of time, Ruby's actually created thousands of objects when it starts up. So before your application ever runs, before a single line of your code runs, Ruby has, while it's starting up, created actually around ten
06:00
thousand objects and put them in a linked list. So I'm gonna show you a bunch of diagrams today. What these squares are representing are objects. There's actually C structures inside of Ruby. By the way, when I talk about standard MRI Ruby, I'm gonna put a red Ruby at the top of the slide so we don't get confused about which one we're talking about. So each one of these squares is a, is an
06:21
object that's been pre-created ahead of time, and it's sort of ready and waiting for me to use it. So when I call the new method and I create an object, if I say n1 is node dot new of abc, right. So here I have this n1 variable. This is like a pointer or a reference up to a node object. And I'm gonna draw this one in gray now, because this is a live, active object that
06:40
I'm actually using in my code. The other ones remain white because they're not being used yet. They're waiting for me to use them. And so they stay on the linked list. And so what you can see is, Ruby actually doesn't do anything. When I create a new object, all it has to do is pull one off the list and give it to me. So it's very cool. If I create another one, n2 is node new of def, same thing. Ruby does
07:02
very little, just gives me node one off the head of the list. The list gets a little shorter. I have now two objects. And if I do it again, like node new ghi. Same, same story. So this algorithm is called the free list. This is the free list algorithm, algorithm. And it's actually nothing new, nothing new to Ruby. It was invented a long, long time ago by this man
07:22
here. Is this actually Mike Bernstein from Code Climate? Wait. Ladies and gentlemen, I think Mike Bernstein is in the audience somewhere today. There he is. There he is. All right. Fantastic. So we have a legendary computer scientist with us today. Well, actually, while it's true that Code Climate has some legendary
07:40
computer scientists working there, this is actually a man named John McCarthy. He is a professor, he was a professor at MIT. He actually spent most of his career at Stanford. While he was at MIT, he invented a programming language called Lisp. You've probably all heard of that. It's famous for being the first or one of the first functional programming languages. And a lot of what's in Ruby comes from Lisp. All the ideas around blocks and
08:03
closures and, and other things as well. Functional techniques. But what I want to talk about today is garbage collection. And in fact, John McCarthy invented garbage collection in 1960 when he built Lisp. So Lisp is not only famous for the language itself, it's famous for how it was built. And what John McCarthy put inside of it. So he
08:23
invented the free storage, he called it the free storage algorithm. He wrote about it in his academic paper from 1960, so this is fifty-three years ago. It's called Recursive Functions of Symbolic Expressions and Their Computation by Machine Part One. The cool thing about this is part one was good enough. There was never a part two or a part
08:41
three. But he got it right the first time. And you can read all about this. You can, so right here you can see Lisp stands for list processor. And lots of cool stuff in here. One of the things that he mentions is the free storage list algorithm. And so you know, for good or for bad, Ruby actually uses exactly the same algorithm that John McCarthy invented fifty-three years ago.
09:01
Just to give you a sense of how long ago that was, this is the first computer that ran Lisp. This is called the IBM 704. This photo was taken in 1963. And you can see, you know, there it is up in the back. That whole cabinet is a computer. And of course, you know, all of us have cell phones and smart phones in our pockets that have thousands or millions of
09:20
times as much memory and CPU power. But the point here is, that computer ran the same algorithm that Ruby uses today for garbage collection. Pretty amazing. Now I'm gonna switch gears and talk about the other two alternative versions of Ruby. One is JRuby. I'm gonna put a little red bird for JRuby. I think that's the JRuby icon or logo. And for Rubinius we'll have a black
09:43
car. So just to keep track of which is which so we don't get confused. And so what happens, how does allocation work for these versions of Ruby? So they work very differently. They also create things ahead of time, but what they do, instead of creating a linked list of different objects of all the same size, they allocate this big stretch of continuous memory called
10:02
the heap. And then when I allocate new, when I create new objects, it allocates memory from the heap using a pointer. I'm calling it next. I don't know what it's called inside of the JVM. Remember, JRuby is Ruby implemented with Java. So I'm really talking about how the JVM works here. And Rubinius also, Rubinius is Ruby implemented with Ruby,
10:20
but it uses its own virtual machine written in C plus plus. So both of them actually allocate this big heap, and then as I create new objects, you know, ABC, DEF, GHI, same story, what happens is you can see this next pointer moves across from left to right. And they allocate adjacent bits of memory, one after the other, from the same heap in this way.
10:40
This algorithm is called bump allocation, because we're, you know, bumping this pointer along from left to right. So kind of a cool idea. Now, of course, my diagrams are super simple simplifications of reality. So what's actually going on here is a lot more complicated. You know, back to standard Ruby, when I create a new node, I'm actually creating a lot of different things. I have
11:00
the object over here, and this is some of the C structure names down here. I don't really want to get into this sort of detail today. But what's interesting here is, or what I want to point out is, you know, I have the object here, and then the string I saved inside of it is over here, and then there's something else over here. This is the node class called the R class structure. The point I want to make is that all the things that we use in standard Ruby
11:21
might be located in different places. And there are pointers that point from one to the other. So it's likely that these values or objects are not all together in the same stretch of memory. But if we look at, oh, in fact, if I create a new node and I put in more than 23 characters, in fact, those letters don't even fit inside the R string anymore. It puts them out, it has to allocate a different stretch of memory and put the letters
11:42
somewhere else. So, once again, we see that in order to use these values, Ruby has to go get stuff from different places in memory. And that's different, and again, this is oversimplifying, but it's different from how JRuby and Rubinius work. Because they're allocating things from the same heap, it's more likely that these things are gonna be located nearby each other, that have similar memory addresses.
12:03
What that's important for is that, you know, your CPU actually caches RAM, and it's faster, it runs faster if it's getting values repeatedly from the same area of memory. So if it's hitting that cache more often. And, you know, even the, the letters in that longer string are gonna be located in the same, that same initial heap.
12:22
OK, so let's move on to identify. How do garbage collectors identify garbage objects and, you know, later what do they do with them? But how do they find them in the first place? So let's go back to standard Ruby, the red Ruby icon. I have my free list. I've got three nodes already. So let's say I continue to run my program. I create some more objects. OK, this time
12:42
I say, N1 is node new of JKL. OK. We've seen this before. We kind of get it, Pat. All right, let's move on. Get, get the, get to the point. All right, I'm creating more and more nodes, and you can see what's happening here is I'm using up the free list. And eventually, you know, obviously the free list will run out. Something has to give.
13:00
But there's another interesting detail here I want to point out, which isn't as obvious. You know, since I'm changing the value of N1, that variable or reference over here, I'm moving it over here repeatedly to point at different nodes, different objects. But the old objects are still there. They remain behind. So Ruby actually doesn't clean up after itself.
13:21
You know, working as a Ruby developer, you need to get used to the fact that you're sort of, it's sort of like living in a really messy house. You know, when I got out of college, I lived with a few graduate students for about a year in a chair department. You know, I've been there. I guess, like, you know, they didn't clean the dishes. They didn't, like, they threw dirty clothes all over the floor. It was a disaster. You know, I'm not a neat freak by any means, but it was a painful experience. And you know, with
13:42
Ruby, the same thing is going on. As I'm, you know, using objects, they're just left behind when I'm done with them. So eventually the free list runs out. Then what happens? You know, life can't go on. The house fills with trash, and something has to give. So what Ruby does at this point is something called stop the world marking. So it stops your program.
14:00
Your program is no longer running. And instead, Ruby starts to do garbage collection, which means it starts to mark all the objects that your application is using. So again, oversimplification, but it goes to all the different variables in my code, different references. It has a lot of its own references inside the virtual machine inside of Ruby. And it goes to the objects that these things point to,
14:22
and it marks them. I'll put a little M here for mark. In fact, there is, the way this works technically is, there's a bit, so there's zeroes and ones that indicate whether objects are marked or not. This is actually saved, starting with Ruby 2.0, this is saved in a separate stretch of memory called the free bitmap. And I think the author and the inventor of that system is here with us too. Nari
14:42
San. I know he's here at the conference. So this is called bitmap marking. This is new in 2.0. It allows Ruby to run faster and take advantage of something called Unix copy on write optimization. So I'm not gonna get into that today. If you're interested in that sort of thing, I wrote an article about that you can check out online. But the point today is,
15:00
all the objects that I'm actually using, what are called live objects, I'm showing here in gray, are marked. And the remaining objects, the white ones, since I'm not using them, they're not marked. So they're, they must be garbage objects. So by marking the live objects, Ruby has indirectly identified all the garbage objects. Now, like I said, there's a, this is
15:21
a big oversimplification. So an actual program has, you know, hundreds and thousands, many thousands of objects that are all pointing to each other in complicated ways. So there's actually, you can think of it as what's called an object graph. So there's a big network or, you know, tree of objects and they're all referencing each other. And so the code, the marking code inside of Ruby has to traverse this tree somehow and mark all these different
15:43
things. So it's very complicated. And this is why Ruby stops the world, so that it can do this without getting confused. Now let's go and shift over to JRuby and Rubinius. So they do something similar. They also mark your objects so they know which are garbage and which are not. But they can do it while your application is running. At least some versions
16:02
of the garbage collector in the JVM. And Rubinius does as well. So let's talk about how this works. So I'll, I'll put the word collector in this big blue arrow. This represents the garbage collector. OK, so the code that's inside of the JVM or inside the Rubinius VM, that's going through this object
16:20
graph and marking all the different objects. So in this example, it's marked the three along the top. It's come down to this one. And in a moment it's, you know, in a microsecond it's gonna move on and mark these other two ones here that remain. So what happens next? So let's suppose that my application or your application is actually running at the same time as the garbage collector, maybe in a different thread. So let's say it comes over
16:41
here and I'm gonna represent your application with this other blue arrow, and I'm labeling it the mutator. So this is one of the funny things. If you read garbage collection literature, like academic papers from computer science, they refer to, you know, the nice innocent garbage collector over here, which is trying to do its work, and the evil mutator, which comes in at the last minute and starts changing things while it's trying to do garbage collection.
17:02
But it's a little bizarre. But this is how computer scientists see the world. It's like it's backwards. But what they do is, so imagine your application runs and it creates an object, let's say, that one over there is a hash or an array, and I stick something into that array, a new object over here, but the new object notices white. It's not marked yet. It's just been created.
17:22
So since I'm running my app while the marking is going on, the collector, the poor innocent collector's over here, it doesn't know that happened. So it's gonna, well, what is it gonna do? It's gonna continue and mark these last two ones. And it's done. It's marked all the objects. So now it knows that everything that's marked is live. But what about this one? This wasn't marked. It's not garbage, though. If it collects
17:42
that, if it releases that, then this is a huge problem. And actually Koichi Sasada talked about this yesterday in a different context with generational garbage collection. But if we free that, it's gonna be a huge, big problem. We're gonna blow away valid data in your application. So what do we do about this? So we go back to computer science theory. So this
18:01
is going back to the 1960s again. I'm gonna get this pronunciation wrong. This is another legendary computer scientist called Edgard Dijkstra. I don't know if that's right. So he's also from, he's from Holland, and he invented this idea called tri-color marking, which I'll explain in a second. He wrote a really interesting paper about it. He called it On
18:22
the Fly Garbage Collection Exercise and Cooperation. So, well, on the fly we kind of get that. It means you're doing garbage collection while your application is running. But what is cooperation? What is that all about? So he's talking about cooperation between the garbage collector and the application that's actually running. So again, this is a very old idea. None of this is new to Ruby
18:41
or to the JVM or anything like that. This is from the 1960s. And how does it work? So again, JRuby and Rubinius, certain versions of the garbage collector and JRuby and also Rubinius use this tri-color marking. So the way it works is we divide up objects into three colors. White, gray, and black. We'll get to black in a minute. But initially all the objects are considered white. That
19:02
means we haven't, we don't know anything about them. In the middle we'll have some gray objects. Initially we put what are called root objects here. These are like the roots of those, of that object graph tree. So these are objects that we know are live that the virtual machine itself is using or your code has, you know, in a global variable or something. And then the collector, again, the blue arrow is gonna go through these gray objects
19:22
and process them one at a time. And as it goes through those, it's going to mark them. And when it marks them, it'll move them over here on the left, and it'll color them in. Of course, nothing is really going on with color. This is just a metaphor for describing the algorithm. But it will consider them black and put them over here on the left. And they're marked. So these
19:40
are the objects that we know are marked. And on the right are the remaining objects that we don't know anything about. And the gray objects, think of these as objects that are sort of in the list or in the process of getting marked. So the, you can call this the mark stack. It's like a list of objects we need to mark. And the other important detail here is, every time we mark one of these, we also check to see if
20:01
it has a child object or objects over in this category. So the idea, as the process runs, the idea is that objects move gradually from the right and from the white to the gray over to the black. And then when we're done, we have a bunch of marked objects on the left that are black. And we have some other garbage objects left behind that are white.
20:22
So you know, what's the big deal? Why are we doing all this stuff with colors? What's neat about this is it allows you to do concurrent marking. So what do I mean by that? Let's suppose that, while I'm marking the gray objects in the middle, the mutator comes along and does something. It changes one of these things. This was, let's say this was over here. I had already marked it. It was
20:40
black. But then I modified it by adding a reference from it to some other new object. So at the moment that I do that, the collector, what it can do is it can move it back into the mark stack. Or it could move the new object into the mark stack, either way. I think Rubinius does a second thing. And what's neat about that is that we allow the application, that second
21:02
blue arrow, to continue to run and go about its business. But the trick here is somehow the garbage collector needs to know that this happened. It needs to know when to move one of these black objects and make it gray again. So how does it do that? What it does is, actually there's a great article by Durkian, who's saying from
21:21
the, from the Rubinius team, about this exact issue. And it's a fantastic read. Check this out online on the Rubinius blog. Well, what he talks about is how you use write barriers. So I'll represent that with this sort of conceptual dotted line around the object. In fact, it's around every object. A write barrier is just a little piece of code that allows
21:40
the garbage collector to know when we, when the application, when you, modify some objects. And when that, when you do that, when you modify something, this calls the garbage collector and gives it a chance to move this one back into that gray category. So that's how concurrent marking works and how, with certain versions of the JVM and with Rubinius, you
22:01
can run your application without, or at least with a very short and few minor GC pauses. OK, so now we've identified our objects as garbage. What do we do with them? How do we reclaim memory from these things? So let's go back to standard Ruby with the free list. We have these three live objects and we have five garbage objects. So what
22:23
do we do? How do we give those back to your program to use again? Well, it's actually quite simple. We've done the marking. Now we do what's called sweeping. So the, the standard Ruby or CRuby algorithm, algorithm is called mark and sweep. This is the sweep part. So we copy all these objects and put them back on the free list.
22:40
And I just said copy, but that's not true. The neat thing about this is, we're not actually copying anything. Since this was a linked list, all we have to do is sort of modify these pointers inside those objects to re- put them back on the linked list, to reform the list. So it's actually quite fast, and there's no copying involved. Now what about, back to JRuby and Rubinius.
23:01
How do they work? How do they reclaim memory from garbage objects? So this is actually a really cool algorithm. Let me take a couple minutes and explain this. And I was really impressed by this when I learned about it. So what I said earlier is actually not true. The JVM and Rubinius don't allocate just one big stretch of memory for allocating objects. It allocates two. One is called
23:21
the from space and the other is called the to space. It's not called that in the code, but it's called that in the, you know, academic literature. This algorithm, again, this is a algorithm from 1970s. This is not, not anything new even for the JVM. So the way this works is, once I've, I've identified all the live objects, those are the gray ones, and all the white spaces
23:42
in between are the garbage objects, what I do is I copy those ones. I copy the live objects, not the garbage objects, from the from space and to the to space. So what that means is I copy the live objects, and I'm actually physically copying memory around at this point. This idea's called copying garbage collection. And I put
24:01
them in the other heap, which is of the same size, just in case everything was live. In that case, there'd be enough room to copy everything. And then I move that allocation pointer over, so if I want to allocate more memory, I know where to allocate it from. Now, here's where the elegant part comes in. So what the JVM and Rubinius do is, then they swap
24:21
the two spaces. So they swap the from space, I'm sorry, the to space up to be the new from space. And they swap that old from space down to be the to space. And so now what we end up with is a new heap with all of just the live objects in it, and the remaining space is now ready for me to start allocating more new objects efficiently.
24:43
Now what's beautiful about this algorithm is that it's a very elegant way of compacting the heap. So when I did the copying, notice that all these objects, you know, went from different places in the top heap, and all ended up on the left side of the new one. So it's not only getting rid of all the garbage, it's moving the live objects together. So it's
25:02
a really elegant way to do this. And like I said, this is from 1970. It was invented by this guy, C.J. Chaney. Another computer scientist worked in garbage collection. We won't read all this stuff. And there's a variation on this called the Baker algorithm that does the same thing but also works concurrently. Yes. Conceptually, theoretically, yes. So there's
25:27
a lot more going on here. In fact, one of the details is, there's a third heap I didn't mention called the Eden heap. This means like the Garden of Eden. So this is where, I love this stuff. It's like out of Fantasyland. So, so this is the Garden of
25:41
Eden because it's where we create all new objects. So life begins in the Garden of Eden. And then things are copied down. So that's a third heap. And actually, we'll get to generational garbage collection in a moment. So there's even more heaps. So the answer is, it's really complicated. But I, I like to get rid of the complicated stuff and focus on sort of the basic ideas and theory
26:02
behind what's going on here. Cause it's really cool stuff to understand this. So let's, so let's get to generational garbage collection right now. So Koichi Susato talked about this yesterday with MRI Ruby. But let's start with JRuby and Rubinius. So how does the JVM and Rubinius do generational garbage collection? So one of the
26:21
things about copying garbage collections, so we just saw how it's copying these objects down, it seems like this would be really slow, doesn't it? Like, it seems like a really bad idea. Like, why would you want to be copying objects around in memory back and forth? And you know, we're swapping these so we're repeatedly copying things back and forth and back and forth. Doesn't seem like that would be
26:40
a very fast way to do stuff. But the, the cool thing here is it only copies the live objects. So this will work very well when there are very few live objects. When most of the objects are in fact garbage. And that's actually the case quite often. So if you create a new object, chances are you're not gonna use this for very long. Chances are that
27:01
this is some sort of intermediate value inside a calculation or you're uses, you create this and use it inside of one method. You, the method returns and you pop the stack and the object is released or it becomes garbage. So most of the time, most objects live for a very short period of time. So when Koichi talked about this yesterday, this is called the weak generational hypothesis. If you
27:23
didn't see that, check out his slides. He had some great graphs of actual data of how long objects typically live inside of Ruby. But this is a fancy way of saying new objects die young. That's just the way object-oriented programming tends to work. And that's why they say this is a hypothesis. It's hard to prove this
27:40
is the case. And you can make pathological examples where this is not the case. But most of the time, what happens is here is that there's very few objects that have to be copied down. And when some objects do live on for a longer period of time, so some objects do become old objects or mature objects. These might be, you know, configuration values or class objects,
28:02
classes that you create and you need for the entire lifetime of your app. So what the JVM and Rubinius do is they promote, they get rid of those objects that last longer than a certain number of swaps. A certain number of copy, copy operations. So in fact, what happens is for these, for this, this is called the young generation. For young objects or new
28:21
objects, they're frequently not copied because they only live for a short period of time. So it's a really cool thing. Now, where do these promoted objects or mature objects go? So it turns out, so this is called the young generation. This is where the Garden of Eden was and all this copying is going on. And on the top is the mature generation, where we need to deal with these,
28:41
you know, pesky old objects. How do we get rid of these? So what happens up there? So the answer is it depends. It depends on which garbage collector you're using. Well, what do you mean, which garbage collector? Well, it turns out in JRuby, and, and if you use the JVM for any Java application, you can actually
29:00
pick which garbage collector you want to use. I never knew this until recently when I started studying this stuff. It has this sort of plug and play API for garbage collection. So in the JVM you can say, I want to use the good old fashioned serial garbage collector. And that actually works, that uses stop the world, it stops your application, and it sort of works similar to how standard Ruby works.
29:21
You can, you can decide to use the parallel GC. This means, really the same thing as a serial GC except during that stop the world phase while it's doing the marking, it does that at least in parallel in different threads. It speeds it up a little bit. Then there's a really cool new one called, it's not new actually, but there's another one called the concurrent mark and sweep GC. So this one
29:41
is designed to do the concurrent marking, what I talked about earlier, for Java programs. And so with JRuby, of course the nice thing about JRuby is you're using the JVM. So you can take advantage of, even though you're writing Ruby code, you can take advantage of the years and years of work that have gone into the JVM. So you can choose to use that concurrent mark
30:01
and sweep GC if you want. You just turn it on. There's even a new one, this other experimental garbage collector, this one's called G1 GC. This stands for garbage first garbage collection. I'm not gonna get into the details. I don't even pretend to understand how it works. But, like anything, there's these great academic papers you can go and read about all this stuff. So like I said, this is a chance, an excuse for you to go back to
30:22
school. You know, Lauren Sagal was talking about this yesterday how, you know, I think for, generally for Ruby tools, and I think he was talking about security stuff. There's all kinds of academic papers out there. You can go and read this and learn about it. So it's really great. So use this as an excuse to go back to school and read about this. This is how the, the third one
30:41
here, concurrent mark and sweep GC. Just, that algorithm is described here. It's actually mostly concurrent GC, because it does still stop your app for, for brief periods of time. There's another article called, about the garbage first garbage collection. You read about that one here. This one's from 2000. This is 2004. So we're
31:01
getting into at least the same century that we're in now. And what about Rubinius? You know, let's not forget about Rubinius. They actually use an algorithm from 2008. This is called Imix. Again, I don't pretend to understand this. It's quite complicated, but you can read about it here. And these two people invented that recently. I think for both this one, Imix, and this garbage first stuff, they kind
31:22
of divide up the memory into different regions. And I think they call them cards, and it's very complicated. The idea is to efficiently manage where objects are and move them around so you can give memory back to the operating system effectively. OK, so what about standard Ruby? You know, all this great stuff with generational garbage collection, we're now getting, you
31:42
know, all the work that Koichi is doing in the core team. It's coming to your regular old standard Ruby, with the 2.1 release. So he went through this yesterday, and he really explained it very well. So I'm gonna just blow through these slides super fast and just repeat the same stuff. So with standard Ruby, we have, we're gonna have two generations, young and mature. So we'll do
32:01
the marking the way that we saw earlier. But once something's been marked, once an object's been marked once, it will never be marked again. So if an object lives long enough so that the marking process occurs, it'll get moved over here to this mature generation. Now, again, it's not getting moved. It's not getting copied like things are being copied in the JVM. In MRI Ruby, there's no copying.
32:24
It's just, I'm just conceptually considering these to be in a different category. And it just means that once something's been marked, it won't be marked again. And so what happens is every time the garbage collector runs, it'll do at least a young garbage collection, or a minor one. It'll mark all
32:40
the young objects the way that it normally does, but it'll just skip these ones. And so by just skipping them, it speeds it up and runs faster. It doesn't take as long, and it reduces the amount of time you're waiting for the garbage collector to work. But we have the same problem that Koichi talked about yesterday, and actually the same thing, a very similar problem to what I just talked about with concurrent marking. What happens if you create, and this
33:03
doesn't have to be concurrent, it can happen at any time between one garbage collection and the next one. What happens if I create an object and I, you know, modify a mature object? And so this, again, might be a hash or an array, and I'm inserting something into it. So the problem is, if I'm only marking the
33:21
ones over here, I'm not marking these ones again. This won't get marked. So it's a very similar problem. I'm actually jerking on where he is. His blog post describes some of this stuff. It's really cool how the same solution applies, because it's a similar problem. So by using, again, write barriers, by, by putting a write barrier around these mature objects, I
33:41
can detect, Ruby can detect when you modify one of those. And so by, notice, by noticing that, by using this write barrier, by telling the garbage collector that something happened, it can include, you know, this particular object can include that one in the marking, and therefore mark that one in the middle, too.
34:00
So I'm actually, this is what I said is not sure, I just learned a couple days ago that Koichi just committed something to Ruby that is going to use three generations. So this, these diagrams are actually not accurate anymore. We need to see what they, what these guys do. And I think Koichi, you said that you're still evaluating this, whether this is a good idea or not. So I'll
34:20
let you talk about that when you figure out what to do. So, but this is still under development and might change in the next few weeks or months. Anyway, that's it. My name is Pat Shaughnessy. I just wrote a book called Ruby Under a Microscope, which describes how Ruby works under the hood internally, and this actually was a synopsis of chapter twelve, which is the last chapter on garbage collection. The
34:41
book describes really everything going on inside Ruby. So that's it. I don't know if we have any time for questions. I might have run on. I hear noise outside. We're probably way over time. I'm happy to hang around and chat, talk to anyone who wants for the rest of the afternoon. Thanks.