Knowing your garbage collector
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Teil | 27 | |
Anzahl der Teile | 173 | |
Autor | ||
Lizenz | CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben | |
Identifikatoren | 10.5446/20115 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache | ||
Produktionsort | Bilbao, Euskadi, Spain |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
| |
Schlagwörter |
EuroPython 201527 / 173
5
6
7
9
21
27
30
32
36
37
41
43
44
45
47
51
52
54
55
58
63
66
67
68
69
72
74
75
77
79
82
89
92
93
96
97
98
99
101
104
108
111
112
119
121
122
123
131
134
137
138
139
150
160
165
167
173
00:00
ImplementierungHalbleiterspeicherMini-DiscMehrwertnetzFormale SpracheMailing-ListeSoftwareentwicklerSpeicherabzugSpeicherverwaltungPunktMathematische LogikCASE <Informatik>MultiplikationsoperatorLogischer SchlussHalbleiterspeicherAbstraktionsebeneProgrammbibliothekData DictionaryMaßerweiterungPhysikalische TheorieNetzbetriebssystemPartikelsystemBildschirmfensterMomentenproblemDreiecksfreier GraphQuellcodeDatensichtgerätImplementierungAggregatzustandElement <Gruppentheorie>SpeicherbereinigungDelisches ProblemAlgorithmusBitZeiger <Informatik>GefrierenVorlesung/KonferenzComputeranimation
03:38
Zeiger <Informatik>OvalCASE <Informatik>Notepad-ComputerMinimumMusterspracheFormale SpracheCodeAdressraumComputeranimation
04:12
SpeicherverwaltungSystemprogrammierungMereologieRegulärer Ausdruck <Textverarbeitung>SymboltabelleFormale GrammatikCodeKnoten <Statik>Formale SpracheHalbleiterspeicherBenutzerfreundlichkeitFormale GrammatikProzess <Informatik>Physikalisches SystemNeuroinformatikCASE <Informatik>Wort <Informatik>PunktEindeutigkeitProgrammbibliothekStandardabweichungMusterspracheBefehlsprozessorImplementierungMultiplikationsoperatorSpeicherbereinigungDreiecksfreier GraphComputerspielKartesische KoordinatenGamecontrollerSpeicherverwaltungTrennschärfe <Statistik>SchlüsselverwaltungStabMatrizenrechnungSoftwaretestResultanteAlgorithmusArithmetischer AusdruckVirtuelle MaschineBitDeterminanteSymboltabelleFunktionalRekursive FunktionComputerschachMcCarthy, JohnCachingZeiger <Informatik>Gemeinsamer Speicher
08:09
DatenmodellInstantiierungZahlenbereichGanze ZahlObjekt <Kategorie>MultiplikationsoperatorBitSystemaufrufComputeranimation
08:44
DatenmodellMetropolitan area networkMailing-ListeProzess <Informatik>ImplementierungCASE <Informatik>MusterspracheFunktionalProgrammbibliothekArithmetisches MittelKlasse <Mathematik>InstantiierungTypentheorieZeiger <Informatik>GeradeHalbleiterspeicherGüte der AnpassungObjekt <Kategorie>StandardabweichungBimodulComputeranimation
10:30
GeradeCASE <Informatik>PunktObjekt <Kategorie>ZählenAuswahlverfahrenMakrobefehlElement <Gruppentheorie>OrtsoperatorMailing-ListeDifferenteMultigraphComputeranimation
11:20
UmkehrfunktionElektronische PublikationMereologieMAPEreignisdatenanalyseAutomatische IndexierungMailing-ListeCASE <Informatik>Objekt <Kategorie>Computeranimation
12:06
CASE <Informatik>SpeicherverwaltungObjekt <Kategorie>Physikalisches SystemPhysikalische TheorieKondition <Mathematik>Element <Gruppentheorie>Computeranimation
13:02
Metropolitan area networkLucas-ZahlenreiheVererbungshierarchieMailing-ListeIntegriertes InformationssystemGammafunktionKlasse <Mathematik>MehrwertnetzEinsGenerator <Informatik>Objekt <Kategorie>Elektronische UnterschriftMailing-ListeWeg <Topologie>Dreiecksfreier GraphTopologieAlgorithmusWasserdampftafelTeilbarkeitImplementierungIterationBildschirmmaskeElement <Gruppentheorie>LeckSpeicherverwaltungStichprobenumfangHalbleiterspeicherFunktionalTypentheorieCASE <Informatik>BildschirmfensterDifferenteEndliche ModelltheorieZahlenbereichSchlussregelInternet der DingeFormale GrammatikOrdnung <Mathematik>BimodulPhasenumwandlungPhysikalisches SystemZählenDemo <Programm>SpeicherbereinigungSchlüsselverwaltungSystemprogrammLaufzeitfehlerVerschlingungDelisches ProblemComputeranimation
21:08
Demo <Programm>Klasse <Mathematik>Cloud ComputingMailing-ListeBus <Informatik>Integriertes InformationssystemDatenerfassungLokales MinimumRationale ZahlRegulärer Ausdruck <Textverarbeitung>Negative ZahlMetropolitan area networkEin-AusgabeMultiplikationsoperatorSpeicherbereinigungSchwellwertverfahrenObjekt <Kategorie>Mailing-ListeParametersystemGenerator <Informatik>FunktionalAbstraktionsebeneZählenImplementierungCodeProgrammierungDreiecksfreier GraphCASE <Informatik>EinsInstantiierungStichprobenumfangZellularer AutomatPolygonzugWeg <Topologie>TaskMereologieBeobachtungsstudieKlasse <Mathematik>TypentheorieSystemaufrufVerschlingungEin-AusgabeHilfesystemOrdnung <Mathematik>Regulärer GraphTrennschärfe <Statistik>AnalysisDifferenteKonfiguration <Informatik>NetzadressePhysikalisches SystemWasserdampftafelComputeranimation
27:10
BinärdatenDemo <Programm>Metropolitan area networkBus <Informatik>NP-hartes ProblemOverhead <Kommunikationstechnik>ImplementierungElement <Gruppentheorie>SpeichermodellManagementinformationssystemPortscannerCAN-BusSpeicherbereinigungMultiplikationsoperatorInterpretiererObjekt <Kategorie>ZählenKomplex <Algebra>CASE <Informatik>AlgorithmusDreiecksfreier GraphHalbleiterspeicherTranslation <Mathematik>PunktÄußere Algebra eines ModulsEinsLeckGraphDefaultEndliche ModelltheorieAssoziativgesetzImplementierungTypentheorieMailing-ListePolygonzugComputerspielFlächeninhaltFitnessfunktionOverhead <Kommunikationstechnik>Wurzel <Mathematik>PhasenumwandlungDifferenteSoftwareentwicklerNP-hartes ProblemBitWellenlehreDivisionAnalysisNichtlinearer OperatorEreignisdatenanalyseSoundverarbeitungGamecontrollerEreignishorizontArithmetisches MittelSpeicher <Informatik>TaskMereologieFehlermeldungProjektive EbeneBildschirmmaskeBildgebendes VerfahrenNummernsystemSchlussregelKlasse <Mathematik>ZeitzoneComputeranimation
33:47
Virtuelle MaschineSpeicherbereinigungWald <Graphentheorie>Dreiecksfreier GraphBildschirmmaskeURLHalbleiterspeicherImplementierungZahlenbereichStatistikOrdnung <Mathematik>MomentenproblemObjekt <Kategorie>Endliche ModelltheorieCAN-BusThreadSpezifisches VolumenWort <Informatik>InterpretiererCASE <Informatik>UmwandlungsenthalpieCodeBimodulMultiplikationsoperatorStrategisches SpielHeegaard-ZerlegungSoftwareentwicklerComputeranimationXMLVorlesung/Konferenz
37:37
GammafunktionSocketServerMehrwertnetzARM <Computerarchitektur>VariableFächer <Mathematik>HackerModul <Datentyp>AggregatzustandMetropolitan area networkFrequenzFamilie <Mathematik>MultiplikationsoperatorTopologieWald <Graphentheorie>JSONComputeranimationXML
38:30
BenutzerfreundlichkeitHalbleiterspeicherDreiecksfreier GraphMultiplikationsoperatorGenerator <Informatik>Coxeter-GruppeCASE <Informatik>LeckAlgorithmusObjekt <Kategorie>Mailing-ListeKartesische KoordinatenWeb SiteSchwellwertverfahrenBitrateVorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
00:04
Hi, thank you for coming. I'm Francisco Fernandez. Here you have my Twitter handle if you want to ask me something during the talk or whatever. Also you have some questions during the talk, don't doubt about interrupting me. Ok, let's start. Today I'm going to talk about garbage collection on Python.
00:24
Here we have the agenda of this today talk. In the first place I'll do an introduction that includes the motivation why I think this talk could be interesting for a Python developer. Some known problems, some manual memory management, some definitions, some garbage collection
00:42
from academia that I think is also quite interesting to know about. And also every solution has some trade-offs so we will explore them. Later on we will see how C Python implements garbage collection. We need to know something a little bit about this implementation so there will be a little bit about C, all good C.
01:08
Then I'll explain the algorithm itself and finally the cycle detector. And if I have enough time I'll try to do a higher overview of how PyPy approach garbage collection.
01:22
So let's start. Ok, what's the motivation in knowing more about garbage collection or memory management? We don't need to take care most of the time about memory in Python. We are some kind of lucky. Most of the time we are working on a business logic layer because probably most
01:43
of us, we are developers, not Python core developers so we don't do native extensions. So we use a lot of extractions that the language gives to us without knowing more. For example we use a lot of dictionaries and we don't need to take care
02:02
about how dictionaries are implemented or the layer between the operating system and the standard library. And also memories, another abstraction that we have and sometimes it's useful to know about. So we'll know more today. I hope that you'll know more about how memory is managed in Python.
02:25
Ok, but managing memory manually is hard. How many of you are familiar with C or C++? So you know, probably you know how difficult sometimes is managing memory manually.
02:40
There is a lot of problems that I try to list here. The most known is memory leaks, that basically is that we allocate memory for some resource and after using it we don't allocate it. We do that most of the times by accident, on purpose. Another problem that we have is the omnisit.
03:04
Let's say for example like in this snippet of code, this is some kind of library and this library allocates memory on the heap. And it returns a pointer, everything fine. But who is responsible for freeing this resource? The library, I do the free, free this resource. This leads to the problem of double freeze.
03:25
What happens if I free the resource and again the library somehow has the pointer and finally tries to freeze? We end up with a segmentation fault in the best of the cases. Another problem that you can have is dangling pointers.
03:41
For example, this is another snippet of code. We are allocating here a variable on the stack and returning the reference. And once we are out of the scope, this address is not valid anymore so again we will end up with strange behaviors. Segmentation fault will be the best of the cases because you know that something is wrong.
04:06
Values that are not correct is a difficult one. Some languages propose some solutions, some patterns. For example C++, they try to apply as much as possible the right pattern.
04:21
They also introduce some tools on a standard library like unique pointer and share pointer. And also there are other approaches like in Rust, they try to make language safe on compile time. There is a lot of solutions. I don't know if in C you have much tools to deal with that.
04:42
But we cannot have garbage collection in every scenario. There are some scenarios where it is mandatory to have full control of memory. One of them could be embedded systems where we have very limited resources and we need to carefully deal with memory. Also for example we have very demanding applications like a video game can be
05:06
where for example we need to know very carefully how the memory layout is to try to avoid cache misses and so on. And also applications that need determination because most of the algorithms as far as I know
05:21
on garbage collection we don't know for sure when a resource is allocated. So there are scenarios where garbage collection works and other than not. But what is all this about garbage collection? Probably most of you know what is. But I would like to introduce you a little bit about history.
05:41
Here we have a picture of Mr. John McCarthy playing chess with a computer. And he was kind of the inventor of garbage collection in this paper from the 1960s. The first mention to garbage collection on recursive function of symbolic expressions and their computation by machine. And it's kind of funny because he described the whole process.
06:03
He described mark and sweep algorithm there. And as a footnote he says okay I've been naming this process garbage collection but I know that I can use a bit picky and I won't use. Finally academia uses garbage collection so kind of funny.
06:21
So what's the formal definition of garbage collection? Garbage collection is basically automatic memory management. While the mutator runs it routinely allocates memory from the heap. If more memory than available is needed the collector reclaims and use memory and returns it to the heap. This sounds too formal but it's quite simple.
06:41
So the mutator you have there the definition but basically it's our business logic. Nothing else. The heap is where the memory is allocated. And the collector is their garbage collection algorithm itself. In our case Python virtual machine is the one that takes care about collecting and use resources.
07:06
If you want to dig deeper into this topic more in the academic side this is a quite good book. It's quite extensive but it's not so formal. So I recommend you to take a look.
07:22
But as I told you before every solution has some trade-offs. Nothing is perfect. So if we decide we are our language implementators and we decide to have a garbage collector language we have to know that probably we'll need more resources.
07:41
We will have performance impacts because the algorithm has to run so we are losing CPU cycles there. And on some algorithms we don't know for sure when resources will be free. So now we will start looking into how C Python implements garbage collection.
08:05
It implements a reference counting algorithm. But before we need to know a little bit about how it is implemented in C. This is the main object that is exposed mostly for us as a user all the time.
08:21
This is a pi object that is basically a strap with two members. One of them is a pi size t that basically is for us an integer. Name of refcount. This member holds the number of incoming references that an object has.
08:41
So for example if this is an instance b and object a has a reference to b the counter is 1 because it has one incoming reference. And the other member is just a pointer to the type. This allows us to have dynamic typing.
09:01
So let's see a simple example. For the purpose of this talk I want to be educational. We will only take care about instances. We know that full class is also an object but for simplifying we only deal with instances.
09:21
So how does this look graphically? Main is the main module and it holds two references. One for foo and another one for my list. So those counters that I told you before are both 1. Because only main is referencing foo and my list.
09:44
But let's make this more interesting. Let's append foo to my list. Now foo will have two references. We can see graphically like this. Now foo has both main and also my list.
10:00
Referencing him. So that's happened. But who takes care about this counter? Presumably it's the CPython implementation. So here we have a snippet of the standard library on list object. This function what it does is given the list itself it appends the object b.
10:22
In our case foo. On my list. And take a look into this line. By in ref from b. This is the point where we are incrementing by one the counter. Because now the list itself in this case has a reference to b.
10:44
This line it sets in the last position the element. How does this by in ref looks like? Like in a lot of places in CPython is a ugly macro. Basically what it does is it does a cast on op and increment by one.
11:05
The ref count. You see? But what happens when an object is not referenced by another one anymore? Well presumably the counter will be decremented by one. So we end up with this case in the same picture as before.
11:25
Because now my list on index zero that before had foo. Inside now has none. So now we end up again with the same picture. Because my list is not holding a reference to foo anymore.
11:40
So foo again has an over count of one. And how does it look like on CPython level? Again this is part of the list object.c file that implements the list. On CPython.
12:01
Again we have very much like the inverse operation in ref. On the all item that basically is the one that we are not interested anymore. What happens there in that macro? Basically the counter is decremented by one. But it also happens one check. If this counter is equal to zero, we know for sure that no other object in the system running is interested in that object anymore.
12:31
So we can safely dialogue that object. This is what happens here.
12:40
That's the check here. If it's greater than zero, that's some checks. Other case, it's allocated. Is it clear until now? Let's reproduce this case in our little example.
13:05
Now we say to the module, I'm not interested in foo anymore. Delete all the references there. So ref count reaches value zero. So we de-allocate it. The carriage collector.
13:21
And the interesting thing is that once we know for sure that no other object references foo, we can de-allocate. We don't need to wait until collection like in other algorithms. Collection phase. But do you notice any problem that this approach can have?
13:43
Any clue? Probably most of you know what happens. But cycles. What happens with cycles? These approach don't work for cycles. Like this. Now let's imagine that foo has a reference to list.
14:01
And we create a cycle. Foo has a reference to my list and my list to foo. Like this. Now both objects live in main module so no problem. They are there. But what happens if we don't take care anymore about foo or my list?
14:25
We end up with a cycle and those objects are not de-allocated. Because both objects have a reference count of one. And it's not possible that anyone references it again. Because we don't have reference on our system.
14:41
We delete from the module main. So we end up with a memory leak. And I told you before that one of the problems with manual memory management is memory leaks. And we also have ER memory leaks. So what's the problem of having a carriage collection? Well we are lucky and in CPython implementation they implemented a cycle detector.
15:05
Well this is another topology that we can have. More complex cycles. This is a... Those are very very simple examples. But in the real world they are more complex. But yeah. What happens if we have a cycle?
15:26
We'll explore how CPython implements the algorithm to check if there is cycles. To do that garbage collectors should know more or less all the objects that are allocated on the runtime.
15:44
And this is keep track on this double link list. This is a node of a double link list that lives in GC module with this signature. And for performance reasons there are three generations on implementation.
16:02
Because doing a whole scan or looking for cycles on the old living objects is quite costly. So what they do is they divide into three generations. From the youngest one to the oldest ones. And we do a step by step so we don't need to check for the whole living objects.
16:25
Well this is some tricks to... This is a union. Don't worry about it. It's just some details. So let's represent our example.
16:43
Like the one before. We have those three generations and we have four on my list. Both of them probably you can see in the back. Both of them has the value one on GC refs. And what's the key idea of the cycle detection algorithm on CPython?
17:07
The key idea is that we know the object that belong to a generation. So what we can do is extract all the internal references on that generation.
17:22
And those objects with number of reference bigger than zero. They have references from other generation objects. I don't know if I explained better. But the idea is that we keep track only.
17:45
We subtract all the internal references in the same generation. I'll do a step by step trying to explain better. To do that, types internally on CPython implementation has this method that...
18:04
abstract how to traverse all the references on an object. So for example, I implemented this as a Python to be more clear. So for example, extract traversing all the references basically as iterating over the key and value.
18:22
And applying some function. At least basically iterate over all elements and apply a function. This is a utility that we have for this kind of problems like cycle detection. So those tools help to implement the algorithm.
18:44
So let's start. Let's think that we only have foo on my list on my generation zero. To simplify things as much as possible. So I start iterating over the first element that in this case is the one marked in red, foo.
19:01
And what I do is using these functions, depending on the type, I iterate over all the references that this object has. In this case it's only my list. And I apply one function. This function, what it does basically is check that the number of references is greater than zero and decrement by one.
19:21
So with this way we are breaking the cycles. Because all the cycles, all the internal references are decremented. So if we have a case like this one, we are breaking it. We are marking as unreachable. So this is the first step. foo only has a reference to my list, decrement by one.
19:42
Now the reference is zero. The next step we have my list as the next object in the generation zero. And again we decrement by one. So now we know that those objects could be unreachable outside of my generation.
20:06
The next step is moving those objects to a new list called unreachable. And one step that we have to take care of is we need to check on objects that are unreachable.
20:21
All the references because there can be cases where some objects get a value of zero. But there is some kind of topology like one cycle, like the one that we were seeing before, and two other references. These end up having zero on those two objects, but they are still unreachable by these two objects.
20:42
So we have to check for that also. After that we move to an unreachable list. Those objects that are unreachable are moved to the next generation because they can live more. But I prepared a little demo written in Python before I did this using GDB, but it was too much.
21:06
So let's take a look. I have time I think. Can you see the code over there?
21:21
Well I tried to represent more or less the implementation on Python to make it easier to understand. Do you remember this tovelink list? This is one of the represented notes on how the garbage collector tracks all the allocated objects.
21:42
These are some helper methods that implement the traversal methods to know how to iterate over the references. One list. Some examples. We will run an example now. Here we have the garbage collector object.
22:01
It has in this case two generations with two thresholds that I didn't mention before. But we don't do a cycle detection each time that we allocate an object. There is some threshold that they studied and if we allocate more objects than that threshold, cycle detection is fine.
22:29
Here we have this logic. Let's say that I want to instantiate some class. What I do is I append to my first generation this object on this linked list and I check this threshold.
22:47
Is my generation greater than the threshold? Yes. I collect. I do a collection. How is this collect cycle?
23:00
We get some references for being more comfortable programming. The first step is we need to update the references because on the note, this note don't have the most updated reference count. Because as I told you before, we don't run the cycle detection all the time.
23:21
So it can have a value that is older. So the first step is we iterate over all the generation that we are interested in and we update the reference count with the pi of that one. The next step is one of breaking cycles.
23:43
Just subtracting one on the internal references. So let's take a look. What we do is again we iterate over the whole list of this generation. Using this helper method that abstracts how to iterate over all the references, we pass a function as an argument.
24:10
This function basically what this does is it checks that the references are greater than zero and decrement by one. This is a helper method. Don't worry about it.
24:21
So we end up with, after this step, I will run one example.
24:42
Do you know how to evaluate all the cells in Python? I'm pretty new to notebooks.
25:02
We'll do like a step by step to see it clearer. We allocate one list. We are trying to replicate our sample. We have a list, an instance of an object and we are creating a cycle. We allocate. We'll explore LIME manually.
25:33
I don't know what's happening. After this step on our example, we will get those two objects with an account of zero.
25:46
So the next step is just check those edge cases where we can get some object with an account of zero but some other object can reference them. So what we do is, in first place, we use again tp traverse and we use another callback.
26:07
Basically what it does is if we find one object that has an account of zero, this is a false positive as an unreachable. So we need to keep track that it's still reachable.
26:23
In our case, we just check that the refs are greater than zero. After that, we have those two lists, the unreachable object and the young ones. So we know that these ones can be deleted but it's not so simple.
26:43
So we move the reachable object in this generation to the older one. I'll explain later what we need to do with references and finalizers and we finally delete all the garbage. There are some edge cases that I'll try to explain now but did you understand more or less the algorithm?
27:06
Should I explain something more or is it clear? But as I told you before, there are some problems with this approach.
27:25
We need to take care about finalizers, underscore underscore deal. There are two kinds of finalizers. Those ones are kind of legacy and those cannot be allocated safely so we end up having memory leaks if we have legacy finalizers.
27:45
With underscore underscore deal, it also has to take care a lot of edge cases so please don't use finalizers. Finally, we have to take care also about weak references. What it does is check the ones that are reachable and execute the callbacks that they have associated.
28:07
So what makes the emoji happy? It's incremental. This algorithm is incremental so as it works, it frees memory. We don't need to wait for a cycle to recover memory. Emoji is worried because as you can see, detecting cycles is kind of hard.
28:24
Not the algorithm itself but there is a lot of edge cases that I didn't mention but it's kind of complex. We also have some size overhead on objects that this OV ref count. As a personal opinion, also in CPython garbage collection is to couple to the model so that we have a counter in the main object of the interpreter.
28:49
So we cannot change the model easily so that's a side note.
29:00
I have little time now, like five minutes. I'll try to do an overview of how PyPy handles garbage collection and memory. So probably most of you are familiar with PyPy. It's an alternative implementation of Python. It's written in R Python and regarding to garbage collection, it has some points that are quite interesting.
29:29
One is that it's kind of agnostic to the garbage collection algorithm. So during translation time that is how do you get the interpreter at the end.
29:44
You can, if I'm not wrong, you can change the model of garbage collection algorithm that you want to use. So this allows PyPy developers to experiment over time with different approaches that I think is a good thing not to be tied to one implementation.
30:02
And the algorithm by default nowadays is in Minimak. Let's explore a little bit about PyPy. One of the things that PyPy developers notice is that we allocate a lot, well we interpreter allocate a lot of objects with a short lifetime.
30:26
So for example, we have some examples here like for comprehension, we are allocating objects that will live very, very slowly. Even when we are calling a method in a class, we are creating a new object, a bounded method with a very, very short life.
30:49
So those objects will be allocated in a performant way. So they end up with this kind of memory model. They divide it into two areas.
31:02
One for the young objects, this one that lives early and the young area is divided into two areas. The nursery that has a fixed size. Another area where all the objects that don't fit in the raw model area go there. And also some arenas to hold the objects that are older than the young ones.
31:30
So how does it work? It performs what they call minor and major collection. Minor collection is performed on the young area.
31:44
So it's fast because we don't need to transfer to the old life objects. Objects are moved only once to the old area. And also major collection is done incrementally. At the end, they implement kind of mark and sweep algorithm.
32:05
These are the phases that they implemented. First place they do a scan phase, then mark, they sweep. Finally they execute finalizer because we have the same problems as in C Python.
32:21
Here we have some schema of how mark and sweep works. In general this is the same schema as in the paper of McCarthy. Basically what we do, we know for sure the root of the main objects and we traverse the graph of memory.
32:43
And we mark on this traversal which ones are reachable. Once we know which ones are reachable, we iterate over the whole list and we sweep. We allocate the ones that were not marked. And yeah, that's it. Quite simple and relevant.
33:02
Mark and sweep can collect cycles because we basically are doing a graph traversal. But type implementation can be more complex to understand than C Python because, yeah, at least for me. And on full collection, we stop the work.
33:21
I mean that we cannot, as well as when we are detecting cycles in C Python, we cannot perform other operations. So, yeah, questions?
33:44
Start up front if there are any. There are a few at the back. Hi, thanks for the explanation of these algorithms. What I would be interested in as a Python developer is to anything I should take care
34:01
about in my code so that I enable the garbage collector to do this work most efficiently. Try to avoid the __deal methods, for example, because there are a lot of edge cases that interpreter has to take care of. So, that's one off.
34:21
With Python you said there is a possibility of the stop the world issue on garbage collection. Is there a way to in future implementation or theoretically to circumvent the circumstance, for example, splitting memory and garbage collecting different?
34:48
Yeah, they do that approach. So, the stops, as far as I know, they are not so long. So, but it could be interesting having like in the JVM where you have parallel strategies and so on.
35:04
It could be interesting. What about threads? You cannot have parallel code running because of the global interpreter lock, as far as I know.
35:22
But I don't get your question. Can you elaborate more? If I run more than one thread, then each thread allocates and then it has to be garbage collected. Is the garbage collector specific for a thread or is it global to all the threads? It's global. As far as I know, I'm seeing Python. In PyPy I'm not sure.
35:40
Can you force the garbage collector to run or is it free also to ignore it if you call it? Can you repeat the question? I didn't hear you.
36:02
Can you force the garbage collector to run and if it does, is it free to ignore it? You can force a cycle on GC module. At least on CPython you can import GC. You can force a cycle detection because on reference counting resources are free when they are not used anymore.
36:30
You can force a cycle detection. But you have an API to manage a garbage collector. Could an object be resurrected by its finalizer? I didn't hear you. Can you repeat the question?
36:48
Can an object be resurrected by its finalizer? That's one of the edge cases that you have to handle. That's why it's not good practice.
37:02
Is the garbage collector the one that collects cycles gathering? Any statistics? Can I inspect and see some statistics?
37:21
Yes, you have through this API on GC module. You can check as far as I can remember number of unreachable objects. Let me check. You have over here recommended somewhere that it checks for statistics all the time.
38:06
Importing GC you can access to some of them.
38:24
Does it run with a certain frequency or how often? On CPython it depends on this threshold. It's time that an object is allocated. It's appended to the youngest generation list.
38:40
Once we reach this threshold, a cycle detection cycle algorithm is done. Last question. We don't have time for more. Do you know about some user friendly tools to detect memory leaks before you deploy your application?
39:10
I think there are some, but I don't know. In the worst case you can use Valgrind, but not so user friendly. And you have to know the internals. But probably there are some that I'm not aware of.
39:29
Great presentation. Thank you very much everyone.