Python's Infamous GIL
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Teil | 67 | |
Anzahl der Teile | 173 | |
Autor | ||
Lizenz | CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben | |
Identifikatoren | 10.5446/20150 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache | ||
Produktionsort | Bilbao, Euskadi, Spain |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
| |
Schlagwörter |
EuroPython 201567 / 173
5
6
7
9
21
27
30
32
36
37
41
43
44
45
47
51
52
54
55
58
63
66
67
68
69
72
74
75
77
79
82
89
92
93
96
97
98
99
101
104
108
111
112
119
121
122
123
131
134
137
138
139
150
160
165
167
173
00:00
AuswahlaxiomGüte der AnpassungVerzweigungspunktIdeal <Mathematik>Computeranimation
00:25
GoogolBaum <Mathematik>BitMAPPhysikalisches SystemBruchrechnungWort <Informatik>BildschirmsymbolInternetworkingThreadVorlesung/KonferenzComputeranimation
01:03
GoogolBaum <Mathematik>AusnahmebehandlungThreadNetzbetriebssystemProzess <Informatik>MultiplikationBetafunktionErwartungswertBildschirmfensterGemeinsamer SpeicherAggregatzustandExistenzsatzVorlesung/Konferenz
01:29
Attributierte GrammatikRahmenproblemObjekt <Kategorie>MultiplikationThreadZeiger <Informatik>Modul <Datentyp>ZählenBimodulComputeranimationVorlesung/Konferenz
02:14
GoogolBaum <Mathematik>SystemaufrufLaufzeitfehlerZählenInterpretiererVerzweigungspunktThreadObjekt <Kategorie>InformationsspeicherungLastEinfache GenauigkeitMAPCompilerTypentheorieGanze ZahlRechter WinkelZeiger <Informatik>NetzbetriebssystemMakrobefehlSubstitutionMultiplikationsoperatorZahlenbereichEinsVariableSchnittmengeAssemblerGraphDatenfeldPhysikalisches SystemHalbleiterspeicherSystemaufrufBefehlsprozessorPunktVorlesung/KonferenzComputeranimation
04:45
SystemaufrufKonditionszahlThreadResultanteObjekt <Kategorie>GraphLastMultiplikationsoperatorInformationsspeicherungZählenProgrammfehlerInverses ProblemOrdnung <Mathematik>Computeranimation
05:38
GoogolBaum <Mathematik>Objekt <Kategorie>ZählenTropfenAnalysisInverses ProblemLeckMultiplikationsoperatorLeistung <Physik>HalbleiterspeicherZweiNetzbetriebssystemVorlesung/Konferenz
06:13
SystemaufrufLogik höherer StufeNichtlinearer OperatorProgrammierungHalbleiterspeicherObjekt <Kategorie>Physikalisches SystemOrdnung <Mathematik>VerklemmungRahmenproblemThreadGruppenoperationNatürliche ZahlLaufzeitfehlerZählenResultanteKonditionszahlMultiplikationsoperatorZahlenbereichEinfügungsdämpfungMusterspracheFlächeninhaltNeuroinformatikp-BlockSoftwareZeiger <Informatik>Vorlesung/KonferenzComputeranimation
08:13
InterpretiererMehrwertnetzDatentypBitrateEinfache GenauigkeitGammafunktionRegulärer Ausdruck <Textverarbeitung>Große VereinheitlichungAggregatzustandTropfenStatistikMultiplikationsoperatorSoundverarbeitungMakrobefehlMathematische LogikMathematikCoxeter-GruppeKonditionszahlFlächeninhaltSchreiben <Datenverarbeitung>SocketOrdnung <Mathematik>VariableDokumentenserverWeb SiteGruppenoperationProzess <Informatik>DigitalisierungCodeThreadTropfenRechenschieberBefehlsprozessorMaßerweiterungWort <Informatik>NeuroinformatikInteraktives FernsehenElektronische PublikationLogischer SchlussMedianwertEndliche ModelltheorieQuellcodeSchlussregelInformatikerDiagrammInterpretiererp-BlockBimodulProgrammierungNotepad-ComputerGanze FunktionExogene VariableUmkehrung <Mathematik>TouchscreenPunktArithmetisches MittelBitSchedulingByte-CodeAutomatische HandlungsplanungZahlenbereichMereologieCodierungFibonacci-FolgePCMCIAObjekt <Kategorie>Physikalisches SystemGüte der AnpassungFahne <Mathematik>ZeitrichtungAggregatzustandAutorisierungRechter WinkelComputeranimation
15:03
ServerSpielkonsoleNeuroinformatikPhysikalisches SystemHauptplatineServerTablet PCBefehlsprozessorMehrkernprozessorSpieltheorieProgrammierungSpeicherabzugSchreiben <Datenverarbeitung>SystemprogrammMultiplikationProjektive EbeneSichtenkonzeptNotebook-ComputerComputeranimation
16:09
ZustandsdichteRoboterMehrwertnetzInklusion <Mathematik>AusgleichsrechnungSchnittmengeZeiger <Informatik>SpeicherbereinigungPatch <Software>Klasse <Mathematik>Formale SpracheInformationsspeicherungWeb logMultiplikationsoperatorResultanteProgrammierungOverhead <Kommunikationstechnik>Ordnung <Mathematik>MultiplikationNeuroinformatikBitSpeicherabzugInterpretiererVirtuelle MaschineZählenMaschinenspracheElektronische PublikationMathematikMusterspracheGraphTypentheorieDreiecksfreier GraphThreadPrädikat <Logik>KugelkappeGesetz <Physik>ZeichenketteVariableDatenstrukturWechselseitiger AusschlussEinfache GenauigkeitFreewareSoftwaretestEinfügungsdämpfungComputeranimation
18:58
Uniformer RaumMaßerweiterungFormale SpracheEndliche ModelltheorieOrdnung <Mathematik>ZählenAppletGlobale OptimierungObjekt <Kategorie>SpeicherverwaltungProgrammiergerätSpeicherbereinigungMathematikMetropolitan area networkInterpretiererTypentheorieMultiplikationsoperatorKontrollstrukturVirtuelle MaschineProgrammbibliothekComputerunterstützte ÜbersetzungComputeranimation
21:14
GoogolBaum <Mathematik>ROM <Informatik>SpeicherverwaltungInklusion <Mathematik>RoutingUnrundheitSchnittmengeMaßerweiterungDruckverlaufMehrkernprozessorImplementierungSpeicherbereinigungSoftwaretestSpeicherabzugEinfügungsdämpfungMathematikComputerunterstützte ÜbersetzungTypentheorieVorlesung/KonferenzComputeranimation
21:59
GoogolBaum <Mathematik>KontrollstrukturXMLComputeranimationVorlesung/Konferenz
22:28
SpeicherbereinigungCodeOrdinalzahlMaßerweiterungInteraktives FernsehenObjekt <Kategorie>XML
23:07
GoogolBaum <Mathematik>MaßerweiterungPunktInteraktives FernsehenSpeicherbereinigungImplementierungProgrammierungZählenCodeCASE <Informatik>Coxeter-GruppeSpiegelung <Mathematik>MultiplikationsoperatorTypentheorieVorlesung/Konferenz
24:01
BinärcodeSpeicherbereinigungProxy ServerBildschirmfensterZählenPhysikalisches SystemMultiplikationsoperatorObjekt <Kategorie>CompilerMaßerweiterungInterpretiererPunktWärmeausdehnungMakrobefehlBinärdatenImplementierungLeckRechter WinkelXML
25:41
GoogolObjekt <Kategorie>HalbleiterspeicherPhysikalisches SystemZählenKontrollstrukturMaßerweiterungLaufzeitfehlerLeckMultiplikationsoperatorSpiegelung <Mathematik>Gewicht <Ausgleichsrechnung>LaufzeitsystemMusterspracheDreiecksfreier GraphTypentheorieVorlesung/Konferenz
26:40
GoogolBaum <Mathematik>Objekt <Kategorie>ThreadNetzbetriebssystemNichtlinearer OperatorProzess <Informatik>Physikalisches SystemXMLVorlesung/Konferenz
27:14
NetzbetriebssystemPhysikalisches SystemFeasibility-StudieVorlesung/Konferenz
27:37
GoogolBaum <Mathematik>KugelkappeMaßerweiterungInterpretiererKlon <Mathematik>SpeicherbereinigungPunktMathematikXMLVorlesung/Konferenz
28:14
MultiplikationsoperatorMathematikPunktSpeicherbereinigungTypentheorieVorlesung/Konferenz
28:44
Vorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
00:05
Thank you. Is the mic working? Yep? Okay, good. So I'm Larry Hastings, and this is Python's infamous GIL. I'm going to talk about the GIL. I'm going to talk about what it is, what problem it's trying to solve, how it solves it, what the ramifications
00:21
of those choices are, what its history is like, and what its future is like. And I originally proposed this talk at being a bit of more of an advanced level, and the way that I wrote it, it's really kind of friendly for beginners too. So don't get scared. So let's run the clock back to 1992. Python's about two years old.
00:42
It was still young and fresh and evolving rapidly. It was mostly used around CWY where GWIDL was working, but it had also been released on the internet. There was a new technology out there in operating systems, something called multithreading. This isn't something that people had dealt with very much to date. It was seen on mock.
01:02
I think it was available on SunOS. Most operating systems didn't support it. It was in the betas for the new operating system from Microsoft called Windows NT. But this multithreading, what it was, it was kind of like multiprocess except that it was all in one process. You had multiple threads of execution happening inside of the same process wall, the same shared state.
01:23
People wanted to try experimenting with threads in Python. But you couldn't just start adding threads and expect everything to work. There was going to be a problem. The first problem is globals. So a global in C is kind of like a module attribute in Python. It's something, there's only one of them,
01:40
and everybody knows where it is. If you have multiples of these threads and they're all trying to talk to the same global variable, you're probably going to have problems. As an example of one of these globals in Python, there's a pointer to the current executing frame object. And that's, there's one of them, and it's a global variable, and everybody can see it. If multiple people tried to play with the frame object pointer, you'd have trouble.
02:02
An even worse problem, though, is reference counting. So Python uses a technique called reference counting to manage the lifetimes of objects to destroy them when nobody's interested in them anymore. And I'm going to do a deep dive right now about what reference counting is, how it works, and how, what the ramifications are of reference counting is with multi-threading.
02:21
So reference counting. Inside of the Python interpreter at run time, every single Python object starts with these two fields. The second one is the type of the object, and the first one is what's called the reference count. This is an integer, and this integer tells you how many people are interested in this object right now, how many people are holding references. So if the reference count is three,
02:42
that means that there are three pointers pointing at this object. And what you do with this reference count is, every time you take a new reference to the object, you increment that number. And every time you are no longer interested in the object, you decrement that number. So you ink ref and deck ref. When the object count, reference count drops to zero, nobody's interested in this object anymore,
03:00
and so you should destroy it. You call the dunderdel, if it's got a dunderdel, and you release the memory, either giving it back to the operating system or recycling and using it for another object. So in C, there are these things called macros. This is kind of like a subroutine call, but it's actually done as more of a textual substitution. So ink ref and deck ref are how you manage the reference count inside of C.
03:23
These are what they look like today. The ones in 1992 looked much the same. So let's look at what happens if we have two threads that are trying to deal with the reference count at the same time. I'm going to show it to you in exhaustive detail, and I'm going to show it to you both when it works and when it fails. Now, thread one and thread two both have references
03:41
to this object in the center, and they're both going to drop their references. So they're both calling deck ref. Let's turn the deck ref into the underlying C, but even this isn't deep enough. We actually have to go one level deeper and look at the assembly language that this was compiled into by the C compiler. This is actually just pseudo assembly language, but this is very accurate to what's actually going on.
04:03
So let's say that the object has a reference count of three. Thread one loads that reference count into something called AX. AX is a register on the CPU, and you can think of it as a variable. Each thread has its own set of registers effectively, so they get swapped around.
04:22
So thread one loads the value of three into AX. It then decrements it, so now it's two, and it stores it back into the object, the reference count, and the object is now two. Thread two comes along, does the same thing, loads it into AX, decrements it, and stores it, and now the reference count on the object is one. This worked. We have two threads.
04:41
They both drop their references. The reference count on the object dropped to one. Now let's do it again, but this time we're going to do those things in a different order. Thread one loads it, and now it stores it three. Thread two does it the same time, almost at exactly the same time as thread one does it, and it also sees three. Thread one decrements, so does thread two.
05:01
Thread one stores, so does two. But thread two and thread one both wrote the same value. Both threads dropped their references. It should have dropped from three to one, but it only dropped from three to two. This is a bug. This is what's called a race condition. Both of the threads were racing along, and they kind of stepped on each other's feet. Now you have the inverse problem with ink ref.
05:23
If you have two threads that both ink ref the object at the same time, then the reference count can be one too small. The problem here, you get different results depending on which it is. If you have the reference count too large, then what happens is if everybody releases their references to the object,
05:43
the reference count is still positive. It's one or more, and nobody has references to the object, so nobody's ever going to clean it up. So now this is leaked memory, and if there is a finalizer and it has resources, those resources never get finalized, then you're leaking resources too. The inverse problem is much worse. If the reference count is too low, then let's say
06:01
that it's two, and there are three people who have references to it. The first guy drops, and it drops to one. The second guy drops, and it drops to zero, and it goes away. We now clean up the object. We release the memory either back to the operating system or reuse it, but there's still one guy who has a reference to this memory. He thinks it's a valid object. What's he going to see? If the memory's been given back to the operating system,
06:22
the program will just crash. If it's been recycled to use for another object, he's going to see garbage values. He might crash. He might compute some inaccurate result. Who knows? But this is a really nasty bug, and it's really hard to debug. So getting your reference count numbers wrong is really painful, so we want to prevent this from happening if we're going to make Python multi-threaded.
06:42
So the first approach that one might think of would be to add locks around everything. A lock is something you use in multi-threaded programming to prevent multiple people from using something at the same time. You can think of it as kind of like being a bouncer. There's a bouncer standing at the bar, and he only lets one person inside at a time. So the first person comes along, he says, go on in. The second person comes along, and they have to wait
07:01
until the first person comes back out, and then the bouncer says, okay, now you can go. So we could throw a lock around every single global thing. We could say, in order to use the frame pointer, you have to be holding this lock. In order to be hold, or maybe we could do it for groups of things together. We could group together all the stuff that's for the run time currently executing stack frame.
07:20
And we also would need at least one, maybe more than one lock for reference counts. The problem with using multiple locks like this is you have a special kind of race condition called the deadlock. So let's say that we have two threads here. This is an example of a deadlock. A and B are locks, and thread one and thread two are going to try and use those locks. The problem is that they grab them in the opposite order.
07:40
So thread one grabs A, then B. Thread two grabs B, then A. And the problem is if you execute them in this order. Lock, thread one tries to get lock A, and that works fine. Thread two tries to grab lock B, that works fine. Now thread keeps going, thread one keeps going and tries to grab lock B, nope, it has to wait. Thread two tries to grab lock A, nope, has to wait.
08:01
These two threads are now deadlocked. They will never make progress. And if any other thread comes along and says, I want thread A, or lock A, or lock B, it's going to get stuck too. This is a deadlock. This is another painful thing in multi-threaded programming. You don't want to have this problem. So Guido very carefully, in 1992, very cleverly, created something called the GIL.
08:20
That's it. That is the interpreter lock. Since it's global to the whole Python running process, we call it the global interpreter lock, or the GIL for short. That's what it looked like in 1992, almost two years to the date, by the way, since the Python repository was created. If you want to see what it looks like today in Python 2.7, that's it. It really hasn't changed in whatever this is,
08:43
23 years, in Python 2.7 anyway. We'll get into that in a minute. Now, in order to interact with the Python interpreter to do anything, you have to be holding the interpreter lock. In order to interact with an object, in order to run Python byte codes through the interpreter,
09:01
you have to hold the GIL, which is both good and bad. So let's talk about the good. This is simple. Simple is easy to get right. Because it's part of the Python C API, extension modules have to hold the GIL in order to interact with the Python interpreter. This made it a very simple rule, and it was very easy
09:21
for extension authors to get right. This led to people writing lots of extension modules for Python. The story of the Python GIL is really the story of the success of Python. People who hate the GIL don't understand Python's history. Without the GIL, Python would not have been as successful as it is today. So also, because you only have one GIL, you can't have a deadlock, because there aren't two locks
09:41
that you have to deal with. Now, if you have IO-bound thread, IO-bond code, if you have multiple threads, and they're all blocking on IO, then this also just works magically. Your program runs as fast as it can. If your program is CPU-bound, then your program is effectively single-threaded. Because again, only one thread can hold the GIL at a time,
10:03
and therefore, only one thread can be interacting with the Python interpreter. So if you have three threads that all want to be running code, only one of them can do it at a time. Therefore, you're effectively single-threaded. We're going to talk about that. That's why people don't like the GIL. Now, about this, the reason that it works magically for IO-bound code, there's another one of these macros.
10:23
It's two of them, actually, pi-begin-allow-threads and pi-end-allow-threads. So, we don't realize, if you're going to go off and do some long computation in C, if you're going to be calculating some long Fibonacci number, or you're going to be blocking on IO or something, you're not going to be interacting with the Python interpreter. You might as well drop the GIL.
10:41
Let somebody else use it for a while, and then you can go and grab it again when you need it. So, that's what these macros do. And these are, again, very easy to use. You can use them in extension modules. They're used all over in the Python source code. Anytime you're going to do something that might take a while, drop the GIL, do your thing, grab it again. So, what's examples of these things? If you're going to go to sleep, obviously you don't need the GIL. If you're going to read or write from a file handle,
11:01
you can drop the GIL. If you're going to read or write from a socket, you can definitely drop the GIL, because that can take who knows how long. But sharing the GIL in CPU-bound code, that doesn't work. Now, the GIL didn't change, like I said, for a very long time. In 2009, David Beasley gave a presentation at a Chicago area Python user group.
11:22
David Beasley is a computer researcher. He loves Python, and he's very interested in the GIL. So, he's played with it a lot, and he discovered some things that were kind of awful. So, first of all, when you have multiple threads that are trying to use the GIL, there's an internal thing in Python. I should have put it on the screen. It's not there.
11:40
It's called the sys.setCheckInterval and sys.getCheckInterval. This is a number that's set to 100. And what that says is, every 100 byte codes, the currently running thread is supposed to say, okay, let somebody else have a turn. That doesn't work. Here is a diagram showing two threads running at the same time. They're both CPU-bound. They both want to use the GIL all the time they can.
12:03
And you can see that little arrow there. That's 66,000 ticks. This is 667 times longer than it was ever supposed to be. This checkerboard was supposed to be like tiny little slices, and they're running for a very long time. So, people aren't releasing the GIL when they're supposed to. But it gets worse. If you have two threads, and one of them is IO-bound,
12:23
and the other one is CPU-bound, what you want is, the IO-bound thread doesn't run very often, and it doesn't run for very long. You know, it does a little work, and then it waits on a socket. And then it does a little more work, and it waits on a socket. You want to let that thing run whenever it's ready to run. So, if you have a CPU-bound thread and an IO-bound thread in the same process, you want to prefer the IO-bound thread.
12:43
Here, the CPU-bound thread is getting all the time, and the IO-bound thread never gets to run. This is what's called priority inversion in the world of operating system schedulers. And Python had it in a really bad way. The problem is, this code that's supposed to be releasing the GIL and letting other people have a chance to run.
13:01
So, this is it. What it does is, this pythread state swap thing, that's this macro that we saw before, this pyallowbeginthread, or whatever it's called, and thread and begin thread. So, this is releasing the GIL here. Then we say, other threads can run now. And then the immediate next thing we do is acquire the lock. So, what was really happening is, if we have two threads,
13:21
and this one has the GIL right now, and this one wants it, he's saying, you're going to have the GIL. Oh, I take it back. You're going to have the GIL. Oh, I take it back. This guy never gets a chance to get the GIL, and it's worse, if he's CPU bound, he never has a chance, and if he's IO bound thread, he's never got a chance. So, this, by the way, is still what Python 2.7 is like.
13:42
This is Python 2.7. It was fixed in Python 3 in 2009 by Antoine Petroupe. He wrote what's called the new GIL, which is only a minor change to the GIL. It added a new variable called GIL drop request, which is something called the condition variable. In effect, this is a flag. If the second guy is trying to grab the GIL,
14:02
and he's not getting it, he sets this flag and says, look, there's somebody waiting. You really have to give it up, and then the first guy who releases it, he releases it, and he looks, and he sees this flag and says, oh, I really have to back off and let other people have a chance. So, with this change in place, the really bad behavior that David Beasley was saying is fixed. There are still funny conditions that you can arrive at,
14:21
where you can still star threads for CPU time and never getting the GIL, but it's a lot less bad now, and to fix those really funny conditions would require making the GIL a lot more complicated, and what we're really talking about at this point is writing a scheduler, and I only know a little bit about operating system research, but I can tell you
14:41
that writing a scheduler is very complicated, and the more fair you want to make it, the harder the problem becomes, and this is really kind of a raffle that we don't want to go down in the Python source tree. So, the plan was we were going to try a GIL this way and see if it worked okay, and basically, it's been fine, and we just want to leave it alone. So, this is the GIL that's in 3.2 and above, I think.
15:03
Now, meanwhile, while we haven't been paying attention to all this, the world has kind of changed around it. So, I remind you, Python itself was written in 1990, originally created in 1990, and Milton's writing was added in 92, I think. Now, you didn't see multiple cores. You didn't see multiple CPUs in a computer very often back then.
15:23
You'd have to actually go and get a second CPU and have both of them on the motherboard. I had a computer like that in 1995, 96. I had two Pentium 133s in one single computer, but I was kind of a weirdo. You didn't really see it prevalent until you had multiple cores on a single CPU, and you didn't see that until about 2005. That's when server CPUs and desktop CPUs,
15:42
and actually the CPUs used in game consoles for the home, those all went multi-core in 2005, and that's when we started really seeing a lot of uptick in multi-core use and programming. Laptops themselves went multi-core in 2007. Tablets and phones and systems on a chip that are embedded in very small projects, those all went multi-core in about 2011.
16:03
Eyeglasses went multi-core in 2013. Watches went multi-core in 2014. Ladies and gentlemen, eyeglasses and wristwatches are now multi-core. We live in a very multi-core world, and we still don't have a solution for running Python on multiple cores simultaneously.
16:20
Now, Guido has talked about this. He wrote a blog post back in 2007. He said, and he said by Python 3K, that's the old name for Python 3. He said, I'd welcome a set of patches in the Python 3K only if the performance for a single-threaded program and for a multi-threaded but IO-bound program does not decrease.
16:40
This is a tall order. It's been almost eight years. Nobody has done it. We're not sure how to do it. People have tried in the past. There was a famous patch back in the Python 1.4 days called the free-threading patch. What it did, it didn't require changing the C API, which is good. It took all of those global variables, like the pointer
17:00
to the current frame, it sucked them all into one single dict, or a struct, excuse me, a C struct, which is roughly equivalent to a Python class. So now you could have more than one of them. So he created one per thread, and now they didn't stomp on each other. And then he added a single lock, a mutual-exclusion lock, around in-craft and de-craft. So you had to be holding this lock in order
17:21
to do an in-craft or a de-craft. And so you would grab it, do your plus or minus, and release it. This added so much overhead that performance is terrible. It was about, it was between four and seven times slower. And this is results based on David Beasley reviving this patch like 10 years later. A couple of years ago, a guy named Antoine Petrou,
17:40
who's around the conference today, he tried something using atomic test and set. So modern computers actually have a machine language instruction that is a single instruction that says, subtract one from this value over here. So you remember those three instructions, load, decrement, and store. It combines them into a single instruction, which it guarantees will happen atomically,
18:01
and you'll never have any problems of races. So that's good. It doesn't require any API changes, which is also good, but it makes it 30% slower. So again, it doesn't meet the bar that Guido was talking about. Now here's something to consider. There are four big Python interpreters in the world. There's CPython, and then there's everybody else.
18:23
CPython has a GIL, everybody else doesn't. CPython uses reference counting, everybody else doesn't. What do the other people use? They use something called garbage collection, mark and sweep garbage collection. You can, if you have garbage collection, you don't need the GIL nearly as much.
18:40
It's very easy to say, okay, we can get around it. PyPy actually has a little bit of a GIL that they just use during garbage collection cycles. They say it wouldn't be that, you know, it'd be kind of a pain to get rid of it, but it's not impossible. It's impossible to get rid of the GIL as long as we're using reference counting. So people say, we should go to garbage collection with Python, okay? The problem with that is that that changes the API.
19:02
We change, so in order to talk to the Python interpreter, you call the C API. In order to deal with objects in the C API, you have to understand Python's memory management approach. So extra programmers have to know IncRef and DecRef. If we got rid of those and changed to garbage collection,
19:20
that changes the API. That breaks every extension out there. There are a lot of extensions for Python. We don't want to break them. So I'm not sure that we can do this. Would it be any slower? Would it be faster? Conventional wisdom, or at least told to me by Michael Ford, is that garbage collection and reference counting are roughly the same, but we wouldn't actually know until we did it,
19:41
and it would be a lot of work, and nobody has been gutsy enough to try. Here's something else to consider, though. So all of the major languages that have come out since Python was invented, like, you know, the JVM for Java and the CLR for C sharp, all, Rust and Go, all these things are using garbage collection.
20:02
Also, all of them don't have C APIs. PyPy doesn't have a C API either. Now, why is that? Well, the obvious reason is that these languages are generally considered about as fast as C. So the reason for writing a C extension to speed things up, you don't need it. They all have the ability to call out into a C library, and so maybe they think that, well, you just don't need it.
20:23
You know, you can call out to a C library, and you don't need the optimization of an extension, so we don't need the C API. But another thing to consider is that, like I've just said, having a C API means exposing your memory management model to the world, which means that now your hands are tied. If you change that memory management model, you break everybody's C extensions.
20:42
If they don't expose the memory management model for the world, they are now free to change it internally. PyPy started out with a simple mark and sweep, I think. He can correct me. It started out with a simple mark and sweep, and over time, it became generational and it became an incremental collector. And Armin told me that the change to incremental wouldn't have broken anybody's code.
21:02
The change to generational probably would have. If they had exposed a C API, they would have been free to make those changes, and PyPy wouldn't have the better memory manager that it does. So if C Python switched to garbage collection, again, this would break every C API out there, and then if we made improvements to it later, we might break everybody again. So in my opinion, I think that if we went the route
21:23
of trying to change to a garbage collection, we should probably give up on having a C API, which, again, we break everybody's C extensions and we don't fix them again. Instead, we say rewrite your extensions in Python and use CFFI, which, by the way, is wonderful because then that lets you run on PyPy, and also Jython and Iron Python are talking about CFFI support.
21:42
So this would mean that your extensions were now portable across Python implementations. I think for political reasons, the only thing that we could really talk about is atomic test and set. And even with a 30% speed loss, I think that the pressure for supporting multi-core is going to get big enough that we're really going to have to look at doing this. That's it.
22:10
So the coffee break has started, so if you're jonesing for a cup of coffee, you can take off. If you have questions, I'm happy to answer questions for a couple minutes until they kick me out.
22:22
Somebody is holding up their hand on the aisle in the center. Yeah. First of all, thank you very much for this awesome talk. So you say one of the main problem with a garbage collector is C extensions.
22:43
But what if somehow Mark the code as your Python code that doesn't interact with the C extensions, and there you can, with those objects, you can do garbage collection. And somehow, Mark, that the code that interacts with the C extensions, and there use Gil
23:00
or atomic increments, decrements, because like there is a lot of pure Python code out there. Why not use garbage collection there? Python, code that runs in Python doesn't care about whether it has a garbage collector or whether it's reference counted. So I kind of didn't understand your question. Maybe you're going to have to clarify it. But I will point out to you
23:21
that C Python has reference counting, and all the other Python implementations use garbage collection, and your code still runs either way. So it doesn't matter. So I'm not sure I'm answering your question. I mean, imagine you have a pure Python program that doesn't interact with any C extensions. What prevents having a garbage collector there? Nothing. You can run it today, because you can run it under Jython
23:41
or Iron Python or PyPy, and those have pure garbage collection. They don't have. No, I mean, why not implement it within C Python? I'm sorry? Why not implement it within C Python? Why not implement it in C Python? Because that breaks all of the C extensions, and people use C extensions all the time. Yeah.
24:00
And to provide a data point for this, can I interrupt Larry? Sure. So Resolver Systems at one point implemented something called Ironclad, which was an implementation of the Python C API for Iron Python. And Iron Python doesn't have a GIL, and it doesn't use reference counting. But all of the, we maintain binary compatibility
24:21
with the existing binaries, because this was for Windows, and people basically can't compile their own extensions there. And so what you have with those existing binaries is you have C objects, effectively Python objects, implemented in C that expects to use reference counting.
24:42
And so we had a hybrid system. We altered objects that were returned from the C extension. We artificially inflated their reference counting by one so that the macros compiled into the binaries would never be triggered by getting down to zero. And then we had a proxy object that if garbage collection was entered for this object,
25:02
then we would trigger reference counting for the, we would decrement the reference count to zero. Technically, we had a leak though, because if you pass in references to Python objects to the C extension, the C extension could keep references to those alive.
25:20
And essentially what Larry is saying is that if you move to something like mark and sweep for the main Python C interpreter, those C objects are opaque to the garbage collector. It can't know what internal references you have. That's kind of what you're saying, isn't it, Larry? Unless you change the API. Right. So Python actually supports garbage collection.
25:42
C Python does support garbage collection, but it's only used in a very small way. It's used to break reference count cycles, and most objects don't bother to implement it. So if you had an object you created in the C extension, it might have references to other objects, and you have no visibility inside of it. The world that he lived in with Iron Python was a little different, where they were running inside of the CLR,
26:02
and the CLR knows everything. In the C runtime environment, we really know nothing about the memory that's been allocated. I mean, in practice, we had no memory leaks that we could find from objects, from C extensions keeping other objects alive or those objects dying when the C extension. But that was only in practice.
26:21
It seemed like it worked. So you kind of want your system to be theoretically tight, not just working by accident most of the time. Yes.
26:43
So I guess the answer to my question is no, but is it possible that two threads think that the gill is free? They claim it, and then something strange happens? No. The lock objects are provided by the operating system, and they're very good. So one thread grabs the lock,
27:03
and the operating system does a very good job of only allowing one person to have it at a time. Yes. It's not fast. It depends on the operating system, actually. Like Linux has what they call the few text, and I think Python may use few texts on Linux. So that is very fast. It depends on the operating system.
27:21
Some locks are actually pretty slow. But we live in 2015. I think all of them are fast these days. Hi. What do you think is the feasibility of having a C Python clone interpreter
27:44
that breaks the C API, and you use it when the extensions have caught up? So you're talking about forking the Python interpreter and going to garbage collection for that? Yes. Certainly, you wouldn't, that's possible.
28:02
I'll point out to you that we're in 2015, and C Python 3.0 came out in, what, 2008? And there are a lot of C extensions that still haven't caught up here. So people are very slow to adjust to changes to the C API. And I worry that the change
28:21
to garbage collection would be even worse than the change from Python 2 to Python 3. So I'm not optimistic about making a fork of Python that has pure garbage collection, and everyone's saying, oh, I want to switch over to that, and it's really taking. I mean, at some point, you make so many changes, it isn't recognizable Python anymore, and you might as well switch to PyPy, because those guys have been doing it for a long time,
28:41
and it kind of works, mostly. Okay? I think everybody wants to go to coffee. Thank you.