We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Using All These Cores: Transactional Memory in PyPy

00:00

Formal Metadata

Title
Using All These Cores: Transactional Memory in PyPy
Title of Series
Part Number
74
Number of Parts
119
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production PlaceBerlin

Content Metadata

Subject Area
Genre
Abstract
Armin Rigo - Using All These Cores: Transactional Memory in PyPy PyPy, the Python implementation written in Python, experimentally supports Transactional Memory (TM). The strength of TM is to enable a novel use of multithreading, inheritently safe, and not limited to special use cases like other approaches. This talk will focus on how it works under the hood. ----- PyPy is a fast alternative Python implementation. Software Transactional Memory (STM) is a current academic research topic. Put the two together --brew for a couple of years-- and we get a version of PyPy that runs on multiple cores, without the infamous Global Interpreter Lock (GIL). The current research is based on a recent new insight that promises to give really good performance. The speed of STM is generally measured by two factors: the ability to scale with the number of CPUs, and the amount of overhead when compared with other approaches in a single CPU (in this case, with the regular PyPy with the GIL). Scaling is not really a problem here, but single-CPU performance is --or used to be. This new approach gives a single-threaded overhead that should be very low, maybe 20%, which would definitely be news for STM systems. Right now (February 2014) we are still implementing it, so we cannot give final numbers yet, but early results on a small interpreter for a custom language are around 15%. This looks like a deal-changer for STM. In the talk, I will describe our progress, hopefully along with real numbers and demos. I will then dive under the hood of PyPy to give an idea about how it works. I will conclude with a picture of how the future of multi-threaded programming might looks like, for high-level languages like Python. I will also mention CPython: how hard (or not) it would be to change the CPython source code to use the same approach.
Keywords
80
Thumbnail
25:14
107
Thumbnail
24:35
Read-only memoryDatabase transactionSet (mathematics)Core dumpComputer animationLecture/ConferenceMeeting/Interview
Database transactionRead-only memorySoftwareScanning tunneling microscopeInterpreter (computing)Parallel computingConcurrency (computer science)Thread (computing)CodeConsistencyAtomic numberWebsiteParallel portVariable (mathematics)Semiconductor memoryData structureObject (grammar)ImplementationDirected setTelecommunicationData modelOverhead (computing)Shared memoryHybrid computerObject (grammar)CASE <Informatik>Thread (computing)Computer programCartesian coordinate systemAtomic numberMultiplicationLevel (video gaming)Database transactionInterpreter (computing)State of matterVariable (mathematics)Negative numberParallel portBitProjective planeStress (mechanics)Electronic mailing listScaling (geometry)Data dictionaryPoint (geometry)CodeResultantIndependence (probability theory)Limit (category theory)Process (computing)NP-hardTerm (mathematics)Order (biology)Set (mathematics)ImplementationSemiconductor memoryDifferent (Kate Ryan album)Endliche ModelltheorieCore dumpSoftware maintenanceParallel computingSoftwareMereologyCountingPosition operatorDeadlockOperator (mathematics).NET FrameworkLoginOverhead (computing)Shared memoryArithmetic meanExtension (kinesiology)Contrast (vision)Block (periodic table)Musical ensemblePower (physics)Forcing (mathematics)Goodness of fitSelectivity (electronic)Charge carrierRandomizationMathematical optimizationCrash (computing)Product (business)Formal languageCausalityView (database)Virtual machineQuicksortConcurrency (computer science)Right angleFormal grammarEscape characterOpen sourceGroup actionResonatorMetadataData storage deviceInternetworkingCodeComputer animation
ImplementationMereologyLibrary (computing)Scanning tunneling microscopePrimality testDemo (music)Revision controlCASE <Informatik>Multiplication signLibrary (computing)Computer hardwareSystem callSound effectElectric generatorComputer programDiagramThread (computing)Database transactionMereologyScanning tunneling microscopeSemiconductor memoryImplementationBefehlsprozessorSerial portCodeCore dumpCuboidResultantLaptopOpen sourceMultiplicationTotal S.A.SoftwareTheoryWritingPoint (geometry)BitDemo (music)Different (Kate Ryan album)MathematicsPhase transitionHybrid computerSlide ruleRight angleLevel (video gaming)Order (biology)CausalityEstimatorMeasurementSeries (mathematics)CodeWhiteboardProcess (computing)Independence (probability theory)BytecodeType theoryMoment (mathematics)Parameter (computer programming)Beat (acoustics)AdditionArithmetic meanSelectivity (electronic)Stress (mechanics)Overhead (computing)Computer animation
Twin primeFingerprintLink (knot theory)Daylight saving timeTotal S.A.Queue (abstract data type)Thread (computing)Cellular automatonRange (statistics)Frame problemFlow separationBitIncidence algebraState of matterTask (computing)Process (computing)Point (geometry)NumberThread (computing)Right angleModule (mathematics)Bit rateLimit (category theory)Revision controlOverhead (computing)Forcing (mathematics)Functional (mathematics)Multiplication signOrder (biology)Type theory2 (number)Exception handlingInteractive televisionCheat <Computerspiel>BefehlsprozessorQueue (abstract data type)Core dumpBlogComputer animation
Thread (computing)Database transactionSanitary sewerDemo (music)Right angleCartesian coordinate systemDatabase transactionPoint (geometry)Different (Kate Ryan album)Multiplication signDiagramThread (computing)Parallel portBlock (periodic table)Module (mathematics)Computer programPresentation of a groupLevel (video gaming)Extreme programmingNormal (geometry)MaizeGoodness of fitReal numberEngineering drawingDiagramProgram flowchart
Sanitary sewerDemo (music)MultiplicationWaveNeuroinformatikWeb 2.0Server (computing)Thread (computing)Medical imagingExecution unitDemo (music)Point (geometry)Computer animation
Category of beingComputer programParallel portPoint (geometry)Software frameworkWordProcess (computing)Mainframe computerCASE <Informatik>Thread (computing)Core dumpMultiplicationLecture/ConferenceMeeting/Interview
Thread (computing)Scanning tunneling microscopeLibrary (computing)Data modelComputer programmingExtension (kinesiology)Stack (abstract data type)Event horizonAtomic numberParallel computingBefehlsprozessorMultiplicationMehrprozessorsystemIndependence (probability theory)Just-in-Time-CompilerTrailDebuggerAlpha (investment)CoroutineThread (computing)BitRevision controlCartesian coordinate systemNormal (geometry)Multiplication signComputer programStandard deviationResultantPoint (geometry)Independence (probability theory)MehrprozessorsystemSystem callInterpreter (computing)Crash (computing)Different (Kate Ryan album)Atomic numberGoodness of fitInterface (computing)Library (computing)Overhead (computing)Physical systemUniform resource locatorAlpha (investment)Process (computing)Computer configurationCASE <Informatik>Level (video gaming)MereologyData structureAdditionPredictabilityGame theoryMedianSelectivity (electronic)Cone penetration testWave packetOntologyLattice (order)Programming paradigmSpeech synthesisMIDIArithmetic meanProfil (magazine)Logical constantFamilyState of matterFigurate numberCountingComputer animation
LoginLibrary (computing)Arithmetic meanLengthLecture/ConferenceMeeting/Interview
Parallel computingAlpha (investment)Scanning tunneling microscopeLengthBasis <Mathematik>Database transactionPoint (geometry)Block (periodic table)Atomic numberThresholding (image processing)Computer animation
Point (geometry)Multiplication signComputer programMedical imagingRight angleExecution unitLimit (category theory)
Power (physics)Line (geometry)Right angleOverhead (computing)AreaFormal languageObject (grammar)BitEndliche ModelltheorieLinear regressionPresentation of a groupSoftware developerSound effectSelectivity (electronic)Point (geometry)Computer fileEnterprise architectureDatabase transactionOverlay-NetzInterpreter (computing)Level (video gaming)Scanning tunneling microscopeComputer animationLecture/Conference
Transcript: English(auto-generated)
I'd like to introduce Armin, who's going to talk about using all these cores that we're
getting and is producing quite a challenge to our software. He is one of the cleverest people I know, so over to Armin. Thank you.
So, I'm here to talk about PyPI STM, which is software transactional memory, and well, what it actually means is something that will become clear, I hope, during this talk. So it is a bit of metadata, it is an ongoing research project.
It's done mostly by two people, me and Remy Mayer, who is at ETH Zurich. It is a project that has been helped a lot by crowdfunding, so we got almost $30,000 over
the three years of the project. So thank you for everybody who did contribute. And it's a project that started at the European 2011 Lightning Talk. So maybe a few of you remember I was there, and I presented in five minutes how it would work.
So yes, this is the result of three years later, basically. So first question, why is there a GIL, a global interpreter lock, when you run your Python code? Well, I mean, it came as a historical reason, but now it's really deeply into the language,
basically, or the CPAP implementation. CPAP started as a single-thread program, and then, well, the easiest way to change
a program to make it run on multiple threads is just put a lock around everything, right? You have one lock that is called the global interpreter lock, and it needs to be acquired in order to run any piece of code. So the result is that, yes, you can have multiple threads in Python code, but they are
not being used for parallelism. You use them for concurrency, whereas the difference between these two terms really means that you can use the fact that you have several threads that will run independent pieces of code or do some external code or things like that, but they are not actually
running in parallel. So this was really done because it was the easiest way to change the interpreter. And, well, it's also very easy for reference counting, for all these kind of internal details
where we don't have to care, basically. So this has positive and negative consequence. Positive consequence, well, it's simple for the implementer, but it also has some consequences that are visible for you, for the user of Python.
This consequence is that some operations are atomic. For example, list.append or dictionary, setting an item into a dictionary or doing these kind of things. You know that even if you are running a multi-threaded program, another thread that would also try
to do an update to the same dictionary, for example, is not going to mess up the internal state of the dictionary and that would lead to completely obscure crashes, et cetera. No, this does not occur in Python. Well, all variables are volatile, basically, is what I could say here.
And if you don't know what volatile means, it's perfectly fine. Now it has negative consequence, of course. The completely obvious one is that you don't have parallelism, actually. You cannot write a parallel program in Python.
I mean, not one program in Python. Well, there is another consequence, that the global interpreter lock exists and it's fundamental, basically, but it's not exposed to the application, which means
that you get some of the benefits, like the atomicity of some basic operations, list or dictionary setting of this kind of stuff. But you don't get, well, you don't get larger scale atomicity, or you still basically need locks even in your Python program.
So you need locks and then all the hard parts of locks, like deadlocks and everything. So how do we remove the gill?
Well, there are three approaches, I think. The first is called fine-grain locking, the second shared nothing, and the third one, which I'm going to present, is transactional memory. So a bit of contrast, what are these three approaches? Fine-grain locking is, well, we have this interpreter which has the gill,
so let's kill the gill. And instead, we are going to put very fine-grain locks on each individual object, right? So this is a lot of work for the guy who is implementing Python,
I mean, updating Python, basically. Well, and also, if you are talking about C-Biphan, then there are tons of nasty issues, like, for example, reference counting, et cetera, because now you can have several processes that will, in parallel, try to update the reference count of the same object,
so what if they must be atomic updates, but atomic updates are slow and processors, et cetera, et cetera. Well, there are tons and tons of issues. The point is that it's an approach that actually exists in Jython and iron Python.
They are doing it, basically, but they are benefiting, I mean, they, as in the Jython maintainers and iron Python maintainers, are benefiting a lot from the fact that the JVM and the .NET platform has really good support for exactly that.
For example, the JVM hotspot will happily do lock removal on any object that it can somehow prove did not escape the current thread.
And even in the case where it cannot prove things, they are doing kind of crazy things, like they have five different cases, the first case I've checked very quickly, and then you go into the slower and slower cases, et cetera.
Well, basically they are doing crazy things and it's fine. It's fine, but, well, you still need application level locking, it does not solve that, right? If you're writing a Python program, and if you're running it on top of Jython, let's say,
well, you still need to carefully use threads and locks and everything in your Python program. So this is why there is a completely different approach that has some traction nowadays,
also because it can be used on CPython, it's just shared nothing. It's basically this approach is that if you want to run some piece of code in multiple cores on your machine, you just start several processes, and then the processes, well,
you need to design your program in such a way that it's possible to exchange some amount of data that is not too much. Well, it is actually a model that, well, some people think is a good idea and I kind of agree
for their usages for which it's a good idea. It gives a clean model on a multi-threaded, well, a multi-core program. They don't have the issue of locks, and then, well, the negative points that, well,
you have limitations on what kind of data you can exchange between processes. You have to actually overhead for exchanging the data. Also, it's not compatible with an existing threaded application. Well, it's still a good model, but it's not a 100% solution, basically.
So this is now what I'm talking about. This is, well, transactional memory is a way, well, it's a way to run the interpreter
as if it had a gill global in the data lock. So the main difference is that instead of blocking, so when you have several, when you have a lock that several threads acquire, it usually means that all threads will have to wait and only one thread can proceed.
But here, the difference is that this, with this kind of locks, you still, you can still run all threads. But you do so in a way, well, you run all threads optimistically,
and then, well, you need some bookkeeping to check what each thread really reads and writes. And then, at some point, you need to check if they actually did something that conflicted or not.
And the hope is that in the common case, you did not actually get a conflict. And in that case, everything works fine, and you succeeded in running your multiple threads in parallel. And in the hopefully rare case of a conflict, you need to cancel and restart one of the threads.
So this is a very, very high-level view. I should mention that here I'm talking about STM, which is software transactional memory. There is actually HTM that exists as well, that's hardware transactional memory
that's implemented on the Haswell CPUs, actually. So the latest generation of Intel processors has HTM. Well, you also have some hybrids that are some clever implementation of STM
that use HTM internally for some things. Well, most of these three different solutions are still mostly research only because, well,
STM so far has a huge overhead, like the checking, cross-checking of memory conflicts and so on, has typically at least two times, but more like four or five or ten times slower on, well, per core.
Then you have HTM, hardware transactional memory, which is in theory great, but in practice far too limited, at least so far. I mean, in the current generation, it's too limited to support. I mean, we tried to write PyPy HTM, but it gives the result that if you run a
transaction for one bytecode, then you have a bit of chance that sometimes it's not too long.
Okay, so that's why I'm still focusing mostly only on STM now, really the software part. So yes, here in this slide, I wrote easy in quotes, right?
So let me explain why it's easy and which part is actually easy. The easy part is to go inside PyPy and replace the GIL because you just replace the place that's called GIL acquire and GIL release, buy respectively, start a transaction, stop
a transaction. Easy. Now the hard part is to actually write it as a library, how you write all this STM handling code. So I mean, the point I'm making here is that
if you actually think about it, if GIL acquire and GIL release, that's also actually something that's not completely trivial, acquiring a lock. Acquiring a lock is done
with a library that is, for example, on Linux, the pthread library, and the pthread library itself is also a bit crazy. I mean, a lock is, yes, okay, how to do a lock naively, yes, just one word, and I put zero for not locked and one for locked, yes, okay, naive.
Yes, it works, but actually it's not how it's done at all, because it's possible to optimize it far more using clever techniques. So here I'm trying to make the same argument.
It's easy to add correct calls to start and end a transaction in PyPy, but the hard part is to write the library. Okay, well, here I'm presenting PyPy STM, but
just mentioning this library here that is the hard part could actually also be used in CPython, which would be great, then we get CPython with multiple cores, cool. But, well, there is one catch, that it's what do you do about reference counting?
That's the main catch. Well, there are solutions that are involved, hard, messy. Et cetera. Okay, I have a nice diagram to show how it works.
So the basic idea is that in this diagram I have two threads, and I want to run things that's a blue box, horizontal blue box. And, well, if I were running this code with a normal Python with a build, then we would get the diagrams with the
boxes that, well, they cannot run in parallel, right? So they take more time in total. Here with STM, they actually run in parallel, but the point is that
you are, each thread is running a bit independently by being very careful about what it changes. Well, mostly all its changes are kept local. So this is what is done on the first part of the box. And then at the end of each transaction, each box, we do it in a special
phase that needs this time to be synchronized across all cores. We push all changes so that all cores will see it. So the effect that we get from the programmer is as if,
as if, well, in this case, as if we had three independent pieces that were running, and these pieces were running here, here, here, and here only. So basically, the three pieces
run after, well, serially. You have the first piece, the second piece, the third piece. And it just happens that the piece, the preparation to the commit of the transaction
occurred before, and maybe did maybe a lot of work before. So it means that a model,
when you think about how it works, it is still essentially a serialization. So you still have one thread, then another thread, then the first thread that can commit. So that means that can produce a result that you can see. So in this sense, it's exactly
the same as a GIL. So a PyPy STM works, feels exactly like a regular PyPy with a GIL. You don't have any additional issue, any additional races, conflicts, et cetera.
Okay, small demo. Here, I don't have on this laptop a PyPy STM, so I will just go through the sources. This is an example using, I mean, it's an example of a demo that was
posted in the comments of the latest blog post. So it's not from me, basically. If you have
this isPrime function, and you want to compute how many of the numbers up to five million are prime, well, you do it like this. And then if you want to do the same thing on multiple processors, then, well, there are several ways. For example,
you can use a multiprocessing module. Then it looks like this. Yes, I think we'll just switch to another tool.
So this is using the multiprocessing module. So it's doing the same thing in, well, in ranges. So it's really using several processors. So this happens to work because this prime is
simple enough. However, it's not actually doing the same thing because, well, because if somewhere in the pyprime.py file, if somewhere there were some global state or something like that, then it would not work the same way anymore because now you run two
different processors. So, well, I cannot actually run them on showtime, but it's on the order, well, you have to believe me. This one runs about six seconds, typically, it's 6.2 maybe.
This one runs something like five seconds. I mean, there is an overhead, right? So it's not twice as fast. And then you can write a version using multiple threads. This is just bare-bone
threads. It's starting here, import thread at the bottom, starting two threads, and then every thread, well, you have a queue to communicate the ranges, and every thread reads from the same
queue, so gets the next range to do. And if we run this on top of a regular PyPy, then it gives us something like eight seconds. So it means like six seconds, but with a bit more
because there is a bit of overhead. And if you run it on top of PyPy STM, then you get 4.8 seconds, which is the fastest I described so far, which is cool. Okay. Now, here I'm cheating a bit, basically, because yes, this example runs faster with two
cores already, which is great. However, the numbers, I mean, all numbers here have been carefully tweaked to show this, right? Like if you change the detail somewhere, then it's
three times slower. But the point is that we are getting there. Okay.
Now, what I just showed is, yes, you can use multiple threads, and it's cool, and it's faster. Good. However, the main, the real point I'm trying to push forward in PyPy is not,
well, it's not just that, yes, you can use threads on below in your maze of locks and debug them forever. The real point is that you can use threads and very coarse locking. You can have two transactions that run optimistically in parallel.
I mean, here I've just shown you this diagram. So here is another time the exact same diagram. However, now every block is no longer just from one, from JIL acquire to JIL release at
the end. Now it acquires some lock that you define in your application up to release these locks that you define in your application. So you see, technically there is no difference, right? It's just one lock. It's just another lock.
However, the difference here is that you can now start to think about your application as it's starting multiple threads. It's using just one lock, this time an explicit lock that
you imported from the thread module, but just one. And everything it does in all threads, whenever it wants to do something, it acquires this lock, right? It's something that makes no sense at all, a theory. It's something that you wouldn't do in normal Python because,
well, why use threads then, okay? But the point is that you can do it and then it can optimistically try to parallelize PyPySTM. So it means that, well, there's a thing I'm
pushing forward here. The kind of program that I could foresee could be done with PyPySTM is really, well, you put, yes, it's still using threads and locks, but you put them in some
corner of your program with just one lock and, well, you have completely coarse-grain locking,
basically, extremely. So, I mean, an example, here is an application that is traditionally not multi... Ah, I don't have it here. I have to wave my hands, sorry. Wave more of my hands.
So yes, this demo is about bottle web server. I mean, it could be twisted, it could be tornado, whatever. So it is a web server, which is traditionally not using threads.
It works like you get a request from an HTTP request and you are processing it and maybe you're doing some complicated computation and then you're pushing the answer. So the point is that if you just take one of these frameworks, like bottle, you add a thread pool on top of it.
So every incoming request you ask the thread pool by pushing something in the queue, you add the thread, you ask the thread pool, now please process this request. So in that thread,
you will actually process the request and send the answer back to the main thread with another queue, for example. Then if you do that, then it can actually run, well, PyPySTM can run this program on multiple cores in parallel. And the point is that each of these single piece that
run in the thread pool are run by acquiring one lock, one global lock. So it means that the different pieces appear to run one after the other.
So it works just like you did not have a thread pool at all. But in this example, I mean, this is an example where it will clearly mostly be,
I mean, most pieces will be able to run independently from each other. So yes, this is in summary what I'm trying to say is that the PyPySTM programming model, yes, it gives threads on locks that are fully compatible with the global interpreter lock.
But here I'm not saying, now everybody should use threads on locks. Here I'm saying you should make or use a thread pool library and use only cross-grain locking because that's enough
for the PyPySTM. So yes, while you have three different kind of applications that are immediate, you can have a multiprocessing-like interface where you can use a pool of threads. I think
there is actually in the multiprocessing a thread option instead of a process option actually, but that's a bit pointless because, well, that's not running things in parallel normally. But yes, the point is that here you could have a multiprocessing dot thread that works
as I said by acquiring one lock. You can extend a twisted tornado bottle. Like I explained, you can also, well, this is maybe a bit further down the road,
but it's always the same thing. Like if you have a stackless or greenlet or g-event system, so it's a system where you have coroutines basically, but coroutines tend to do something independent from each other. So you can again do the same thing. This time you acquire and
release a lock around the execution of one atomic piece of the coroutine, which means from one switch to the next for example. So yes, the end result would be again something that works,
that continues to work exactly the same way as in you take your existing stackless application and it continues to work, but on multiple cores. Yes, this is the current status.
Well, the basics work. The best case of 25 to 40 percent overhead, which is much better than originally planned. I mean it's really good enough so that usually with just two threads
it's already faster. So this is overhead of running only one thread. Yes, well everything I just said before about the application level locks working the same way as a global interpreter lock is actually wrong, but it should be true soon.
Now we have a workaround with atomic things that I won't explain. It's temporary. Yes, well there are tons and tons and tons of things to improve. So yes, as a summary,
this approach has potential to enable parallelism of CPU-bound multi-threaded programs or I mean it can be used as a replacement of multi-processing or etc. It can also be used in
applications that are not explicitly written for that, like if you have basically anything that could potentially be replaced by a call to multi-processing dot pool dot map.
And yes, so the benefit is you keep the locks coarse-grained. However, the issue is that you keep the locks coarse-grained. I mean this is something that has actually other issues that we
as in me and my coworker were not very clear about so far. Will this work very nicely, completely in any case? I don't think so actually because you have the issue of
if you are running things that should be running parallel but are actually not because for example both pieces would increase some global counter. This is enough to make the pieces conflict and then if they conflict they are again
serialized. So this is an example of systematic conflict. So it means that if you take your program and apply just what I said and expect it to go just n times faster, then it may actually
not go n times faster at all and the reason is probably because of systematic conflicts. So it's something that we need tools to debug. We need a way to find where the conflicts are and figure out, oh but here I'm incrementing this global counter, let's fix it in that way
or this way. Well these are debugging tools and profiling tools etc. All this is not done basically and all this will very probably be very much needed. So here if you want to
compare this approach with the standard multi-threading approach. In a standard multi-threading approach everything is fast, cool, but then things crash if you don't lock correctly. Here at first everything is slow but works correctly and then you can improve. You can improve by detecting the conflicts and etc. But everything still works correctly as in
works the same way as if you had only one thread. Which I think is a very good approach basically. Especially for languages like python. Yes, the performance I mentioned.
And yes, so it's not production ready. It's still alpha getting there. You can download at this URL the first release. It works only on Linux 64 for now. And yes, it's crowdfunding.
Thank you. Thank you Armin. We've got time for a couple of questions before the next session.
If you've got a question will you go to one of the microphones. Thank you. Hi, when you
say coarse locking, how coarse or fine-grained can it be? For example, login library has a global lock internally. Does that mean that logging something once a second would already hit this
performance penalty or is it still okay? What are the lengths of the atomic block that... where's the threshold basically? Can't hear me?
Okay, so how long can a coarse locked transaction run? The point is that in the PyPySTM I tried to enable arbitrary long. There is no limit. Well, if you really do it
too short then you will have the GIL. The GIL will be applied for a longer time.
So the limit is not important for the programmer. The GIL offers a lower bound and the size in the GIL is tweaked to get the best performance. It's tweaked to be long enough so that making it even shorter would introduce too much overhead.
Can you hear this? So if you're running just a pure interpreter is the
overlay pretty much non-existent? Right, the overhead is, well, I can't tell so far because it's a bit like it's in development but that's how I foresee it is that overhead should be like 25 percent everywhere. So yes, but no, I mean, yes, yes, the GIT will remove some
of the overhead but then the baseline is also much shorter so it's harder to improve. So it turns out to be some somehow similar. Yes, so why do I need to remove rough counting
from CPython? I think I will take the question offline because this is, this asks for a more complex answer. So the question is how do you roll back
transactions that actually had already side effects? The point is that a transaction should not have side effect and that is actually, that fits very nicely in the model of CPython
because if you are going to have side effect like write to a file you know, you read the GIL. So here it means you end the previous transaction and you do is actually writing outside transactions. Can you hear me okay at the other microphone over here? In Haskell's STM there are
combinators such as a retry where you can take this STM transaction and say okay I would like to do this from the beginning. Do you intend to expose something like that? No, because this is more low level. Here the real goal is to have STM internally and not expose it at all
to the language.