The GIL is dead: PyPy-STM

9 views

Formal Metadata

Title
The GIL is dead: PyPy-STM
Title of Series
Part Number
112
Number of Parts
173
Author
Rigo, Armin
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Identifiers
Publisher
EuroPython
Release Date
2015
Language
English
Production Place
Bilbao, Euskadi, Spain

Content Metadata

Subject Area
Abstract
Armin Rigo - The GIL is dead: PyPy-STM Take a big, non-multithreaded program, and run in on multiple cores! PyPy, the Python implementation written in Python, experimentally supports Software Transactional Memory (STM). It runs without the Global Interpreter Lock (GIL). The strength of STM is not only to remove the GIL, but to also enable a novel use of multithreading, inheritently safe, and more useful in the general case than other approaches like OpenMP. The main news from last year's presentation is that there is now a way to get reports about the "STM conflicts", which is essential to go past toy examples. With it, you can incrementally remove conflicts from large code bases until you see a benefit from PyPy-STM. The goal of the talk is to give several concrete examples of doing that.
Keywords
EuroPython Conference
EP 2015
EuroPython 2015
Loading...
Laptop Computer programming Read-only memory Database transaction Building Randomization Just-in-Time-Compiler Thread (computing) Observational study Code Dot product Multiplication sign Source code Mereology 2 (number) Formal language Revision control Read-only memory Interpreter (computing) Software Central processing unit Regular expression Condition number Metropolitan area network Computer icon Zoom lens Process (computing) Computer Binary file Benchmark Maxima and minima Computer animation Software Personal digital assistant Order (biology) Module (mathematics) Personal area network Iteration Game theory Data type Resultant
Point (geometry) Randomization Algorithm Point (geometry) Ext functor Perspective (visual) Functional (mathematics) Random graph Maxima and minima Graph theory Computer animation Vertex (graph theory) Statistics Vertex (graph theory) Hex map Queue (abstract data type) Resultant
Point (geometry) Computer programming Thread (computing) Point (geometry) Parallel port Ext functor Theory Number Maxima and minima Particle system Loop (music) Computer animation Personal digital assistant Queue (abstract data type) Statistics Right angle Queue (abstract data type) Logische Programmiersprache
Computer animation Transformation (genetics) Logarithm Real number Software testing Logische Programmiersprache
Computer programming State observer Length of stay Computer animation Demo (music) Lecture/Conference Real number Logarithm Ring (mathematics) Software testing Units of measurement Number
Point (geometry) System call Line (geometry) Computer file Cloud computing Inclusion map Computer animation Lecture/Conference Software testing Interrupt <Informatik> Queue (abstract data type) Hacker (term) Task (computing) Form (programming)
Event horizon Computer animation Lecture/Conference Personal digital assistant Server (computing) Projective plane Software testing Water vapor Line (geometry)
Point (geometry) Standard deviation Computer programming Thread (computing) Server (computing) Multiplication sign Virtual machine Function (mathematics) System call Machine code Number Table (information) Category of being Event horizon Computer animation Bipartite graph Operating system output Central processing unit Resultant Asynchronous Transfer Mode
Point (geometry) Web page Thread (computing) Code View (database) Letterpress printing Line (geometry) Function (mathematics) Event horizon Computer animation Core dump Information systems output Statistics output
Standard deviation Computer programming Read-only memory Scanning tunneling microscope Electronic mailing list Mereology Mereology Computer animation Lattice (order) Read-only memory Software Cuboid Subtraction
Database transaction Read-only memory Dialect Multiplication sign Function (mathematics) Parallel port Computer Computer animation Database output Pattern language Right angle Object (grammar)
Read-only memory Thread (computing) Logarithm Consistency View (database) Interactive television Parallel port Electronic mailing list Parallel port Computer animation Software Phase transition Right angle Information systems Game theory Condition number Flag
Greatest element Standard deviation Logarithm Interactive television Electronic mailing list Parallel port Bit Revision control Computer animation Database Boom (sailing) Vertex (graph theory) Flag Selectivity (electronic) Object (grammar) Physical system
Bytecode Computer programming Database transaction Context awareness Thread (computing) Code Scientific modelling Mathematical singularity Parameter (computer programming) Function (mathematics) Food energy Force Single-precision floating-point format Energy level Selectivity (electronic) Metropolitan area network Dialect Block (periodic table) Computer Interface (computing) Parallel port Ext functor Bit Port scanner Functional (mathematics) Computer animation Interpreter (computing) output Right angle File viewer Block (periodic table) Film editing Data management
Computer animation Sound effect
Point (geometry) Database transaction Read-only memory Multiplication sign Rollback (data management) Letterpress printing Sound effect Mereology Matrix (mathematics) Computer animation Statement (computer science) Pattern language Physical system
Adventure game Default (computer science) Computer programming Video game Thread (computing) Computer animation Branch (computer science) Mereology
Point (geometry) Robot Machine code Red Hat Computer animation Lecture/Conference Personal digital assistant Logic Whiteboard Estimator Modem
so was idea is then or that for that pretends to would let by sentence exactly the gives a globally in the but the kids so and if you have been up to produced OK in this very room you know already that the what's a globally in there but the lot is when I assume a lot of you do if you don't end it is simply a lot but it in in the user's Sebastian or pipeline condition of Python it is 1 of the GlobalAlloc that's needs to be acquired in order to run any Python code so python code can run only if it has a low which means if you are writing a Python program using multiple threads on all of these threads are trying to do some CPU intensive computation then actually only 1 of them we run at the time this is the basic story on these have been the case since so well since for ever since 19 2 in the government for inferring by so here and the game of type II with his pipe-like by plays in other atom edition don't to the Python language uh it was started 10 years ago I i it has better performance on most of the time because it has a well uh just-in-time compiler visiting tried his great not STM means software transactional memory what do I mean by that please see the part 2 of my talk and no type that is the M is an eternity of 2 pi pi so it means that you need to use another version of by type but these are the relative does not have the building governing that but the lock up to point so let's study with an example this is the result of remained I rented before on my laptop so it is uh reach returns which it's some random benchmarks and CPU-intensive benchmark uh here around it's like a total 10 thousand iterations so divided in 4 Fred so here it's just naively using threads but starts new friends in order to create full-fledged each of these 4 fragile and for 2 thousand 500 iterations OK I'm really see how much time it takes this is the time as of 8 . 0 1 1 is how long it takes on by applying the regular by applying with a given without the give takes only 2 . 5 seconds on this laptop which is to use last fall coral up them so it's good this is an example where it works the great a simple yet OK so when I mean of you the will 3 of this is such an example as returns when he is my goal was only need to run for returns example for each of the benchmarking college and which I could run for processes that's kind of obvious a OK and and if you if you are into running subprocess maybe you are you are already using the military processing modules in Python which is basically just running some processes so was advantage running sub sub Python processes is that each process has its own nobody but the looks so if you're not using and to be In this case not using threads at all so it does not matter but is that when of course a big drawback is that if you're renting the different processes and then you when you get the time needs to when you have different presence alone you need to pass that around which is well it's can be hard medicate you can you that you can take an existing began complicated program at just turning into a emitted processing programs that does not too at usually so a by by oppositional by ASTM is about to be reading it phrase in 1 program on this idea that you have shared data and we can use the shared data which he is both good and bad so here is an example where I want to look into the source code so
all of this into an example where create a random graph so it's about here um creates my list of that this is said to have edges subtask from a random vertex to another random vertex on user goal here of this algorithm is defines a tool
pointing this random graph which uh as far as far from each other specific so on which is a specific problem which I'm sure has because complex houses has nice solutions from a graph theory together perspective that here here 3 that was the most naive algorithm that take 1 point enumerate all whatever points that I can reach etc. so so it means we try from every single point to reach all of the points so this is a function that that's given 1 point find furthest point here and here is how however and here's here's how I how I look for for for for the answer for all the starting point I put in a few the starting point and then I get results from the q on the my Q
is forget the point compute what is the furthest and puts the furthest back in the queue on white wine I'm talking what q because because and want defended program so what do I do here yes so I have a loop that starts some number of threads gets me to Q is This is mostly some Python right if you ignore the fact that the queues are from the 2 elections module you can think of them as cues from the Pew module its stomach Python but the point that if you run this in the summertime and then you don't get any benefit from from these the defendant this is different than when Cox basically
OK well here it here you will be the case that there is a in the program is unable to be run on on with the preferred in parallel when all or it should in theory but analysts this particle example which I which I draw which are quickly did yesterday does not actually showing the huge speedups 2 but you can guess that I know it's a bad example dedicated and you seen before and as the recent afterwords there other examples that are larger programs and for this logic programs it seems to work better which is good so it's
so they can bring to assume that you trust me and I will show you now never that logic programming which I did the exact same transformation
and you so all
these pieces of program the that 2 wires already around 2011 I think where at the observer is the same demo so it is
a 3 and there entirely by
some on but a 0 . 9 is the number of units per se cond because of some she doesn't think takes forever ability but you can see changing on things who said if I really
don't but I will too
much faster but there is great because the that 2011 and that's now let's think let let's around the surrender on only the friends from the point of the doped here now but when you see
that it's somewhat store the 1st displayed few
and so we use just the pubescent rendering system it's it's the you can see that it's not just calling opened yellow or whatever because it's using the cylindrical projections the users of the water here and it's not a straight line it just just a random comment the case so
what doesn't work OK now what they mean if I had had in different the programming advice and when these is what what it's this is a summary of a normal Python so the Python by playing we've out as and this is a summary of what you can expect book from the table from the from the mid different program when if you look well you know you need to divide your friends into categories you have just the feathered that input output so that typically typically blocked in some operating system call like fighter read or or receding from the subject of doing things like that and you have on the other hand the kind of threat that is uh CPU-intensive that points to compute something so was uh install not we see that the in the decay you get these results so any you can have any number of threads doing I I all that finds the books great but you can have only 1 Fred's that's doing CPU-intensive competition of dying so now if you if instead using Pakistani immediately slightly strange results when you can have any number of input of threads OK and instead of 1 you can have up to N friends that are doing intensive things where is the number N is actually compiled into the bipartite the but there is still so that that means for example this year and using about STM with any goddess in about 4 I think that when you can have 1 with an eagle 8 dedicated the you we get 1 with any criterion is the number of calls or on your machine of all right and low a number larger than the number of calls to machine would work as well the support point the went so far so good however that of user the problem with this approach so the approach in their destiny that only 1 thread at a time can be switching between these 2 modes what I mean it so I have an
example you this is a
completely trivial example I have my runs threads which with the print I comment about the it's purely computing things on the view is our and these on IST and it would have been used for good however if I these I mean by best then things get slower because because each thread is trying to to be is a friend's wedding it's basically because is is switching from I need to compute stuff like this line here do I need to do input output which is which is outlined here so this is the main drawback basically this is point to look for when you are using but you need to find answers for any fixed good-quality somehow always on on on in general yes it it can occur for example if if these if doing some competition some complicated competition like rendering of a Web page of of doing so is cost of that but then in the middle you get to the longer to write so so the when In this situation you need you need to intervene and refactor your code so that's the core to the would be would be moved somewhere else of so these years
and to to conclude the 1st part of my talk with his biodiversity and you get the compatibility with some Baffin well as much by itself is compatible with the Python you get meet brand new different program that had the run out of the book on the when still if you beds that straight out None a promise that at the beginning that I would explain what's STM actually means something lecture memory when what is just
elections when this look at lists look at it this way if you have a program and think about how the given the government of at
dialog work pattern for example or in and subdued is acquired the neuron stuff then you really is again on on when when you really is again you you do some input output maybe or maybe not and then you acquired again so 1 transaction what what I'm going to college election is the amount of time between acquires a bill on the next use again at why is it called
election because that's actually very after database elections why is senior because well what is transactional about it is is is a computer memory your objects you have to look at you use a computer memory used by your program and think these is the database or as bad thing about it as a as a regular that these what you have in a regular that the is you you started to election you do some regions you do some right
rights as long as you do read and write you have a consistent view on at end you tried to commit islands these comments may work or they may make goes anabolic so we mean the the Committee phase the commits Phase if someone else did in parallel and never to election answers a lot of of interactions happened to change subjects that you have already FIL vs conditions is exactly the same 1 as the 1 used in software to memory so so this means that you are and so let me draw
pictures in the air when you when you have the normal survive your and 1 2 1 thread than imposing the neuron another Fred opposing the neuron the game the 1st Fred and so on and so on so we this selectional approach instead you start running both friends 1 commits OK that 1 continues than the other 1 tries to commit and then it to make a versus succeed and committing of failure if there was a conflict and if there was no
conflict then you are happy then is a tool for the door friends as East say where they had been executed 1 after the other and when fixed are unhappy and you throw away what you did and you were studied so so what when difference with with the standard selection system of databases such here so this of throwing away and trying but is completely transparent you don't see it as and at the bottom of yes so will here is how he talks under the hood made illegal bit fast here so that they basically during interaction you flag all objects that you read and you recalled the least of all objects that you right and when you can meet you will you you see is that list of objects that you wrote into some node so there is a growing log and when and basically you have the uh
is relatively simple to describe accurately to to a to know that that's when signal if you have a conflict and not to achieve which indicate if you have threatened objects that had never chal election has written in parallel so concurrently it's
uh so uh as said about images of transparent but that also means that that also means that you can do well so as a input output that if before Fred is really trying to doing for that but obviously or when she wanted do input output you cannot be cancers on and we tried so 1 21 transaction can be flagged as new now the selection actually did the input output so the start and here cannot aboard anymore yep the unwell so when everything the described so far is basically something that lot and that looks a bit like what's Larry hustings for a previously explained in his talk to how had to make by the interpreter actually not have a deal however this approach so that means that I'm using here STM is better I think that's the approach of of adding looks a bit everywhere because it it also when he eat did it there was another got style of programming work done I mean that when you know that there will be no switch to another thread in the middle of a byte code but but vis a vis you can actually add this more hot with had a and we see this you can make it so that you can make sure that there is no switch to another thread in a larger region so this is actually gives uh an interesting model to to do it to programming we need the best friends you can make programs that have actually large regions on these larger region you would just by saying we suddenly you make sure that there is no possibility this that that there is no perceivest which when you make sure that the so called block of Python code brands in 1 collection so what it means is that you can have have actually this kind of interface when these is just a single thread pool right you you add you add some function that will be called later in another thread some some argument you city the other and and the function there is always a function that you have added are executed at that point in with the best friends however however these runs then we we set at to meeting that could uh context manager that I described before so it means that it's gives exactly the impression to the program that each of the functions has turned completely mean in serialized so I mean even for so various functions are run in parallel the transactions as attractions that are going here under the hood our which is large enough to contain completely so the function goes we did tells when which which is basically it's gives you gives an incredible level in it gives it gives well EDS and be turning out of the and here that the weather whatever what I'm trying to say is that it is up to the is is the model that I think is kind of a future or from a different programming at his an adhesive you don't eat your final generally that you know of any if your fine with
not getting to the last last and person of the 4 months or something like that this is a very nice model to program I means there is there is there might 3 D made treaty viewer that I did before it uh is it's just paralyzed by the so you compute cut on by columns that each column you compute you send it to another thread 3 trent tools that computes columns but each column computation is done in with Atomique block which means always allergic to compute 1 column even if the energy depends on some global variable or does things like age and whatever it will work if we want the same way you don't get any race gone at the level of OK so yes and that yeah
that's what sentences and then must leave the room so I will be outside after let's just in I
think you you looked around the effect of the Princeton
and tried because while you cannot make a rollback than there because transactional memory acts as you said on some some memory matrix or whatever but when you have an effect like the print statement that cannot back yes parts of pattern has no effect system something like that so it's effects are not annotated so whole whole of the pipette and knows when when it detects effect something which can replace the all all time when it's when when the
son of firefighter would really isn't given he adds up to a point you know that something strange is going to occur I mean is all that not the complete answer I can give you a more precise answer that that's roughly when the the single
question would you the marriage that a pi pi is the and to the main as branch some some day or in part by a grant dead by the end because I think it's is a branch of the life I guess so will become the default in my might someday or would be all these arises adventures exam question
I don't know so far I mean I mean this is has a program that this between 25 and 40 % lower on a single thread so so again same 0 world I don't know really cute but when he she we are a biface it's possible to think about more advanced ways where the by payment of which between
boards and other logic compliant machine codes for so honesty and modem for that estimate of something so so it's possible that at some point is going to be the case yes thank you
Loading...
Feedback

Timings

  725 ms - page object

Version

AV-Portal 3.10.1 (444c3c2f7be8b8a4b766f225e37189cd309f0d7f)
hidden