Add to Watchlist

Using All These Cores: Transactional Memory in PyPy


Citation of segment
Embed Code
Purchasing a DVD Cite video

Formal Metadata

Title Using All These Cores: Transactional Memory in PyPy
Title of Series EuroPython 2014
Part Number 74
Number of Parts 120
Author Rigo, Armin
License CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
DOI 10.5446/19944
Publisher EuroPython
Release Date 2014
Language English
Production Place Berlin

Content Metadata

Subject Area Computer Science
Abstract Armin Rigo - Using All These Cores: Transactional Memory in PyPy PyPy, the Python implementation written in Python, experimentally supports Transactional Memory (TM). The strength of TM is to enable a novel use of multithreading, inheritently safe, and not limited to special use cases like other approaches. This talk will focus on how it works under the hood. ----- PyPy is a fast alternative Python implementation. Software Transactional Memory (STM) is a current academic research topic. Put the two together --brew for a couple of years-- and we get a version of PyPy that runs on multiple cores, without the infamous Global Interpreter Lock (GIL). The current research is based on a recent new insight that promises to give really good performance. The speed of STM is generally measured by two factors: the ability to scale with the number of CPUs, and the amount of overhead when compared with other approaches in a single CPU (in this case, with the regular PyPy with the GIL). Scaling is not really a problem here, but single-CPU performance is --or used to be. This new approach gives a single-threaded overhead that should be very low, maybe 20%, which would definitely be news for STM systems. Right now (February 2014) we are still implementing it, so we cannot give final numbers yet, but early results on a small interpreter for a custom language are around 15%. This looks like a deal-changer for STM. In the talk, I will describe our progress, hopefully along with real numbers and demos. I will then dive under the hood of PyPy to give an idea about how it works. I will conclude with a picture of how the future of multi-threaded programming might looks like, for high-level languages like Python. I will also mention CPython: how hard (or not) it would be to change the CPython source code to use the same approach.
Keywords EuroPython Conference
EP 2014
EuroPython 2014
but nitrogen use coming this conceptual using all the scores that we thing Clyde challenge lots of where is 1 of the Chavez people on the set of his the fact thank you so much of the dog about
by STM which is software to lecture on memory
and well what it actually means is something that we become clear I hope during the school this stole so it is a bit of metadata that the ideas and on-going research project and it's done mostly by to people mediums PhoneMail who is that ETH considering um it is it is a product that have been helped a lot by pretending so we got some almost 30 thousands dollars those 3 years of the project so from Q for everybody who did contribute and it's a project that started that your Python you have like told so maybe a few of you would remember how I was there and I presented in 5 minutes how it would work so yes this is the result of 2 years 3 years later in the so the 1st question why is there a given a globally and that not when you're in your Python code when I mean it came as the historical reasons but now it's really deeply into Z. is the language basically allows us about implementation in might survive and start another single program and then when in easiest way to work to challenge the program to make it run on the visible presence just with the local around everything right you have 1 looks at this course but a lot and it needs to be acquired in order to run any piece of code so the result is that yes you can have music friends in Python cool you that's and being useful qualities you use them for concurrency resonance difference between these 2 terms means that you can use the fact that you have several friends that we're an independent pieces of code on to some extent coding and things like that but they're not actually running in parallel so this was really because it's 1 of the easiest ways to to China's interpreter and then when so it is also very very easy for reference counting of all these this kind of internal where we don't have to care physically so this has positive and negative consequences a consequence when it's simple through its include forcing them and the but but it also has some consequences of having the ability for for you for the user of Python is consequences of some operations are atomic for example start up and or dictionary setting an item into a dictionary or or doing this kind of thing you know that even if your MIT program than other friends that would also try to eat to do whatever in just the same dictionary for example is not going to mess up the internet state of dictionary and that would lead to prompted to crashes it says that no he's done a lot of current by auditory those are volatile and basically key is what I would say here and if you don't know what to volatile means it's the 55 now it has negative consequences of course a complete your views 1 is that you don't have already selection you cannot try the power of the programming and another 1 our problem by but there is another consequence is that the the government exists in these fundamental that basically but it's not exposed to the applications which means that you you get you get some of the benefits like the atomicity of some basic operations east of the carrier setting of this Council but you don't get when you don't get larger scale atomicity are when you you will use to basically needs even in your Python program so you don't and then order the heart of looks like like and then logs and everything so How do we remove using this when 3 approaches I think the 1st is called fine-grained looking the 2nd share nothing on the 3rd 1 which I'm going to present is from lecture on memory so a bit of contrast what are these 3 approaches a fine-grained looking his when I we have this interpreter which has there reduced so that it is the duty on instead of scrambling to put very fine-grained rocks on each individual objects right so we use a lot of work for for the guy who is implementing by having no object of dating back and basically when on and also if you before to mean what's the bad and there are dozens of nasty issues like for the reference counting etc. because I now you can have several several process of really important and try to data reference count of the same text so what is the most of the atom the dates and on grammar but had to negotiate source to 1 versus all wasn't interested from his optimism in terms of issues as the point is that it's not approach that actually exists in Jonathan I by is outgoing it's really cute but not when something seems and say as a means of Dyson maintainers on on our own government dealers and initiating a lot from the fact that Sergei and adults and that block for has really would support for find that it is not for for example is a set data and that's what we have the dude look removal known any object that it doesn't concern somehow proved you cannot escape so current friends and even even in the cases where it cannot prove things so ordering kind of crazy things like said find different cases 4 groups 1st case object very quickly and then if you go on you going into the store and strong cases EasyChair when they designed and crazy things it's like that is fine but when use the and neat application of interlocking he does not sort of right if you're writing a Python program under running it on top of dies that saying but is still need to carefully use the threads on the rocks and everything in your Python program so this is this is why there is a completely different
approach that that has some traction nowadays because also because it can be used on the button is just another thing is there is some properties that if you want to run so summer piece of code in when we the cause a new machine you just start several processes and then the process is when you need to you need to design your programs in such a way that it's possible to exchange some amount of data that is not too much are when it's indeed is actually a model that that's when some people think it is a good idea or not enough agreed force 4th there are individuals for each it's a good idea it's gives a clean model of the different and randomness mystique program you don't have the issue of locks when you and then when negative points that's when you have limitations on what kind of data can exchange between processes convert your actually overheads for extensions of that the no around and also sort compatibility with an existing Fred application like is still good mother that's not 100 per cent solution clear so this is now what I'm talking about the and this is where the selection of memory is but it's a way to run to run is interpreted as if it had been governing cannot so this is the main difference is that instead of blocking so when you have several of them when you have a lot of such set friends acquired using means something or threads and you have to wait and only 1 that can proceed but here the difference is that these resources are this kind of used to you can steal your old friends but if you do so you know where when you grow old stress optimistically and then when you need some bookkeeping to check what it's friendly reads and writes and then z you're at some point you need to check it is a actually we did something that conflict them out and so I hope is that in the common case as you do not actually get a conflict and in that case and everything works fine and you succeeded in running during your Mr. professors in parallel 1 another hopefully rare case of a conflict you need to cancel and restarts 1 of the friends so obvious that this is a very very high level you um measurements and that's just talking about st which is software to election memory series actually ht ends and exist as well hardware to maximum memory that implement and so has CPU actually so the latest generation of Intel processors has HTN um when he was so handsome hybrids this that that there are some some clever some clever implementation of St and that use HTM internally for some things um well most of most of these 3 different solutions are still mostly research only because when it it STM so far has a huge overhead like to do the job so taking cross-checking of memory conflicts and so on these data has typically at least 2 times but more like 4 or 5 or 10 times slower on way to a court uh so and you have ht and hardware collection which is inferior great but that in in practice so far too limited the so far I mean this His current generation is it's into to to to support mean when we tried to write a paper by ht and but it's it could be it gives the results of if you run the transaction for 1 byte code then you have the beat of chance that sometimes it's not too long ago the no OK so is that that and the for giving them most your your St and mt really the software the software product so bad for you here in this slide a wrote easy in court right the Latino explain why season which by election is that easy but to go inside pipeline replacing because you just replace this the place of holding the land acquired and didn't really by respectively started convection stuff that election is the now in the hands of but this had directly write it as a as a library in the right order this is the amount estimate Hamming codes so and this is the the point I'm making here is that if you actually think about it it if he given get acquire and the reason such also actually something does not completely trivial acquiring a lot acquiring the is done with the library that is for example on Linux so that the thread library and the different libraries itself is also a bit crazy I need a lot is that yes OK part of the would look naively yes just 1 moment and a nice puts you often not looked on 1 for yes OK 9 and yes it works but actually it's not how it's done at all in because president to optimize far more using using clever techniques so some so here I'm trying to make the same the same argument it's easy to add correct cause to start and end mention impact that's the hard part is to write a letter OK and when she understands by like the end of my just mentioning these library here is that is the hard part would actually also be used in the back which would be greater than we get survive and to propose school but when in that there is 1 catch that it's what they do what's reference counting that the main can't when this definition of that there are informed hard to missing attention the OK I have
a nice diagram to show how it it works so that baby that is the
not In this diagram I have 2 friends and I want to run things that have no books horizontal books and land if I was if I if running these codes with a normal Python we should that then we would get some of the diagrams with boxes that's ready to Canada in parallel right so that take more time in total here with STM is actually running for but the point is that you are at each thread is Running beats independently by being very careful about what it's changes when what was the audience changes occured kept local so this is what is done on the 1st part of the book that and then at the end of each selection the 2 books we do it in a special favors that's needs this time to be synchronized across board cause we push on changes so that would cause we see that so that the uh the effects of we get from the programmer is as easy as went in this case if we had truly 3 independent pieces of software running 1 of these cases where running you here here you on so basically that 3 is run after was serially just the 1st is the 2nd is the 1st these and it just happens that so that there is a piece that there is a preparation to coming off the transaction occurred before on maybe did did maybe another work before so it so so it means that some of the mother when when when you think about how it works it is due essentially S. sterilization so you still have 1 friend then another friends then is the 1st thread that can so that means that can produce results and set against the so in this sense is exactly the same as the did so the type by STM works the is exactly like the regular part that we do not have any additional issue in India in a new edition races conflicts etc. it going a small demo here I don't know I don't have any reason that the pipeline STM so I will just equal through source of
the is an example of
using as incident at an example of animal that task forces on the commands of the latest proposed so it's not from me if you're is a this is prime function and you want to compute how many of the numbers to 5 million are trying when you don't like this and then it if you want to do the same thing on MIT in the process of when cell
several ways for example you can use processing modules
is then it looked like who not on all the years a few will
just switch another 2 the so
he's using limited processing module and so it's doing the same thing even when interaction so it's really using several processes so we this happens to work because he's prime is simple enough however it's not actually doing the same thing because when because if somewhere in the use type rating by trying the that by find the somewhere there was something about state of something except that it would not work in the same way anymore because but I you have now you run to different processes that gave when
I kind cannot transmit and cannot actually runs 1 time but that it's on the order of well you you have to believe me this letter and about 6 gongs typically it's . to maybe this 1 runs something that some of the next 5 seconds and the reason right so it's not twice as fast and then you can but then you can write a version using multiple threads the barebone threads uh reads starting here import Fred at the bottom starting to friends and then every friends but you have appealed to communicate the ranges and every every frame read from the same cues so get next tranche to do on if we run this on top of a regular pipeline needs gives us something like 80 columns so it means that like 60 gone but we should be more because they're in a bit of overhead on before you run it on top of by STS than we get 4 . 8 columns which is the fastest and described so far which is cool OK now is if you buy the she on teaching decay because yes this example runs faster with to cause already which is great however there is the numbers on on numbers here has been carefully tweaked to show this fact if you changed the face of Morrison's is prevent that but the point is that they are getting there you this is
a good and now what I just showed you say yes you can use multiple threads and each point it's faster good however possesses a main there's a point and trying to push forward in by is not well it's not just that you yes you can use threads on on the nose your armies of locks and divides and flavor the real point is that you can use threads on various costs keep looking you can have 2 transactions utterance of domestically in parallel I mean she artists known you this bank so here is not times the exact same diagram however now every block is no longer just from 1 from the dealer's quiet Geri's at the end now it's from acquires some rocks such as defining application but to really is the stuff that find in your application so it here dignity there is no difference right is just 1 of is just another however the difference here is that you can now start to think about the obligations as it's starting with friends it's using just 1 of the time explicit looks at you imported from different modules but just 1 and everything it does it all your friends whenever it wants to do something it's acquired the stuff right it's something that makes no sense at all of the URI it's something that you wouldn't do we normally because when you why why use present but that's the point is that you can do it and then it can the tried to paralyzed by placed in so it means that when it is as if Ian and pushing forward here says that it's a kind of program that I could foresee could be be done with faced yet if when you you put the you you yes it's still using federal but you you you put them in some corner of your program we just 1 and when you have a completed coarse-grained level basically extremely so I mean an example here is an application that is
traditionally not nifty and don't have each year and the way my hands during away from
all my hands so it is this then always is about uh what the web server running could the twisted could be torn although whatever so it's it is a web server which is not using thread of getting the dogs like like you you get a request for an image is being http request in your processing units and maybe you bring some complicated computation and then you pushing answer so the
point is that the if you just take 1 of these frameworks like what whether you add a thread toward and above so every incoming request you asked the French by pushing something that you you are the threat justify Fed would not be processes this request so beings that Freddie we'd actually processing requests and send on answer back to the mainframe we which another cue for example then if you do that then you can actually run went by by GM can run the program on me to do cause in parallel and the point is that each of these single such in the different words are running by acquiring 1 of 1 so it means that is is different pieces of the year to run 1 after the other so it works just like just like you do not have let but that in this example and this is this is an example where it's we clearly mostly the and in most cases we able to run independently from each other remember so yes this is a
summary what I'm trying to say is that based STEM programming model yes it's useful friends and lots of stuff fully compatible with going put and but it's here young not saying it's not everybody should use freedom lots of here and things you should make more use of a threat to library and use only cost cross-grained looking because that's enough sure so bad taste in so yes for being at 3 different and to a different kind of application of our starting mediate you you can't you can't have a MIDI process like interface where were you can use a pool of threads as things every lecture in the medium processing of friends and option instead of a process option actually but some speech pointless because well-defined training meetings in brother and only the but here's the point is that you would have the processing but friends that were an say by party 1 not the and you can you can extend the disadvantage Donald little like annotated explain you can also went it is maybe a bit further down the road but it's already thinking think that if you have a stack less so or enacted Jevons system so it is a system where you have call routines recently but but coroutines tend to do something is independent from each other so you can you can again so you do the same thing this time required really is look around educational 1 at meeting base of the Cone which means from 1 switch to the next for example the so yes this is the end result the gain something that works but continues to to work exactly the same way as in you take your your existing your existing cyclists application and continues to work but unreadable because yes this is the current state when the basics works so so so in best case of 25 to 40 per cent overhead which is much better than originally planned means good enough so that usually with just 2 friends it's already faster so this is overhead of on running only once said and just when everything I just said before what addition of looks walking is in the same way as of the log is actually wrong but it should be that should be now we have a walk around the Which atomic that want explained its temporary now I'm just when there are tons and tons and tons of things to improve the so yes as a summary this approach has the potential to enable currently small CPU-bound committee for the programs or and can be used as a replacement of mental processing on structure is then also be used in applications of are not explicitly written for that that if you have but if you're basically anything that could but eventually you replaced by an accord committee processing but look forward but it's not the orange yes so the benefit is you keep and of coarse-grained however the issue is that you keep the locks coarse-grained and this is where it is something that has actually average user that we thought to leaders we as median median Michael Walker when that's very clear about so far the wheel these works very nicely completely in any case I don't think selection because you you have the issue of you have if you are running things that should be run in parallel but are actually not because for example pieces would increase and about counter this is enough to to make the pieces conflict and then if the conflicts are games foreign the necessary like so we use an example of this is an example of systematic conflict so it means it means that if you take your programs ons apply just as well as said expected to go just in times faster than the 2 may actually not going and faster toward and the reason is firmly because a systematic conflicts OK so so it it's something that we need that's we need to which to develop new do we need a way to find ways constant and figure out over GMM incrementing discovered count and that's not analysts in that way or this way when we started debating wasn't profiling tools that differ only in not done uniquely on August we problem very probably the very much needed so here here is if you want if you want to compare this approach the stumbled the defending approach in a style defending approach everything is fast cool but then things crashes don't look correctly this year at 1st everything is slow but works correctly and then you can improve you can improve by detecting the conflicts on on and suggest that the that's interesting studio works correctly as in works is the same way as if you had only 1 fate which which I think is a very good approach is specially for for which like 5 and yes no yes so far from Governments and mentioned yes so it's not prediction it's still undefined things there is you can download it at least you're as the 1st part of the reason he talks on Unix 64 for known on your inscribed having thank you the right thing to in the book
questions before the next session if you got the question will you got 1 more violence and you only will the mean I'm up when you wars with the following quarters of fine grained can be for example library as a global internally doesn't mean that what is something we want a 2nd would already few this performance penalty or is it still OK
what is the length of the on atomic that that with a rational basis we got here is the you know what the old our home OK so how long can cost of to election around the
point is that in the inspired based image right to enable arbitrary under there is no unit of all well if you if you really if you really do it's tools for that you will be we have the you will be applied for a long time along way so is there not important for the program and the duty of those of the lower bound on the size in dual is too weak to get the best performance is to be taken to be long enough so that so that's making it even shorter would introduce too much of overhead all about that yes this note that the other thing that comes to in the we have a right to know that the overhead is when I can so far because it's a bit like it's in development but that's the power for the use of the overheads should be should be like vectors present and everywhere so yes that's no i mean yes yes is that if we remove some of the whole overhead but ends of these lines is also much shorter so it's hard to improve so it turned out to be some kind of stimulus they are not yes long so why do I need to remove resulting from the base year all areas and the other thing that we did regression offline because this is this asked for the problem of objects on the you is so this is the that is what you see is that the use of so the question is how how do you wrote about elections that such actually had already side effects the point is that that is the point is that the elections should not have side and that's that is actually that it's very nice in the model so that because if you're going to have a side effect like right file you you know you really is the deal so she it means you enterprises selection and you do you do with actually writing outside elections and in a way that the other might William and the hassles is T and there are companies such as we try we can take this this intersection and say OK I would like to do this from the beginning to the years it went to expose something like that you know this is the model gives you the the real goal is to have this can internally and not expose that all right to the language of the which of the and that
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation


  469 ms - page object


AV-Portal 3.8.0 (dec2fe8b0ce2e718d55d6f23ab68f0b2424a1f3f)