Logo TIB AV-Portal Logo TIB AV-Portal

Lessons learned in X years of parallel programming

Video in TIB AV-Portal: Lessons learned in X years of parallel programming

Formal Metadata

Lessons learned in X years of parallel programming
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Lessons learned in X years of parallel programming [EuroPython 2017 - Talk - 2017-07-14 - Anfiteatro 2] [Rimini, Italy] There is a lot more to parallel programming in Python than multiprocessing.Pool().map. In this talk I will share some hard-learned knowledge gained in several years of parallel programming. Covered topics will include performance, ways to measure the performance, memory occupation, data transfer and ways to reduce the data transfer, how to debug parallel programs and useful libraries. I will give some practical examples, both in enterprise programming (importing CSV files in a database) and in scientific programming (numerical simulations). The initial part of the talk will be pedagogical, advocating the convenience of parallel programming in the small (i.e. in single machine environment); the second part will be more advanced and will touch a few things to know when writing parallel programs for medium-sized clusters. I will also briefly discuss the compatibility layer that we have developed at GEM to be independent from the underlying parallelization technology (multiprocessing, concurrent.futures, celery, ipyparallel, grid engine...)
Simulation Multiplication sign Computer simulation Water vapor Parallel port Computer programming Perspective (visual) Computer programming Neuroinformatik Type theory Loop (music) Radius Order (biology) Social class
Slide rule Dependent and independent variables Mapping Weight Surface Gender Projective plane Source code 3 (number) Number Software Integral domain Pattern language
Laptop Scripting language Line (geometry) View (database) Multiplication sign Water vapor Mereology Focus (optics) Machine code Scalability Supercomputer Web 2.0 Plug-in (computing) Computing platform User interface Mapping Projective plane Java applet Open source Expert system Mathematical analysis Interactive television Planning Staff (military) Mereology Instance (computer science) Scalability Data mining Word Visualization (computer graphics) Data storage device Revision control Resultant
Laptop Point (geometry) Multiplication sign View (database) Database Price index Water vapor Insertion loss Parameter (computer programming) Mass Parallel port Mereology Machine code 2 (number) Sequence Bit rate Causality Computer configuration Software testing Data structure MiniDisc Information security Domain name Parallel computing Computer file Projective plane Sampling (statistics) Planning Computer simulation Variance Parallel port Database Bit Computer programming Radical (chemistry) Process (computing) Computer configuration Video game Pattern language
Arithmetic mean System call 2 (number)
Context awareness Thread (computing) Workstation <Musikinstrument> Source code Archaeological field survey Database Water vapor Open set Parallel port Data dictionary Estimator Methodenbank Ubiquitous computing Different (Kate Ryan album) Office suite Error message Physical system God Social class Mapping Relational database Stress (mechanics) Computer simulation Lattice (order) Instance (computer science) Complete metric space Virtual machine Digital photography Arithmetic mean Process (computing) Data storage device Order (biology) Phase transition Software framework Cycle (graph theory) Domain theory Calculation Number Latent heat Computer hardware Communications protocol Metropolitan area network Standard deviation Distribution (mathematics) Information Weight Physical law Heat transfer Core dump Line (geometry) Cartesian coordinate system System call Software Personal digital assistant Function (mathematics) Revision control Collision Table (information) Family Window Library (computing) Greatest element Length Multiplication sign Decision theory 1 (number) Set (mathematics) Insertion loss Parameter (computer programming) Mereology Food energy Machine code Leak Semiconductor memory Matrix (mathematics) Software framework Process (computing) Library (computing) File format Computer file Point (geometry) Moment (mathematics) Thread (computing) Open set Connected space Vector space Website Configuration space Right angle Task (computing) Resultant Functional (mathematics) Implementation Mapping Service (economics) Divisor Virtual machine Cursor (computers) Heat transfer Metadata Wave packet 2 (number) Cross-correlation Causality Root Read-only memory Software Task (computing) Default (computer science) Addition Coalition Parallel computing Forcing (mathematics) Computer hardware Customer relationship management Calculation Object (grammar)
Web 2.0 Divisor Software framework Software framework
Context awareness INTEGRAL Multiplication sign Set (mathematics) Database Water vapor Instance (computer science) Parameter (computer programming) Independence (probability theory) Medical imaging Estimator Semiconductor memory Object (grammar) Entropie <Informationstheorie> Physical system Social class Algorithm Mapping Relational database Concurrency (computer science) Sampling (statistics) Virtual machine Connected space Arithmetic mean Process (computing) Chain Right angle Quicksort Physical system Resultant Spacetime Point (geometry) Laptop Functional (mathematics) Algorithm Calculation Virtual machine Control flow Inclined plane Distance Protein Event horizon Number Wave packet Supercomputer Architecture Read-only memory Software testing Computer-assisted translation Traffic reporting Task (computing) Domain name Information Parallel computing Heat transfer Planning Database Line (geometry) Personal digital assistant
thank you very much always to be Europe I actually am along the time means that as that of the 15 years since the
2002 that the radius of the X this and the type of here this xn multitude of many years of doing it by program and order I learned about the perspective is more than a moralist any of water company that we had last the so I was filing computing but was not real and not consider was really really about individuals who had just that I was just like a sequential order the classes were doing was going out for me mn allow in the recent years let's say 5 years ago I been a working for this Foundation which is a lot of the model and we talk about briefly and here I'm really changed all day I live out of our eyes the into the loop you to to simulate earthquakes things like that and so I'm this last year sigh really been doing computing lot so and some lessons that we like to share with you so this is a feature that shows a lot of what we produce at the end and this is a
map of the edge weights that the make up this in the next 10 thousand years in California around surfaces of the minus so things so we do this kind of all plots also we do it estimates of the damage that these ways and reviews and you have the have the number of all source and this you see at the there are 2 reasons why I'm showing this slide what the response source and last morning so I here even because of the sponsor things and then in the 2nd reason is that even you you you can be a sponsor because OK UCB names like you see the bigger because the insurance companies so or you see stays like an O star patterns Yolanda Thailand's around the United Kingdom ecology and running of just on several countries and the USA these given us a lot of money because also young project you 3rd world countries and all designing our Africa we are we are also working in South America in Central America he gives you that is very difficult to find a place in the world
where we don't do not only 1 of the big names here but then say that you can collaborate with gender even if you don't you us money but if you want to give us the and all free access to computing resources sources having fairly will give us a uh software licensees or or not were tested you get you can add to it you can continue this is no profit of the foundation so it it is also I tell would the because you contributed to the people in the way
so this was my interaction and based on and work on European Grid engine which is our detention and Jim Irving jam is not only about the ending that out of the other things that we do we understand the the staff of ICM and DIG guy of the German inside and I am scientist and the physicist but and you do and do it or you have to wait so I'm not an expert in which these don't ask questions about the the but I'm saying that we are other things that we do and did there are some colleagues of mine here too uh you can recognize them at the store and then all you know it might hold things with and there are other things that we do for instance of the view that the guy here my colleague is doing Visualisation laughter so it is weakening you use plug-in Hooper lots of the result of the entry into the analysis of a work that is after the user much time time it takes to recall that in that endeavor not only that these work in the platform because also we have a web platform based on June although when you can share maps downloaded data and things that so the in the engine the user much of the project and we say the stuff 2009 National 8 years what I have been working on that uh more rest for years and 4 years after a unified and so is that the before my time the so it's also that would last words all is on the top of the user at the end of the season Jenkinson user we support quite to people we also have to actually have a plan to develop the support of items to next year there there also was more about which is the web interfaces and part of the Web was really more really most of the conventional path and we focus a lot on the scalability of the same gene what's on their ask very by Wasserman outcome what's on your laptop water what they show what this ever more stomach last and that we believe is also awesome supercomputer but we don't have access to so it was something you don't so if you want to start the quotation we last there again this was the interaction now we
start to do the I this is not
my life so because a lot of projects
the shoes so I accept the fact that
I did what I what I wanted to show a damn about us this is not my laptop was given the so this talk will be not will say nothing about perfect 6 weeks you want to be more general because expect here people on most of you are parameters and not the necessary the worst thing in the numerical simulations and my point here that part of the use of all even to people that this noting that in these use what they have not we met of this is the reason why is very useful because these then the problems that just call embarrassing quality problems that extend almost up at any time you have some data you want to proper process some data a test to the daytime chance implied the same argument which chat that's apparently the and there is also a good support the by the water so them so it can be even easier than the non need to be a bit of society and there are also attempting the points that make this fact to so I would you in a motivating example here it reduces book is also comes from a year life because before coming to inject into jam was looking for a finance company and I was a mechanical engineers and needed several teams among them I was in charge of importing data in our posters of the base of the data file CC fights that structures so there were called 42 the financial security the data the price of the rating variance in the problem that we have there thousand flies and the thousands of different uh options on 1 of answers to the use of this kinds and the way I whether going to import terminal I more than 60 million or roles and causes or something like that and they should you this was an issue because the the view that the process of everywhere importing the new data and then computed the prices so that the material from a fashion and that during the nite so the nite is the 8 hours a nite if you can importance in loss of was and the and 5 hours of political mutation but differed important is to lot you don't have enough time to computational you cannot provide new users that to with the ultimately want so I would we have this problem so I was thinking I we can speed up with the importer and 1 a day I wanted you would be here OK it would be so you know this important parallel but I was not very convinced that this was a was with India because at the end you know that you have 1 piece of it is important the same databases will know shall have more sophisticated things so I would not convinced that the pattern was going to when on the other hand the you know that phosphorous is able to make use all of the process so that you have a whole we have established several processor like with them so that let's have a look do at some expense something connecting can get so uh the the final mass is is that I'm going to give you is that actually this is looks like an ideal bound problem actually the patterns and so the sound and this is then all I wanted to lower the them or that they want to believe that it was producing the we 5 105 this it's Pfizer thin thousand roles and that our morris 5 million arose black the domain of my ceases to import this is the example actually had a much bigger thing but these sample I can put it on my left and look at it and I'm not being any magic here except that I'm not the originality indices because you know probably a lot of water and the this kind of policies that the best way if you want to import stuff just dropping this is important and and then installed in this and maybe historian in this it takes more time than the so there are much time think it with data we import 5 million of roles OK a prices on this laughter which is very well the laptop is actually disease and all that is being part of the plan it is not the stations and so the yeah the idea was to do do them all here but I don't know them in his hand and on that on the bit of is not mine so but I can you the answer that you waiting can't find me eurozone off 5 hours find the notes 5 seconds can you raise your hands of almighty then 5 hours it who says that allow
us to had the money was saved by mean of it who says at 5 seconds sequentially sometime I think but actually arrive because it takes essential amino on on this laughter which is really impressive 55 million euros in 1 when the really impressed so it is faster than the in a sequential this is the call them using to use and of course the museum a copy from which is the right way to do if you have to import the Sicily the and 1 linking
sequentially in this example and I'm
thinking of much time it takes are so that we have the 1st line here right away all of this and that this could be noticed that and that is a final and it is a creative attention like the the table or a and then I look inside this that where I have my data so I knew what the services finds salute all all of this is the advice and I quote with his mother function which is not just doing the the couple from uh quarreling it is then so up for each phylum I'm starting again the pis that come on the so this is not the most efficient way you can imagine you invite but still if you think that the stuff about time the splendid it is more defined meeting of but sites in this way you can do it in OK now suppose I want to do this in bad to see if it is an improvement on rock so paralyzed is extremely easy you know that the fight announced on the library as a and would processing module well from which is very sensitive to the border and that it is a common warm-up that I assume that most of you already know knows we'll everybody somebody does not know about this important not that so everybody has it so it is easier and maybe you don't know other than was a process and that is also done in module and if you use this modules you get that I can realize that the were process school so you avoided the independent you all look the associated a pool of processes you just use it all lot of friends in this case is fine and then and the processes of the Boston DS Glenn process so you can let me and much dates he states 21 seconds so 3 times faster on this machine this is machine which is a local process so the exact training so in they look like a 4 per cent so but as you have done these kinds of things you know that uh the outer layers of the important ones so but the fact that I should get up the more if this was dominated by so if you should get a factor of 2 speedup of up to factor of 4 that part of 3 so that is some these things here there's some some things are limited by the use something so I know that this summer CP but something which is not and you can do a lot of exercises of this kind of use of exercise that I'm suggesting that you think about the so what happens if you have to have shorter 65 something deep into the fighters flows more that this does not time to start the Escuela experimental common that make you performance so many things that you want to use cycle each year and accessibility of pervasive from fighter and then you have a copy for almost encyclopedias that then you can think about if I do this concurrently others it work because of the the 2nd the causes of the possibility of instance you want connection for the spread of 1 course which step which is the most efficient or whatever so use of process this is that the thread how must this depends on the hardware because I was very surprised at this for the lot of lot of takes 1 mean up enough in my office I have a new workstation with 10 cost many powerful 8 takes 80 seconds so is lower the machine and this ball up so that kind of thing things that may happen and also the yeah if you try something when 1 on the software that is very slow using the obvious these approaches would fail maybe after 6 months you for the rest of the same software of the years and then immediately becomes faster what you thought was also uh is radically and what do you really need need absolutely need to do is to measure sure you need to have a way to manage your stress domain theory according to specific use cases I see of of these depend on these kinds of support so this is the most important and I want to show you how we do that and the open quake libraries that are just to give you a suggestion on our the monitoring we do with what than we do we what I'm not saying this is the best monitoring in the water legs in order for what we going so essentially uh you have the importer monitor object to which is the context manager and we've monitoring monitoring piece of order this will time out the time just after the end of it on the call those so we measure the men would be shown not much evidence stating that vocal quality uh we have a star map all gas so that actually you use like the this them up so that you know and you just lost a mark on the loss of all this time and that is used a map entitled 3 what essentially you map out that a function only least of pupils the C so if user be community need Warsaw and so and the bullets the this this is it just a year out there who are expected way the course and the line we can use various processes we can use a little use any kind of mock most a lot of time kind of different of it's about this way is homogeneous depending on the configuration model or a new before you can use 1 1 way or the other and also we have we would like very much to the the F a file format which is really really good if you have if you need i performance of the other race and lots of numbers to store idea we really like it to the fire for so we store the performance information sigh DH if in the past is still there if inside of the so now we of uh so they did a function that you use that as the India Jobling what data is the same as before except that is in addition that went to the last document is the monitoring of you won't believe this is the estimation instrumentation thought then use the system monitor would you give to a monitor work at the bottom here it find find when you will store the results that the have a convention that there is a vector your whole move which is what you data and then you would generate that fine with a number like god 1 it's defined on to which is if every time you run the correlation producing a fight and these are the uh the arguments that you want to pass on name of the defined in the sense of the monitor and you the computation and and and really
fast as before because the recession but affordable will use a process would use the underlying in on a single machine you are using the future futures model and the by the photo was spawned by the processes but you can also form processes if you want to this is an environment like but you can use celery you can also say not all that means not distribu of which is really important if you want to divide something because if you have an error you put OQ disability but not but around around the relation polluted by the bad they're inside a single something so that's an important and we have other sources of them an experimental implementation of that course with the danger that uh I have a quality that i've it on my machine that and we don't understand the but in these anyone in any place where you have a kind of map of the Marco Gorici framework we can easily extend so I was saying this is really a lot more than just the use of force from Apertium as the standard the library and and we have comments about where you can see the performance information received for each task along with to go there we can uh and brought to this book to the length of the the time so the fastest C for the processual tasks this is very common problem that we have sometimes 1 task is so small that it too many simple mutation everybody's waiting for that to the and end of another very important thing that you could use you the man should all may need it in my year sending to the right and well matched by some of the sea this is very important when you are in a class this situation because of uh you know that only means so it can happen that and you could coalition phase because the adapter transfers to bigger so that is an error phrase maybe does not fail becomes law you haven't memory issues things like that so you know this kind of information I had a nice thing that the the separates does not require the use of the correlated to test function and knowing the familiar with cellular that there was a framework you typically have to put the weight of these is the need to integrate can you even an artifact from from the model where you do not stop at 6 to the source so you can change from the it is not and of course uh you print we libraries are we we spent several years with the 2 of them so we to be so problem so the foot that is a matter of 1 best face you get to the by taste everything much about properly uh for these long problem if you are on a single machine and the you he led the part would would've assessing the children stay alive so this is the other annoying but we can work around lot about that because that is a library which is this process controlling the roots so what's only knows itself from article in he let the children of the Department of dies uh rather takes the we use that we fought the processes before loading the that because if you have enough I realize some data you 1st redefining the bytes them for automobile we have obtain cost you we consume 50 gigabytes of memory so it's best before before there is also a site functionality in the sense that if you measure how much less memory available if your original 19 80 per cent of available memory and that you don't produce task you died so that things like that but we also found problems their brothers and using for for instance and they have this feature that once you correlation and the task of God's design the orders are kept in memory application so if I am recording at 1 and the gigabytes of data my case is 1 and the you so you saline having these on memory and then you them I so that we have to do some some equal to avoid that essentially touching if I want to make same with book for us is that I think of them is a feature of the intensity is meant for not really for method can the in salary for because of so cavities is unknown was not a decision we at the quarantine the kinds of the possibility of what to laugh at once you know the so that situation when there's no so I don't think it was therefore that's before the but for the moment so and we have other things interesting like and is a sequel like correct and that the Bayes which stores all the metadata for the information standard metadata in that it before the calculations made of data like when the Commission started that when the collision the news that will start the pollution so users so things like and it cannot be restored that of means also and also the because the formation of inch defined we can copy those flies so we can lead to a lot of things and the facilities to convert from but I cannot distinguish 5 object the store but we also things like reading the seismic sources from XML but the latter generic and you can use these as as in uh citizens from the library from by complete XML the right we have uh some that actually this is a relation dictionary which is convenient because we have a dictionary of race so you little matrices surveys than in some of these surveys so regulated its is so that means which are used for so you may think 0 and then how can we go and what the engine and the I recommending that you can use the for the is on of the top of you can install now that is the amount of energy or a Mac Windows Linux that easy but I must coordinate with you that this is the framework and if you know me and you know that not particularly novel phonemic frameworks and the people that like in the key not say which was really wonderful land yeah the following before they use is is really wonderful but for my fear was that whenever the framework was always fighting against the frame of because they needed to do something was slightly different so you know as that of the long so if you have water that you might things so there's so as a set and then I have joking but in the summer so is very good if you write your own framework you you I think you must write your own framework for you don't for replication but think twice
before something that they that that things like this because I don't know any it will the and the web the framework that I think it is would not even 1 of them I like that so I'm in some idea here is to show you the idea I want you to take this idea from the ends you look at the world that the but
is a some factors is very small as a few others so like maybe call steel is is fine and can form if you really want also to to use the chordiogram is about 2 is enough federal federal GPL license and and this is the fermentation you can think of these animals are not sure if interconnection also would not be from that
labeled the distance so now I Wikipedia I want to warn you some lessons that we learned that uh 1 very important landaulette lesson that you have to do the inclination so that is 1 example is would that's nice but uh when you still have you will find problems so always think twice before leaving it to the integral example of like proteins as in this example the that amazing port is totally different if you start from entered database here I was at the center of the base as you start in the database which is already feel that we were not millions of roles that the performance is that if and also you won't the depends the usual my laptop book I have 4 processes therefore connections reports is what's about this point in the need the number of connections in Boston in the social and an amendment to the image 250 seats of 5 on the radio or plan you can't you need to think of with that but in the past that we are all that limited the measured just not what so and maybe even if you under the lead me to do water but believe it is context reaching the conventional resources set and things new economy that is long so don't ask that is more example of this is the 1st thing that very important lesson is that the task is very important Mamelodi when you on decoder thing so you need a lot of memory so you can and is difficult to measure the memories and beating the sold the Ecofin careful about that all the best advice is to use and by race everywhere especially in when you have a uh task the things transformed by government was another surprising thing is that the running out of memory some sense is would because I can have an algorithm which does not enough of memory might take serve forever so my psych these kinds of events is the relation the space where A 1 week 2 weeks or month we we don't know what's happening so is my not battery for the science of my breath a 1 hour so the scientist look at spicy so and this parameter was wrong disorder if to chance so that they we have a plan that will fix it so that might it's important that you give information and also a In wrong and if if simpler and discovered it is better than the current time which is complex and more because if you something which is simpler and faster the even if it is wrong you fix it is simple easy to fix but if you know something which is complex and so it's the you can make it fast because of complex and you don't know what to do so there is really so at the beginning I was doing something to say about the do saying then I we fix all of my I we spend our last days to fix only test that they were and why would be simpler lecture was my lecture but in testing and we discovered that actually the same of what the my machine but not on the class that now I'm doing different there and say OK I'm really sorry about it but I will not the goal of making the knowledge tests I know is wrong in some cases but the cat I put this in this this soonest possible I see the performance is that former since would then if the so that this thing and you the but is it the promises bad OK chains approach of this you know what the that's so what are the things that are now in Phaenician open to question I should ever more the means for that and is very good to have causal layer which is independent of what the policy should because you have because you want to try and see what happens if use prayers without through these processes and what I've and so would be so uh these about where furthermore what is important and also we have a very very simple approach the corrosivity is this map of because I had something more complicated before after I so the Europe I less the year I went to the high-performance computing training they so they were only using mark so says the way also I reality we only need the map so I removed the the fish of more than a year and other things that area and if you have problems that answer as we haven't been instead the same these data across the wide would be best if you view them because we use for the performance of the the result from the shelf system was not that there was used to read the from a set for system the Director DMA from this if I fight you don't need to on the that so it is much better to do believed that and the and and you don't want to write directly from the water and something that wasn't can easily get bottlenecks in say that and also another very special surprising things for me that I sort of right now I'm going to do a high-performance computing that would be using the item from Friday although the time said actually I do the by the 4 phyla 1 to i stadium sometimes you need that because it's 1 multiple the prominent it is it is more chains that it was all I had to use the Python provided to discover wife but a living out these than 99 per cent I don't use it by the right because the our problem the problem from a sample of the new we look that that the ask the week or the way the entries of uh started out to store the erase the lender arrays thinking so that so when these can information I get from instrumentation that the world so that's the final estimate of the line and finishing uh think about this domain to use system and write your instruments they should because only you use more with these that I what is the right place where you can get information and information the and that's Hathout
and modular and more persons at the and for a few months you may have a few I think somebody he is dead but we will be a Perot available for plants so you can see you know people with this and the ship gasses and even at the coffee break but then after 4 3 that we will we'll functions sort of westernized and now thank you