We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Running Python code in parallel and asynchronously

00:00

Formal Metadata

Title
Running Python code in parallel and asynchronously
Title of Series
Number of Parts
160
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Running Python code in parallel and asynchronously [EuroPython 2017 - Talk - 2017-07-11 - Anfiteatro 2] [Rimini, Italy] My outline will be: 1) What does it mean to run code in parallel in Python? How does it differ from concurrency? Can they be applied at the some time? 2) GIL and why it complicates parallelism in Python (CPython), but only to some extent. 3) Difference between a thread and a process from the OS point of view. 4) When parallelism in Python is useful and when to avoid it. 5) Description of how to achieve parallel execution in CPython and how to do it properly. 6) Possible traps when using parallel programming in Python. 7) What happens if the code runs both in parallel and asynchronously? 8) Is it really beneficial? 9) How such execution can be achieved? As the outline shows I will focus on the parallel part as it is an important topic in our current time of multicore processors and multiprocessor systems. The topic has been discussed a lot of times but mainly from the scientific point of view, where it's been used for speeding up calulcations time. I will not go into these use cases (e.g. using MPI) but rather discuss it from web development point of view (e.g. multi worker applications)
95
Thumbnail
1:04:08
102
119
Thumbnail
1:00:51
SynchronizationComputer programmingThread (computing)AntimatterSoftware developerSoftware frameworkInternet service providerService (economics)Content delivery networkServer (computing)Computer networkPoint cloudInjektivitätRadio-frequency identificationParallel computingProcess (computing)CodeIntegrated development environmentSpacetimeVirtual realityAddress spaceStack (abstract data type)Variable (mathematics)Local ringPhysical systemObject (grammar)Uniqueness quantificationMultiplicationMehrprozessorsystemInterpreter (computing)BefehlsprozessorCore dumpAbelian categoryConcurrency (computer science)CodeThread (computing)MultiplicationPhysical systemVirtualizationMultiplication signVariable (mathematics)Core dumpLocal ringBefehlsprozessorProcess (computing)SpacetimeParallel computingDifferent (Kate Ryan album)Information securityProteinProjective planeNeuroinformatikWeb 2.0SoftwareFitness functionContent (media)FacebookMereologyArithmetic meanBitRight angleGroup actionShared memoryIntegrated development environmentCloud computingBytecodeComputer scienceInterpreter (computing)Address spaceRule of inferenceMemory managementProduct (business)Axiom of choiceResultantOperator (mathematics)Scheduling (computing)SequenceDesign by contractOffice suiteMassSoftware developerInternet service providerSoftware frameworkSerial portSingle-precision floating-point formatPolar coordinate systemObject (grammar)AntimatterGoogolSoftware bugFlow separationContent delivery networkModal logicComputer animation
Thread (computing)Process (computing)BefehlsprozessorCore dumpAbelian categoryConcurrency (computer science)Computer programmingMultiplicationComplex (psychology)Overhead (computing)Software maintenanceContext awarenessTelecommunicationoutputSerial portTask (computing)Computer-generated imageryPixelLoop (music)ProgrammschleifeIterationEmailMaß <Mathematik>ArchitectureShared memoryRead-only memoryMessage passingSubsetMereologyMessage passingInterpreter (computing)Natural numberSemiconductor memoryOperator (mathematics)Shared memoryProcess (computing)Category of beingCASE <Informatik>Multiplication signComputer programmingType theoryBefehlsprozessorVector processorComputer architectureElement (mathematics)IterationMedical imagingoutputOrder (biology)Pattern languageSet (mathematics)ResultantParallel computingExecution unitFunction (mathematics)Different (Kate Ryan album)Physical lawScheduling (computing)SynchronizationSession Initiation ProtocolLoop (music)Fiber bundleInfinityNumberSoftware developerGroup actionFitness functionTask (computing)NeuroinformatikSoftware maintenanceDependent and independent variablesTelecommunicationCartesian coordinate systemAreaOverhead (computing)2 (number)Programmer (hardware)SequenceFront and back endsCore dumpThread (computing)Computer animation
Process (computing)Object (grammar)Read-only memoryType theoryShared memoryLetterpress printingSocket-SchnittstelleRemote Access ServiceNamespaceServer (computing)Level (video gaming)Context awarenessQueue (abstract data type)Lattice (order)Level (video gaming)Process (computing)Shared memorySemiconductor memoryTask (computing)Functional (mathematics)MultiplicationoutputSoftwareNumberDemosceneElement (mathematics)Data managementWordParameter (computer programming)Remote procedure callMessage passingDifferent (Kate Ryan album)Context awarenessSocket-SchnittstelleProxy serverCASE <Informatik>ParadoxBlock (periodic table)Function (mathematics)Physical systemSemantics (computer science)Primitive (album)Type theoryRoundness (object)ResultantDefault (computer science)Condition numberElectronic signatureMappingProgrammer (hardware)Maxima and minimaRight angleObject (grammar)Multiplication signCalculationSequenceGoodness of fitComputer programmingSynchronizationSubject indexingHybrid computerNetwork socketQueue (abstract data type)CodeXML
Data modelTask (computing)Hybrid computerScale (map)Dynamical systemMultiplicationComputer configurationNetwork socketThread (computing)Process (computing)Socket-SchnittstelleLambda calculusLevel (video gaming)Range (statistics)LogicMaxima and minimaSystem callVirtual machinePlanningNetwork socketCore dumpComplex (psychology)Multiplication signServer (computing)Utility softwareTelecommunicationResultantBefehlsprozessorNeuroinformatikNumberSynchronizationProcess (computing)Cartesian coordinate systemConnected spaceLibrary (computing)Descriptive statisticsIdeal (ethics)MappingString (computer science)Water vaporEndliche ModelltheorieThread (computing)Different (Kate Ryan album)Level (video gaming)Semiconductor memoryComputer programmingElectronic mailing listSocket-SchnittstelleMultiplicationCodeSingle-precision floating-point formatReading (process)Buffer overflowStack (abstract data type)Parallel computingFlag
Duplex (telecommunications)Data bufferChannel capacitySocket-SchnittstelleProcess (computing)Thread (computing)DemonEvent horizonFunction (mathematics)Type theorySynchronizationContext awarenessPartial derivativeTask (computing)Block (periodic table)CodeSystem callConcurrency (computer science)Variety (linguistics)Computer configurationComputer programmingArchitectureModul <Datentyp>Object (grammar)Data managementComputer configurationProcess (computing)Event horizonObject (grammar)Similarity (geometry)Pattern languageParallel computingKernel (computing)Module (mathematics)Mechanism designCodeComputer architectureBuffer solutionProgrammschleifeType theoryLoop (music)Task (computing)Core dumpOperator (mathematics)Thread (computing)Context awarenessSystem callVariety (linguistics)DemonVariable (mathematics)Network socketDuplex (telecommunications)Expected valueMetropolitan area networkOrder (biology)Fuzzy logicParameter (computer programming)Moment (mathematics)Universal product codePauli exclusion principleAdaptive behaviorOpen setCASE <Informatik>MassMereologyComputer programmingDeadlockSanitary sewerXML
Library (computing)Computer programmingSound effectMultiplication signDeadlockLecture/Conference
Software testingModule (mathematics)Context awarenessSlide ruleLecture/Conference
CodeConcurrency (computer science)Variety (linguistics)Computer configurationComputer programmingSynchronizationArchitectureModul <Datentyp>Object (grammar)System callProgrammschleifeDifferent (Kate Ryan album)Adaptive behaviorCore dumpLoop (music)Process (computing)Ideal (ethics)Computer programmingPresentation of a groupComputer animationLecture/ConferenceJSONXML
Transcript: English(auto-generated)
Hello, everyone. Can you hear me well? Okay. Welcome to my talk. I'm Michal, and today I will be speaking about running your Python code in parallel for most part, and asynchronous.
To be honest, I've never spoken to such a big group of people, so excuse me for being a little bit overwhelmed. And let me advertise other talks a little bit. So you can see that asynchronous and parallel topics are hot right now. So even during this conference,
we have several talks about them. I want you to understand what my talk is about. To not mix it with some other talks, I also encourage you to see. So there are some asynchronous
talks, parallel talks. There's also a talk about Python at CERN, which I think will be interesting, or I think it's a poster session. And this talk tries to be an overview of the topic. So it's not an introduction. That's why I labeled it as advanced. And also you might feel that
I skipped some parts. But I wanted to put it together as an overview, so you can later research what's interesting for you, and to not bug you with a lot of details.
Okay. So a few words about me. So I worked at the LHCB experiment at CERN, looking for antimatter for some time. And later I decided to pursue a PhD in computer science. But then I've heard that if I drop out, I will probably start a multi-billion dollar business.
But for some reason that hasn't happened yet. Yes, and currently I work at Akamai, and I'm now as a FAD developer, where FAD obviously stands for frameworks and tools.
So what my job is at Akamai is to make sure that we use the best tools we can. And how do you define the best tools? So sometimes you hear that Facebook is using some tools, or that Google is using some other. But do we need that? Do we have the exact same scenario
as them? So my job is to create tools and to select tools which are the best fit for us. And Akamai itself is a content delivery network and cloud service provider, so we are not
very known in Europe for some reason, but we have one of the largest, or the largest, network of computers talking with each other. And we are also responsible for between 10
to 25% of all web traffic. We also have some security projects, security products launched recently, and we also have 16 offices in EMEA, both sales and engineering.
Okay, there's a lot of mess when it comes to basic concepts in asynchronous and parallel programming, so let me clarify some things first. So when you have one pipeline and one worker working on it, you have a serial or sequential execution. When you have also
one pipeline, but multiple workers, and they do, they work in the same time, but not in parallel, I would call it concurrent. You may not agree with me, and some people do not, but let's assume at least for this talk that it's correct. And also we
have parallel execution, so we have multiple pipelines, we have multiple workers, and they actually do their things in parallel. So the concurrency, I usually, when I think of concurrency, I usually think about preemption. How many of you know what preemption is?
Okay. Half of you. So let me just say that preemption occurs if a thread has a CPU time, and operating systems scheduler decides that there's some other thread that needs that
time more, so there the preemption occurs, one thread is being stopped, the other thread is being put into his place, and then they switch roles until their job is complete. So this is why you can sometimes see that things are concurrent, because you achieve
results in a certain amount of time, but they are not truly parallel. Okay, so how would you call this? I would call this a headache. Or you might call it parallel
and asynchronous. So another thing which I need to clarify is the difference between threads and processes, because they are often mistaken, or processes are wrongly called bigger threads. So threads are the place where your code is executed. Each process
has a thread, and this thread can be scheduled for execution, it can get CPU time, and all threads share a virtual address space and system resources of the process, and they
do not share stacks, local variables, and also, but they do share process heap. So process, and process is an execution environment for threads, so it has its address space, it has executable code, it handles system objects, so it brings all what's necessary
for a thread to run. So I wanted to clarify that, because sometimes people don't know why GIL in Python complicates things. So how it applies to Python. So in Python, we
have multithreading and multiprocessing, and when we talk about multithreading, we have one process, so this one environment, we have many threads, only one interpreter, and due to GIL, there's a rule which says that in a Python process, only one Python
bytecode instruction is executing at once. So if you have many threads, you cannot execute many bytecode instructions from different threads at once. But with IO, it's a little bit different story, because if you have IO, then it does not execute any bytecode instructions,
so if you have threads, and you do some IO in them, you can actually see speed up, but that's because it's not going through Python interpreter. And when we talk about multiprocessing, we have many processes, we have many threads, at least one thread
per process, we have many interpreters, and all threads have their own interpreter, and that's why they can execute in parallel. So do we have Alex Martelli here? It's always dangerous to be citing someone sitting in front of you.
So during a chat with Raymond Hettinger, he proposed the following classification, which I think is simple but nice. So if you have one core, you usually want to run a single process with a single thread. So for two to 16 cores, because that's how
many cores you can get in consumer PCs nowadays, you can have multiple threads and multiple processes. So why you should not use multiple threads on a single core? That's because even though that IO, which might give you a speed up when it's done
in a thread, it still needs some CPU time. Not a lot, but it does. So with only one core, you should not achieve any speed up. And also, when you have 16-plus cores, you usually have multiple CPUs, so you enter the area of distributed
computing. And Alex proposes that as the time goes by, the second category becomes less relevant, as we are in the era of big data, and even one CPU with 16 or 32 cores is not enough. I would argue that for some
cases, like back-end work services, it is. But you can hear more about that in Raymond Hettinger's talk. Okay, so you should have some
knowledge about that now. So when I, as a back-end developer, think of speed up or performance boost, which one I want to use? Parallel or asynchronous? Parallel. Because I want to execute many things at the same
time. And if I want to gain responsiveness and lower latency, I'm choosing asynchronous. Okay, so when running things in parallel is useful? Well,
when you have big data sets, or complex computations, when you have problems with parallel nature, so-called coarse-grained, or when you have multi-worker applications. IO-bound problems are not a good fit for being parallelised, as they require a lot of IO, which is
mostly serial sequential. And also problems need to be complex enough so that parallel overhead caused by process maintenance, communication, scheduling, synchronisation, is negligible to what's going on inside
the process. Okay, so who knows Amdahl's law? Some of you. Okay, so Amdahl's law says how much speed up you can get when running in parallel. When you need to know how big part of your programme needs
to be sequential, and if you know that, you can approximate how much speed up you can get with a certain number of CPUs. So let's say that we have a task that runs for ten minutes, but five minutes of that time is sequential work like loading data. So you can see
that if you have even an infinite number of CPUs, we can only achieve a speed-up of two, because that second part, if it's really run in parallel, then that time goes to almost zero, but you still
have that five-minute time. So you really need to know your problem when you start working in parallel programming. And just to give you an example of that, some of you might say that it's a really trivial problem, but in order to present to you how
that works, I had to choose something like that. So here we have a small data set, and a really simple operation. We have an input vector of one million elements, and we want to calculate outputs which are inputs plus one. So we can run it sequentially,
and also we can run it in parallel in different processes. So how do you think how much the speed-up will be? We are running on four cores. Two, four, none. It will actually be slower,
because the problem is really simple, and the data set is small, and it's not enough to have any gain, and in fact, you actually lose something. And even for eight cores or more,
it gets even more complicated, and you get even worse results. Okay, so a common pattern in parallel programming is to put a problem difficult, more difficult, by running it in a
for loop. So here we have a problem that's 200 times more complicated. And how much the speed-up will be now? Two, four, almost four. So the speed-up comes from using
processes which, like I mentioned earlier, we need to have processes in Python to execute truly in parallel. And arithmetic operations go through interpreters, so we need
separate interpreters. So here we have almost four. Okay, so some problems like that have parallel nature. So here I was easily able to divide my data set into four
subsets, and the most part of that program is running in parallel. So this type of thing has a parallel nature. So usually, when we talk about parallel nature, we talk about coarse-grained problems. So if we have a loop of loops, if
we have multiple images to process, if we have multiple datasets, or a really big dataset, or maybe the dataset is not big, but the operations we want to run on it are long. So those problems are coarse-grained. And then
there you can easily apply parallel programming. But for fine-grained problems, there's a different story. So when you have iteration of a single loop, an image, or a single small dataset, you should not parallelise that, at
least not with a CPU. Because nowadays, we can actually parallelise fine-grained problems with massively parallel architecture devices like GPUs, because they have really a lot of processing units, and their threads are really
light. So in parallel programming, we have different memory architectures, and the most known two are shared memory, where each process has its own memory,
where each process connects to a shared memory, and it works on the same dataset. And we also have distributed memory, aka message passing. So we need to pass data to processes, and later get that back. That's why it's called
message passing. So how to apply them in Python? So for shared memory, we have shared C-type objects. Those are
objects created in shared memory and can be inherited by child processes. So if you import from multi-processing value, you can assign what type that value is, you can assign its value in the beginning, and you have also some other types and primitives. So let's see how
they behave. So I have two programs. So the difference is that one uses locking, and the other one is not. The
one on the left does not use locking. So we have shared memory. So all processes have access to the same memory. All can read from it and write to it in the same time. So what you, if you do that, they will, there actually
will be something called race conditions. So sometimes two or more processes may read the same value. So let's say that at index two, I have value two, and four
processes read that, and when they read that, they add one to it, so it's three, and then they will four times write that three into the memory. That's why you can achieve this. So when you run the left
program, you will get different values depending on what's going on in your system, but the answer will be wrong. So for shared memory, you always need to use some kind of synchronization, and in this case, I used locking. So here we ensure that
only one process can read shared memory at the same time, and only one can write to it. So with that, you get a good result, but what's with the time? You might say that the problem is
too small or the dataset is too small, but that won't be the case. The case here is when you use locking and you have multiple processes, you in fact get sequential execution, because only one process can take something from memory, make
calculations, and write back to it. So your code will either be slower or run more or less at the same time. And believe me that here it's really easy to spot and see, and usually we don't have that simple problems. And actually, you
can use something else. These shared C-type objects have their own locks, so you can use them. The output will be the same, but you will not create additional locks. And you need to really
keep the number of your locks as slow as possible to know what's going on. Okay, so we also have managers in Python, which are hybrid between the shared memory and message
passing. So managers are proxies through which child processes can access data, and when you create a manager, it spawns a new process which communicates through sockets. So actually, if you create, if you type multiprocessing manager, it
will create a new process. And you can later give that to children of that process, or you can even use it for remote access, because it's using a socket. So for distributed memory, the most commonly used tool is
PoolMap. How many of you have used PoolMap? Yeah, some of you. So it's really simple and nice. So you just define how many processes you want, and you map a certain function and a collection or its
arguments, and it just runs fine. It's really a high-level and nice tool, and you can simply achieve speedup. But you need to remember to always close or terminate your pool, and later to join it. And if you
know, if you're one of that kind that doesn't remember, like me, you can use it as a context manager. And we also have something that looks like message passing more. So we also have pipes
and queues. So the basic difference is that pipe has only two ends. It's really fast, because it's usually using operating system pipes. And queue can have multiple producers and consumers, but you need to have in mind that
behind the scenes there are pipes connecting all elements of the network. So Pool has some overlooked features, because people usually use it like this. So they create a pool with numbers of processes, and
then they just map it some function to some input. So what you can define is, for example, max tasks per child argument, when sometimes your processes grow and consume
more and more memory, and you want to restart them once in a while. So here you can define how many tasks should be executed per child until there's a new child. You also can define a chunk size, and I didn't know that until yesterday, I think, because map
usually maps one execution to one element. So if you have a map of four processes, and let's say 12 inputs, the default chunk size is one, so there will
be 12 round trips between the worker and the main process before the result will come back. So you can optimize it with that parameter. So you can also define IMAP and IMAP unordered. The difference is that IMAP still waits until all processes finish,
and because when you call map, you will get a list with the results. When you call IMAP, you will get a generator, but you will still need to wait until everything is finished. And when you use IMAP unordered,
you get what finishes first. So that's useful. And also there's an approach with apply async. This is what is actually going on behind the scenes, but it is discouraged to use it, because maps are
considered higher level and better tools. Okay, so we have different models for parallel programming. We have also different models for distributed memory itself. So we have something called worker-based
models. So you can have a prefork model that JuniCoin uses. So you might create your workers beforehand, so you define that your application starts up with four processes or four threads, and that's prefork. You can have a worker model where you
define during execution how many workers you need. For example, you optimize that to your dataset and how well it divides, and multiprocessing pool is an example of that. And you also have a hybrid
approach, so you can define that number of workers beforehand, and later you can scale them dynamically, which is useful when you're working with something like a backend server. Okay, so when you want to create an application, a
multi-worker application, and let's say that you want to respond to some requests, then you have basically two approaches. You can either reuse port and reuse other flags for the socket, and I won't
really go into details. There's a really nice description on Stack Overflow about that. But basically you can create as many processes or threads you want. You can assign the same socket to them, so all those workers can listen
what's coming from the socket. But you need to in this scenario you have to ensure locks and synchronization because if you're going to read from all those threads then you'll just get garbage. So in Twisted there's a really neat way of doing this.
So you create a socket, let's say a TCP socket, you spawn child processes, and you can later adopt sockets for the child processes. And this is
the approach which you should choose if you really want to tune your performance and if you really want to have access to some low-level stuff. And if you don't, you can take a different approach, which is most common where you have just a single thread reading from the socket and
this thread is responsible for IO and later it delegates the work to some other workers through a queue, which I mentioned earlier. So then you don't have any problems with synchronization and stuff like that.
So up till now we've talked about so-called intra-node communication, so communication within a single CPU or just one server but you can also run your code on multiple
machines. There are many libraries you probably have heard about MPI, which I think is the most commonly used library up to this day when it comes to scientists. But there are some
other tools. I personally like Scoop. Maybe because it has a really nice slogan but it uses 0MQ sockets for communication. It's really similar to multiprocessing pool and it utilizes
SSH connection for execution. So you need to have SSH access to the machines you want to run your application on and later it connects to them sends data and executes it. So you can see that
it's really, really simple to use that. Okay, so I've encountered some traps and some weird behavior over the years. So I would like to share that with you. So one possible trap is
hyper-threading. So CPUs are often advertised as 16 cores, 32 cores, 64 cores but how many physical cores you get? Usually half of them.
So hyper-threading works in this way that you have a CPU pipeline and you have let's call them slots in them. So if you have slots to run two things at the same time on one core then your two logical threads will run
in parallel. But if you don't then they won't. So I had a problem. I had a 12 core Intel Xeon machine with 24 logical cores and when I ran my computation, which was
a really complex computation and I'm sure that the result is not caused by communication or stuff like that, I achieved these results. So I've heard that Intel is launching a new tool
for tuning and profiling Python, so I think it might be interesting to work with that. Also you don't always want to target 100% utilization. Because if you have four cores you prepare four workers
and then you have 10% of each core not used in each CPU epoch. So what you want to do is to just add workers to use that. But you won't gain anything and actually you will lose.
Does someone know why we lose time here? Because we should utilize that additional spare 10% and it should be faster. Yes, exactly. So we are switching contexts. All processes are fighting for resources
and switching them and copying them for different cores is a really expensive operation. So don't always target that 100%. Also there is a funny thing
in how pipes are implemented. pipes cannot send things both ways. So if you define a pipe with duplex you'll actually get a socket. And if sometimes
you get a socket and sometimes you get a pipe and if you take into consideration that they have different buffers defined in the kernel then you might encounter a situation where sometimes you will be able to send something
and sometimes not. So that's interesting. And also you have a usual topic which is deadlocks. So when one process has some resource a second process has its own resource and they wait for each other but they do not free their resources
then they will wait forever. And do you know how to kill processes and threads in Python? So there is a kill method. So who used kill method? Okay, so you couldn't use kill method because it does not
exist. Because you cannot kill a thread. It's by design because you might end up in this situation where your thread holds a resource and when it's killed other threads will never get it because it's never being freed. So that's why you need to use some different mechanics.
And when we are at the threads there are some there's a common misconception with daemons so if you have while true or something similar in your thread it should be a daemon. And daemons should not be joined.
Once you set up a thread as daemon they should just run as long as their process is running and the only clean way of stopping them is up to you. So also don't use global variables. Don't define
stop equals false and then iterate it unless it changes because you might never know what will happen and when those threads will be stopped. So the common pattern is to use events and to just send an event in the main thread
and to wait for that event in worker threads. Okay so when it comes to parallel and asynchronous we finally reach the topic. So we have basically two options the threaded option and the process option
where we can define executors, we can submit jobs to them and basically what you get is futures and also you can define them as context managers and you can run that without starting the IO loop and just get futures or you can
use them with IO loop but then you need adapters and sometimes those adapters work and sometimes they don't so you need to be really careful and also keyword arguments are not allowed in executors so you might want to read the PEP to know why. If you really
need keyword arguments then use just partial. Also you can submit several jobs and wait for all of them to finish. So coming to an end why would you want that?
So you might want that for long running tasks that might block your IO loop so you might want that if you have some code which is incompatible with your IO loop and that will most certainly
block it like requests. You cannot use requests with any IO loop that exists for now. And also if you have some running blocking tasks that you want to run in parallel so what will you get when you use that?
Headache. Because running things asynchronously is troublesome and when you introduce also running it in parallel it's also troublesome so you should really know that you need it.
Okay so let me rant for a moment just before I finish so you all know this gentleman it's Tim Peters and he said that there should be one and preferably only one obvious way to do it. So where have we gone wrong? We have currently four commonly used IO loops
we can three types of asynchronous calls so if there are some decisive people in that crowd let's think how to fix that. Because Python 3 was created in order to clean up some mess
which accumulated over the years and I feel that now we are creating such mess again. Okay so in summary, Python has a wide variety of options when it comes to parallel and asynchronous programming.
You should really know your architecture when you use parallel programming. You should always test your code before entering parallel concurrent world so first sequential then parallel. After you enter the concurrency world you should test it with fuzzing.
I didn't say anything about that but you can research that. Be aware of any incompatibilities between modules and I assure you that they do exist. Be sure when you should expect awaitable objects in asynchronous
programming and handle them properly and also you know those tools are for us and they mostly work and you can create even production code with it if you test it well so don't be scared to seek out new options and to boldly go where no man has gone before.
Thank you. Hopefully it will be a piece of cake. Thanks. Do we have some time for questions?
Yeah we have three minutes for questions. Questions for Mihail? Oh a lot of questions. So actually I brought something nice from Poland for people who ask the best questions.
You mentioned about deadlocks I am curious about if there is a dead library also detect deadlocks in the program which can appear. I know that such solutions exist but
I don't remember the names but you can mostly get that with fuzzing or just testing some unexpected behaviors.
In your last slide you mentioned about incompatibilities between modules maybe you can tell something more about that. What was your experience? How it worked? Could you say that again and a little bit louder?
Be aware of any incompatibilities between modules you use. So what were you referring to?
Incompatibilities. Ok sorry for that. So like I mentioned we have different IO loops
so let's say that you want to use Tornado it has its own IO loop but you want to also use some process executor which does not run really well with that without adapters so what you might
do is to first adapt your program, your Tornado program to run on asyncio and then run that those executors. For example Curio does not work with any other IO loop
so you don't even get any so you don't even get anything to connect to them and for example in Tornado you should not yield from you should yield from a coroutine so there's also some incompatibility
and that's what I meant. Ok, thanks Mihai for your presentation.