Demystifying AsyncIO: Building Your Own Event Loop in Python
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 131 | |
Author | ||
Contributors | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/69424 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 2024110 / 131
1
10
12
13
16
19
22
33
48
51
54
56
70
71
84
92
93
95
99
107
111
117
123
00:00
BuildingLoop (music)Event horizonOpen sourceObject-relational mappingSoftwareLetterpress printingPseudodifferentialoperatorInterior (topology)Range (statistics)Random numberObject (grammar)Task (computing)Event horizonFunctional (mathematics)outputSoftwareDefault (computer science)Loop (music)Parallel portDivisorMereologyClient (computing)Task (computing)Server (computing)Multiplication signQueue (abstract data type)Different (Kate Ryan album)DampingSoftware developerCodeFigurate numberProcess (computing)Line (geometry)Point (geometry)Computer fileSystem callBitCondition numberData structureFile systemComputer fontThread (computing)Object (grammar)ImplementationTwitterProjective planeData storage deviceConstructor (object-oriented programming)Parameter (computer programming)Electronic mailing listInterpreter (computing)InfinitySequenceOpen sourceAtomic numberCore dumpCuboidCoroutineFunction (mathematics)SynchronizationOperator (mathematics)Limit (category theory)Moment (mathematics)POKEUniform resource locatorResultantRepresentational state transfer2 (number)WeightComputer animationLecture/Conference
07:57
Letterpress printingLoop (music)Trigonometric functionsMultiplication signException handlingLatent heatResultantSystem call2 (number)ImplementationCASE <Informatik>NumberBitFilter <Informatik>CodeParameter (computer programming)Queue (abstract data type)Default (computer science)Functional (mathematics)Revision controlObject (grammar)Point (geometry)Different (Kate Ryan album)Operator (mathematics)1 (number)Loop (music)Scheduling (computing)SoftwareServer (computing)Network socketEvent horizonComputer fileProcess (computing)Function (mathematics)outputKernel (computing)10 (number)Connectivity (graph theory)Operating systemReading (process)Computer hardwareSocket-SchnittstelleData structureProduct (business)Keyboard shortcutSystems engineeringMultiplicationMultilaterationComputer animation
15:41
TetraederCone penetration testWebsiteEvent horizonConservation of energyLoop (music)Pointer (computer programming)BenchmarkAdvanced Encryption StandardNetwork socketOpen setIndian Remote SensingVirtual machineFunctional (mathematics)Event horizonLink (knot theory)Loop (music)ImplementationServer (computing)MereologySystem callDifferent (Kate Ryan album)Slide ruleIterationProduct (business)Data managementCodeCASE <Informatik>Context awarenessComputer fileNatural numberElectronic mailing listWorkloadDifferential (mechanical device)Cartesian coordinate systemSoftwarePhysical systemIntegrated development environmentWeb 2.0Noise (electronics)Group actionSelectivity (electronic)Traffic reportingTask (computing)Profil (magazine)Data structureFunction (mathematics)outputStandard errorOperator (mathematics)Multiplication signAdditionTable (information)CoroutineLevel (video gaming)Term (mathematics)Library (computing)Scaling (geometry)Default (computer science)MultilaterationAtomic number2 (number)Open sourceSystems engineeringWeightSoftware testingBenchmarkProcess (computing)Complete metric spaceWritingCodePlastikkarteScheduling (computing)Connected spaceProjective planeResultantComputer animationLecture/Conference
23:25
Object-oriented analysis and designReliefSteady state (chemistry)Infinite conjugacy class propertyPseudodifferentialoperatorOpen setOctahedronError messagePersonal area networkEvent horizonComputer programmingLoop (music)Context awarenessThread (computing)Semiconductor memoryControl flowCASE <Informatik>Functional (mathematics)INTEGRALSystem callScripting languageParameter (computer programming)CoroutinePower (physics)Different (Kate Ryan album)Default (computer science)Physical systemTask (computing)ImplementationSingle-precision floating-point formatEndliche ModelltheorieQuicksortComputer architectureMehrprozessorsystemData managementComputer animationLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
00:04
Thank you. Thank you, everyone, for coming. So yeah, we're going to talk about AsyncIO and how actually it works. It started just as a kind of all-nighter I spent just trying to understand AsyncIO. And so I just dived into this kind
00:20
of infinite hole of understanding how it works. And so it gave me the idea of just creating this talk to build an event loop from scratch to understand basically what is AsyncIO, how it works, and what is hidden behind those await keywords. So to start, maybe a little bit about me. So I'm Arthur Pastel.
00:41
I'm from France, probably you guessed with my accent. I'm on GitHub and Twitter. I got started with the Python community by building an object document mapper, so basically a kind of ORM for MongoDB. And it was Async. So this is also what got me started with AsyncIO
01:01
as well as Fast API. As well, I'm more recently a C Python typo contributor. I'm really proud to be one. And more recently, I'm focused on software performance and helping teams improve software performance, either in open source project or other kind of product, to make sure that it never degrades
01:21
while you're developing. But we'll come back to that when we talk about event loop performance later down the line. So maybe we can start with a little quiz and just to figure out, what have you heard about for AsyncIO? So can you raise your hand if you already heard about AsyncIO? So probably everyone, otherwise there
01:41
would be a problem here. And have you heard about the event loop on top of AsyncIO? Futures, maybe? OK. And callbacks as well? OK, so almost everyone. And coroutines, which is a bit more niche, but still quite common. Maybe file descriptors as well?
02:01
OK. And as well, selectors? OK. So this will be the big outline of the talk. We're going to start by just what is AsyncIO and why is it useful at all? So this is the most simple example of Async code you can have.
02:22
So there is a few things to note if you've never used AsyncIO, which is like the async keyword before the definition of the function, and as well now an await operator, an await keyword, which actually kind of stops the execution of this coroutine.
02:40
And then it's restarted later on, so when the sleep is completed. So with just that, it might be kind of pointless, because instead of using await AsyncIO.sleep, we could simply use time.sleep, and it would be exactly the same. So here, it's kind of pointless, but still it works. But it's more interesting in more complex example
03:03
when you deal with a lot of different factors. So for example, let's say we want to fetch random Pokemons. So I have a function that we call the Poke API, which is an API that contains a lot of Pokemons, a REST API,
03:23
so a really nice toy example API if you don't know about it. And so what I want to do is to be able to encounter five Pokemons. And so without AsyncIO or any threading or anything, I will just have a for loop. So it works, but there is a bit of a problem,
03:41
because here, we are doing everything sequentially. So first, we send the first request, then we wait for the endpoint to send us the result, then we process and receive the object, then we do the second request, et cetera, et cetera. And so here, we can see that there is a green part. And this green part is actually
04:01
when we are waiting on IO. And during this moment, like when we're waiting on input-output or any kind of file system operation, the global interpreter lock is free. So the global interpreter lock is the thing that is actually locking the interpreter to make sure
04:21
that a single thread is executing Python code. So right now, it's a limitation. Probably soon, thanks to the new development, it won't be there anymore. So we won't really have to deal with that anymore, and we'll be able to use threading as we want. But here, the main point is that during all this green time,
04:40
we could execute Python code. And we are not, because we just have a synchronous execution. But the point of AsyncIO is to be able to do that like this. And so we could do that with threading. But the main benefit of AsyncIO is that it allows us to have this massively parallel
05:02
input-output processing with a very simple implementation. So for example, if we take the same example as before, the only difference is that instead of using requests to fetch our Pokemon, here we use an asynchronous HTTP client.
05:20
And on top of that, we define an async function, and we just gather the task, and we process them in parallel. I mean, it's not really parallel, but the input and output are executed in parallel, and we wait on them concurrently. So within the beginning, we are thinking a lot of AsyncIO.run. But actually, it's not just AsyncIO.run.
05:44
This is where the event loop is actually hidden. When we are doing AsyncIO.run, we get the default event loop. Our new one is created, and as well, the task is created to run our coroutine. And then the event loop runs our function.
06:01
So let's start to create our event loop, which is the first part of this AsyncIO.run. The ultimate goal of this talk is to build an event loop that will be able to run a fast API server. So we'll probably omit a big chunk of network implementation, but we'll grasp most of it
06:22
and the most interesting part. So the most common thing we can do with an event loop is to schedule really simple callbacks, because with this, we can delay slightly the execution of atomic functions, and then we can build on top of that.
06:42
So first, for example, if we want to schedule the foo and the bar function, we'll call the call soon function, which will just add those methods in a queue and wait for them to be executed by the event loop. So under the wood, it works just like this. So we have our queue, we have our event loop,
07:01
and it just queues the callbacks that will be executed. So for the implementation part, first, we have the constructor of our event loop. So we'll store lists of those handles that will actually be our callbacks. So we wrap the callbacks in handles
07:22
so we can store, as well, arguments and more detail about what is needed to be executed in those callbacks, and as well, just a stopping condition. Then when we call the call soon method, we'll just add this new handle to our queue. So here, we're using a deck, because we
07:41
want a FIFO structure. So the first item that goes in the queue will be the first to go out. And then we actually run the event loop. So it's called event loop, because we have a really atomic step, which is called a tick. I think there are also names, but usually it's
08:01
called a tick. And so here, it's called run once. And what it does here, in our very basic version, is just we check if we need to execute some handles, so if they are ready. If we have some, then we execute them. And when we actually call run forever, we'll just run this tick function, so run once forever.
08:25
So this was a really basic just scheduling something to be executed as soon as possible. But on top of that, we can schedule things to be executed at some point in time with the callat method. And so the implementation is kind of the same,
08:41
but we have an intermediate queue that will be sorted by time when the callback needs to be executed. And once those callbacks are ready and they need to be executed, they will be pushed into the ready handle queue. And so the code is just a bit more complicated. But here, we use the same data structure, so a deck again.
09:01
And we use bisect.insult to insert in this queue, but by preserving the ordering in this deck. And so as well, it just had a new item in our run once function. So before executing the ready handles, we'll check if we have some timed handles that are ready
09:22
and they need to be executed. And if it's the case, then we'll put them in the ready handle queue. So on top of that, we can derive callator. So callat, where you will specify a specific time. But with callat, callator, you will just specify some delay. So it's pretty simple, but it's way more convenient to use.
09:40
So for example, if you want to implement the sleep function, the async version of sleep, you will use callator. So until now, we scheduled callbacks, but we didn't really deal with asynchronous operation and the results they can bring. And this is where futures come in handy.
10:00
So the main benefit of futures is that it represents the results of an async operation that hasn't been completed yet, or it can be completed if it's finished. But when you create a future, it will be pending without a result. And then you can set a result in it, and then it will be finished. And you can await those kind of results.
10:23
And so for example, here we have a simple example where we create a future. Then we add a callback to this future. So when this future is completed, we want to execute some code. So for example, if we take the previous example, once we fetched our Pokemon, we want to parse the JSON and get just the name of this Pokemon.
10:42
And then we schedule it by saying, OK, we want this future to be completed in one second with the result 42. And what will happen when we run this function and we run it in the event loop is that it will wait for the future to be completed to actually stop executing.
11:01
And so this implementation of run until complete is pretty close to what we had until now. The main difference is that instead of looping forever with run forever, here we're just looping until our future is done, actually. So it's pretty simple. And this future object will bring a lot
11:21
of interesting capabilities. So for example, we can build the slip function we mentioned before. So you can note that this function here we're implementing is not an async function. It's just a function that will return the future. And so for example, here it takes the number of seconds it needs to slip and as well an event loop.
11:41
So in the official implementation, you don't really have to specify the event loop because it will take a default one or you can specify one in some cases. But here for the sake of simplicity of implementing this, it's easier to just pass it as an argument. And so then we create a future and we use our event loop to actually fulfill this future
12:02
one second later. We do like the result is kind of pointless. We don't really care what we put in that future. And then we return this future. And so when our event loop will wait this future, it will wait until the callback is called and actually fulfill the future. And when this is the case,
12:22
the future will be fulfilled and the execution will continue. So as well, another helpful helper is Gather, which allow you to wait on concurrent future and a lot of them. So for example, we used it in the first example to wait on multiple Pokemon fetching.
12:42
And so here the implementation is a bit more complicated, but still interesting because the idea is for each future we want to wait on, we'll set them a callback, this home done callback, and we'll just count the number of completed callback that we have compared to the number of item we got in.
13:02
And once, when all those callbacks are called, it means that actually all our futures are completed and we can return. So this is a very partial implementation because here there is a really tricky thing to handle, which is handling exceptions, but we won't dive in this, but you can have a look to the official implementation
13:21
of the Gather helper, which is really interesting because you have a lot of edge cases with exceptions to deal with. So until now, we talk about scheduling callbacks waiting on the futures, but this is called AsyncIO,
13:44
not just Async, nothing. So what about those IOs? So either we are waiting on a network call or on the file to read on or file to write in, we'll always deal with a file object.
14:00
And the main point of AsyncIO is to be able to bind this file object with a callback. So for example, I'm waiting to read on the socket and I'm waiting for the data to come. So this is a file object and I just want to read and the read will be executed in a callback. And the main point is being able to wait on a ton of file descriptors and it can be like
14:23
network sockets, just a file you want to read on or maybe even hardware devices. And the point is that here we only have six of them, so it's pretty small, but if you're working on a production Async HTTP server, you might have tens of thousands of them.
14:41
And you want to be able to handle them concurrently and not scan each of them, because a really naive implementation would be to just iterate over those file objects and check if they are actually ready. So to do that, the main component of this event loop is actually the selector, which will be able,
15:02
depending on which file object is ready, to tell us which callback we should execute and when we should execute it. And the main goal is to have as optimal as possible selector, because we don't want it to scan all the possible file objects. So before diving in, or actually it works, we need to dive into file descriptors.
15:23
So when you have a process, the kernel of the operating system will give you some file descriptors. So when you just start your process, you will have three file descriptors available by default. The first one, zero, will be the standard input. The first one, the one will be standard output.
15:41
And then as well, we will have a standard error output. Every time you open a new file, you will have a new file descriptor that will be stored in the file descriptor table. So for example, here we have an example of the system call. So it's in C, but the idea is that when you're working with Python, it's calling this function under the hood and doing exactly the same thing.
16:02
And when this system call is called, it will add a new item in the file descriptor table of your process, because this file descriptor table is specific to our process. And so for example, here we have our new file descriptor. Another interesting system call to actually wait on a lot of input or outputs
16:23
is the select system call. So the main idea here is that we'll give it some file descriptors, list of file descriptors, or iterables of file descriptors. So the first one will be the file descriptors where we want to read on.
16:41
As well, the second one will be file descriptors when we want to write on. And then there is an exceptional cases for like the X list one, and we don't really care about it right now. And as well, we'll give it a timeout. So when we call this with all those lists, it will give us the ready file descriptors as output.
17:02
And so in the case of our event loop, it will tell us actually which file descriptors are ready and which one we should find the callback of. And so the implementation is actually pretty simple. It's just we provide some simple functions where people can add some readers,
17:21
add some writers as well, or remove them. And then we call the select system call. And when the system call returns us ready file descriptors, then we'll call the associated callbacks. So this is mostly how IOs are handled by event loops.
17:40
Some event loops don't really use a select system call and use some other technique, and you can as well do some polling. But in terms of performance, this is usually what is preferred. So until now, we didn't really talk about coroutines, and we mostly add some functions that return features. But on top of just having functions that return features,
18:02
we can as well define functions that are asynchronous by nature. And so in the beginning, you could define generator style coroutines, because basically a generator gives you the possibility to define coroutine, because the official definition of a coroutine is just a function that can be suspended
18:20
and then resumed. But with async await came way more friendly use cases and usage and the syntax as well. So for example, with async context managers and async iterators and a lot of different handful things.
18:41
So when actually we are doing create task, what is happening under the hood is that the function is wrapped into a task and is executed step by step. So first, we have a call soon call that will schedule the first part of the function, the synchronous one.
19:01
Then we have the await part of the function that will open the connection, so create a new file descriptor and just add a reader that will wait for the data to come back. And as well a writer to send the call. And then when everything is done,
19:22
the call soon method will also be used to schedule the last part of the coroutine. And so this is like the whole execution of this coroutine wrapped into a task. So with all of that, and some more network implementation, it can run a fast API server.
19:41
So the code is available. I will give a link at the end. And so we actually did it. So I, like I spelled the implementation of all the network calls, but it actually works. Trust me and you can run it on your machine. But then what about performance? Because we really did a super quick implementation.
20:03
So to compare the performance of the default event loop with the one we created, we'll write microbenchmarks, just benchmarking simple operations. So for example, run until complete, call soon, and call later. Because most of the time, those will bring like the biggest performance bottleneck
20:21
because the IO part is not that interesting. So for example, if we're doing a 10 second network call, it's really only here in those small tick and atomic operation that we can have some impact. And so then we'll just provide the event loop. So here it's a default event loop,
20:41
the one that is provided if you're running on Linux. But we'll provide our own event loop. And actually, the performance is a complete disaster because all data structures and everything is really poorly optimized. But even the implementation provided by CPython
21:02
is not like the best one if you want to run a production application. So for example, the web server at scale. And often you will end up using UV loops, so mostly with UVCore, for example. And so the main difference with UV loop is that it's built with Cython and as well LibUV. So LibUV is a library that is powering Node.js.
21:22
So it works at scale. And the main idea is that since it's a lower level implementation of all that, all those operations we're benchmarking will be way faster. And so they claim it's actually two to four times faster and we'll just verify that those claim are true. So with our existing test,
21:40
we'll just switch to this UV loop. And actually, it's completely true. So it's 2.8, I mean, from my benchmarks, it's 2.8 times faster than the stock implementation built in CPython. And so to build all those reports, I use CodeSpeed, so this is a tool I'm working on.
22:01
And the main idea is that with just this pytest marker and a small YAML addition in GitHub action, it brings us those reports and as well detailed profiling. And so it's completely integrated with CI systems. And it runs on CI without bringing extra noise
22:22
because we are instrumenting the CPU and simulating it, which means we don't really depend on noisy neighbors. Like for example, if you're running in a CI environment or GitHub action runners, you might have a lot of other people running heavy workloads. And it's not really like depending on that.
22:41
And as well, it's integrated with GitHub with like differential profiling, as I mentioned just before. So this is how I actually measured all those results. And you can as well check them directly on the link. And as well, CodeSpeed is free for open source products. So if you're working on a cool open source product and just want to measure performance, don't hesitate to hit me up.
23:02
So yeah, this is mostly it. The code is available on my GitHub. And as well, I will share the slide in the Discord channel of this one. And yeah, if you have questions, don't hesitate. Thank you.
23:24
Thank you, Arthur. So we have about four minutes for questions. If anyone wants to ask something, there's microphones in the hallways.
23:42
Hi, so my question's about the sort of future of async go with remove of the GL. So yeah, currently, you typically just have a single event loop. With the removal of the GL, yeah, do you see kind of like a more kind of like guarantee in the future where you can have threads?
24:01
Each kind of thread is doing, have its own event loop essentially. Is that something you see is happening? Yeah, probably. I'm not sure of it, but as well it will bring way more power to just using threading because until now, you could do actually exactly all this. You could do it with threading,
24:21
but with less guarantees and it would be like less efficient than using multiprocessing. But with the removal of the GL, it will kind of bring everything closer, manage memory will be easier as well, managing memory. And so for asyncio, I think it could bring some integration to be like easier because you could just use kind of shredding
24:40
under the wood. So create some async task by just using shredding under the wood. And probably it will be easier to build some integration. Thank you. Any more questions? Okay, we have a online question from Tybale. What is a callback? Okay, yeah, I didn't really talk about it in the beginning.
25:01
So the main idea of the callback is to have some things that will be called later. So often it's a function and it's just the idea that you can put something in the event loop and have it being called later on. So it's not like calling the function directly, but just delegating the call.
25:20
And so it can be called as well with its arguments by the event loop itself. So specifying which arguments and which function, which scope, what is the context and everything. Okay, thank you. Anybody else have any questions? Okay, so I have a question for you. So we have async programming and multi-threading programming
25:40
and we also have the old C model of using polling and e-poll. So why would I choose async programming over an e-poll loop, for example? I didn't really experiment a lot on using different kind of event loop, but usually the selector-based event loop will be the,
26:00
depending on your system, will be the most performant one. But I think on some kind of OS, even the default implementation of CPython will prefer an e-poll event loop. I don't remember exactly which one. But in some cases it will be better, depending on the architecture. But if you're running on common x86, 64 Linux,
26:23
the selector-based one will be preferred. Okay, thank you. So I see that we don't have any more questions. We will be stopping for a 15-minute copy break and we'll be continuing at 1530 with Nicholas Tolvery and Joshua Love talking to us about PyScript.
26:41
Thank you. Thank you.