We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

gRPC Python, C Extensions, and AsyncIO

00:00

Formal Metadata

Title
gRPC Python, C Extensions, and AsyncIO
Subtitle
How to make AsyncIO work with the gRPC Core
Title of Series
Number of Parts
130
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Goal - Encourage Python developers to understand C extensions by sharing gRPC Python’s practice, and advocate the adoption of AsyncIO. Prerequisite - Understand thread vs. process; - Interested in asynchronous programming. gRPC Brief - What’s gRPC Core? And what is gRPC Python? Cython To The Rescue - Why we picked Cython among all other available tools (e.g., pybind11, ctypes) - Debuggability: pdb & gdb The GIL Friction - How to delegate work to C extension - How to make multithreading work AsyncIO Topic - Not blocking the loop, the main headache. - Non-blocking I/O solution 1: replacing C libraries’ I/O operations - Non-blocking I/O solution 2: dedicated background poller thread - Performance improvement (10k -> 20k for client, 4k -> 16k for server) Migration to AsyncIO - Tolerate multithreading and AsyncIO in the same application - Make both API co-existable in the same application
61
Thumbnail
26:38
95
106
Projective planeMereologyFlow separationScaling (geometry)Software engineeringSoftware repositoryComputer virusService (economics)Right angleField extensionAreaOpen sourcePhysical systemBuildingEmailKummer-TheorieProduct (business)SoftwareInformationGoogol
Kummer-TheorieSoftwareMaizeOpen sourceCache (computing)Software maintenanceSoftware frameworkWeightDirection (geometry)Core dumpWrapper (data mining)Asynchronous Transfer ModeCompilation albumDisintegrationUnified threat managementComplex (psychology)Revision controlModule (mathematics)Boilerplate (text)Disk read-and-write headMobile appCodeSupremumBootingTemplate (C++)EmailFormal languageLogicHand fanFrictionFormal languageTotal S.A.SimulationCompilation albumSoftware maintenanceDifferent (Kate Ryan album)Semiconductor memoryFamilyCycle (graph theory)Projective planeMultiplication signSoftware developerKummer-TheorieRemote procedure callField extensionEndliche ModelltheorieProcess (computing)LeakInformation securityData compressionSource codeOpen sourceGoogolWrapper (data mining)INTEGRAL1 (number)Functional (mathematics)Structural loadSpacetimeTemplate (C++)LastteilungSoftware engineeringElectric generatorInheritance (object-oriented programming)Complex (psychology)Interface (computing)Memory managementWritingRevision controlInformation retrievalDesign by contractObject (grammar)Line (geometry)CodeMereologySoftware frameworkFrame problemConnectivity (graph theory)WindowError messagePhysical systemCommunications protocolVideo gameEmailCore dumpLevel (video gaming)Mechanism designModule (mathematics)Computing platformLogicComputer animation
Template (C++)CodeEmailFormal languageLogicHand fanBootingMathematicsReading (process)Source codeExecution unitInterrupt <Informatik>Data modelThread (computing)Core dumpAspect-oriented programmingLatent heatMathematicsLibrary (computing)Formal languageLoop (music)Revision controlCodeThread (computing)SubsetNumberLimit (category theory)Variable (mathematics)Fluid staticsProxy serverInternet service providerRootRemote procedure callDoubling the cubeStress (mechanics)Source codeHand fanPortable communications deviceCycle (graph theory)Video gamePrime idealOrder (biology)Single-precision floating-point formatEvent horizonCore dumpLevel (video gaming)Endliche ModelltheorieDeadlockServer (computing)NeuroinformatikPatch (Unix)Streaming mediaWebsiteObject (grammar)SpacetimeDivisorLine (geometry)Software maintenanceModule (mathematics)Asynchronous Transfer ModeInterpreter (computing)Stack (abstract data type)Frame problemPerformance appraisalProcess (computing)Scripting languageConcurrency (computer science)Entire functionCoroutineQuicksortComputer animation
Dependent and independent variablesLoop (music)Instance (computer science)Interface (computing)Event horizonSoftware frameworkSimilarity (geometry)Beer steinComplete metric spaceSystem callImplementationStack (abstract data type)SynchronizationDemosceneServer (computing)Library (computing)Thread (computing)Electric currentBlock (periodic table)Core dumpMedical imagingWeightInterface (computing)Loop (music)DemosceneSystem callStack (abstract data type)Revision controlGroup actionData compressionFunctional (mathematics)Service (economics)Multiplication signPhysical systemCustomer relationship managementLibrary (computing)Cartesian coordinate systemServer (computing)Table (information)SoftwareTouchscreenImplementationData managementClient (computing)Event horizonSoftware frameworkResultantComplete metric spaceOperator (mathematics)Numeral (linguistics)Primitive (album)Pattern languageCore dumpBlock (periodic table)Network socketGraph (mathematics)Queue (abstract data type)Game controllerDifferent (Kate Ryan album)Task (computing)Query language2 (number)Perfect groupGoodness of fitComputer animation
Stack (abstract data type)SynchronizationImplementationThread (computing)Event horizonBlock (periodic table)Electric currentLoop (music)CodeCore dumpNegative numberDeadlockFunction (mathematics)Network socketComplete metric spaceObject (grammar)Socket-SchnittstelleSoftware testingClient (computing)DeadlockCartesian coordinate systemImplementationClient (computing)1 (number)RandomizationMessage passingWechselseitiger AusschlussProper mapStack (abstract data type)Physical systemContext awarenessMoment (mathematics)Content (media)Library (computing)Compass (drafting)WebsiteCore dumpFunctional (mathematics)Set (mathematics)CodeEvent horizonINTEGRALTable (information)SynchronizationMathematicsSuite (music)SurfaceMultiplication signThread (computing)Operator (mathematics)Software testingFormal languageData managementRevision controlSystem callEnvelope (mathematics)Goodness of fitWhiteboardEnterprise resource planningGoogolCloud computingProcess (computing)Data compressionIntelExpandierender GraphServer (computing)Loop (music)CurveRow (database)Computer animation
Moment (mathematics)Speech synthesisData conversionFigurate numberComputer animationMeeting/Interview
Transcript: English(auto-generated)
Anyway, so, next up, we have actually a doubleheader, as it were. We have Lidi Zhang and Pau Frisis, am I right? Yeah, more or less, Pau Frisis. All right.
And so, Lidi's a software engineer at Google under the TechInfra Network Systems area. He is an active member of gRPC Repo and mostly contributing to gRPC Python, focuses on API design, distributed systems, tooling. Prior to Google, he got his master's degree from CMU and had several years' experience
in tech startups at Beijing. And then, Pau, who's also speaking as part of this spot, has been working the last four years as a software engineer at Skyscanner, had a chance to build, run, and own many Python services at scale and production, mainly on async.io.
Gave him an opportunity to contribute to gRPC, as well as several other projects in the Python ecosystem in the open-source community. So they're going to be speaking on, unsurprisingly, gRPC Python, C extensions, and async.io. So welcome to you both, and...
Hello. Okay, let's get started. So, hello, everyone. Welcome to join our talk. Today, we are going to discuss how we build gRPC Python, how we utilize this Python C extensions, and how we integrate it with async.io. If you have any questions, feel free to post it in the Discord channel.
It's called hashtag, talk gRPC and async.io. So please allow us to introduce ourselves again. So this is Lily Zheng, and I'm a software engineer at Google. I have been a maintainer of gRPC Python since 2018. Paul, do you want to introduce yourself?
Yes, thank you, Lily. So, yeah, this is Paul. First of all, I wanted to provide a great thank you to SkyScanner for funding my time for working on the project the last year. Sadly, I'm no longer a member of the SkyScanner family. I'm currently working at ona.com.
As most of you, I'm a Python enthusiast, but what I do like more is solving any kind of engineering problem. And I try to do my best also these last years for contributing to many different open source projects passing over to you, Lily. OK, so next, let's get started.
So what is gRPC? As the name suggested, gRPC is an RPC framework that builds upon HTTP2 as its transport protocol. So it's meant to be fast, lightweight, and it is designed for distributed systems. So it carries some highlight features, for example, streaming RPC and various load
balancing policies and ways to do the load balancing. And we also provide interceptors so you can inject logic at any stage of an RPC. We also integrate well with protobuf, so it enforces your API contract. Currently, we're getting around four hundred thousand downloads per day.
On the right side is our cute new logo and her name is Golden Retriever Pancake. And she's cute. So there's another reason for you to try out gRPC today. So before introducing gRPC Python, we have to introduce gRPC Core. Core is actually the component that handles all the HTTP frame processing,
serving, compression, security, load balancing, all this complex stuff. And Python is just a thin wrapper over it. As you can see, it handles so much functionality, it would be unwise to build it again and again for each different languages. So many languages like C++, Python, Ruby, we are just a thin wrapper over it.
And in total, we have 14 supported languages. And it gives us not only better performance, but it also lowers our maintenance burden.
OK. OK, so it also gives us some friction. For example, segfault. So Python developers really isn't really familiar with segfault. And we have seen a lot of complaints about don't know how to debug it.
And it's complex. And also memory leaks. Python's memory management model is very different from C++ once. It is very error prone when we are trying to manage the lifecycle of C++ object in Python space, as well as compilation across platforms.
So compilation in Linux and Mac OS probably will be easy. But how about compile on Windows? And to solve the compilation part, we not only dispute our source code, but we also dispute binary wheels. So, yeah, so JPC Python users don't need to worry about that.
What does Python C extension actually look like? So C extension, that module retains your C++. On the right side is a short example of it. As you can see, all this code, what it does is it creates a module and you're trying to print the hello world. On the first line, you can see the header, python.edge, it includes all the API
you needed for manipulating Python objects within C++ space. It's quite complex to write. Also, the API itself varies from version to version. So to make things easier, people are trying to come up with simpler ways to do it, for
example, like better C++ framework and glue code generators that helps to ease the pain of writing glue code. And there are many ways to write C extensions in the market. And I'm going to talk about only three of the representative ones.
So first is PyClif. So PyClif, it is a templating language. It works really well when you are just trying to expose C++ interface into Python space. But if you're trying to do something more complex than that, when you're trying to meddle around with the threading model, life cycles, and you want to be
sufficient. But next is PyPy11, which is a portable, lightweight, header-only C++ library. And I can't really complain. But it requires people to code in C++. Since this is a PyCon, so I put it in the drawback side.
But if you are a C++ fan, maybe it's a plus for you. And finally, Cysn. So Cysn is a language very similar to Python. It's very easy to develop. It's adopted by NumPy, SCIPy, and TensorFlow. However, even though the language itself claims to be a supercell of Python, but it's
not a strict superset, you will see some weird quirk of Cysn language itself when you are trying to use it. So eventually, gRPC Python team decided to use Cysn because it's similar to Python. So when gRPC Python users want to take a step ahead to help improve the library,
they can. So let me introduce how does Cysn work in a minute. So, for example, we have a prime checker, so it does very simple math computations. Try to loop over a number and try to see if it is a factor of the number you are
checking. And we put it in the right place. You can import it just saying import prime checker and then use it. What about Cysn? So first, Cysn not only compiles Cysn specific code, you can also compile Python like the entire entirely Python source code.
But down below is a version of it. You can try to import library from C or C++ with C import, and you can also define static variables like cdef double root. And after that, you can compile the Cysn provide tooling for you to compile it
into some sort of lines of C++ code and your setup tools will help you to compile them into a shared library object. And when you put the shared library object in the right place, you can just import it like any other Python module and use it.
So when you are trying, if you have ever tried to debug a Python extension, you will find out it's a hard challenge because when you're trying to do GDB Python, like Python 3.7, when you're trying to see the back trace, all you can see is evaluate frame, evaluate frame, evaluate frame.
It got nothing. Like it makes people mad. However, this is a problem troubling the Python maintainers as well. So they come up with this python-gdb.py script built for GDB. It ships with every single Cysn source release and you can find them in
your installation folder, in your source code, in your downloaded source code folder. After the Python GDB mode is turned on, suddenly GDB understands all the Python variables. So you can see Python's back trace, you can see Python code. And when you're trying to print some variable, it doesn't say py object, but
instead it tells you what's inside that py object. And next, so with all the effort to make Python work with C++, and finally we got a gRPC Python working with gRPC Core, but there is still one problem troubling us.
So let's say we have a gRPC server and it's calling methods from gRPC Core. And it's running a Postgres, so all Python threads are Postgres, and let's say using gEvent or eventlet, and they will try to monkey patch all the standard library, including threading, and swapping out the Postgres with core routines.
And okay, so we have a valid thread, and then in order to make it a server, we also need to need a polling thread and a bunch of executors thread. Each one RBC will consume an entire executor thread.
So with limit number of executors thread, it also limits our concurrency. To get, to make things worse, there is this global interpreter log that is restricting all the Postgres threads. So it not only makes the performance worse, but it also causes deadlocks when we are jumping from Python space to C++ space and back to Python space.
Okay. However, we find a solution. So next, Paul will discuss how we solve this challenge by using ACIO. So let's leave the stage to Paul. Thank you so much, Spidey. Could you, could you stop sharing please?
I'm going to share it from my own. Where's the button? Okay. Yeah. Can you see my screen?
Yeah, we're good. Perfect. So I think it was almost one year, one year, a half ago when we started that initiative together with a lady and other members of the GRPC community. It wasn't that time that the Sky Scanner we're looking for, because we were making a transition between HTTP or we are trying a transition between
HTTP and GRPC, we had some of our main services running with Python and SyncIO, and we didn't have any solution for moving from HTTP to GRPC without giving up on the SyncIO. And this is the time when we started this initiative together with GRPC
for trying to solve that problem. So what was the first headache that we had? As you, as most of you will know when you are running a SyncIO application, and when you are using the weight syntax, basically what is happening behind the scenes is that you are returning back to
control to the loop, providing a future, which basically would be called later for waking up the task that was asleep. This pattern, which is the main pattern that is being used by SyncIO was not reproducible by using the primitives that they were exposed by the C++ interface of GRPC, because basically the
main interface for pulling events was a blocking interface. Not only this, also we needed a kind of IOManager system that would allow us to implement all of the socket network operations
using the SyncIO. So we were facing two different problems here. The first one, the IOManager was something that was addressable since GRPC was providing a way for overriding the custom IOManager, but there was no unblocking primitive for pulling events for the GRPC core.
Other frameworks like g-even and Node.js, because of different circumstances, they were able to address that issue, but suddenly the GRPC C++ team, they implemented a new completion queue callback, which basically was based on callbacks.
So basically instead of having something that would block you, you would be able to configure a callback, which basically would be called when the event was there. Which would allow us to chime this event through a SyncIO filter. So, yeah, I think we had pretty much what we needed for starting that initiative.
And we set Eureka. And after a few times, after some days, we managed to have a first solution, which basically uses a new custom IOManager, a whole replacement of the old network operations, also using the callback
completion queue and the results were quite good in the graph. We can see the different, the performance difference between what, between the synchronous client, which is the version that we already had there, versus the new implementation that was in a SyncIO native implementation.
For the client, for example, we went from not more than 10k queries per second using in articles to almost 20k queries per second. And in the server, we also had a really, really nice posting moving from almost 5k queries per second to more than 15k queries per second.
So we were, we were happy with the results that we got. But then another headache came into our plates, which was basically, as you, as I already said, we had, we had already one client implemented, which was the synchronous one, and we had a new one,
which was an asynchronous one. And it presented some challenges because the synchronous stack would be there for a long time, maybe forever. And one of the possible issues that we could, we could face in the future would be, for example, having an asynchronous application, a server, which
basically schools, a third party, a library, which behind the scenes uses the synchronous version with a solution that we had on the table. It wouldn't work. So we had to address that situation. Um, for addressing that situation, we thought in different, on different, on different ways for addressing it.
Uh, the first thing that we thought was, okay, let's try to rewrite the whole single stack by using the synchronous one, so all of the synchronous function would end up basically calling an asynchronous and asynchronous goal. But, uh, this, uh, we didn't have, we didn't have a clear idea that if it would work at last or not, and also the amount of port that it would need,
uh, would not be a negligible. The second, uh, solution that we put on the table, uh, was for example, um, uh, changing a bit, uh, changing a lot, the C++ implementation by providing, uh, by providing a way for having two coexisting stacks at the same time, running at the same time.
But, uh, that seemed to not be also an easy solution on the table. And finally, we found out the solution that, uh, which the amount of changes that seemed to need, uh, was affordable, which basically was let's have one threat, uh, where all of the GRPC IO operation
for the executed. And in that way, we'll be able to not block the synchronous client that could be called, it could be run on top of, uh, an asynchronous application and the first ones that we got, uh, by using that technique were quite good. So the impact on the performance, which was one of our main concerns,
uh, was not, uh, was, was almost negligible, but then we start experimenting, uh, some deadlocks. When we started the experiments, these deadlocks, we started experience on random deadlocks at the moment that we try to, for example, pass
all of, all of the synchronous test suite or an asynchronous context. And then is when we realized that we were having problems with some contention with, uh, between some, uh, Jill acquires and some acquires in the mutex, the mutex of the, uh, of the GRPC code library. And then for addressing that issue, we had to put the proper Jill releases.
And then it's when, uh, the performance, uh, went down, uh, system in a, in a substantial way. So basically the problem that we were facing, uh, before adding the proper, uh, release of the Jill is basically we had two threads, one thread going, calling a GRPC core function, which would acquire a mutex, which at the same
time, uh, would make a call to the callback, which will trigger a Python function that would like to acquire the Jill, but at the same time, but at the same time, we also had a threat, which was trying to cure to Jill, he managed to acquire, but also made a Cisco to the GRPC
core and try to acquire the mutex, but was not the mutex was already locked by the other thing. So we were facing a simple deadlock here. And for addressing these deadlock, the only way was always, if you, we have to call a GRPC core, we had to, um, uh, release the shield
first by releasing the Jill, uh, what happens that we, uh, surface it a lot of contention on releasing and carrying the shield. And this is when the performance gets impacted a lot. And then we went for a second solution, which basically this second solution, which we call we basically gave up on trying to use the callback compassion.
Cool. We gave up on using another IOM manager was like starting over again. Uh, we basically, what we did is physically we implemented the solution where we will have a new set. Uh, this said, we'll be pulling the GRPC core. And every time that we will have a new event from the GRPC core, we'll be
waking up the Evan loop, uh, set of a second day on and for doing so, but at the same time, avoiding as much as possible, the contention for acquiring the shield, we'll be doing this by trying to avoid as much as possible Python code. So even we implemented that, that threat by using site on most of the
code that was executed for pulling and for waking up the envelope was using basically, uh, no worries. Oh, so, uh, basically these, uh, solution was really good. Had all the good benefits. For example, we remove the important department of having
to maintain a new way manager. And the good thing is that the performance degradation was kind of affordable. So America again, so for the client, for example, we were able to, we managed to keep the same performance that we got in the first solution, but we had a significant, uh, impact on the performance for the
server, but there's still a good boosting compared to with a synchronous one. Um, passing over to Lily. Okay. So thank you Paul for introducing our AsyncIO integration. Um, the gRPC AsyncIO API has been released as expanded API since 1.25.
Feel free to give a try. Also, it already passes all the Intel tests, which means it integrate well with all other gRPC languages and with historical version of gRPC. And also we're integrating is with, uh, Google Cloud Platform clients,
and we're expecting some of them to be released in quarter three, 2020. And the API reference can be found on gRPC.io website. Thank you. Now we take questions. All right. So, um, does anyone have any questions here?
I'll give it a minute because sometimes these, um, creep in and meanwhile, while we are waiting, um, anyone who is in the, um, uh, anyone who's, uh, hold on, my brain just left me. Okay.
So, um, uh, so after the Q and a, um, everyone can, uh, talk in it. What was the name of your talk again? See, that's the thing is trying differently at gRPC there. Yep. gRPC and yep. So in the talk, gRPC and async IO, you can go to that room as well for further discussion afterwards.
I'm just looking to see if anyone has any questions, post them here in zoom or on a discord, if you have them there and, uh, we'll just, uh, hang tight for a moment and, uh, you know, like we're still in Camilla on the
last talk, you know, my, my, my speech teacher used to say, if you don't get any questions, usually it means that you presented really well. Thank you. So, yeah, it actually, it looks like, it looks like, uh, you're good. So, uh, yeah. Uh, hop over to talk to your PC and async IO on, um, on discord
and, uh, enjoy that conversation there. And, uh, thank you both for being here. Oh, actually I have to do. Wow. This worked for me.