High Performance Networking in Python
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Teil | 27 | |
Anzahl der Teile | 169 | |
Autor | ||
Lizenz | CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben | |
Identifikatoren | 10.5446/21244 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
EuroPython 201627 / 169
1
5
6
7
10
11
12
13
18
20
24
26
29
30
31
32
33
36
39
41
44
48
51
52
53
59
60
62
68
69
71
79
82
83
84
85
90
91
98
99
101
102
106
110
113
114
115
118
122
123
124
125
132
133
135
136
137
140
143
144
145
147
148
149
151
153
154
155
156
158
162
163
166
167
169
00:00
SoftwareentwicklerSpeicherabzugPauli-PrinzipFamilie <Mathematik>Gerichteter GraphGravitationUmsetzung <Informatik>VektorraumPauli-PrinzipVersionsverwaltungSpeicherabzugSoftwareentwicklerVorlesung/KonferenzComputeranimation
00:53
Socket-SchnittstelleStreaming <Kommunikationstechnik>Treiber <Programm>Zellularer AutomatStandardabweichungGesetz <Physik>SchlussregelDatenstrukturSchreiben <Datenverarbeitung>LoopCodeKommunikationsprotokollSocket-SchnittstelleStreaming <Kommunikationstechnik>Treiber <Programm>RechenschieberVorlesung/KonferenzComputeranimation
01:50
NP-hartes ProblemGenerator <Informatik>Singularität <Mathematik>Computerunterstützte ÜbersetzungRechenschieberKoroutineFramework <Informatik>Spezifisches VolumenSchnittmengeComputeranimationVorlesung/Konferenz
02:29
NormalvektorProgrammierungPunktCodeFramework <Informatik>Güte der AnpassungSichtenkonzeptEinfach zusammenhängender RaumVorlesung/Konferenz
03:02
Generator <Informatik>Singularität <Mathematik>KoroutineGenerator <Informatik>Gerichteter GraphImplementierungCodeComputeranimationVorlesung/Konferenz
03:41
DatentypFunktion <Mathematik>SystemaufrufMultiplikationsoperatorKoroutineEinfache GenauigkeitCodeGenerator <Informatik>Gewicht <Ausgleichsrechnung>TypentheorieComputeranimation
04:09
DatentypFunktion <Mathematik>SystemaufrufAlgorithmische ProgrammierungEindeutigkeitKonstruktor <Informatik>Framework <Informatik>Vorlesung/KonferenzComputeranimation
04:52
DatentypSystemaufrufFunktion <Mathematik>InstantiierungSichtenkonzeptDifferenteFramework <Informatik>ProgrammierungSystemaufrufVorlesung/KonferenzComputeranimation
05:24
InstantiierungDemoszene <Programmierung>Anwendungsspezifischer ProzessorProxy ServerAutomatische IndexierungGewicht <Ausgleichsrechnung>Objekt <Kategorie>DialektZahlenbereichRechter WinkelInnerer PunktDifferentePaarvergleichIterationMultiplikationsoperatorMailing-ListeGenerator <Informatik>Einfache GenauigkeitKommunikationsprotokollKontextbezogenes SystemFunktionalKlassische PhysikKoroutineDatenverwaltungBefehlscodeNormalvektorTypentheorieSystemaufrufVorlesung/Konferenz
07:12
SoftwareentwicklerMereologieStandardabweichungParallelrechnerProgrammierungFramework <Informatik>CASE <Informatik>PunktEreignishorizontXMLComputeranimationVorlesung/Konferenz
07:48
SoftwareentwicklerMereologieStandardabweichungImplementierungKommunikationsprotokollInstantiierungMereologieStandardabweichungProgrammbibliothekGravitationComputeranimationVorlesung/KonferenzXML
08:17
ComputerarchitekturDifferenteGruppenoperationPhysikalisches SystemProgrammbibliothekSymboltabelleBridge <Kommunikationstechnik>Rechter WinkelSystemplattformDifferenzenrechnungMereologieQuaderProgrammfehlerDreiecksfreier GraphSoftwaretestSoftwareFaltungsoperatorVorlesung/Konferenz
09:15
EreignishorizontInterface <Schaltung>ServerTaskWarteschlangeSynchronisierungKommunikationsprotokollTransportproblemDifferenteStreaming <Kommunikationstechnik>Faktor <Algebra>ProgrammierungEreignishorizontEinfach zusammenhängender RaumLoopServerImplementierungPrimitive <Informatik>Computeranimation
10:01
TaskEreignishorizontInterface <Schaltung>ServerWarteschlangeSynchronisierungMultiplikationsoperatorTextbausteinImplementierungServerTeilbarkeitEinfach zusammenhängender RaumTaskCodeKoroutineFramework <Informatik>Spezifisches VolumenVorlesung/KonferenzComputeranimation
10:32
Interface <Schaltung>TaskEreignishorizontServerWarteschlangeSynchronisierungEinfache GenauigkeitInterface <Schaltung>WellenlehreKlasse <Mathematik>WarteschlangeProgrammfehlerVorlesung/KonferenzComputeranimation
11:09
EreignishorizontSoftwareentwicklerEinfach zusammenhängender Raump-BlockWarteschlangeLoopProgrammfehlerInstantiierungVorlesung/Konferenz
11:44
EreignishorizontTaskMAPPortscannerLoopMultiplikationsoperatorSchedulingSystemaufrufTaskFaktor <Algebra>CodeMereologieProzess <Informatik>RechenwerkComputeranimationVorlesung/Konferenz
12:20
EreignishorizontTaskVererbungshierarchieProgrammierungTeilmengeForcingTypentheorieFormale SpracheComputeranimationVorlesung/Konferenz
13:11
EreignishorizontTaskVererbungshierarchieMereologieSoftwaretestCodeGüte der AnpassungEreignishorizontQuaderHackerComputeranimationVorlesung/Konferenz
13:41
EreignishorizontTaskVererbungshierarchieEinfügungsdämpfungMetropolitan area networkBenchmarkServerImplementierungSichtenkonzeptComputerspielReelle Zahlt-TestKartesische KoordinatenDistributionenraumWort <Informatik>GSM-Software-Management AGEinfache GenauigkeitFramework <Informatik>SystemplattformMultiplikationsoperatorSpezielle unitäre GruppeInstantiierungCodeVerkehrsinformationXML
14:47
MultiplikationImplementierungBefehlsprozessorFormale SpracheResultanteRechenwerkBildschirmfensterLastParallele SchnittstelleBitVorlesung/Konferenz
15:34
RechenschieberDifferenteStreaming <Kommunikationstechnik>KommunikationsprotokollMereologieVorlesung/Konferenz
16:02
MAPStreaming <Kommunikationstechnik>KontrollstrukturUniformer RaumClientDatenflussEbeneEreignishorizontSocket-SchnittstelleLoopStreaming <Kommunikationstechnik>ServerImplementierungKommunikationsprotokollKoroutineWärmeübergangBimodulTropfenVersionsverwaltungExogene VariableProzess <Informatik>MAPCodeMetropolitan area networkSymboltabelleBitEinfach zusammenhängender RaumWort <Informatik>Gerichteter GraphWasserdampftafelGewicht <Ausgleichsrechnung>Arithmetisches MittelSpeicherabzugClientComputeranimation
17:38
MagnetbandlaufwerkDatenflussImplementierungKommunikationsprotokollMAPPuffer <Netzplantechnik>GamecontrollerElektronische PublikationLoopEreignishorizontDatenflussQuick-SortPunktVorlesung/KonferenzComputeranimation
18:34
Streaming <Kommunikationstechnik>VektorraumPhysikalisches SystemElektronische PublikationEinfach zusammenhängender RaumEreignishorizontSystemaufrufVorlesung/KonferenzBesprechung/Interview
19:14
MagnetbandlaufwerkDatenflussEreignishorizontKommunikationsprotokollEinfach zusammenhängender RaumGamecontrollerStreaming <Kommunikationstechnik>LoopDatenflussKontrollstrukturComputeranimationVorlesung/Konferenz
19:45
Streaming <Kommunikationstechnik>Singularität <Mathematik>CodeNebenbedingungRechter WinkelPuffer <Netzplantechnik>Gewicht <Ausgleichsrechnung>ZeichenketteEinfache GenauigkeitStreaming <Kommunikationstechnik>KommunikationsprotokollCodeGanze FunktionTransportproblemLesen <Datenverarbeitung>Schreiben <Datenverarbeitung>Computeranimation
20:32
Treiber <Programm>KommunikationsprotokollGewicht <Ausgleichsrechnung>FokalpunktRechter WinkelWort <Informatik>DatenfeldArithmetische FolgeRadiusStandardabweichungCodeKartesische KoordinatenTransportproblemMAPVorlesung/Konferenz
21:07
PortscannerImplementierungSyntaktische AnalyseSingularität <Mathematik>Syntaktische AnalyseStrategisches SpielWärmeübergangSchreiben <Datenverarbeitung>Lesen <Datenverarbeitung>GamecontrollerPuffer <Netzplantechnik>KommunikationsprotokollLoopStreaming <Kommunikationstechnik>EreignishorizontAbstraktionsebeneWald <Graphentheorie>Gewicht <Ausgleichsrechnung>AggregatzustandMessage-PassingSchlussregelProzess <Informatik>Computeranimation
22:39
Treiber <Programm>Offene MengeZustandsdichteDateiformatBinärdatenCodierung <Programmierung>Syntaktische AnalysePortscannerTypentheorieRechnernetzEin-AusgabeForcingStrategisches SpielWeb SiteEinfache GenauigkeitOffice-PaketQuellcodeMereologieTreiber <Programm>TropfenVollständigkeitMAPComputervirusBildschirmmaskeKommunikationsprotokollDickeMultiplikationsoperatorGebäude <Mathematik>DämpfungDatenfeldRahmenproblemAuswahlaxiomSystemaufrufWort <Informatik>BinärcodeProgrammbibliothekBinärdatenCodierung <Programmierung>Vorlesung/KonferenzComputeranimation
23:51
BinärdatenDateiformatCodierung <Programmierung>Syntaktische AnalyseTypentheoriePortscannerRechnernetzTypentheorieBinärcodeInstantiierungZirkel <Instrument>VollständigkeitHypermediaDimensionsanalyseCAN-BusVorlesung/KonferenzComputeranimation
24:23
Treiber <Programm>Ideal <Mathematik>TypentheorieServerBefehl <Informatik>CachingBefehl <Informatik>Syntaktische AnalyseCachingDatenstrukturZeiger <Informatik>MultiplikationsoperatorTreiber <Programm>ServerTypentheorieArithmetische FolgeInstantiierungEinfügungsdämpfungWald <Graphentheorie>Automatische Handlungsplanungt-TestProdukt <Mathematik>Computeranimation
25:23
FunktionalTouchscreenMereologieNormalvektorZeiger <Informatik>Streaming <Kommunikationstechnik>DifferenteTreiber <Programm>ProgrammbibliothekImplementierungPunktspektrumProzess <Informatik>Bildgebendes VerfahrenArithmetische FolgeFormale SpracheAbfrageDatenverwaltungServerInterface <Schaltung>EreignishorizontMultiplikationsoperatorVorlesung/KonferenzComputeranimation
26:45
ImplementierungRelativitätstheorieKonditionszahlTelekommunikationMechanismus-Design-TheorieCodeVorlesung/Konferenz
27:19
Klasse <Mathematik>Bridge <Kommunikationstechnik>DatentypTreiber <Programm>ROM <Informatik>Syntaktische AnalyseTypentheorieCodeMAPDatenflussTreiber <Programm>CodeSpeicherabzugSpeicherverwaltungKommunikationsprotokollDatentypZeiger <Informatik>FunktionalMereologiePuffer <Netzplantechnik>Arithmetische FolgeLesen <Datenverarbeitung>Objekt <Kategorie>EinfügungsdämpfungMaschinenschreibenKlasse <Mathematik>Mathematische LogikHalbleiterspeicherCursorImplementierungTransaktionTypentheorieSystemaufrufDimensionsanalyseQuick-SortGewicht <Ausgleichsrechnung>MAPZahlenbereichAeroelastizitätEinfacher RingQuaderDivergente ReiheSichtenkonzeptBefehl <Informatik>Computeranimation
29:51
DatenflussMAPCodeTreiber <Programm>AdditionImplementierungLoopKartesische KoordinatenKommunikationsprotokollCodeMathematikDatenreplikationLoopDatenbankImplementierungVersionsverwaltungInjektivitätVorlesung/KonferenzComputeranimation
30:34
BinärdatenServerClientCodeBenutzerprofilPortscannerFunktion <Mathematik>Konfiguration <Informatik>KommunikationsprotokollSyntaktische AnalyseMereologieEnergiedichteMultiplikationsoperatorCodeBinärcodeVorlesung/KonferenzComputeranimation
31:01
Rechter WinkelPhysikalisches SystemProfil <Aerodynamik>MultiplikationsoperatorUnrundheitCodeResultanteVorlesung/Konferenz
31:34
BenutzerprofilCodePortscannerKonfiguration <Informatik>Funktion <Mathematik>SelbstrepräsentationFahne <Mathematik>GeradeQuellcodeSystemaufrufCoxeter-GruppeHalbleiterspeicherResiduumKonfiguration <Informatik>ComputeranimationVorlesung/Konferenz
32:10
ImplementierungEinfache GenauigkeitPuffer <Netzplantechnik>MaschinenschreibenMessage-PassingHalbleiterspeicherObjekt <Kategorie>InstantiierungMAPComputeranimation
32:51
GamecontrollerSocket-SchnittstelleRechter WinkelMessage-PassingFahne <Mathematik>WärmeübergangSchnittmengeDefaultKreisbogenSocketVorlesung/Konferenz
33:32
Konfiguration <Informatik>MereologieImplementierungOverhead <Kommunikationstechnik>MAPExogene VariableRechter WinkelWald <Graphentheorie>MereologieRechenschieberKoroutineCodeImplementierungSpeicherabzugComputeranimation
34:02
InstantiierungEinfach zusammenhängender RaumOvalHalbleiterspeicherMultiplikationsoperatorMereologieMAPEinfache GenauigkeitStellenringDatenparallelitätHilfesystemLoopEreignishorizontCoxeter-GruppeKartesische KoordinatenVorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
00:00
I'm from Toronto, Canada. My name is Yuri Selvanov. I'm co-founder of Magic Stack. Check out our website. It's magic.io. I'm an avid Python user since 2008. I think the first Python version I started to use actually was Python 2, but then in a month I switched to Python 3. I used it since Alpha 2 or something.
00:22
I never looked back. So use Python 3. I'm C Python Core developer since 2013, but I believe I actually started to do things even before that. You might know me from PEP 362, which I co-authored with Larry Hastings and Brett Cannon.
00:41
It's Inspect Signature API. Then I have, then I've created PEP 492. That's async await that we have in Python 3.5. And I'm also helping Guido and Victor Steiner to maintain asyncio. I also created the loop. More on that later.
01:02
Structure of the stalks. I actually wanted to tell you so much about how to write high-performance code in Python and with asyncio in particular, but unfortunately I had to cut my slides. Like, I don't know, 50% of my slides had to go.
01:22
So we'll briefly start with an overview of async await. Then we'll quickly cover asyncio and uvloop. Then we'll answer, or try to answer a question, how you should write your protocols, how you should implement them using sockets or protocols, or maybe you should use streams.
01:41
Then I'll present you with something new. It's a new high-performance driver that I open sourced like two hours ago. And then we'll recap. I have to say that there will be no funny cat slides just because performance is hard. So only seven depressed cats from now on. So let's start.
02:03
There should be just one obvious way to do it, right? So we have five different ways to do coroutines in Python. First one is to do callbacks and defers. I think Twisted actually started and originated this approach, one of the first major frameworks at least
02:20
that used that and kind of validated that it is possible. So then we have Stackless Python and Greenlets. And I'm pretty sure everybody heard about eventlet and gevent. That's a good example of frameworks that use them. In short, the programs in gevent look like normal programs.
02:42
They kind of look like you are using threads, but instead it's just one program, one thread, and every point of your code can actually suspend and then resume. It's a lot of dark magic, and as Guido said, it will never be merged in CPython, so those guys are kind of on their own.
03:00
Then we have Yield, and it was possible to use generators as coroutines in Python since I believe Python 2.5 or something, and Twisted has a decorator called inline callbacks so that you can kind of implement modern looking code
03:23
using coroutines in Twist, and you could do this for years. Then in Python 3.3, YieldFrom was introduced, and AsyncIO benefits from it. That's how most of the AsyncIO code is written, using YieldFrom. And then in Python 3.5, we have Async08.
03:43
That's the new way. And why do I think that Async08 is the answer? Well, first of all, it's a dedicated syntax for coroutines. It's concise and readable. It's easy to actually glance over a large chunk of code and see what's actually going on. You will never confuse coroutines and generators.
04:04
There is now a new built-in type of coroutines. It's actually the first time in Python history that we have a new dedicated built-in type just for coroutines. We also have new concepts, Async4 and AsyncWith, and I believe this is something rather unique to Python.
04:21
When we added Async and Await, a lot of people actually told us, well, you copied it from C-sharp. Well, yes, we copied it from C-sharp, but we also introduced new things, and I believe this kind, Async4 and AsyncWith, are kinda unique. Like, I haven't seen any other imperative language that has this construct.
04:40
Async-await is also a generic concept. A lot of people think that Async-await can only work with Async-IO. That's not true, actually. Async-IO uses Async-await, but you can build entirely new framework and use them on your own. That's, for instance, what David Beasley did with his framework called Curio.
05:01
He uses Async-await in a completely different way from how Async-await is used in Async-IO. And also, Async-await are, it's fast. If you write something like a Fibonacci calculator, you will see that it will run just twice slower. And that is fine, actually,
05:20
because even in big Async-IO programs, you don't have as much Async-await calls as you have normal function calls. You cannot even compare. It's like 100 times more. So use Async-await as much as possible. It won't hurt your performance. You won't see any drawbacks. So, coroutines are subtype of generators,
05:42
but not in a classical Pythonic sense. In Python, they share the same C struct layout. They share like 99% of the implementation, but coroutine is not an instance of a generator, actually.
06:00
And you can see this, the sharing of the machinery. If you, for example, disassemble a coroutine, you will see that it still uses yield from opcode. Then we have types coroutine. Originally, we introduced it to make old style yield from coroutines from Async-IO compatible
06:22
with new coroutines that use Async-await syntax, because you cannot just await on things. You can only await on awaitable objects. So you cannot await on number one, and you cannot await on a generator. But if you wrap a generator with types coroutine decorator,
06:42
you can await on it, actually. And again, David Beasley uses this kind of creatively in Curio. If you are interested in Async-await, I definitely recommend you to take a look at how Async-IO is implemented and how Curio is implemented, just to compare it to different approaches. And then we have a bunch of protocols for Async iterators and Async context managers.
07:04
Let's move on. Let's talk about Async-IO, Libvv, Cython, and Uvloop. So Async-IO is developed by Guido himself, originally. I think a lot of it is inspired by Twisted. And it's actually good, because Twisted existed
07:21
for, I don't know, 20 years or something, and they validated that this concept of asynchronous programming in Python actually works. So I think we copied quite a lot from Twisted, and Twisted actually plans to use Async-IO at some point when they fully migrate to Python 3. They will just use Async-IO event loop.
07:41
A lot of people call Async-IO a framework. Well, it's not a framework. I would call it a toolbox, actually. It doesn't implement HTTP, for instance, or any other high-level protocols. It just provides the machinery and APIs for you to develop this kind of stuff. If you want HTTP, you probably would use AIO HTTP for that.
08:01
If you want memcache driver, you go and Google it. And it's also part of standard library, which is both good and bad. Why is it bad? Python has slow release cadence. We see new Python major releases every year and a half,
08:21
and bug-fixed releases usually are half a year apart. And I would say that for Async-IO, sometimes it's not enough. Sometimes we discover bugs and we want to fix them as soon as possible, but we have to stick with the Python release cycle. But it's also good, because you kind of know
08:41
that Async-IO will stay with us for a while. It will be supported by someone always, because it's a part of the standard library. And also, Python has a huge network of build-bots with different architectures and different operating systems, and it's quite important, actually, to test something as convoluted and as hard as IO
09:03
on different platforms. So it's good. Async-IO is quite stable right now, and it will be even more stable pretty soon. So what's inside Async-IO? So we have standardized and pluggable event loop.
09:21
Actually, Async-IO from the beginning was envisioned in a way that you can swap the event loop implementation with something different. It defines protocols and transports. That's one way to actually marry callback style programming and Async-await,
09:43
is to actually develop protocols using low-level primitives, such as protocols. It also has factories for servers and connections and streams. And this is also quite important, because if you implement a server, let's say, using blocking sockets,
10:01
you implement it once, and then you start to implement a second time, you will see that you have lots and lots of boilerplate code that kinda looks the same every time. So Async-IO takes care of that and factors out all of this implementation and convenient helpers for creating servers and creating connections. It also defines features and tasks. Tasks is something that actually runs the coroutine,
10:25
that pushes the value into coroutines, that suspends them and resumes them. In a framework-independent way, it's called Coroutine Runner, actually. And features allow you to interface with callbacks.
10:41
That's how you actually introduce Async-await into something that uses callbacks. It also has interfaces for creating and communicating with subprocesses asynchronously. It has queues. And by the way, queue is a very useful class.
11:01
You should definitely use it. It's exceptionally hard to create an asynchronous queue that supports constellations, all the stuff like that without bugs. We still fix a lot of queue bugs in 3.5.2. So queues are useful for things
11:20
like connection pools, for instance. Definitely check it out. And we also have locks, events, semaphores, everything like that, everything that nobody knows how to use, actually. And as Lukas Lang said on his talk on PyCon US a couple of months ago, if you love the locks, you can still have them in Async-IO.
11:43
So event loop is the foundation. It's the engine that actually writes, that actually executes Async-IO code. It also provides factories for tasks and features. It's also an IO multiplexer. That's the engine that actually reads the data and pushes the data to the wire.
12:03
It provides low-level APIs for scheduling callbacks, for scheduling timed events, for working with subprocesses and handling Unix signals. And the best part about it is that you can replace it. So that's what we kinda did with Uvuloop.
12:22
Uvuloop is 99.9% compatible with Async-IO. I'm not aware of any incompatibilities, but maybe there are some. As far as I know, you can drop in Uvuloop in pretty much any program and it will just work. It's written in Cython. And by the way, Cython is just amazing.
12:42
It's unfortunate that it's not as widespread and I think it's kinda underappreciated what you can do in Cython. Essentially, it's a superset of Python language. You can strictly type it and it will compile to C and you will have C-speed. You can easily achieve it with a syntax closer to Python.
13:03
So definitely check out Cython and try to use it. Uvuloop uses libuv. Libuv is something that keeps Node.js running, actually. Node.js uses libuv as its event loop. And it's actually a good thing because Node.js is super widespread
13:22
and it's very, very well tested. So libuv is stable and it's fast. It also provides fast tasks and futures. So even your Async-await code runs faster on Uvuloop by about 30%. And it's also thanks to libuv and a few hacks.
13:42
It has super fast IO. So how fast is Uvloop? Well, compared to Async-Io, it's two to four times faster on simple benchmarks like Echo Server. Again, nobody probably deploys Echo Servers in real life, so as soon as you add more Python code,
14:02
of course it will become slower. But again, even in real applications, I've seen reports that Uvloop runs code about 30% faster. And also, the latency distribution is much better with Uvloop. So it's faster than Async-Io.
14:21
What about other platforms and framers? For instance, the same Echo Server written in Python which uses Uvloop is two times faster than Node.js. And it's kinda interesting because Node.js is itself written mostly in V8. That's the JavaScript implementation. It uses libuv which is written in C
14:41
and there is a thin layer of JavaScript on top of it. So still, Uvloop that uses the same libuv is two times faster for almost the same amount of code. It is as fast as Go, run with Go maxprox set to one. That essentially means that Go cannot parallelize
15:02
on multiple CPU, the load. But still, it's quite an impressive result because Go is like a fully compiled language and it also has, I think, a bit more efficient implementation of IO than libuv.
15:20
Just because libuv is trying to be generic, it supports Windows, it supports Unix. Golang supports it too but in slightly different way. Anyways, and of course, it's much faster than Twisted and Tornado just because it uses a lot of it is in C, like most of Uvloop is in C.
15:43
So initially, my idea for this talk was to end with this slide, just use Uvloop. Thank you for your time, questions. But unfortunately, it's not that easy. So part three, let's talk about sockets, streams, and protocols. That's basically one obvious way to do it, episode two.
16:02
So what should you choose? Should you use coroutines like socks and dolls, sock receive, sock connect? Or should you use high level streaming API? Or maybe you should use low level protocols and transfers. Here is an echo server implemented with loop sock methods
16:22
and if you look at it closely, you will see that if you kind of drop async and await keywords, it looks like a normal blocking code that uses the socket module. So it is kind of convenient when you have lots and lots of code, an old style code, blocking code.
16:43
You can kind of easily convert it to async and await. Here are streams, here is the streams version of the echo server. It's quite high level as you see. You don't work with sockets anymore. You have reader and writer.
17:02
And here is a low level implementation of echo server using protocols. So essentially, protocol is something that the event loop just pushes the data in and protocol has a transfer to push the data back to the client, to the server. So the key method here is data received.
17:21
That's like the main method. The event loop pushes the data to the data received, then protocol can process the data and then call transfer to dot write to actually send the process data or send a response back to the caller.
17:40
For echo servers, it's quite a simple implementation but you can imagine it gets pretty hairy for more complex protocols. So, downsides. When you use low level loop dot suck methods, loop cannot buffer for you.
18:01
So you are responsible to implement the buffering on top. And you also have no flow control which without buffers doesn't make any sense. You don't need flow control but when you start implement the buffers, you won't have it. And it's quite a tricky thing to implement correctly. And another thing why you shouldn't use it
18:22
is just because the event loop has no idea what are you doing right now. Let's say you are reading some data. Okay, event loop will add your file descriptor to a selector which can be EPOL or KQ on Unix.
18:44
And essentially wait for an event and when it receives this event, it will try to read the data, push the data back to you but it will also remove the file descriptor from the selector. That's an extra system call. Because it doesn't know, will you continue reading the data
19:01
or will you write the data now or will you just stop or will you close the connection? So it cannot predict what's going on. When you use streams, and streams are by the way are using protocols, event loop just knows because you have an intent. Just keep sending the data to my data received
19:22
or to my stream and when I don't need this data, I will close the connection myself. So event loop can actually optimize for that. And flow control is kind of important. I like this picture because it illustrates that sometimes you kind of have to push back on something slow or something that you don't want to use right now.
19:43
So which API you should use? You should use loop.suck methods when you are quickly prototyping something or when you are porting some existing code but I would highly recommend you to actually stick to streams. Even for porting code,
20:01
just rewrite it in streams because streams are much easier to use. You can just say, give me exactly this amount of data or you can tell streams, read until you see slash N or something like that. And it will do it. It also implements a buffer, read and write buffers quite efficiently and you can use async away
20:21
to program the entire protocols with streams. And use protocols and transports for performance actually. If you want exceptional performance, you have to go low level. So for this talk, let's focus on protocols and transports. And again, it's kind of important
20:42
for your application code, you should always use async away. Never even touch, never think about transfers and protocols. This stuff is just for drivers. Drivers for PostgreSQL, for Memcache, for Redis, for any kind of this kind of code. High level code should never think about protocols.
21:02
Always use async away. It will be enough. So let's focus on protocols. So as we mentioned before, loop pushes data to protocols. Protocols send data back using transfers. And protocols can implement specialized read and write buffers.
21:20
They can also do flow control. They can hint the event loop through the transport.resume and pause read methods. And you have full control over how IO is performed. You call transport.write, you can pause or resume data consumption.
21:42
So you have tools to control the IO. So how to use protocol and transports? There are basically two strategies. The first one is you implement your own abstractions, your own buffering and your own stream abstractions. And a good example of that is AIA HTTP.
22:02
That's what they do. They have buffers and then streams specifically designed to handle and parse HTTP protocol. And then they just use async away. It's a fine approach. It will be slower than using callbacks and then accelerating everything in C. But it's quite good, still quite good.
22:23
So the second strategy is to actually implement the whole protocol parsing in callbacks. And then create a facade that allows you to use async in a way. And the main key reason why this might be a better strategy
22:44
and why this can offer better performance is because you can just drop Python completely. You can go low level. You can use Cython, you can use C. So part four, AsyncPG. This is something that I just open sourced a couple of hours ago.
23:02
This is right now the fastest PostgreSQL driver for AsyncIO and for Python, actually. It's two times faster than PsychoPG. It completely re-implements the protocol from ground up. It doesn't use libpq. That's the defacto library for working with PostgreSQL. So we just implemented it completely from scratch.
23:24
It uses PostgreSQL binary data format. And by the way, when you are implementing protocol and you have a choice, use text or binary, always choose binary. It's easier to read binary. This is usually just less data because the encoding is more efficient. And you can process this much faster.
23:40
Because how binary formats works, usually you have a length field that tells you how much data follows this frame and then you have another one. So you can read frames much faster. You can decode types much faster. So always choose binary. And also, not all Postgres types can be encoded in text
24:00
and actually decoded from text. So composite types, for instance. If you have a recursive composite type, it's just not possible to decode it in PsychoPG. So what we did for AsyncPG, we actually forgot about DBAPI completely. There is no DBAPI for async await,
24:21
but for instance, what AOPG does, they kinda sprinkle async and await on top of existing DBAPI. So our idea was let's build a driver that just is tailored for Postgres and uses Postgres features. And we also support all built-in Postgres types, basically.
24:47
So Postgres loves prepared statements because it doesn't need to actually parse the same query over and over again. When you prepare a statement, it has basically some structure on the server with a plan, with a parse query.
25:01
And Postgres already knows how to accept your arguments and do this kind of stuff. So we use prepared statements every time. Even when you don't explicitly create them, we have an ORU cache of prepared statements and we do that transparently for you. We also dynamically build pipelines for efficiently encoding and decoding data.
25:22
So the pipeline is essentially an array of pointers to C functions that can process the stream like with enormous speed. So it actually shows. This chart compares different Postgres drivers for different languages.
25:41
The fastest one is AsyncPG. It manages to push almost 900,000 queries to the server. The second one is AOPG. That's another driver that uses libpq which also is written in C. But unfortunately, Psychopg doesn't provide
26:01
an efficient async interface, so it's slower. And also Async, AOPG, and Psychopg, they use text data encoding, so it always will be slower. Then you will see two Go implementations and then you will see Node.js drivers which are just 10 times slower.
26:22
The funny part about this one is that Node.js PGs is an actually pure JavaScript implementation of the driver and PG native is using libpq. So somehow, a lot of JavaScript is faster than C. I have no idea how. The funny thing about this performance is that there is another library.
26:40
It's not a part of this chart because it's kind of slow. It's called PyPostgresScale. Nobody knows about it. We used it for several years and then we just created AsyncPG. Anyways, it's a pure Python implementation and it's as fast as pure JavaScript implementation. So everybody is saying that Python is slower
27:01
than JavaScript, you shouldn't use it, but we kinda saw that it's possible to write a pure Python code that's as fast as Node.js code so maybe Python isn't that slow. So AsyncPG architecture, it's basically implemented in,
27:20
the meat of it is implemented in core protocol. Core protocol is something written in siphon. It uses callbacks to process the protocol. Then we have a protocol class that just wraps core protocol and inserts some future objects into it so that you can use Async Await. And the rest of AsyncPG is just pure Python
27:41
implementation that just implements the high-level API. So how would you parse-parse progress protocol? Naive approach would be just to use Python bytes and memory views, but unfortunately, doing so will cause a lot of Python objects to be created and you will actually see how long
28:01
you spend on memory allocation. So the solution is to use Python and go to the C types and just don't even touch Python bytes and memory views. So this is a preview of read buffer. It's a bit bigger than this, it's API,
28:21
but you can see the first method is the most important, feed data. That's what protocol data received actually calls. Protocol data received has just two lines in it. The first one pushes the data to read buffer and the second one calls a function that just reads from the buffer. And this buffer is kinda tailored for progress protocol.
28:40
It has low-level read in 32, in 16, and the second most important call here is try read bytes. Try read bytes either returns you a low-level CC data type or it returns you a null pointer. And if it returns a null pointer,
29:01
then you actually call read, which returns you a Python object which is much slower. But most of the time, 99% of the time, try read bytes succeeds and we can avoid creating any Python objects. So again, high-level logic of async PGA is built in pure Python.
29:21
That is how you can actually use it. You can see it's a pretty high-level API. We prepare a statement, we enter a transaction with async with, and we iterate over a scrollable cursor. Part five, let's recap. So don't be afraid of protocols. Use them to implement really, really high-performance
29:42
drivers and use Python for low-level code. It's really much easier to code inside than in C. You can quickly refactor your code completely, change everything, and it will just work. Async-await should always be used in your application code. Don't think about protocols and transport.
30:00
Use only high-level code. And again, once you have fast database drivers, memcache drivers, stuff like that, and you use uvloop, you will see your application being much, much faster. Loop create future was introduced in Python 3.5.2, actually. That's a new feature. With this, if you use loop create future,
30:22
uvloop can actually inject fast future implementation into your code because uvloop implements its own version of the future. And it's about 30% faster than async-await future. Always use binary protocols. Never, never even try to parse text protocols. So it doesn't make any sense.
30:41
If you can't do binary, go binary. Always profile your code. It's actually funny because when Async-PG actually started to work, I benchmarked it against AIO-PG, and it was two times slower. And I didn't understand why, because it should be faster.
31:00
There is no way it can be slower. So I spent about 30 hours without sleep optimizing Async-PG and made it two times faster. No, four times faster. So the important lesson from this is that if that first run showed that Async-PG was like 30% faster than AIO-PG, maybe I wouldn't spend so much time
31:21
trying to optimize it. So always profile, always analyze, and then try to push it forward. And by the way, Cython code can be profiled with wall grind, and you can visualize results in KK's grind. It's a very useful tool. Check it out. And Cython has a useful flag.
31:41
It's called dash a. It generates HTML representation of your source file, and each line is highlighted. It's either blank or it's a shade of yellow, and the most yellow lines use more Python C API. And it is slow, so basically you have a quick way of analyzing your Cython code, its speed.
32:03
So definitely check out that option. Always try to do zero-copy. Try to avoid working with bytes, memory views, all this kind of stuff. Go low-level with Cython, and don't copy Python objects, never. And one of the last advices, actually,
32:22
is to implement an efficient buffer to write data. For instance, for Async-PG, what we do for writing messages, we have a write buffer that just pre-allocates a portion of memory, and then we compose messages with high-level API, and we don't touch that memory at all,
32:44
and when the message is ready, we just send it. So we have high-level API of creating the message, but we don't allocate any memory while we are doing so. So when you have this control, you should definitely set TCP no-delay flag. We probably will set it by default
33:02
in Async-IO in Python 3.6. Right now, it's not set. You should do it, because it will speed up transfer.write method. Basically, with this flag set on the socket, socket doesn't wait until it receives a TCP arc message. It just sends the data as soon as you do it.
33:23
But if you don't have control over how frequently you are calling transfer.write, you can basically use TCP cork. What you do, you cork the channel, then you do multiple writes to it, then you uncork it, and it just sends all of your data in as few TCP packets as possible.
33:43
And the last slide is timeouts. Always implement timeouts as part of your API. Don't ask your users to use Async-IO wait for, because wait for is slow. It wraps the core routine into a task, and that comes with a huge penalty.
34:01
Your code will become 30% slower if you use wait for. So design timeouts as part of your API. At the lower level, implement timeouts with a loop.callLater method, and it will just work. That's it. Thank you.
34:29
Yeah, so I think we have time for maybe one or two questions. Hi, thank you for the presentation.
34:41
I want to ask you about using Async-IO and UB, your event loop, not for high performance, but for high concurrency. Do you have any, would you use it for high concurrency?
35:00
Of course. I have a scenario with hundreds of thousands of concurrent connections, but. Yes, UB loop is even better for that, because it uses less memory than Async-IO. No, here, you. UB loop is much better for highly concurrent application
35:22
that handles hundreds of thousands of connections, simply because it uses less memory. Again, it's faster. We tested UB loop with 100,000 connections, and it handles, it's pretty okay. Thank you.
35:41
Unfortunately, this is all the time we have. Thank you.