We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

How to Build a Python-to-C++ Compiler out of Spare Parts - and Why

00:00

Formal Metadata

Title
How to Build a Python-to-C++ Compiler out of Spare Parts - and Why
Title of Series
Number of Parts
131
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
A frequent topic about Python is performance: its interpreted nature inhibits optimisations, and the famous GIL limits parallelism (for now!). Existing Python Compilers - Cython, Numba, Codon - focus mainly on compiling small, critical bits of code to achieve linear execution speedups. As for parallelism: parallel for-loops powered by OpenMP. To parallelize highly concurrent programs with concurrent I/O and concurrent tasks, we need more. A key difference is it requires compiling everything: as soon as the Python interpreter comes into play, the GIL will make parallelism collapse. We introduce Typon, a Python-to-C++ compiler with powerful concurrency primitives powered by a crazy homemade task scheduler. It can take untyped, idiomatic Python code and output C++ code fully independent of the Python interpreter. It also provides seamless to-and-from Python interoperability, for those cases where you really just need to import numpy. In this talk we'll recount our journey so far: why we think it's important, how we're making something new out of existing bits, what we've achieved. Along the way we might delve into fun details like type inference, concurrency primitives, and C++ pretending-to-be-Python. You'll come out of this talk with some cool insights into compiler design, concurrency, and the design of Python. Knowledge of C++ not required. Knowledge of Python language inner workings helpful.
BuildingCompilerStorage area networkProjective planeMultiplication signClient (computing)Open set1 (number)Social classAdventure gameService (economics)Category of beingSoftwareCASE <Informatik>PlanningCompilerScheduling (computing)CalculationEnterprise architectureType theoryRow (database)CompilerInterpreter (computing)NumberOperating systemOpen sourceMachine codeShared memoryComputational scienceVermaschtes NetzTwitterSemantics (computer science)Software developerParallel portConcurrency (computer science)Core dumpComputer programmingLoop (music)Process (computing)Sheaf (mathematics)Thread (computing)Mathematical optimizationValidity (statistics)SubsetTask (computing)Data managementDatabasePhysical systemCodePoint cloudFocus (optics)Formal languageFreewareSpeicherbereinigungMehrprozessorsystemSlide ruleTerm (mathematics)Proof theoryStrategy gameAutomorphismMemory managementDrop (liquid)Instance (computer science)Electronic mailing listCentralizer and normalizerMatrix (mathematics)Computer animationLecture/ConferenceUML
Scheduling (computing)Parallel portTerm (mathematics)Functional (mathematics)Scheduling (computing)Data structureCodeCASE <Informatik>Point (geometry)Thread (computing)Analytic continuationMatrix (mathematics)SequenceFocus (optics)System callInternational Date LineSynchronizationSlide ruleReverse engineeringPrimitive (album)Ocean currentFunction (mathematics)Multiplication signBlock (periodic table)Concurrency (computer science)Normal (geometry)Process (computing)BitOperator (mathematics)Directed graphCore dumpOverhead (computing)Source codeNetwork topologyCompiler constructionNumberLatent heatOperating systemMixed realityPhase transitionState of matterComputer programmingAbstract syntax treeFerry CorstenUniform boundedness principleWeb 2.0Server (computing)Connected spaceTransformation (genetics)1 (number)LeakTask (computing)10 (number)CompilerRecursionVector potentialException handlingEwe languageCausalityType theoryWeightStack (abstract data type)Computer animation
Abstract syntax treeTypinferenzIntegerString (computer science)Object (grammar)ResultantFunctional (mathematics)Raw image formatElectronic mailing listParameter (computer programming)Generic programmingInstance (computer science)CompilerProjective planeLimit (category theory)Type theoryCASE <Informatik>CodeSocial classComputer programmingImage resolutionError messageException handlingInheritance (object-oriented programming)MultiplicationVariable (mathematics)System callFunction (mathematics)Letterpress printingTypprüfungField (computer science)Socket-SchnittstelleRepresentation (politics)CompilerAttribute grammarBitInfinityCompilation albumComplex (psychology)Line (geometry)Position operatorImplementationMessage passingOrder (biology)Operator (mathematics)Revision controlHierarchyArithmetic progressionKeyboard shortcutSemantics (computer science)Information systemsLibrary (computing)Electric generatorPhase transitionSpeicherbereinigungDefault (computer science)Interior (topology)Pattern languageNormal (geometry)Equaliser (mathematics)ParsingSquare numberInterpreter (computing)CuboidLecture/Conference
Design of experimentsForm (programming)Lecture/Conference
Formal languageInformation securityCodeMetaprogrammierungLatent heat1 (number)WritingEquivalence relationSocial classCuboidSoftware bugFluid staticsRight angle2 (number)Scheduling (computing)Lecture/Conference
Electronic mailing listLimit (category theory)Semiconductor memoryObject (grammar)Representation (politics)Interpreter (computing)Latent heatCountingPoint (geometry)Lecture/Conference
Electronic mailing listCountingPoint (geometry)Social class2 (number)Representation (politics)Thread (computing)Type theoryLecture/Conference
Multiplication signExpert systemBitTask (computing)HTTP cookieControl flowLecture/ConferenceComputer animation
Transcript: English(auto-generated)
Thank you for being here and thank to your Python for hosting this talk And I'm Xavier Thompson and I'll present to you this talk how to build a Python to C++ compiler out of spare parts I put the slides over there if you want to see them live
Otherwise, I'll also send them to your Python afterwards Let's get started So I work at a small company in Lille Or maybe 40 people most of them based in Lille north of France Some of them remotely from across the world
And it's a company that has the peculiarity of doing only free software So for instance, there's a few software's developed at nexity slap OS cloud orchestration system neo a distributed database the ep5 and
Enterprise resource planning software for enterprise management and resist a Resilient mesh network using ipv4 over ipv6 Sorry ipv6 over ipv4. The reason I mentioned these is because they're all written in Python
The Ep5 in particular is over 20 years old and this is to say Python is part of nexity's DNA Which is why I'm going to talk to you about performance Whenever there's a talk about performance or discussion two facts come up
Python is interpreted and The interpreter has a global interpreter lock at nexity performance matters to us
Because of the software's we write a good example is yuppie five so it's an enterprise resource and planning software We sell it or we sell services about it to clients. It's open source, of course And these clients have customers and we have a client that has a million customers
Which means yuppie five is used every month to generate over a million invoices and this is only going to scale up the company Aims to grow aims to have more customers. So there's a fundamental need to scale up and to always perform better To optimize everything and this is central to any development about the ep5
When performance Is a topic in the industry You might see a few trends some companies choose to migrate from Python or Partially Dropbox migrated some parts to go a few years ago
Some companies for oxy Python and make the in-house interpreter to go faster and optimize for the use case and Some companies choose not to use Python from the start this also happens and As a wider thing performance is a topic in the Python world the first proof of this is your Python
There were two talks about performance yesterday There'll be one later today in this room And there'll be one tomorrow as well Also, there are proposals to enhance Python about performance
In this list. The third one is particularly interesting. It's about Potentially removing the global interpreter lock starting with making it optional in Python So we might end up in a world where Python no longer has a global interpreter lock There are two global strategies to making Python go faster
The first one is faster interpreters Of course, there's the Python official interpreter itself, but beyond these there are projects To fork see Python projects in the industry and also open source projects to make
Replacements for dropping replacements for the interpreter. So interpreters that are fully compatible With Python and on the other hand, we see optimizing compilers and their usage is a bit different When you look at interpreters versus compilers
As I said interpreters are fully compatible with aim to be fully compatible with Python Optimizing compilers tend to mostly focus on optimizing critical sections So it's the thing you'll use to make your loop That is that you iterate many times over go faster by compiling it to native machine code
But the rest of your program will still interpret with Python and with the Python interpreter If we focus on what these offer in terms of concurrency and parallelism In interpreters, you have the same API's and tools available in Python
Multiprocessing this means no shared memory. It has it works, but it has its drawbacks Threads and then you run into the global interpreter lock Also a sinker wait, but this is single threaded. It's more for IO bound tasks And when you look at compilers
They often integrate with open MP This is great for data-based parallelism so if you do scientific computing and you need to Do the same calculation for every row in a matrix? You can parallelize it with this
Also In siphon specifically, I don't know this particulars for others You can release the global interpreter lock and have code that runs without the lock But in that case you can no longer use Python. You can no longer interact with the interpreter So in the end, you're just writing C code
And this is why At next ad we have this project We call it type and it's a Python to C++ compiler with a focus on concurrency and parallelism and As opposed to
Faster interpreters, it doesn't aim to support the whole language every feature The most dynamic ones are not supported by typing, but it does aim to allow compiling whole programs And with Python semantics with classes With automatic memory management with everything you might want to have if you write Python code
And looking at parallelism and concurrency This is built in to type in from the ground up. So the project is thought with this in mind This is why it's it's it's why we do this project
Another thing is that type one is a subset of Python It means that any valid type and program is also a valid Python program This is a useful property
And we still have Python interoperability So we don't need Python to have classes to do all that But we can and this is useful when you just need to import numpy or something like that So this whole adventure started in 2022 That year I wrote a scheduler
in C++ With the idea in mind that it would be used for a project like type on but at a time it was only C++ It's an what it's called an MN scheduler So the idea is to run many concurrent tasks over n fixed number n of worker threads
Essentially in the same way that the operating system runs many processes over a fixed number of computing cores But without relying on the operating system without using having the overhead that comes with the operating system switching processes switching threads
We offer two concurrency primitives actually There's a bit more than that, but I'll focus on these two fork and sink and The code I'm about to show you is not C++ code because I thought I wouldn't be showing you C++ code So it's type and code that will compile to C++ code that uses this scheduler
That way I avoid showing you lots of C++ code at the time I wrote the scheduler there was only C++ Let's dive straight into fork So fork is a primitive that offers an opportunity for parallel execution
It's just an opportunity it means It can still be a Sequential execution. It can still be like just a normal function call as if the fork was not there same thing But it offers an opportunity if there are idle worker threads in the scheduler for another worker thread
to steal the continuation That is to say while the fork is running its function What comes after what what would run when the child function returns this can actually be stolen and run in parallel with with the child call
This is a way to do parallelization that has many advantages Especially in terms of Performance we seek to optimize the case where The fork will run sequentially because you might have many many fork points in your program But maybe you have only two workers or only eight so much less
So most of the time a fork will not result in actual parallel execution. You don't have course to add to keep running a Thousand ten thousand a million tasks in parallel So by having something that is optimal and efficient in the sequential case
We have parallelism with very little overhead This Is powered by a data structure where the worker that is that is doing the fork and push and pop a reference to the continuation and Other workers that are idle can steal continuations from the top
This is what the state of the scheduler might look like You can have workers working pushing popping continuations as they do forks from their own data structure and idle workers seeking to steal continuations to start their own data structures if we look at the fork call
What happens is just before the function is called the child call? the continuation a reference to the continuation will be pushed in a data structure and Once the child returns an attempt will be made to pop it And this attempt can succeed and then it's a sequential execution
It continues or it can fail and this means the continuation was stolen By another worker in that case this worker becomes idle and seeks to steal work elsewhere It's of course possible to have folks recursively have a function that forks that cause a function that folks that cause a function
So it might look like this What I want to say here is that The continuation that are pushed first that are higher in the stack of continuations These are the ones that are higher in the call stack So if a continuation is still stolen if a pop fails it means all the previous ones also failed
Here's an example of a potential such recursive forked function So it's a naive very naive but parallel Fibonacci function So this is exponential and if you if you just look at the naive function
It's the worst way to write a Fibonacci function. But here the idea is that Thanks to this fork call in the middle it will run in parallel and spread over as many calls as are available for the work and Again, this is a case where we can see that There will be many more forks than there are actual workers. So the thing to optimize is the sequential case
Moving straight on to sync you might have noticed the call in the previous slide This is the reverse primitive fork allows us to fork execution into parallel strands Sync is about joining them back together and waiting until all have finished before continuing
So in the sequential case sync is a no operation again optimizing the sequential case In the In case there was a stolen continuation It depends on who finishes last the child call or one of the continuations
whichever will be the one that resumes that executes the The code that comes after the sync and the sync primitives offers us an opportunity for structured concurrency The idea of structured concurrency is that in a function that does a fork
There will be an implicit sync at scope exit when the function returns however, it returns which have a code path even if it's an exception and This is very useful because it makes it easier to reason about Concurrency in the program because it's localized to function once the function returns there can be no concurrency leaks
Another thing important when writing a scheduler is of course concurrent IO Let's Take an example This is a simple
partial Fragment of a web server written in Python. So for now, this is pure Python. I think of it as pure Python code It handles one connection at a time. It's a very basic naive server If you change it in just this light way Adding this for call and compile it with Taipan
Suddenly, it's able to handle tens of thousands of connections at a time solving the c10k problem How it does this beyond forking the the calls is that when compiling with Taipan the IO operation Instead of being a blocking operation that blocks the current thread. It's asynchronous allowing the the worker thread to do other work
Until the IO completes this uses IO Ewing under the hood and looking at the scheduler state It might be something like that a suspended task The worker that was working on it is now idle seeking to steal another confirmation and whenever the IO completes
The task can be run again weight left off Then in 2023 came the time to write the actual compiler And this work was mostly done by Tom Nije At the time he was an intern at nexity. He did such an outstanding job that he got himself hired
And before we dive into the specifics, I'll take the time to have a quick note about compiler design What a compiler does is it takes the source code passes it and produces a Structure a tree like structure called an abstract syntax tree
Which represents this code and then the compiler can be thought of as successive phases of transformations over that abstract syntax tree Until the last phase which will produce the output code There may be other data structures other representations, but this is a basic
simplified versions of what a compiler does The abstract syntax tree comes to us for free in Python. So we wrote Taipan The compiler in tie in Python itself because you can just import AST and suddenly you have a parser So AST offers a pass function and a dumb function that lets us look at the output so a few quick examples
Here we see that arguments are complex thing in Python Because there are position only arguments keywords on the arguments normal arguments defaults, etc
How the return looks like in this case? And another thing that is used is the visitor pattern this is Classes that will represent the phases of the compiler operating over the the AST Visitor is just a class that defines visit method for each type of node
And you can use the node visitor class Provided by a C or you can roll your own and so just a quick look of how that might be done Here we walk back the class hierarchy of the node being visited to see if we find a visit something method
matching This this class or a parent of it Otherwise and we call it. Otherwise we call the generic method The generic visit might just dispatch of all children or raise an exception or whatever And you can also implement as is highlighted here
Dispatching over lists as a convenience moving on to type inference I'll be relatively quick over this But it's a central part of the compiler. It's the part that allows the program Any Python program to be understood by a compiler?
So I'll just show you some some highlights especially generics So support for generic code So here's a generic function in Python We don't really think of it as a generic function in Python, but it's a add function that takes untyped arguments
It just works with whatever is addable. So in type and you can add two integers like this. It works The compiler understands that the output the result is an integer it can add two strings You can add an integer in a string and then the compiler will tell you it's a Compilation error a type error and you can define this weird class
We're adding whatever returns a string and the compiler will deduce that the result is a string Another thing about generics is generic classes The classic case is the list. So in type and with we this is one of the limitations We don't support heterogeneous lists of that may contain integers and strings and whatever
So we have to have a type for the contained object But when you create an empty list, you don't know what it might contain until later in the code For example you append an integer and then suddenly the compiler can deduce that the contained type is int Code generation. This is the only part where I'll show you some C++ code
If you don't know any C++, don't worry. It's just to show you a bit Let's dive in It's about Showing you how C++ can be used to behave like Python to have Python semantics to pretend to be Python in a way
so imagine we have Function this basic function in Python if you just if you know a bit a bit of C++ This might be how you would implement it in C++ but then you have to think about the fact that In Python a function is a first-class object. It can be referenced by other variables like any other object
so actually We compile a function to be an instance of a class that has a call operator In C++, and that's actually what a function is in Python anyway, so this makes sense If you have an argument we can an untyped argument we can implement generics in C++ using the other keyword
You can return attribute lookup If we have a class it's a bit the same thing as a function It's an object of a class that has a call operator, but the call operator returns an instance of the object
So there are actually two kinds of objects in the class case There is the type of the class and there is the type of the instance of the class And so we have two objects in C++ One Quick thing is in Python objects are reference counted to have automatic memory management
So we do these two instances. We don't return raw C++ object. We return a Reference counted object. This is how we implement it If this class had an Infield why it would be like this if it had a why method it would look like this same thing as the F Method earlier, but in the body of the class
And now we have to think about if you do an attribute lookup and it's a method it must be a bound method How do we do this? We have this dot helper that does it for us And that's the whole of what I'll show you in C++ But it goes on and on we have inheritance that way multiple inheritance and method resolution order
generic classes and on and on Moving on to Python interoperability for those cases where you just need to import numpy What's going on here under the hood, so it works out of the box in Taipan
What's going on? Is that in this code in the result of this this compiling? By Taipan numpy will be a reference a C++ bound reference to a Python object The X list is still a pure C++ list the Y
Equals numpy square line converts the X to a Python object Calls Python to handle the numpy call and gets a Python object in return and the line 13 uses the type annotation to tell Taipan to convert The the result back to a pure C++ list and the print call
Is pure C++ There's also the code so this is about kind of Python, but there's also the case where Python calls you back This works also out of the hood out of the box, but in some cases you might need to add a export annotation If you have a non typed function you need to tell
Python What types to pass to it or more precisely Taipan needs to know what to expect if you pass it the Python object? It needs to convert it back to an int or raise a type exception Finally going towards a standard library
The idea for this is to have Taipan C++ bindings Using C++ implementations and Python stubs that tell the Taipan compiler how to Call this C++ code, and this is necessary because we don't want to use Python for programs to avoid the global interpreter lock
so we have Implemented built-ins this way most built-ins The socket API parts of the OS API, but it's a work in progress It's the main work of the project and there'll be lots of work to do The project is now available on piping and just a first release so look it up if you want and thank you for listening
So, thank you for this wonderful insights and we are now entering the Q&A session I see already the first person on a microphone if you want to also
Ask something you can also put your questions on this code, but let's just start with the microphone here on the left side Yeah, I have two questions. My first question is about Does it all will it also support callbacks like decorators?
Decorators Callbacks are decorators. Yeah. Well decorate is a form of callback. Okay, so Decorators not right now We hope to do it but it involves C++ metaprogramming or Or some decorators like specific ones for static methods or stuff like that is also supported straight out of the box
If you mean callbacks like passing code to Python that is actually C++ code that is supported just not using decorators I have a second question and that's have you also Considered using rust instead of C. Yes at the very beginning. I thought I would write the scheduler in rust
Hope to do it that way. The reason I didn't is because rust is much more opinionated as a language than C++ So it's C++ is so flexible that it's maybe very complicated to write C++ code directly But it's very easy to find a way to do the thing you want when you compile towards C++ It would have been much harder with rust
To do that, but wouldn't it be more harder to maintain regarding security box that kind of thing Not that it would have been harder to have a rust equivalent of Python concepts to have a class like we do it in C++ to have So yeah, that would have been much harder
So let's go to the right side. There's also a question. I was wondering why you not use the Original representation and see that you could use in C++ as well after List object like we saw it in the talk before Because then you would have the possibility to have like different values also in a list and not so many limitations
Did you do it to save memory or? No, not to save memory. We do it because the list object is reference counted in Python Based mainly. Well, it's a Python object. So it interacts with the Python interpreter One specific sticking point is the reference count. It's not thread safe
So you can only reference a list from a single From a single thread safely so there would be work to do around that if needed another sticking point Is that the list works using Python so? For performance reasons. It's more efficient to have a list written pure C++
Where when you do append on the list? You're not looking up in a dictionary or in a slot in the class of the Python representation of the type Etc. Okay Get one minute, so let's do it faster. No, no come on question, okay For the schedule, I was wondering it's all looked a bit like async.io without async.io. So why did you not build up?
upon async My physical currency to so what what's different and if you're an async.io expert I'm willing to to have your insight about this, but I think I always not meant to be multi-core in execution
So it's for I your bound task. It works great but what we want here is for I O to be able to be started by one worker and continued by another If another worker becomes idle when the I O completes So perhaps there's time also for a coffee because we are reaching coffee break so you can just go on with
To me afterwards, I'm available. So again a warm applause for you. Thank you for your talk and you won't quit without a cookie