We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

FAT Python: a new static optimizer for Python 3.6

00:00

Formal Metadata

Title
FAT Python: a new static optimizer for Python 3.6
Title of Series
Part Number
64
Number of Parts
169
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Victor Stinner - FAT Python: a new static optimizer for Python 3.6 The Python language is hard to optimize. Let's see how guards checked at runtime allows to implement new optimizations without breaking the Python semantic. ----- (Almost) everything in Python is mutable which makes Python a language very difficult to optimize. Most optimizations rely on assumptions, for example that builtin functions are not replaced. Optimizing Python requires a trigger to disable optimization when an assumption is no more true. FAT Python exactly does that with guards checked at runtime. For example, an optimization relying on the builtin len function is disabled when the function is replaced. Guards allows to implement various optimizations. Examples: loop unrolling (duplicate the loop body), constant folding (propagates constants), copy builtins to constants, remove unused local variables, etc. FAT Python implements guards and an optimizer rewriting the Abstract Syntax Tree (AST). The optimizer is implemented in Python so it's easy to enhance it and implement new optimizations. FAT Python uses a static optimizer, it is less powerful than a JIT compiler like PyPy with tracing, but it was written to be integrated into CPython. I wrote 3 PEP (509, 510, 511) targeting Python 3.6. Some changes to support FAT Python have already been merged into Python 3.6. We will also see other pending patches to optimize CPython core, and the bytecode project which allows to modify bytecode, it also includes a peephole optimizer written in pure Python.
11
52
79
Red HatView (database)Projective planeRow (database)Transport Layer SecurityMereologyXMLLecture/ConferenceMeeting/Interview
Red HatDataflowScripting languageJust-in-Time-CompilerJava appletCompilerProjective planeCore dumpSoftware developerSinc functionPhysical lawPairwise comparisonFormal languageLecture/ConferenceJSON
CASE <Informatik>Multiplication signFormal languageBefehlsprozessorBytecodeArithmetic meanPairwise comparisonWeb browserVirtual machineMachine codeCodeJust-in-Time-CompilerDynamical systemLecture/Conference
Just-in-Time-CompilerRed HatMultiplication signProjective planeLatent heatCodeMachine codeImplementationCartesian coordinate systemWorkloadCASE <Informatik>Drop (liquid)Right angleExtension (kinesiology)Lecture/Conference
Projective planeCodeCASE <Informatik>Operator (mathematics)Source codeType theoryFunctional (mathematics)Machine visionImplementationJust-in-Time-CompilerNumberCore dumpGeneric programmingCompilerBitWebsiteExtension (kinesiology)Lecture/Conference
Red HatImplementationInstance (computer science)Greatest elementProjective planeMathematical optimizationRight angleMachine visionLine (geometry)ImplementationPlanningBitCore dumpComputing platformLecture/ConferenceComputer animation
BitView (database)Mathematical optimizationExtension (kinesiology)Semiconductor memoryDampingObject (grammar)ImplementationEmulatorCountingComplex (psychology)Different (Kate Ryan album)FrequencySound effectDivisorLecture/Conference
Red HatImplementationPattern languageTheoryImplementationRange (statistics)Materialization (paranormal)WeightSingle-precision floating-point formatString (computer science)Right angleMathematicsLecture/ConferenceComputer animation
Library (computing)Cartesian coordinate systemSpeicherbereinigungBitCodeComputer fileImplementationInformationObject (grammar)MiniDiscContext awarenessData managementOffice suiteFrequencyPoint (geometry)Dimensional analysisArithmetic meanLecture/Conference
Red HatBulletin board systemFunction (mathematics)CodeNeuroinformatikLengthVariable (mathematics)Multiplication signLogical constantCartesian coordinate systemRight angleString (computer science)ResultantRun time (program lifecycle phase)GodBytecodeMereologyFunctional (mathematics)Projective planeNumberFormal languageBlock (periodic table)Point (geometry)
Red HatFunction (mathematics)Abstract syntax treeFeedbackMathematical singularityFunctional (mathematics)Multiplication signWage labourImplementationMathematical optimizationLatent heatCASE <Informatik>Group actionRun time (program lifecycle phase)View (database)LengthString (computer science)Complex (psychology)Unit testingSoftware testingBytecodeLoop (music)Module (mathematics)Lecture/Conference
Projective planeMathematical optimizationVirtualizationBefehlsprozessorLengthSemantics (computer science)FreewareFeedbackMathematicsMultiplication signCASE <Informatik>Run time (program lifecycle phase)Formal languageDynamical systemLatent heatLecture/Conference
Red HatCodeMathematical optimizationCartesian coordinate systemCategory of beingRule of inferenceConstraint (mathematics)CodeWritingSemantics (computer science)NumberSource code
Cartesian coordinate systemCodeFunctional (mathematics)WritingNumberDialectFormal languageLecture/Conference
Red HatFunction (mathematics)Mathematical optimizationImplementationLimit (category theory)CodeSemantics (computer science)Run time (program lifecycle phase)Cartesian coordinate systemFunctional (mathematics)BenchmarkPoint (geometry)Real numberNichtnewtonsche FlüssigkeitGodLecture/Conference
Red HatFunction (mathematics)Core dumpLocal ringSpacetimeInstance (computer science)Form (programming)NamespaceSocial classAttribute grammarObject (grammar)Data storage deviceFunctional (mathematics)
SpacetimePattern languageNamespaceCodeMathematical optimizationPauli exclusion principleRun time (program lifecycle phase)CodeData dictionaryRevision controlComputer configurationDistanceLecture/Conference
CodeGaussian eliminationHydraulic jumpLogical constantGodMultiplication signCellular automatonINTEGRALMathematical optimizationQuicksortCodeCodeBytecodeCoordinate systemFunctional (mathematics)MathematicsType theoryDifferent (Kate Ryan album)CASE <Informatik>Parameter (computer programming)Regular graphGaussian elimination
BytecodeMathematical optimizationAbstract syntax treeCodeFunctional (mathematics)Point (geometry)View (database)WordInformationInternet service providerLecture/Conference
Abstract syntax treeAbstractionComputer fileToken ringAbstract syntax treeCodeToken ringPairwise comparisonPresentation of a groupNetwork topologyComputer fileGroup actionRepresentation (politics)
Parameter (computer programming)CASE <Informatik>Structural loadForm (programming)Type theoryFunctional (mathematics)Content (media)QuicksortString (computer science)Process (computing)Software testingRoboticsCodeInformation systemsGreatest elementType colorSystem callLecture/Conference
Abstract syntax treeSocial classCellular automatonCore dumpMathematical optimizationResultantMereologyLibrary (computing)Ocean currentElement (mathematics)Mathematical analysisString (computer science)Operator (mathematics)Logical constantSystem callGravitationObject (grammar)IterationLetterpress printingSet (mathematics)Multiplication signLoop (music)Structural loadVolume (thermodynamics)Functional (mathematics)PRINCE2Run time (program lifecycle phase)INTEGRALWordFinitismusFluxRange (statistics)ImplementationException handlingTable (information)Category of beingQuicksortCondition numberSoftware developerProjective planeElectronic mailing listData conversionNumberTupleAnalytic continuationLocal area networkModule (mathematics)Lecture/Conference
CodeSoftware testingMountain passLogical constantAsynchronous Transfer ModeAbstract syntax treeRed HatLine (geometry)Negative numberCompilerNeuroinformatikInformationMetropolitan area networkMultiplication signFunctional (mathematics)Software testingSet (mathematics)NumberMereologyType theoryMathematicsImplementationLogical constantMathematical optimization1 (number)Object (grammar)Rule of inferenceSemiconductor memoryTable (information)MappingLink (knot theory)CodeData storage deviceRun time (program lifecycle phase)Arithmetic meanReverse engineeringUltraviolet photoelectron spectroscopyBlock (periodic table)String (computer science)Condition numberWater vaporBit rateActive contour modelRoboticsLocal ringLoop (music)Line (geometry)Negative numberBytecodeHydraulic jumpControl flowTupleStructural loadLevel (video gaming)Lecture/Conference
Pauli exclusion principleRevision controlLogical constantMathematical optimizationSet (mathematics)MathematicsTupleRevision controlPauli exclusion principleField (computer science)Level (video gaming)Data dictionaryCASE <Informatik>Module (mathematics)NamespaceSocial classCategory of beingUniqueness quantificationExecution unitPattern languageUniverse (mathematics)Event horizonAdditionLecture/Conference
Revision controlPauli exclusion principleCodeFunction (mathematics)Abstract syntax treeRevision controlNamespaceCASE <Informatik>1 (number)QuicksortComputer configurationCodeBytecodeLoop (music)Key (cryptography)Functional (mathematics)ResultantMathematical optimizationElectric generatorCodePauli exclusion principleExpected valueMathematicsSpecial functionsWebsiteMessage passingSocial classMachine codeComputer fileSpacetimeMechanism designQuery languageGodPlastikkarteLecture/Conference
Pauli exclusion principleLine (geometry)Computer configurationCodeAbstract syntax treeNumberPauli exclusion principleSemantics (computer science)CodeVotingFunctional (mathematics)Level (video gaming)BytecodeSystem callTransformation (genetics)Computer configurationGodLecture/Conference
ImplementationArithmetic meanMathematicsMathematical optimizationSummierbarkeitMereologyProcess (computing)Pauli exclusion principleCodeLecture/Conference
ImplementationFeedbackImplementationFeedbackCartesian coordinate systemBenchmarkMathematical optimizationCompilerPoint (geometry)Multiplication signSpacetimeInformation systemsGodService-oriented architectureProjective plane
Cartesian coordinate systemDependent and independent variablesIntrusion detection systemCodeLoop (music)Lattice (order)CASE <Informatik>Observational studyLecture/Conference
Key (cryptography)RoboticsMultiplication signLogical constantFunctional (mathematics)Lecture/Conference
Function (mathematics)Type theoryCodeData typeKey (cryptography)GodMathematical optimizationProjective planeCASE <Informatik>Profil (magazine)Thresholding (image processing)Functional (mathematics)CodeCompilerJust-in-Time-CompilerSystem callRow (database)Web 2.0Lecture/Conference
Internet service providerImplementationOcean currentMultiplication signProof theoryComputer programmingThresholding (image processing)Complex (psychology)Profil (magazine)Unit testingRun time (program lifecycle phase)Lecture/Conference
Red HatFunction (mathematics)Type theoryData typeCodeComputerLine (geometry)Electronic visual displayStandard deviationUnit testingModule (mathematics)MultiplicationCoprocessorNeuroinformatikProcess (computing)Multiplication signLecture/Conference
Set (mathematics)Library (computing)Suite (music)BenchmarkBitData managementStability theoryMultiplication signLecture/Conference
Multiplication signSystem callGodCycle (graph theory)Functional (mathematics)BuildingLecture/Conference
Message passingFunctional (mathematics)Point (geometry)CodeLine (geometry)Object (grammar)GodRight angleLecture/Conference
Right angleMultiplication signDecision theoryLecture/Conference
Transcript: English(auto-generated)
Welcome to the first talk today on fat Python by Victor Stiner and with that welcome here. I'm here to give you the talk Hi So hi, my name is Victor Stiner. I'm currently working for the Red Hat company I'm working on the open stack projects one part of my work is to pause the giant beast called open stack to Python free
The good news is that I'm always Almost done because I ported more than 90 percent of the projects so Python free is coming and
I'm also a Python core developer since something like five year or more and Today, I'm here to present you a new project called fat Python First I will try to explain why Python is slow and why? This specific language is more difficult to optimize and some others
If you would like to say that Python is slow you you must compare it to something else Common reason is to use the C language because Python Sometimes is as almost the same speed but in some corner cases. It's up to 20 times
slower and When I say is the C language is a the C is compiled to machine code. So the code directly is executed by the CPU Compared to Python which is Interpreted it means that you you get bytecode and bytecode is executed by the virtual machine
At least for the case of C Python and you can also compare Python to the JavaScript because JavaScript is also Dynamic language as a Python but JavaScript has very very efficient JIT compilers You you can found them in many browsers
Compared to JavaScript Python is still a slower But we already have a much faster Python implementation of Python The most famous one and the most advanced and more stable is a pi pi Which is here since something like 10 years
It's fully compatible with C Python it's really fast like five or ten times faster than C Python or sometimes Even more depending on the specific kind of application on your workload You have a new project called
Python which is sponsored by a Dropbox. It's a fork of C Python 2.7 based on LLVM the idea is to keep the compatibility with the C extension, but In some cases try to convert the Python code to machine code to compile it to machine code
Another project is called Python It's made by Microsoft. It's a little bit younger than Python I think that's based on two years old and vision is one year old vision is based on the Microsoft core CLL
Another kind of Project is Numba. Numba is not a full implementation of Python it's a JIT compiler that you have to annotate your function with something like At JIT to compile it and it's specialized for numbers For example, it's very efficient for NumPy, but it's not a generic
implementation of Python you cannot make Django for example much faster with Numba and Another common example is a Cython. Cython is not really an implementation of Python. It's more compiler to Taking Python source and convert the code to
Something like a C extension, but you can also annotate the type to make even more optimization So if you start to annotate the type, it's no more Python, but it looks very close to Python So the first question is why do we need a new optimizer?
The Fact is that I'm working on the OpenStack project and in OpenStack we are still using C Python because it's still a reference implementation and Some people try to use PyPy, but there is not enough support in the OpenStack community to fix some
simple issues so And about Python from Durabox the issue is that they they started from Python 2 and we are all moving slowly to Python 3 and Is I don't plan to support Python 3 right now and Pigeon is still a little bit young
And I'm not sure that core CLL from Microsoft is really optimized for all platform like Linux or Mac OS X So I'm trying to to do my best to make C Python a little bit faster
Another fact is that Python is not always faster than C Python a Known bottleneck of PyPy is when you use a C extension because to support C extension PyPy has to emulate the memory Object memory you have to you need to have two views of the data the optimized view of PyPy
But all the way to represent data for the C extension You're they also have to emulate reference counting and other many Complex tricks and because of that running C extension in on PyPy is slower
Because PyPy was written from scratch so they don't have the Huge C API Another fact is that C Python remains a reference implementation for new features For example if you compare Python 2.7 on the future Python 3.6
There is a wide range of new features there are Maybe 10 or 20 new modules, but also new changes in the syntax Like the I wait and the single keywords, but it's also the new f-string in Python 3.6 so Python is moving and it's moving first in the C Python and
Sadly many libraries on the application rely on C Python implementation detail I have to put quotes because It's not really detail. It's it's a little bit complex, but
Application continue to rely on them Implementation details of C Python are for example the C API as I said but another good example is a garbage collector because in C Python we have a reference counter garbage based on the Reference counting it means that when you release the latest reference to an object is destroyed immediately
But in PyPy they decided to use a more efficient garbage collector and the consequence is that you object may be destroyed later you don't know when exactly and If you write your code
For example if you open a file for reading you put that in the file and you forgot to close it Explicitly the data may not be on the disk depending when the districtor is called So a good practice is to call the close method or to use a context manager But there is still a lot of code in the wild which is not not written correctly
And for for your information in Python we now have a resource warning to detect that issue To to simplify the goal of my application of the fat Python project is the idea is to replace
According to the length function computing the length of the string ABC and replace it with directly with the results the number three The goal looks quite simple, but I will explain why it's not as simple as you expect The first Block of point is that Everything in Python is mutable when I say everything is just everything is a language
To give you some examples the built-in function like the land function can can be replaced at runtime You can even modify the bytecode of a function at runtime You obviously the value of global variables change
Anytime there is no such thing like constant in a better language So you cannot rely on the value of a global variable because it can change anytime. So You you have to reload the value each time To give you an example of the for the built-in function You can replace the land function at runtime and when you call it instead of getting the length of the string
you get the string mock and This example is maybe not very useful but it's very common in the unit test to use a mock module to To reduce the complexity of unit test on only test one specific function
a Fat Python is not my first attempt to optimize C Python in the past I wrote a ST optimizer, which is a simple is the optimizer. I also wrote Register VM which is a new Implementation of the loop evaluating bytecode instead of using a stack
I use I use registers virtual registers not CPU registers and both project implemented Optimization like replacing length ABC with free because of that I got a bad feedback on my project because it changed the Python semantic and
people Explained me deeply that they really wants Python to remain Dynamic because they choose this language because it's dynamic and because you are able to replace everything at runtime Even if it looks ugly as a first look in some specific cases, it's very useful to
To be able to to modify anything So if you would like to write a new optimizer You you have to respect some rules some constraints as the first one is to not change the Python semantic
It's something really important for the Python community Obviously, you should not break application It means that if you run the code using your optimizer it should continue to work as it was without the optimizer and A good property would be to not have to modify the source code because I don't want to write something like number to
Which require to put some? Decorators on function or do special stuff on the code the idea is more to be able to optimize any kind of applications because I want to be to have a The fastest language in the community and I I hope that if Python becomes faster more people will use it
okay, now I will present you some some ideas to to rock work around this limitation and Even if the respect the Python semantic but allow allow us to optimize the code
First thing is to To to implement the optimization. In fact, we implement efficient optimization which provides Visible speed up on a real application and not only on a tiny micro benchmark you have to make assumption on the code and to To make assumption a tool is called the guards guards are basically a check made at runtime and
For example a guard can be a check if the built-in land function was replaced on that at runtime Very important a very important feature of Python the namespaces
Namespaces are used like everywhere to store data. For example in a module Global variable Is a space in the function the local variable are stored in a namespace in a class It's used for class variable, but also methods for instance is for the attributes of the object
Technically a namespace in Python is a dictionary and the technical challenge to to write a guard on a namespace is to have a check with which is faster than a dictionary lookup
because you may not know but I look up in Python is very is very fast and If you would like to avoid the lookup you check must be even faster So I propose a solution for that. It's a new pep to simply add a version to dictionaries I will detail the paper later
Second tool to optimize the code is to specialize the code The idea is to to make some assumption in the code and enable optimization for this assumption It's called as a code specialization and the
To be able to use Specialized code you will have to check guards at runtime to decide if you use the specialized codes or the regular code and When a one example of specialization if is if you have a function with two parameters X and Y and the two parameters are known to be or
Usually to be integrals you can specialize a function to work to be optimized for integrals Because if you know that is our integrals you can enable a lot of different optimization which are not possible in the common case when you don't know types and Code to
Function becomes first you have to check the guards and you pass the function parameters to implement guards on the type of parameters If the guard say everything is fine. Nothing changed. You can use a specialized code if something changed You just fall back to the regular bytecode
Python already has a C Python already has a optimizer called the peephole optimizer The Optimizer working on the bytecode it implements only a simple optimization like constant folding a dead code elimination optimization on gems
Annoying point is that it's written in C And because it's written in C. It's not easy to extend it to Implement new optimization and moreover. It has a very narrow view on the bytecode. You only see A few instruction before maybe one or two instruction in advance
So you you cannot you only have a very tiny knowledge of the code For example, you don't know the word function and you don't know the word module So you are very limited in the kind of optimization that you can do But Python provides something more interesting called AST AST is abstract syntax tree
When you when Python compiles a pi file to bytecode, in fact you have intermediate steps the first one is tokenization to take letters and group them to tokens and Tokens are compiled to AST
AST is a high-level Representation of the code so it contains all information but as a tree which is very convenient to To analyze to process and it also have the types on nodes. So it's It's even more easy to analyze it and
The ist is compiled to bytecode At the bottom. I show you an example of ist for the call land of the string ABC So you can see that the call as a type color so you you can know directly. That's the call It has two parameters the function on arguments
The function in this case is we have to load the name land from the global or from the built-in and There is one argument which is a string so you get the type string and the content To give you the most simple
I see optimizer to just to replace the core with the result You can use ist node ist module, which is part of the standard library and And the module As a visit method and depending on the name of the method you will enter on one node
So in this case, we replace it with the result Optimization So we have we have guards we have specialization What we can do with that is that we can implement some optimization
So there's the following optimization already implemented in as a fat Python projects For example you when you call a built-in function, you can replace it with the value The idea is that instead of having to call it each time you you directly get constant So it's you don't have to compute the results every time
You can also simplify Iterable for example replace a call to the range function with a table because Later, if you combine multiple optimization, it becomes much more interesting to have a constant as iterable
Yes When you optimize a built-in function, you need a guard on the replaced built-in function and on the rage Range you have also you also need a guard on the range function Another another interesting optimization is loop unrouting
See the idea is instead of paying the cost of the For keyword which has to create an item object first take the first item Take the second item and continue until you get an exception The idea is to duplicate the loop body enough time for each iteration and generate an assignment
For example x equal 1x equal 2x equal 3 and This optimization alone is not really interesting, but it's enabled even more optimization That we will see later
For example Simple optimization is to copy a constant Because here you you assign the value 1 to the variable X So you have to store the value just after that you have to reload the value because Python is a stack-based VM so you have to push pop values every time
so to to avoid the reload from the variable you can just Copy the value of the variable directly to the core. So instead of print X you just call print 1 Constant folding is a set of operation on constant values. So integrals strings
tuple of integrals To give you some example if you ask for the positive value of 5 is just a number 5 If you would like to check if one element is in a list instead of creating the list at runtime you can
compile a Convert it to a tuple, which is only built once you have also Operation on strings Operation of some string extra and the latest one is interesting because it's not a constant It's a list but even if it's mutable list, you know that the result will always be the same so you can replace
Operation directly with the value something else is that you can
You can avoid Allow the global instruction because when you call a built-in function like a LAN You have to each time you have to reload the function from the from the global You have to check in the global and after that you have to check in the built-in because in fact the function is
Built-in so it requires to look up And This restriction load global can can be replaced with load constants It means that you have to inject the built-in function in global at runtime And if you do that, you don't have you avoid the look up the two lookups
Another simple change is to remove the date code. So for example if you have Test and if we find if block and as block as if rock is empty You can just invert the condition and remove the if block It's useful to avoid the jumps in the bytecode level
If you have a test and the test is known to all be always be false. You can just remove the whole test If you have a final instruction like return raise something else that Which is the at the end of the control flow You can you can just remove what is after the final instruction
Okay. Now about the implementation. The good news is that I already got two changes merged in the code The first one is a new type of iced tea Which is constant it does simplify the the optimizer
because instead of having to check each time if the type is For example name constant num string of bytes you have a single type so it makes a check easier but moreover if you have a tuple of Constant object or a tuple of tuple of constant objects you can replace it once
And after that in the optimizer, you only have to one test Another change which was merged in Python 3.6 is to support Negative line number Delta because in Python we don't store directly the line number
to to each instruction Because it would cause too much memory and it's not efficient so we we store a compressed table mapping instruction offsets to line numbers and When you implement optimization like loop unrolling the line numbers goes backwards sometimes
so Because we don't store the line number directly we bet we saw our Delta so my changes allow to store negative line number to Line number which goes backward and the latest change is to to support directly tuple on the frozen set constant in the compiler
Because this optimization already exists in the byte people optimizer, but it is implemented in On the byte code or not on iced tea, and I would like to implement the same optimization, but I still have them
So with my change you can generate directly constant tuple of frozen sets Okay, now I represent you free peps which Written to merge my work into C Python The first one is to add a new version to the dictionary to the dictionary
As a field is a private is not visible at the Python level only at the C level The properties that the version is increased at every change as a version is unique for all dictionaries and
With the second property unique for all dictionaries means that you not only You you know if something changed, but you also know if you are you still You are still using the same dictionary Because technically in some kind in some cases you can replace the names place for module of
Class or something else and you would like to make sure that the namespace is still the same And using the version you can implement a guard on the namespace you because if the common case if Nothing changed you just have to compare the version and you avoid the lookup
To give you an example of guard You you get the version of the dictionary if the version is exactly the same you avoid the lookup you are done Otherwise you you look up for your key. If the key is still the same It means that something else changed. It doesn't matter in our use case
So we store the new value and we are done and otherwise it means that the value changed but in Python if you If you look at the built-in function of the class method is very very to modify something in the namespace
So the hope the expectation is that you always go to the first pass The second pep is a pep to specialize function It adds a new C function to the C API called pi function specialize You you can use it to register a new
specialized code Using guards. It means that if the guards are true you call the specialized code and I modified the C eval.c File which is the most important loop in Python. It's a loop which evaluates the bytecode So the change
Check guards and depending on the results of the guards you choose which code should be executed and not only you can generate bytecode, but you can also call any kind of callable function and
You can generate Specialized codes using any any tool in my case I'm using a fat optimizer which works on the AST But you can also imagine you and that you use a site on to generate machine code You can use Python to generate very optimized C++ code
maybe you can also use the pep number to To specialize code but to keep the Python semantic To to give you an example of specialization Instead of calling the built-in function to generate a character. You can just replace the call with the value and when you
Specialize a function you pass a guard on the built-in function And the last pep is a pep to for code the transformers This pep adds a new command line option dash O. It adds a new function called sys dot
dot set code transformers Code transformer can work on the bytecode level, but it also works on the AST level for example with my pep the pepole optimizer becomes a
code transformer, so it's It becomes part of the same process and you can even disable the pepole optimizer if you want or use your own Implementation which may implement more changes more optimization and the question if it will
happen for Python 3.6 First I got Good feedback on my free peps on on the project in general But the blocker point is that people are asking me ask me asking me to show concrete speed up on application and not only on micro benchmark and
Sadly to be honest today It's only faster on a micro benchmark because I spend a lot of time and just to implement guards to implement spaces specialization to modify the compiler to to support ist optimizer And fix some bugs, so I did not have much time to implement
Amazing optimization. It's more the foundation of the project and In my opinion I need at least three months to implement more optimization to to have something visible on applications and
What's coming next? So I say that we can implement more optimization. So here are just some IDs when you unroll a loop when you You you get this this code which looks inefficient because you assign the variable X equal 1, X equal 2, X equal 3, but X variable is no more used
So in this case, you can just remove the X variable because it's no more used another example is to copy the global because If you know or if you check that the global will not change instead of having to call
to load the global each time is a function you can just Copy it in the function body and implement more optimization like constant folding and As usual you need a guard on the keys global Another important
optimization is a function inlining because in Python the inlining as a Important function call as an important cost So instead of calling the function the ID is to copy the function body where you call the function and is that in this case you can
if you combine it with other optimization, you can produce much more efficient code and Obviously you need a guard on the inline function because if the function is modified somehow You still have to call the original modified function Another larger project is to implement profiling
The desktop usually is done at once in the JIT compiler. JIT compiler first profiles the code while the code is running and Depending on some thresholds and some triggers you can emit matching code
But I don't feel able to work to implement such thing at runtime because it's really complex Pi Pi guys took many years to To implement something efficient. So my idea is to To run the profiler first on a known workload, for example run unit tests
So you ask me to stop but I know that I have 45 minutes Do you know okay
Okay, just to finish quickly, I have a new module which is a module to implement benchmarks and The idea is to spawn multiple processes and compute the average because if you run a single process You know you only get One specific performance, but if you run it multiple time, you have to you you get a better
Realistic value and it's more it's very efficient to to get more stable benchmark and You can also store all data as a Jesus and because of and thanks to that you can display compare and analyze data after afterwards
so it's a It's a library and I already modified the C Python benchmark suite to use it. So we will get much more stable benchmark Okay, here I am do you have any question we've got time for maybe two questions
Hi, thank you for the call of the talk What I wanted to ask you is you said that when you you modified see of all
To see if the guards are valid and then go into the function What kind of broke up cycle that was a function? Jet was because it's not only when you get into the function You have to check the guards you mostly have to check
Every time you call other functions you call the evil or even some other things because built-ins could have changed Yeah, and yeah values and a ton of other stuff. How do you deal with that? We thought Python you you get guards which are checked as the entry points But when you specialize the code you you can inject your own
guards inside the code So you are free to generate guards inside the function body to to decide inside the function body if you take a first pass for one line or fall back to the regular code Then I have a follow-up which is why then modify CFO
Why don't you check the guards inside of the specialized and then bail out of it? It's just as a decision, right So for technical reason it's more easy to do it like that
Thank you very much Victor that was excellent