We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

A faster Python? You Have These Choices

00:00

Formal Metadata

Title
A faster Python? You Have These Choices
Title of Series
Number of Parts
160
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
A faster Python? You Have These Choices [EuroPython 2017 - Talk - 2017-07-13 - Arengo] [Rimini, Italy] Python was never intended as a fast language but many modern uses of Python require high performance computing, particularly in data science. This talk explores your options for squeezing maximum performance out of critical Python code. This talk provides a succinct summary of the options you have: C extensions, Cython, CFFI, PyPy and many others. It also shows the trade-offs between execution performance and the cost of writing and maintaining code with each choice. Each option is also explored for maturity and ease of use for Python programmers. A real world programming problem is coded and benchmarked using each of these techniques. All the code used in the talk is available on GitHub. At the end of this talk you will be better place to decide on which technique to use to make your code run 100x faster
95
Thumbnail
1:04:08
102
119
Thumbnail
1:00:51
Metropolitan area networkData managementPoint (geometry)Numerical taxonomyPerformance appraisalReflection (mathematics)Concurrency (computer science)BenchmarkMathematical optimizationAerodynamicsVirtual machinePauli exclusion principleAlgorithmRootComputer programmingData structureAxiom of choiceCodeLibrary (computing)Revision controlOverhead (computing)Software maintenanceFreewarePerfect groupFormal languageMathematicsBenchmarkRight angleNumerical taxonomyElectronic mailing listProjective planeSoftware maintenancePoint (geometry)Library (computing)Overhead (computing)Computer configurationObject (grammar)CodeDifferent (Kate Ryan album)Operator (mathematics)WritingMultiplication signRun time (program lifecycle phase)Line (geometry)Direction (geometry)Morley's categoricity theoremGreatest elementExterior algebraPerfect groupFormal languageProgrammer (hardware)BitDomain-specific languageQuicksortReflection (mathematics)NeuroinformatikMultilaterationPresentation of a groupPairwise comparisonMereologyType theoryInterpreter (computing)Data structureComputer scienceAxiom of choiceCompilation albumAbstractionAlgorithmNumberProcess (computing)Sheaf (mathematics)Electronic visual displayBoss CorporationOpen setTerm (mathematics)Performance appraisalProfil (magazine)Mathematical optimizationJust-in-Time-CompilerDisk read-and-write headSemiconductor memoryInformationGoodness of fitVideoconferencingDataflowRankingSelf-organizationFocus (optics)Endliche ModelltheorieToken ringCASE <Informatik>Concurrency (computer science)Natural numberOctahedronSet theoryInteractive televisionSoftware bugVirtual machineRevision controlGraph coloringHoaxMoore's lawMathematicsReal-time operating systemConnectivity (graph theory)ComputerArray data structureMaxima and minimaPhysical lawMedical imagingComputing platformSlide ruleComputer architectureComputer animation
MathematicsSet theoryCompilerBeta functionCodeDrop (liquid)Numerical taxonomyFormal languageShape (magazine)Continuum hypothesisJust-in-Time-CompilerSubsetFunction (mathematics)Focus (optics)Extension (kinesiology)Type theoryControl flowInterface (computing)Library (computing)NumberSolitonRight angleMultiplication signMoment (mathematics)InformationType theorySystem callLipschitz-StetigkeitBound stateArray data structureTunisMathematical optimizationOperator (mathematics)Square numberLibrary (computing)MathematicsRootProjective planeBasis <Mathematik>Self-organizationShared memoryForm (programming)Functional (mathematics)CompilerSoftware maintenanceGraph coloringCodeArithmetic meanClassical physicsStandard deviationSubsetFormal languageExtension (kinesiology)Product (business)SummierbarkeitBenchmarkWritingTypinferenzFrame problemJust-in-Time-CompilerSound effect1 (number)MereologyPattern languageCompilation albumPoint (geometry)Game controllerMixture modelComputer configurationInterface (computing)Goodness of fitHand fanPerformance appraisalQuotientCASE <Informatik>Slide ruleoutputLine (geometry)Translation (relic)Social classAreaComputer-assisted translationExterior algebraConcordance (publishing)Video gameSheaf (mathematics)Row (database)Physical lawQuicksortElectric generatorMaxima and minimaStructural loadDifferent (Kate Ryan album)DampingAddress spaceComplex numberImplementationData managementGreatest elementWordLocal ringComputational scienceRange (statistics)BitComputer animation
Read-only memoryResource allocationIntrusion detection systemExtension (kinesiology)MIDIMenu (computing)Formal languageCodeStrutSimilarity (geometry)Library (computing)EmailState diagramInterface (computing)Module (mathematics)Numerical taxonomyPerformance appraisalAxiom of choiceConstraint (mathematics)Revision controlCore dumpStandard deviationBenchmarkError messageMeasurementStatisticsCodeAxiom of choiceInterface (computing)Line (geometry)BitAbstractionSystem callHookingElectronic mailing listFormal languageString (computer science)Revision controlEquivalence relationPhysical systemQuicksortGreatest elementSingle-precision floating-point formatMacro (computer science)Social classExtension (kinesiology)Software testingReal numberEmailFlow separation1 (number)Suite (music)Module (mathematics)Structural loadPerformance appraisalLibrary (computing)Decision theoryRational numberSheaf (mathematics)Right angleAreaSelf-organizationProjective planeNumberBenchmarkCore dumpDisk read-and-write headComputer configurationMereologyGroup actionMemory managementMeasurementOpen sourceLink (knot theory)Moment (mathematics)WeightExterior algebraNumerical taxonomyRule of inferenceAngleGraph coloringBoilerplate (text)PurchasingTemplate (C++)Functional (mathematics)StatisticsDifferent (Kate Ryan album)Level (video gaming)Slide ruleSpacetimeObservational errorData storage deviceConstraint (mathematics)Staff (military)FacebookWritingoutputEvent horizonBit rateComputer animation
Arithmetic meanBenchmarkSoftware testingMaizeRange (statistics)Read-only memoryStructural loadPerformance appraisalUsabilitySoftware maintenanceProof theoryRevision controlNumerical taxonomySelf-organizationProduct (business)Axiom of choiceCodeMathematical optimizationAbstractionComputer programMeasurementAxiom of choiceFile systemComputer fileLibrary (computing)Universal product codeJust-in-Time-CompilerDot productSubject indexingCartesian coordinate systemSoftware testingBenchmarkCodeUsabilityInformationRootGreatest elementProduct (business)Category of beingOutlieroutputFormal languageRevision controlProjective planeGoodness of fitDecision theoryPerformance appraisalElectronic program guideView (database)BitNumberSemiconductor memoryElectronic mailing listTwitterCapability Maturity ModelSubsetBoss CorporationSoftware developerPOKESheaf (mathematics)AuthorizationInformation technology consulting1 (number)CognitionProper mapStructural loadDifferent (Kate Ryan album)Multiplication signFunctional (mathematics)Pattern languageAverageMereologySurjective functionStandard deviationStatisticsTable (information)GeometryStudent's t-testGeometric meanHypermediaVideo gameForm (programming)PlastikkarteHeegaard splittingDeterminantSound effectMoment (mathematics)ResultantPoint (geometry)Tube (container)CuboidBit rateReading (process)Graph (mathematics)Range (statistics)Module (mathematics)ImplementationRight angleSelf-organizationWordGame theoryOnline helpOperator (mathematics)System callComputer animation
NumberForcing (mathematics)InformationPurchasingArithmetic meanCognitionGeometric meanLecture/Conference
View (database)Projective planeMultiplication signPoint (geometry)HeuristicData miningMathematicsFlow separationKeyboard shortcutAddress spaceHybrid computerGraph (mathematics)Source codeStandard deviationRevision controlWritingCodeBlock (periodic table)Different (Kate Ryan album)Computer fileRoutingQuicksortMeeting/Interview
CodePoint (geometry)MereologyCategory of beingSelf-organizationLecture/ConferenceMeeting/Interview
File systemRow (database)Variable (mathematics)Machine codeLengthSoftware bugPoint (geometry)Goodness of fitBitSeries (mathematics)OutlierMaxima and minimaGraph (mathematics)Computer fileQuicksortSubject indexingTime zoneMultiplication signMoment (mathematics)Projective planeCompilerAuthorizationGroup actionEndliche ModelltheorieReading (process)Metric systemWebsiteDecision theorySystem callMeeting/Interview
Transcript: English(auto-generated)
My name is Poros, I work for AHL, which is a systematic hedge fund in London. Here's a little bit about us. We've been around 30 years this year, which is quite long for hedge fund terms. We trade in about, we're systematic, so it's the algorithms that do the trades,
not discretionary traders. We're active in about 400 markets all around the world. We gather about 3 billion market data points every day. We've open sourced our tick store. You can find it there on the MAN-HL GitHub account, Arctic. It's been very successful for us. We're quite a small organization, really, for what we do.
We all speak Python. So this talk is really based on our experience of using Python as a systematic hedge fund, and the things that we've had to do when performance isn't what we would like. So there's kind of three sections to this talk. One is just to introduce the concepts and to scope out the talk,
because performance is such a big subject, it can't be all covered in 45 minutes. Then there's a lot of technical solutions out there you can have to make Python faster, or your code run faster. And it's somewhat confusing looking at all different solutions. So I tried to create a kind of technology taxonomy, really, to make sense of all these different options that you have, and so you can reason
about them intelligently. And then finally, I wanna talk about evaluation criteria, how you evaluate the different solutions that are on display here. So first up, let's have a little introduction. So firstly, what this talk is, it's really kind of like a tour through some of the Python alternatives to Python. This is for general purpose computing,
not specialized computing. So all the solutions I offer you are sort of general purpose solutions for compute power. It's also gonna be a way of evaluating alternatives, so you can decide which one is good for you. And it's also a reflection on the trade-offs you're inevitably gonna make between things like performance and maintainability and cost, and so on like that.
So I better say what this is not about. I will mention NumPy a bit. This is not about NumPy and Pandas. So if your problem domain is one that can be solved by vectorized operations on fixed-size arrays, then you might as well just head off over to NumPy or Pandas, because that's where you're gonna get your best performance.
If your problem is solved by concurrency, well, then you know what to do, and I'm not gonna talk about that, or simply caching as well. I'm not gonna make any definite recommendations to you. I might have some advice, but I can't make recommendations to you, because I don't know what your particular problem is, and all problems have different solutions. This is not gonna be a benchmark comparison
of the different solutions. So all the benchmarks you see in this presentation are fake. That is to say, they're just like real benchmarks. Okay, so it might be worth reflecting why this talk even exists. I mean, why do we have to talk about improving the performance? And this is perhaps why Python is how it is.
So the reason why Python is allegedly slow, or perceived as slow, is it's obviously an interpreted language. Every line has to be eval'd, so every time you go around the loop, the interpreter has to eval the same line each time. It's dynamically typed as well, so the real type has to be assessed each time an object is accessed. It's running on a virtual machine,
which is a great way of abstracting away from the platform, but that abstraction comes at a cost, a runtime cost. Python doesn't have a JIT yet, although in Python 3.6.1, there's PET 523, which opens a door to possibly putting a JIT in, and we'll talk about JIT solutions as well.
And Python itself takes very little optimization opportunities with the code compared with, say, C, aggressive C++ compilers. It's kind of interesting to reflect that the first three of this list is some of the reasons why Python is slow, but they're also the reason why Python is so productive, why it's so quick to write Python code
and why Python is so valuable. And the second two are also perhaps why Python is slow, but they're known to be fiercely hard problems to solve in computer science. So if this hasn't happened to you already, it's certainly happened to me many times, so your boss comes to you and says, this Python code that you've written is far too slow. You've got to find a way of speeding it up.
It's not fast enough. Well, then you've got to ask yourself a number of questions. Is really, is that true what they're saying? Isn't it possible that Python is fast enough for the job you hand? So we're a hedge fund. We trade billions of dollars each day, and we can do that with Python because of the way we trade. Python is fast enough,
and then we can take advantage of all the other aspects of Python and make it very quick and cheap to develop new ideas which is more important to us than speed of operation. If you want to make it go faster, you're not going to try and make it faster overall. You're going to profile it, of course. Profiling it is a bit of a dark art.
Profilers often lie. There was a very good talk earlier in the year, in a tutorial earlier in the week on profiling that had a lot of good information of it. If you don't, I'd head over to the video there, but get your profiling skills out and decide which bit, and the whole point of doing that is so that you spend the right amount of money in the right place because we're engineers. We do things cheaply.
Okay, then have a look, of course, at your algorithms. This is a well-trodden path. We're using the right algorithms and we're using the right data structures. People overlook that. I was involved in a problem recently where people were building very large lists of Python objects and adding to the list, and of course, they failed to appreciate that the list has to be contiguous in memory,
and when you start adding stuff to it, it becomes a point where it has to be copied and the new memory found for the whole contiguous list. This is a very expensive operation. Having something like a deck, which is like a linked list, it's made it much more efficient. Just changing that data structure made a big difference in the runtime. I put up this famous quote at the bottom. Okay, do remember that when you're changing your code.
Have that burnt into your memory. So, so much for the introduction. I thought I'd talk about the options that you have, technologies that exist out there for speeding up your Python code, and how we can kind of categorize these to make more sense, because there is a huge amount of choice out there.
Here's some of the projects that are aimed particularly at making Python code, or code that starts with Python, run much faster. I'm sorry if your favorite project isn't up there, but you can add to it later if you tell me. But when you're considering any of these things,
it might be worth starting, well, what would the perfect solution be? What would we really, really like? As a solution to making Python go faster. And so, well, here's my kind of wish list here. This is perfection. It can run my Python code directly. I don't have to change the code at all. Okay, so there's no effort by me, because I'm quite a lazy programmer. I don't want to change my code,
so the perfection would run my code directly. There'd be no maintenance overhead on my behalf. It would work with all Python versions, all library code, all the standard library, all third-party code that I was using, my own libraries and so on like that. It'd be free, of course, fully supported, wouldn't have any bugs in it, no perfect debug story, and it'd be 100 times faster.
You'll notice I'm not being greedy with this shopping list. I could have said 1,000 times faster, but I'm gonna stick with 100, because I know that's what, so this is what I wish for. So the thing is, this doesn't exist, okay? There's solutions out there that exist parts of this list. So this is where you're gonna make the trade-off. If you're gonna go for maximum performance,
you're probably gonna incur some cost, or your debug story might go to hell, or something like that. So it's worth reflecting on what would be perfect when you really make those trade-offs, and just blindly choosing one solution, you're implicitly making those trade-offs, and if you don't realize them, you might get in trouble later. So the kind of taxonomy I've chosen
is basically about how much code you have to change before using that particular technology. I divide it into basically little or no code change, some small amount of code change, or really rewriting your code in a completely different language. So let's have a look at what exists for the first one.
Little or no code change. There's five projects I've picked out here. Obviously it's Python itself. There's a line benchmark at the top. Typically these projects will give you one times to eight times speed up. It varies enormously depending on the problem. First we have Python. Then we have Cython, which is a very well-known project. And we're gonna write Cython, not optimized code,
and I'll show you what I mean now. Then we've got PyPied, it was a very interesting PyPied talk yesterday, and I think there's another one tomorrow. I urge you to look at those. This is a very interesting solution. Then we've got another couple of projects called Shedskin and PyStun. We'll have a look at them briefly. Okay, so first Cython. I'm gonna write some code here, give it to Cython and see what happens.
This code computes a standard deviation of an array of floats. So I put it into Cython. So I run it in Python. It's slow. And I put it into Cython and expect Cython to work its magic. And Cython goes about 1.3 times faster, which is really no performance improvement at all. So Python is rubbish, isn't it? Sorry, Cython is rubbish, isn't it?
It's not a solution at all. Well, it's kind of interesting to see why Cython struggles at this point. It's actually because we're not helping Cython at all. If we just take that one line here where we're computing the mean, and we look at what Cython does is, so the way Cython works is it takes your Python code, it generates C code, compiles that,
and off you go. And if you look at the C code for that one line, here's a Cython C code. Here's the line that we're talking about. And if we dig down through this rather complicated code, we see that this line, sum divided by len, is expressed by these three calls into the C Python API. These calls are highly generalized.
They're highly nonspecific. They're not specific to any type. And the reason why Cython can't make an improvement or much of an improvement on our code is because we haven't given Cython any type information. We'll revisit Cython in a moment to see what happens when we can fix that, because that involves changing our code quite a bit.
The next one is PyPy. So typically, their benchmarks are seven times or so the C Python implementation. It's a just-in-time compiler. It's fairly drop-in replacement for 2735 code. It doesn't really have a C Python API, but it supports CFFI, Common Foreign Function interface,
which I'll mention in a moment. So that's your way into C and C++. It's not completely compatible with certain libraries. I think Flask and Pillow are a couple of libraries it can't manage. But there's a very good interesting quote at the bottom of the slide here that should make you pause for thought. Although, it does have the word probably in it.
I guess what I'd say is I'd change that to something like, you should certainly try this, because it has been very successful in some areas. Okay, so much for PyPy. So Shedskin is a project. It basically does automatic type inferencing, translates your Python code to C code.
It's quite elderly. It only supports up to Python 2.6. And there's been very little activity for the past year, so I'm not gonna delve more into that. And there's also one piston, which is LLVM-based JIT compiler. Now this is backed by Dropbox, or I should say was backed by Dropbox. It's only Python 2.7,
and the project was suspended in January this year. Now I offer these last two ones just as a example of one of the problems you're gonna have. When you choose a particular technology to go for performance, you're making quite a committed move to that technology. And then if that project just runs into the sand and stops getting maintained, then you've suddenly got a load of technical debt.
This is gonna be important when we look at the evaluation criteria for which project to use for your code. So so much for a little code change. Well, what happens if we accept there'll be some code change? We can make some code change to accommodate these projects. What options do we have there? Well, here's some, and now we get a speeding up
of typically sort of 10 to 100 in that kind of range. So we got siphon optimized. We got number, parakeet, Python. I'll go through them. So with siphon optimized, basically, you do this kind of thing. So on the left is the standard deviation code, which was like the Python code, and I just logged that into siphon and didn't get very far. On the right is much more heavily optimized siphon code.
And now what I've done here on the function at the bottom is I've actually declared I'm gonna use numpy arrays, so I import numpy, and that gives siphon a whole load of opportunities there. I'm declaring the local types as size T and so on like that. I'm putting a decorator on there saying don't check the bounds for this array,
which is obviously a cost. Right at the top, I'm importing from libc, the math library, and I'm gonna use square root from there rather than the Python math square root, which is probably much slower. So if I do this in siphon, I get it running 62 times faster, which is like a really good improvement. It's a fantastic improvement. But consider this, how maintainable is the code
on the right versus the code on the left? And we're talking about a really simple operation here, a square root on an array. So this is what I mean about trade-offs, is that if you really want that 62-fold in performance improvement, implicitly, you're gonna put up with a maintenance problem,
and probably your code is much more complicated than the standard deviation. So siphon, optimized siphon, can rapidly become fairly unwieldy because it's this really weird hybrid code. It's not C or C++, it's not Python, and it's this in-between thing. And there's quite an art in tuning siphon to get the maximum performance out of it, and it's a bit of a black art,
and that often isn't shared in organizations. So, on to number. This is backed by Continuum Analytics, which is always good to have a good backer. It's a JIT compiler. It works, it's pretty much aimed at Python and NumPy. And basically, you just annotate your functions with a JIT,
and this brings in a whole lot of JIT technology, and you run it, and the JIT will get to work, and try and optimize that based on what it's seen in the past. There are a number of these JIT compilers. One of the early ones is Parakeet, which is still around. Okay, same kind of idea. You decorate your code,
and it'll provide a JIT for it. Parakeet is quite old, only supports Python 2.7. It's very much aimed at NumPy. It's quite effective at that, but there's been little activity over the last four years. Four years ago, you thought Parakeet was the best thing, and you started writing all your code around it, then I'm sorry for you now. So, Python is another one.
They use a slightly different technique. They basically annotate the function with a comment, and you can see the types in the comment there. Then you basically run it through Python, generate C++ code, and then you run that. It really aims at scientific computing. It's kind of a subset of Python that it interprets, but it can produce really powerful high performance
at some cost. So, that's sort of some code change section. The third alternative, if you want really high performance, is to basically write in a different language, okay, like C++ or something like that. So, what kind of projects are out there that do that?
And this is where you really get this kind of 100x performance improvement. Typically, even better maybe. Here are some of the C++-based ones. We have the original C Python C extensions, which we'll talk about in a moment. There's various ones there. There's too many for me to go through, but I'm gonna cut it down a bit, and there's also Rust and Fortran and Go
and that kind of thing. If that's your thing, absolutely go for it, okay? I'm not gonna talk about any of those, but let's just look at three of the C++-based ones, which is writing a C extension, CFFI, and PyBind11. They all take different approaches to basically the same problem, is how do you get into writing in C?
So, firstly, the classic C extension, and these give you a mixture of joy and agony when you're involved in C extensions. So, what's a joy? Well, it's written in C. C's really easy. There's only 32 keywords in C, so what could possibly go wrong?
You can mix it with C++. That all works nicely. You have really precise control over your performance at this point, and there's a lot of good libraries at HL. We use this a lot because C extensions can call into NumPy, address it directly as C, and that is incredibly efficient and fast.
If you're writing for the standard library, you have to be here because, strangely enough, well, not strangely enough, but the standard library requires you to write in C. In fact, C89 with some little extensions to C98, I think, so here we are in 2017 and we're stuck in C89, which is very interesting for me of my background.
Anyway, so that's the joy of writing C extensions. Well, what's the agony? Well, this is a little class, probably the first class you wrote in Python was something like this. This is taken from the Python documentation from the tutorial about how you write your first C extension, and basically make a little class that contains a first name and a last name
and has a method where you can pull out the combined names. That's in Python. What's it like when you write a C extension? Well, it's this. It's 190 lines of C code addressing a very complex and sophisticated and well-documented API. And apart from the fact it's a lot of lines of code,
where is the real agony in this? Well, here's the real agony. It's in C. And you have to do reference counts, get your head around that. You have to do manual memory management. You have to understand how Python does its memory management. It's a very specialized skill writing in this C API, and it's quite expensive to write 190 lines
versus a handful. Testing is real problematic with Python modules. I'll return to that in a moment. Debugging a C extension is kind of a real black art as well. GDB works fine. If you wanna hook up an IDE, if you like debugging an IDE, that can be done.
It's a bit tricky to set up just to show how it can be done. Here's a screenshot of debugging a C extension in Xcode on a Mac. There's a link there to one of my projects that show you how to do that. So if we don't like doing C extensions, we find that far too painful. What are the alternatives?
So here are different languages. So let's move away from Python C extensions and have a look at CFFI, which is a really interesting project, and there was a talk yesterday about it that was very good. I'll give you a list of the talks that I thought were valuable here later on. So CFFI, you're writing in Python. It allows you to call C code from within Python,
and it's C-based, but you can also hook up to C++ for a little bit of work. It abstracts away much of this boilerplate, those 190 lines of the C extension, which is basically the interface code, and it also abstracts away the kind of build system. So if you remember, this is what we were trying to do.
This is what we had to do in C extensions. This would be a sort of fairly crude CFFI equivalent. I'm importing CFFI. I create this cdef with some string in it, which is actually C code. That gets compiled up. I create a new one. I can assign those first and last names to my own name,
and I can extract them out in some fashion there, and that code all now fits on one slide, and it just works, and it's been executed in C land. So a different approach is PyBind11, which I think is also a fascinating project, and I think there's a talk tomorrow about PyBind11, which I'm definitely gonna go see.
So how does PyBind11 work? It's a header in C++ library, so you're writing in C++. It makes it very easy to write these C extensions. It's kind of similar to Boost Python, if any of you have used that, and wonderfully, it's C++11, so it's kind of a modern version of C++, a kind of easy-to-write version of C++. So here we go.
Here's, just to remind you, this is a class we're trying to create. This is what we had to do in the C extension. This is how PyBind works. We're in C++. We create a struct where it has a constructor. It has a single method name, which will concatenate the names. It has the first and last names as members, and then you import or you include PyBind11,
and that gives you access to a whole load of functionality, templates, and macros that basically the creating of the module is just those four lines at the bottom. Very interesting project, and now you've got your shared library that you can run. Very interesting project indeed. So that's all I gotta say about those things.
There are other C and C++ ones, so choose the ones that suits your shop. I just say that if you are working in a second and in a separate language like C++, try and arrange your code to look kind of like this. So on the right is all your C++ code, for example, that doesn't include Python.h, so it might include something else, but it's pure C++ code if you like.
On the left is perhaps some Python code that wraps this, and the middle is kind of like the glue code. It's like the C extension code, or it'd be the PyBind11 code, or it'll be the CFFI code, and the problem is the bit in the middle is the one that's really hard to test, so if you can physically and logically separate the code on the left and the right as far as possible,
you can then test them independently for correctness and performance, because performance is what we're after. If you put too much code in the middle, it effectively becomes untestable, so I'd recommend some physical and logical layout that is something like this. So that's basically the taxonomy,
which was the second section of this talk. Now I want to talk about the evaluation criteria that you might use to judge which of these solutions is suitable for you. So again, here's the choice you make. We might be able to divide these up now with this taxonomy, and understand which is the one
that would give us the right trade-offs, but how do we evaluate even within that taxonomy, which is better? So I suggest that there are three kind of things, areas you need to consider when evaluating these, which are this. It's basically who you are, what company do you work for, what organization is it,
what team are you in, what skills do you have, so on and that. Then there's technical criteria. We're probably mostly engineers here, and we like technical criteria because we can put numbers to them, and numbers are our lifeblood, but there's also non-technical criteria as well. I'd argue that they're equally important when you make a choice. So first of all, who are you?
Well, I'm guessing a bit here, but I'm guessing you're not Facebook, or Google, or Microsoft, or anything like that. So these companies do a lot of interesting projects. They publish white papers, and they open source stuff, and they say we're using this particular project, and it's really fantastic for us,
and so on like that. And the temptation is because they're using it that you should use it, but you're not them. They have their own problems, which may not be your problems. They have their own culture, and they're constrained by that, and you have your own culture, and you're constrained by that. You may not be constrained by their, what they do. So understand what they're trying to do, and what you're trying to do are different.
So don't be too overly impressed just because one of the big companies is doing it that that's what you should do. On the other hand, because a big company is backing something, that is also a plus for a project because it's likely to increase its longevity and quality. So consider also what skills you have to maintain this code, or what skills you can get,
or what skills you might lose as an organization because if you've gone down the route, for example, of highly optimized Cython, and you've got a couple of people that got really skillful at doing that, and your whole code is running really smoothly, if those two people leave, then you've kind of now got a whole load of technical debt.
So let's have a look at some of the technical criteria. We're fascinated by this stuff as engineers. What we might look at. These are things that we can put numbers to and compare two different options against each other. So what might there be technical criteria that we might look at? Well, what do these projects depend on?
Do they depend on things that we don't really wanna depend on or different versions that we don't wanna use? We can put numbers to that. What versions of Python do they support? As we've seen, some of these projects only support older versions of Python. Some of them struggle to move, make the transition to Python 3. Some of them only support part of the core Python
or part of the standard library, and some of them, perhaps the third-party libraries that you might depend on aren't supported at all. You can go and find that out and compare alternatives very easily. And then finally, we get to benchmarks. Of course, people are obsessed with benchmarks, or perhaps not obsessed, but benchmarks carry a lot more weight than I think they deserve,
mainly because they're kind of wrong. And as I said, all the benchmarks you'll see here are lies, just like real benchmarks. So what are the obstacles to benchmarking accurately to make rational decisions about various options that you have? Well, there's simple measurement errors.
There's just measuring the wrong thing or measuring it in the wrong way. Bad statistics, and I won't go through the whole of benchmarking with loads of good material out there about how to benchmark properly. Bad statistics is an interesting one because in benchmarking, you're going to take a small subset of results, and then you've got to reason about the wider world
with statistics based on that small set. And if you do bad statistics, we'll look at a couple of examples in a moment, then you're going to mislead yourself with benchmarking. And then there are actually human cognitive biases, confirmation bias. You might be overly fond of one library, and therefore you might implicitly be starting
to look for positive benchmarks in that library. Or you might get fixated on one small part of the problem, ignoring the wider aspect of the problem. So these are all obstacles to objective benchmarks. So here's some benchmark pitfalls I'm imagining here. We're running a test eight times, and we got library C and we got library D,
and we're comparing them. And lo and behold, we're taking the average of the timings of them, and we're being very good statisticians, and we're taking the standard deviation. And lo and behold, they come out the same. So it looks like library C, library D, are exactly the same.
But if you look, library D has an interesting pattern to it. The first time you run the test, it takes 18. The subsequent times, it's a very consistent eight. This is maybe characteristic of some kind of JIT kind of behavior. And so I'd say that if you're running this in production, many times, library D is definitely a better choice. If you're only gonna operate that function maybe once,
then it's probably the worst choice. So you've gotta look at the trends as well as the raw data. Here's another common fallacy, is I got a number of tests here, independent tests of two libraries, G and H, and I'm combining them, taking the average, and the average looks like they're the same.
But if you notice, the tests take widely different times, and it's actually quite misleading to take the average. If I add another column onto it, where I divide H by G, you'll see that in the first test, H takes slightly longer proportionately, but all the other tests is twice as fast as G.
So how come the average is so misleading, saying they're the same? It's because what you need to do when combining widely different numbers is take the geometric average, not the arithmetic average. Geometric average is the product of all the numbers divided by the nth root. And now, oops, we see, if I take the geometric average, that library H is much faster than library G.
In fact, tables are a really bad way to present benchmarks as well, because they're just numbers. Here's a much better way of presenting a benchmark. This is a real-world one, not from the company I currently work for. But basically, this is a graph, and on the x-axis, it's logarithmic. It's a file size going up to 100 megabytes.
And what we're doing here is, this is a sequential file, but we want to randomly access it, and so we're creating an index by read-seeking through it, find the interesting points, and then we can randomly access it much faster than the sequential file. Now, the red dots are the original Python code. On the top, oh, sorry, the y-axis
is how many milliseconds per megabyte of file it takes to create the index that we want. The red dots are the original Python implementation, and the green dots are writing it in C. Now, there's a wealth of data in this file. For one thing, you can see there's a big range of inputs. The file sizes we're testing are from very small to very large.
There's about a two-decade improvement, general, on the C code. The fact that it flattens out means it's O-N. And then there's some really interesting stuff happening with some outliers up the top and reflected in the bottom. If you look at the bottom right-hand side, you'll see that some files, remarkably, are taking much, much faster than others, even with the Python code,
and that's reflected in the C code as well. And when we investigated those files, we found another technique, because they had a particular property that we could exploit for all files that gave us another 10x improvement. So presenting data like this gives you a whole wealth of information that you can make reasoned decisions, and compare that with a table, how much information you're gonna get out of that.
Okay, if you're benchmarking, don't just do speed. Here's a great long list of stuff you should do, because memory might be more important, or IO. What are the trends of benchmarks? And once you benchmark, do you put it into production and forget about it? No, you wanna be benchmarking your production code as well.
So the last thing in evaluation is like the non-technical criteria. So these are things that you can't put numbers on. And sometimes engineers shy away from this because you can't put a numeric value on it. It's a bit of a harder decision. I'd argue it's equally as important. So here's some non-technical criteria you probably wanna consider. Ease of installation deployment, if it's a large deploy.
Again, dependencies. Ease of writing the code and maintaining it. What's your debug story? What's your tool story for analyzing your code or reporting things out of production? And is the project that you've chosen, be it PyPy or Cython or whatever it is, is how future-proof is it? And that's a really difficult judgment call,
because of course the past is no guide to the future, but it's all we've got. So how can you try and predict whether a project, because we've seen several projects that have just suddenly halted. So let's hope that your solution doesn't do that, leaving you with all that technical debt. These are the kind of things you wanna look at probably. Is what Python versions have supported?
Are they moving up versions in a timely fashion? What's its development status? If it's old, that's maybe a good thing, because it's been around for a long time, it's mature. On the other hand, it might be using very old technology. A newer project might use more newer language features and be better than that. Is it maintained?
Do they have good, interesting backers, like a large company or a determining company? A fix is quick. Do they take PRs? That kind of thing. Also, who's using it? And interesting enough, is there a consultancy around? Because if there's money to be made, if there's money swirling around a project, it's more likely it'll last
than one that's just been given out for free and everyone's taking it for free. So those are the three sections that I have for you. So I'll just finish up by just making a summary. This is what I say as a takeaways for you, is just out of these solutions,
choose the solutions that's appropriate for your organization, your skill set, your product. It's your choice. Just recognize that you're gonna be making trade-offs. And if you don't realize that, there'll be implicit trade-offs that could come back and bite you. So try and be explicit about what you regard as important. Benchmark if you must. Benchmark wisely when you can.
And I'd argue that these non-technical criteria are equally important for the longevity of your code than the technical ones. I'll just make a shout-out here for this book. I have nothing to do with this book at all, except I know one of the authors slightly. But it's high-performance Python.
I got quite a lot out of it, and I recommend it if you're, it covers a lot more territory than I'm covering in this talk. But if you're interested in high-performance Python, this is definitely one to go and get. And also, just don't listen to one person, particularly me. I mean, consider other opinions. Here's some of the talks here in Europe, Python.
Monday and Wednesday, we had some of these talks I went to. I got a lot out of them about CFFI, profiling, Cython, so on like that. Tomorrow, we got two talks. I'm definitely gonna go to one on C++11 and the other one on PyPy. So listen to a lot of opinions when it comes to performance, not just one.
And that's about it. And I'll answer your questions as best I can. This is, if you wanna stalk me on social media, this is about as far as I get on social media, which is GitHub, also AHL. We open-sourced quite a lot of our code now. Some very interesting projects there. I urge you to go and have a poke at that. And you make my bosses very happy if you had a look at our Twitter feed,
our work Twitter feed. And that's it from me. So I'll try and answer your questions as best I can. Thank you for the talk. Questions.
Any tip how to get around the confirmation views when testing? Because this is very difficult in the future in the team. How do you go around confirmation bias? I suspect humans have been struggling with this problem for millennia, really.
I don't know. I think when you wanna present material to persuade other people, you have to kind of make that material falsifiable. There must be enough information in there so that someone can form a contrary opinion. So if you just say this is faster,
you're not giving anyone any purchase to say they can't deny you because you're not giving the information. If you say, I believe this is faster and here's the data for it, you're giving them some sense of force of viability because they can look at that data and say, oh, you're taking the arithmetic mean, not the geometric mean, and therefore you're drawing the wrong conclusion.
So I think some of these biases, some of these cognitive biases can be helped by presenting your information to a large number of people and they might be able to spot something that you can't spot yourself because it's very difficult to avoid confirmation bias yourself. But perhaps if you're aware about it and you ask yourself that question, that may help as well.
Hey, thanks for the talk. Considering two main technologies for binding, namely PyBind11 and Cython, what is your view regarding advantages and disadvantages of each one of them?
Okay, well, I've used Cython quite a lot, both on my own projects and at work. We depend on it quite well. It is one of the most mature projects out there. It's used all over the place. Another standout example would be Pandas,
heavily depends on Cython and so on like that. So it's gotta be a star. The PyBind11 is a much more newer project, but I think it's very interesting because perhaps it addresses some of the issues that I've come across from time to time with Cython.
Certainly, like debugging in Cython is really quite a, can be quite a challenge. The fact that Cython optimized code is in this strange sort of hybrid form, I understand why that'd be so, but it also makes it kind of hard to read and reason about. With PyBind, apart from the bindings,
you're pretty much operating in C++, which should be easier to reason about. So I think it really depends on who you are, what your capabilities are. If you're a strong C++ shop, for example, then PyBind11 would be a very interesting route to take. Although Cython also can get you into C++.
I'd say performance-wise, there's probably not a big deal between them, but I have noticed sometimes with Cython-generated code that you can make small changes in the PYX file, the Cython file, and you get radically different changes in performance, which are not necessarily predictable,
and it's just the way they use a lot of heuristics to generate their C code and some of those changes, and that's often noticeable between versions as well. So I think they're both very fine projects, and I think it probably comes down to, there's probably not much to choose between them for performance, but it very much depends on what kind of shop you are,
really, and what your personal preference are. I don't think you'd be wrong to choose either one of them. Going back to your graph on the file read. Oh, sorry. Going back to your graph on the file read performance, you mentioned that you identified something there that actually resulted in a tenfold speed increase for, I believe you said, the native Python code as well.
Can you just comment on kind of going down the rabbit hole of optimization, and as programmers, we love new and challenging things, and maybe we see a problem and we think, oh, we should implement this or anything. Do you have any tips for how we can avoid going down a rabbit hole where maybe a 10x improvement by keeping it in native Python would have been enough? Yeah, well, I guess this,
I think there's 300 data points on each of these, and it turned out there was something like eight or so data points, which are these kind of outliers, which you kind of, at first sight, might think are just statistically not significant, but we did go and actually have a look at them where we found an interesting thing. So this is basically a sequential file where a series of records are written one after the other
and to read to any point in the file, you have to read everything up to it, because they're variable length records. And so you can't just go into some place, and the reason for conducting an index is that the index is the start of each of these variable length records. Well, it turned out that most of them, the variable length records had a maximum size of 1K, so there were lots of small records in there,
and it turned out the outliers had a record size of up to 64K, so there's much fewer records, so creating the index, you're doing much more seeking and much less reading, because you have to read ahead of each one. So that accounts really for the kind of 10X improvement, and what we realized, we could actually rewrite all the files from 1K records to 64K records,
and that was how we got the extra 10X improvement, but without it being presented in a graph like this, and without kind of like the curiosity to say, ooh, those outliers might be interesting, and I'm not gonna dismiss them. We wouldn't have found that 10X improvement. So I guess we could have gone down a rabbit hole, because it could have just been statistically significant,
but we did start off by doing a little bit more statistics, a few more runs, and that kind of thing, and discovering these were actually real outliers rather than just statistical anomalies. Our rabbit holes are something that I seem to merrily dive into, I'm afraid, so perhaps my advice on backing out a rabbit hole is more one of experience than of good decision-making.
More questions? We still have a couple of minutes, a few. Nobody, everybody's happy with their code's performance?
I've been using Boost Python, and it has two downsides, which are the fact that the compiles take a long time, and it sometimes produces very big object files. So is pipeline 11 improving on these metrics?
Yeah, I'm probably a bit outside my zone of expertise here so this is probably some speculation, but I haven't used Boost Python for quite a long time. I really admire the Boost project, but it's getting on quite a bit now, and it has got all sorts of stuff that probably reflects history rather than modernity,
and Pybind 11, from what I understand from the authors, was really inspired by Boost Python, but it was a complete kind of refresh, starting writing afresh, the whole thing, and in C++11, which would make me think that given a modern compiler,
you would get less variability with Pybind 11 than with Boost, so it might be worth, I certainly recommend you go and look at Pybind 11 and see if you get the same kind of problem, but I'm not gonna predict. My guess is that you wouldn't perhaps get that problem with Pybind 11, but I'm saying I'm speculating a bit there, but I'd recommend you go and try.
Okay, so thanks for the talk and the questions. Give another warm hand to Paul Ross. Thanks.