Cython
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 43 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/38178 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Erlangen, Germany |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
4
5
6
7
9
10
11
14
28
29
32
33
34
35
38
39
41
43
00:00
Software developerCore dumpInformation technology consultingCellular automatonKernel (computing)Computer fileLetterpress printingRevision controlLine (geometry)SineComputer programmingSoftware developerCASE <Informatik>DistanceOrder (biology)Library (computing)Formal languageWave packetCodeConfidence intervalWebsiteWordLaptopCompilerTorusPresentation of a groupMachine codeProgramming languageWeb pageCore dumpWritingStructural loadComputer animation
02:44
Extension (kinesiology)Letterpress printingLine (geometry)Revision controlCellular automatonKernel (computing)Computer fileLaptopSoftware developerCore dumpData typeFunction (mathematics)System callSineCodeMathematicsKernel (computing)Software developerFunctional (mathematics)Revision controlOrder (biology)Graph coloringElectronic signatureComa BerenicesCodeMultiplication signCategory of beingWebsiteNormal (geometry)Personal digital assistantCompilerSign (mathematics)WordData managementEndliche ModelltheoriePoint (geometry)Presentation of a groupMathematicsStandard deviationObject (grammar)Mixed realityResultantFunction (mathematics)Formal languageDifferent (Kate Ryan album)E-bookPlanningData miningMassLipschitz-StetigkeitLetterpress printingPerimeterMathematical modelExtension (kinesiology)Pattern languagePRINCE2AdditionObservational studyTraffic reportingLevel (video gaming)LaptopEnterprise architectureLine (geometry)Doubling the cubeGroup actionCompilerInstallation artSoftware testingBendingRight angleCAN busParameter (computer programming)Power (physics)SineLibrary (computing)Declarative programmingProgramming languageEmailCellular automatonModule (mathematics)Arithmetic meanProcess (computing)WritingVariable (mathematics)Computer animation
10:54
Line (geometry)CodeFluid staticsWrapper (data mining)Kernel (computing)Computer fileCellular automatonFunction (mathematics)Software developerCore dumpInformation technology consultingSineAverageLetterpress printingSign (mathematics)Functional (mathematics)CodePattern languageOvalDampingCAN busBitSquare numberObject (grammar)ResultantNormal (geometry)DemosceneSpacetimeVariable (mathematics)Type theoryoutputMachine visionContext awarenessPairwise comparisonComputer animation
12:35
CodeSineKernel (computing)Computer fileCellular automatonAverageVideoconferencingLetterpress printingCore dumpLine (geometry)Sigma-algebraRandom numberMaxima and minimaFile formatArithmetic meanCountingSet (mathematics)Maxima and minimaSign (mathematics)Functional (mathematics)TypinferenzLogarithmische NormalverteilungElectric generatorAverageLine (geometry)BitCalculationSquare numberResultantMultiplication signWrapper (data mining)CodeExtension (kinesiology)Type theoryDoubling the cubeSystem callObservational studySocial classArmPlanningNormal distributionWebsiteRight angleFigurate numberDifferenz <Mathematik>Presentation of a groupPlotterExterior algebraSummierbarkeitNumberComputer animation
16:36
Cartesian coordinate systemPlot (narrative)Computer fileKernel (computing)Cellular automatonAverageAverageBit rateDistribution (mathematics)Well-formed formulaImplementationPattern languageNumberMultiplication signState of matterTerm (mathematics)CalculationWebsiteDifferent (Kate Ryan album)Sound effectComputer animation
18:17
Operator (mathematics)AverageComputer fileCellular automatonKernel (computing)Electronic mailing listLetterpress printingLattice (order)SummierbarkeitBit rateCalculationRevision controlMultiplication signMereologyMatching (graph theory)2 (number)Order (biology)Condition numberResultantPairwise comparisonSelectivity (electronic)Prisoner's dilemmaLevel (video gaming)BitComputer animation
19:43
Arithmetic meanSummierbarkeitKernel (computing)Computer fileCellular automatonAkkumulator <Informatik>Operator (mathematics)Electronic mailing listProgrammschleifeCodeBit rateAverageBit rateFood energyData managementRevision controlMereology2 (number)Point (geometry)System callWebsiteResultantQuantum stateMultiplication signPosition operatorNumberSound effectScaling (geometry)Functional (mathematics)Presentation of a groupElement (mathematics)Electronic mailing listSummierbarkeitObject (grammar)Different (Kate Ryan album)PiCuboidEvent horizonProcess (computing)CodeBefehlsprozessorCellular automatonCoprocessorPower (physics)Order (biology)Computer animation
23:25
CodeComputer fileKernel (computing)Cellular automatonFluid staticsType theoryBit rateTotal S.A.ProgrammschleifeCompilerMultiplication signPlanningDoubling the cubeObject (grammar)Electric generatorDrop (liquid)Pairwise comparisonTotal S.A.Variable (mathematics)Function (mathematics)Level (video gaming)ExpressionFunctional (mathematics)ArmOrder (biology)Machine visionMessage passingSystem callLine (geometry)Right angleResultantArithmetic meanCalculationBitDivisorCodeLoop (music)Twin primeVideo gameCASE <Informatik>Different (Kate Ryan album)Context awareness1 (number)Parameter (computer programming)Electronic signatureImplementationType theoryMathematicsSpacetimeElectronic mailing listQuicksortWordObservational studyIntegerSummierbarkeitDeclarative programmingRevision controlCompilerLogical constantConfiguration spaceComputer animation
31:06
CompilerTotal S.A.AverageCellular automatonKernel (computing)Computer fileResultantOperator (mathematics)Point (geometry)Object (grammar)Line (geometry)Interactive televisionCalculationLevel (video gaming)Multiplication signPlanningVariable (mathematics)Volume (thermodynamics)Asynchronous Transfer ModePresentation of a groupFunction (mathematics)Arithmetic meanCodeRun time (program lifecycle phase)Computer animation
32:15
Computer fileCellular automatonKernel (computing)Total S.A.Variable (mathematics)Multiplication signFunctional (mathematics)ResultantDampingDoubling the cubeRepetitionOperator (mathematics)CASE <Informatik>Object (grammar)CalculationSinc functionType theoryFunction (mathematics)Line (geometry)CodeDemosceneSystem callElectronic signatureArmoutputParameter (computer programming)Graph coloringData conversionComputer programmingMathematicsMathematical optimizationConditional-access moduleDeclarative programmingPresentation of a groupDifferent (Kate Ryan album)Arithmetic meanWrapper (data mining)Level (video gaming)Semantics (computer science)TupleModule (mathematics)Fluid staticsCompilerComputer animation
36:31
Total S.A.Cellular automatonKernel (computing)Computer fileProgrammschleifeCompilerFunctional (mathematics)System callDilution (equation)Process (computing)Bit rateoutputData conversionTotal S.A.Rational numberResultantCAN busMultiplication signVirtual machineDoubling the cubeOrder (biology)Term (mathematics)Neuroinformatik2 (number)DivisorProjective planeINTEGRALIterationElectronic mailing listImplementationLoop (music)Line (geometry)Level (video gaming)Object (grammar)LaptopFunction (mathematics)Computer animation
38:43
Total S.A.Division (mathematics)Ring (mathematics)Computer fileKernel (computing)Cellular automatonCodeLine (geometry)CompilerThetafunktionMathematicsInverse trigonometric functionsRead-only memoryCalculationException handlingPoint (geometry)Pairwise comparisonCrash (computing)BitResultantOperator (mathematics)Data conversionSpacetimeDampingCodeError messageSummierbarkeitElectronic mailing listTotal S.A.Division (mathematics)CASE <Informatik>Object (grammar)Suspension (chemistry)Functional (mathematics)Message passingLoop (music)Different (Kate Ryan album)Limit (category theory)Type theoryNumberCommunications protocolPower (physics)Multiplication signLine (geometry)Revision controlRepetitionoutputWebsiteCompilerFigurate numberSystem callINTEGRALData managementDependent and independent variablesIntegrated development environmentArmProper mapFrame problemTupleIterationDivisorSemiconductor memoryComputer animation
46:17
Read-only memoryComputer fileCellular automatonKernel (computing)AverageBit rateRange (statistics)ProgrammschleifeNumberIntegerFunctional (mathematics)Point (geometry)Loop (music)LengthSemiconductor memoryDivisorArray data structureResultantoutputElectronic mailing listOrder (biology)Multiplication signBuffer solutionView (database)Goodness of fitInterface (computing)Communications protocolLibrary (computing)Operator (mathematics)BitFile formatObject (grammar)CASE <Informatik>Range (statistics)Shape (magazine)Doubling the cubeDimensional analysisRevision controlSpacetimeMedical imagingTouch typingBit rateHand fanPiMessage passingUltraviolet photoelectron spectroscopySubject indexingWordArithmetic meanProgram slicingSingle-precision floating-point formatSystem callService (economics)Computer clusterGroup actionWebsiteArmMathematicsCompilerDifferent (Kate Ryan album)Game theoryDirection (geometry)Moment (mathematics)Symbol tableComputer animation
52:53
Read-only memoryKernel (computing)Computer fileCellular automatonRange (statistics)LaptopCompilerProgrammschleifePoint (geometry)Function (mathematics)Euler anglesFile formatLetterpress printingHausdorff dimensionBit rateRAIDRange (statistics)Line (geometry)Crash (computing)BitResultantLaptopThread (computing)Functional (mathematics)Power (physics)Linker (computing)CodeCalculationDivisorMeasurementParameter (computer programming)CompilerCore dumpElectronic mailing listMathematicsCASE <Informatik>Different (Kate Ryan album)Revision controlConstructor (object-oriented programming)Simultaneous localization and mappingPentagonComputer animation
56:36
Range (statistics)Computer fileCellular automatonKernel (computing)Bit rateParallel portFunction (mathematics)Price indexOpen setCache (computing)Range (statistics)ArmPoint (geometry)Pairwise comparisonOpen setSet (mathematics)Functional (mathematics)Presentation of a groupRun time (program lifecycle phase)Group actionINTEGRALMultiplication signTouch typingTerm (mathematics)Extension (kinesiology)Loop (music)MathematicsEnterprise architectureLengthSystem callParameter (computer programming)Parallel portRow (database)Interpreter (computing)Level (video gaming)PlanningQuicksortCode1 (number)Game theoryInteractive televisionThread (computing)ChainCompilerDirection (geometry)Operator (mathematics)IntegerComputer animation
01:01:24
Cellular automatonKernel (computing)Computer fileParallel portCache (computing)Open setPrice indexShape (magazine)CompilerProgrammschleifeSemiconductor memoryDivisorMereologyLibrary (computing)Overhead (computing)Extension (kinesiology)BitBand matrixReading (process)AlgorithmThread (computing)CASE <Informatik>Distribution (mathematics)AverageConcurrency (computer science)Scripting languageElectric generatorBinary codeObject (grammar)CodeFunctional (mathematics)Point (geometry)View (database)BuildingLengthOrder (biology)Interface (computing)Revision controlCurveArray data structureSocial classComputer programmingMiniDiscStructural loadStudent's t-testÜberlastkontrolleEndliche Modelltheorie1 (number)Computer fileArmTexture mappingBit rate2 (number)Subject indexingResultantType theoryData conversionProgrammer (hardware)NumberMusical ensemblePattern languageLevel (video gaming)INTEGRALEncryptionProjective planeComputer animation
01:08:41
Cache (computing)Price indexParallel portComputer fileCellular automatonKernel (computing)Source codeLengthFunctional (mathematics)PlanningCurveDressing (medical)Traffic reportingInformation privacyCharacteristic polynomialShape (magazine)Wrapper (data mining)Address spaceAlgorithmNormal (geometry)Interface (computing)Computer animation
01:10:11
Cache (computing)Price indexParallel portComputer fileCellular automatonOpen setKernel (computing)AverageBit rateDifferent (Kate Ryan album)ImplementationPlanningRevision controlView (database)Semiconductor memoryoutputProgram slicingMereologyComputer animation
01:11:36
Open setPrice indexLaptopKernel (computing)Cellular automatonComputer fileParallel portCache (computing)Shape (magazine)CAN busOverhead (computing)CalculationWordMathematical optimizationCodeSummierbarkeitAdaptive behaviorVariable (mathematics)ResultantSystem callAdditionBit rateCASE <Informatik>Function (mathematics)Local ringPoint (geometry)Scaling (geometry)Multiplication signExpected valueLevel (video gaming)Functional (mathematics)Rule of inferenceMixed realityData conversionRight angleEntropie <Informationstheorie>Division (mathematics)Parallel portThread (computing)Library (computing)CompilerLoop (music)BitCompilerSubject indexingLinear regressionComputer animation
01:16:54
Letterpress printingKernel (computing)Computer fileCellular automatonData typeCompilerCalculusRange (statistics)CodeTelephone number mappingUnicodeCurvatureEmailLine (geometry)State of matterRun time (program lifecycle phase)String (computer science)Function (mathematics)Fibonacci numberFloating pointNumberTouch typingComputer fileMemory managementEmailException handlingRun time (program lifecycle phase)Semiconductor memoryCodeError messageState of matterVector spaceExpressionBlock (periodic table)Declarative programmingComputer programmingFunctional (mathematics)Content (media)Interpreter (computing)Closed setOperator (mathematics)Stack (abstract data type)Structural loadCompilerSource codeData conversionDrop (liquid)ImplementationJust-in-Time-CompilerScripting languageMultiplicationModule (mathematics)LaptopCellular automatonParameter (computer programming)QuicksortExpected valueConfiguration spaceMereologyOrder (biology)MathematicsPattern languageMixed realityDampingCompilerWrapper (data mining)Pointer (computer programming)DataflowGraph coloringArmFluidSinc functionLibrary (computing)Electronic mailing listExistenceGUI widgetString (computer science)CASE <Informatik>DemosceneStreaming mediaExtreme programmingHydraulic jumpConnectivity (graph theory)Game theoryComputer clusterPoint (geometry)Multiplication signDiscrete element methodDisk read-and-write headData managementExecution unitData structureRow (database)Buffer solutionPresentation of a groupPhysical lawComputer animation
01:25:07
CodeArray data structureFunction (mathematics)Fibonacci numberCellular automatonKernel (computing)Computer fileRun time (program lifecycle phase)CurvatureUnicodeCompilerString (computer science)Cache (computing)Floating pointDirection (geometry)WordData managementINTEGRALLevel (video gaming)Prisoner's dilemmaBlogFunctional (mathematics)Source codeCodeSoftware bugSequelNumberWebsiteSemiconductor memorySystem callPiType theoryObservational studyRow (database)QuicksortHeat transferLimit (category theory)Physical systemLine (geometry)State of matterView (database)Memory managementResultantRecurrence relationMereologyMultiplication signGroup actionLibrary (computing)Data typeElectronic signatureWritingOrder (biology)Cost curveEndliche ModelltheorieInterface (computing)Message passingTracing (software)FreewareDebuggerArray data structureModule (mathematics)Casting (performing arts)Mathematical optimizationPauli exclusion principleCrash (computing)Computer animation
01:31:16
Computer animation
Transcript: English(auto-generated)
00:05
OK, so let's get started. It's 1400, so everyone who's late is late. So welcome to the tutorial, this is the Cython tutorial. I'm Stefan Bienle, one of the core developers of Cython, so this is kind of a you can
00:26
actually ask me anything session. I'm also a Cython trainer, so I give trainings especially at the Python Academy in Leipzig. You can find more about that at the Python Academy or at my web page there.
00:42
I hope you've all downloaded the notebook that I'm going to present here. Anyone not prepared? Who dares? That's fine, OK. So, OK, first of all, what is Cython?
01:04
Cython is Python compiler, basically, first of all. It translates Python code to C code, so you can write native code in Python syntax. And the cool twist about it is that because it's translated into C or C++, it allows
01:25
you to directly interact with C code, with native code, C++ code, C code, with libraries, it allows you to write C and C++ code without actually writing C code.
01:42
So that's the cool thing about it. It's an actual programming language and many people use it to wrap libraries for Python or to speed up their code into native code. But some people also just use it, you know, because it's a programming language, they
02:01
use it as a programming language, like they would use Python, but as a kind of a cool programming language that makes many things easier than in other native languages. OK, so I'm going to use Jupyter Notebook for the tutorial, and the first thing you
02:26
would do in the notebook in order to enable Cython support, because Cython is actually one of the languages that Jupyter supports, you would say load xCython, and if you have Cython installed, then this will enable Cython-compiled cells, so I'll do that, and it tells
02:45
me I should restart my kernel. OK, so I'll just try again, it's not going to tell me anything, and it's imported
03:02
Cython. OK, cool. The next thing I put in there is just a quick, you know, summary of what dependencies you have installed, so I'm using the latest Cython master, actually you probably don't
03:20
have that installed, so it's perfectly fine if you use Cython 0.26, or even 0.25, that'll be OK for this tutorial. I'm using Python 3.06, is anyone using Python 2? OK, you're excused. So that's kind of fine too. Cython supports Python 2.6 and later, up to 3.07, so we continuously test against
03:48
the latest development versions of Cython in order to stay compatible, and that means whatever code you write in Cython will be adapted by us to newer Python versions
04:03
for you, so you don't have to do much there. The kind of recent NumPy version, one thing to know about Cython is it compiles code to C, so you need a C compiler or a C++ compiler installed in order to make any actual use of it.
04:20
Does anyone not have a C compiler installed? OK, I'm a person up there, everyone else has it, that's fine. So you will need this for this tutorial, because the way the compilation works is Cython takes your Python or Cython code, translates it into C, and then the C compiler
04:42
translates that into a shared library, into an extension module that you can then import and use in Python. OK, that's the way it works. So you have a three-step process. The nice thing about the Jupyter notebook is that it removes most of that, so you
05:04
can just write Cython code in the cell, run it, and it'll be compiled in the back and you won't see anything about it, and it's just going to get executed and imported and it's all there. You'll see that. Really nice. Regarding questions in this tutorial, you can ask questions at any time.
05:22
Please just interrupt me. When there's anything you want to know, I'll try to answer anything. Well, not anything, you know what I mean. OK, here's a very simple example. This is just normal Python code, I'm using the Python math module, specifically the sign function from it, and I'm just calculating the sign of 5, gives me minus 0.9.
05:47
So that simple, you would probably know that. And I can do the same thing in Cython, because Cython can translate and compile arbitrary Python code, so I'll just do that there. And the first thing you notice, you don't get any output.
06:01
Why is that? What Cython does in the back is it compiles your module, and the module gets imported as an extension module, and it actually gets executed, but the Jupyter notebook does not see what the last line in your code was, so it can't just say I got this result now
06:22
presented to you. Instead the code in there gets executed as a module, and there's no standard output of modules. So what I can do instead is I can say print the result here, and it's going to be printed
06:42
for me. OK? And there's a difference. So this is fairly simple, this is just still plain Python code. What I can do in Cython now is instead of using the Python math module, I can actually use the math support in the C library.
07:03
So libc math header. Cython comes with all declarations that you need for that, so instead of having to declare anything anywhere, I can just say C import, so that's an extension to what you would see in Python.
07:22
It's a compile time import, so there's a couple of extensions to the Python language that you would use in the Cython language, so one of them is C import, which gives you a static import, a compile time import, and I'm using the libc math module here, and
07:42
then I'm taking the sine function of that, and that's the actual low-level C sine function. And what I'm doing here is I'm taking that function and I'm assigning it to a variable called sine function, and this is a global module variable, meaning it's a Python variable.
08:03
So what I end up with is a module that defines a name, sine function, which represents the C sine function, and exports it to Python. So I can just take that and call it from my notebook, and here I'm using the C sine
08:21
function directly from Python. This is kind of the quickest way to export a simple function, I would say simple C functions, into Python. What this does more or less is it defines a Python function, which internally calls the
08:44
C sine function, and it takes a double and returns a C double, and it returns a C double result. And that's hidden behind this assignment here, and here I've spelled it out. This actually decides the syntax, so Cython allows me to define Python functions, but
09:03
declare the signature in a C way. When I execute this, it defines my C sine function and I can run that from Python. So what you can see here is that this is actually a very basic property of the Cython
09:23
language. It mixes Python and C freely. So most of the code that you will see, most of the Cython code that you'll ever see, will use some Python features here, use some Python objects here, then push something
09:42
into C, do stuff there, call a C function, get a return, and literally mix Python and C all over the place. And that makes Cython such a cool language for many things, because you get the simplicity of Python on one side, and on the other side you got all the native power of C in
10:05
one programming language. One nice debugging feature of Cython is, well here you see I've declared a Cython cell in Jupyter. What I'm doing here is I'm passing the minus-a parameter to that cell, and in addition to
10:28
compiling my code now, what it'll do is it'll analyze and annotate my code, and then output it as an HTML fragment down here, that tells me how the compiler saw
10:43
my code and what it made of it. So for example, when I click on this line with a plus here of the signature definition, then you'll see that there was a lot of code generated from Cython, just to implement
11:03
this Python function for me. And then down here, I can actually prove that it calls the C sign function, because the C code says, I take the variable x, call the C sign function on it directly, then take the result and turn it into a Python float, so that's what this function is doing,
11:25
Python float from C double, and then return the result as a normal Python float object. And Cython generates these things automatically for you, because it knows the result of the sign function here is declared as a C double, but it is used in a Python context, it is
11:44
returned from Python function, so it must be an object, and it automatically converts between the two value types for you, behind the scenes. Okay, now, this is very basic, and as I said, this can be done just by assigning
12:05
the C function to a Python variable. It becomes a bit more interesting when you do more stuff on the C side, so what I'm doing here is, I'm getting a double value in again, but then I'm taking the square of that,
12:22
and I'm taking the sign of the square. So this is implementing sign of square of the input, and it's doing all that in C space again. Okay, so when I compile this, and I actually compile it with Cython minus A, again then you see that this line down here is really just plain C multiplication, and then we take
12:43
the result and call the sign function on it. Okay, and this is something you also often see in many wrapper generators that you find for Python, say, there are many of them, in many of those, what you get is, you get
13:02
a plain wrapper around some C function, some C functionality, and then you have to deal with whatever that gives you in Python, and Cython allows you to make that wrapper thin if you like, but thicker if you need it, and that's completely seamless.
13:23
Whenever you need more functionality, or you want to aggregate some C functionality behind some Python function call, for example, or hide it behind a Python class, you can easily do that by just implementing the thing in Python and then calling whatever C functionality you need straight from your code.
13:44
Okay, so if you find that, yeah, right, I didn't declare CDF, okay. So CDF is another extension you've seen C import. CDF is the other basic thing that you'll see all over the place in Cython code, it
14:00
just says this is a C definition, okay, and what this does here is, it declares the variable x square as having the C type double, okay, that's all it does, okay. In fact, I wouldn't even need that, because Cython is smart enough to do type inference here, I have two double values here, so multiplying them obviously results in a C double,
14:24
and I wouldn't need to declare the result type of the variable, but I did. Yeah, yeah, it should be. So there's a bit of type inference available in Cython, which helps to avoid having to
14:42
declare too many things. Okay, so that was a really tiny, quick intro, let's come to a bigger example. So, I owe this example to a guy called Caleb Hetting, who presented a Cython tutorial
15:03
at Python Australia a couple of years ago, and I found a very nice example, why? Well, everyone likes taxes, right? So, I borrowed that idea from him, and what I did was, I looked up the actual tax calculation
15:20
function that we have here in Germany, and I found that there are currently, so for the last year, there were something like 44 million taxpayers, so earners, in this country, and the average income of them was 3, 7 or 3 euros per month, so times 12,
15:40
that's about what you get as an average income per year, so I'll just run this, and that's the average income. And then I started looking a bit for actual data that backed this number, and didn't really find anything, so I'll just, for the sake of this tutorial, I'll just create some alternative
16:02
facts here, and I'm sure all of you are happy with that, it's the right conference to do that, so I'll just assume that, first of all, there are only a 20th of the number of people, just to keep the calculations a bit faster, and a log-normal distribution
16:20
for the income. Okay, so you can see that the average is about right, and there's an alternative minimum income and a maximum income here, and I plot this, you can see that, well, that's about what you would expect, more or less, from the income distribution.
16:46
Well, more or less. So let's calculate everyone's taxes. I looked up the tax formula in Wikipedia, and this is what it gave me, it actually gave me an Excel implementation of that formula.
17:02
So I took that formulation and rephrased it in Python, and this is essentially it. So if your income is above that number, then it's calculated like this, otherwise, you know, it's a staged calculation. And that's also what makes it kind of interesting, because this is difficult to express in simple
17:22
terms. The best way to express this is actually with ifs. So this is how you would express it in Python, and then you can calculate the average income down here and the average tax rate across the data that we have.
17:44
You probably get slightly different results, because it's an alternative effect on your site too, but this is about the average income and the calculated tax, the average tax, and then I'll just do the whole thing in Python, and you can see that the average
18:08
tax rate is about 24% for my dataset. Okay, so when I ask TimeIt how long that took, it's going to tell me some number after
18:23
a while, a couple seconds, or maybe a couple more seconds, or some more seconds. Well, you see, it actually takes quite a while in Python, and TimeIt runs it a couple
18:40
of times in order to get accurate results, and then it tells me that it's about a bit more than three seconds. So let's say that's zero in milliseconds, I'll just remember that value, and then in order to make things comparable while I'm on the way of optimising stuff, I'm calculating
19:05
the ratios along the way, and I'm saying that Python is my base level, and that's the slowest version I've probably come up with. The next thing I did was I found a way to express the whole thing in NumPy, and this
19:21
is what I got there. It's selecting parts of the array that match these conditions, and then doing the calculation for the income ranges based on that. So I can calculate the whole thing on NumPy segments, and then sum them up separately
19:45
and calculate the average from that, and you can see that it gives me about the same average tax rate, 24%, and when I ask how long that took, it's going to be much faster.
20:03
So that's 57 milliseconds, and now I notice that I forgot one thing. Whenever you benchmark, one thing you should never forget is switch off the energy management.
20:21
So I rerun the Python version again, and it's again going to take a couple of seconds, so I'll get a proper result there now. It probably shouldn't be very much different, because it's running for so long, so the processor will scale up during that time, and I'm probably getting good results from
20:41
that anyway, but the faster it runs, the less likely it becomes that my CPU is going to be at full speed, because it's going to take a while until the power management notices that my CPU is actually being used, like heavily used, and only then it's going to scale up, and so I'm getting skewed results, but this is about what I had before.
21:02
So I'll just keep the old value there, and then I'll rerun the numpy version again, and then it's still about the same thing, so that is 58.1 milliseconds, and you can
21:21
see that's already 54 times faster than the Python version. That is pretty good. Okay, the same thing can be done with a numpy ufunc by simply taking my Python function and applying it to all elements in the array.
21:40
So I'll do that here. There's a frompyfunc function in numpy, to which I can pass my function in order to convert it into ufunc, and then when I apply that to the whole array and sum up the results, I get the same text rate, and time it on that is going to be much slower
22:03
than the slicing version, but still much faster than the pure Python version. So I'll remember that that took 818 milliseconds, and so now the numpy version is way faster
22:23
than that, and the Python version is only about 4 times slower. Okay, let's get back to Cython then. Up to now, this is pretty much what you would be able to achieve from Python as well, given the tools that we just had, so using plain numpy.
22:44
Now what Cython allows you to do is you can take the Python version of your code and compile it into C. This is usually not going to give you a big effect, simply because it's still going to
23:00
use Python objects, it's still going to call Python functions along the way, so there's not much out of the box that Cython can do for you, so when I just take the Python code that I implemented above, copy it straight in there, it's exactly the same thing, and now use a Cython cell instead of a plain Cython cell, so I get the same thing compiled
23:24
with Cython, and I can now call that on the list of incomes, should get the same results, 24%, and when I call time it on that, it's going to take a while, and then
23:44
we can see how it compares to the plain uncompiled Python version, and that's 275 milliseconds,
24:02
and still about 15% faster than the plain Python version, just by compiling it, just by adding one line at the top of the Jupyter cell, which is okay, I mean I didn't really do anything, I didn't care, I just dropped code in and it just got 15% faster, if that's
24:21
for free, that's fine, okay, but the cool thing about Cython now is that I can start optimising my code at a much deeper level, at a native level, rather than at a Python level, I have many things that I can do at a C level that can make it run much
24:40
faster than what I can achieve in Python, and the first thing I would do is, I add CDF declarations and the signature changes that I made, I can now start declaring variables as having C types instead of Python object types, so here's the same code copy again,
25:06
and now what I'll do is, I'll start annotating, I'll start changing the code and annotating it with static types, with the goal of making it faster, so the first thing I'll obviously do is, I know that the income is probably a double, so that can be safely represented
25:25
as a C double, and I know down here, well this is a list, it's working on a list and it's taking the sum, and what I see here is, down here, it's calling this function
25:41
a lot of times in a generated expression, so it's running over the incomes in order to calculate the sum, and when I see something like this, the first thing I would say is, this can be done much faster by running over the list once, and doing the calculation in one step, so I'll change this into a loop, I'll just spell out the for loop
26:07
that is hidden in this expression down here, and I'll say for income in incomes, my tax is, you can follow this, but the actual implementation of this is right below, so you have the whole
26:26
solution down there, so you can follow this, what I'm doing here, in order to see your own results, and if you're only interested in the results, you can just execute the cell, just so you know, so I'll spell out the expression here, as a single loop,
26:46
and that means my tax sum is going to be the sum of incomes, the sum of calling the function on my income, so for each of the income values, I calculated the tax, sum up the
27:08
taxes, and the total is the total of incomes, and then down here, I will say, the final result is the tax sum by the total, right? That's pretty much it. In order to make
27:28
that fast, what I'll do is, I'll declare all variables in here, as having proper double values, so that the whole calculation can actually be done in C-space, so all variables that I'm using here, the tax sum, the total, and the income itself, are all going to be
27:48
C-variables. Which ones do you mean? These up here? It depends on the context.
28:31
So what I did here is, I already declared, so in the inside of my function, I already declared the income argument as a C-double, and that means that all comparisons down here will be done in C-space
28:45
using C-doubles, and Cython will infer the C-double type for this variable, for this, sorry, for the constant also, because that's what it's comparing. So the way the
29:02
version then works is C-style, so if you compare an integer with a double, then the results, the values will be compared as doubles. If income was just a plain Python variable, then this would be a Python literal, and it would use object comparison.
29:27
So, what have I done so far? I'll drop the last line now, and what you can see now is, all the variables are doubles, the function is still called, we're summing up the values that we get back from
29:41
the function call, we're summing up the totals, and we do the total calculation in the end, so just compiled it this as it is, and see if the output is still the same. It's still 24%, and that looks right, and when I call time it on that, it's going to run for a while, and it should
30:04
already be a bit faster, quite a bit faster, so we're down to 139 milliseconds now, and that compared to the plain compiled version is a factor of 20 faster.
30:44
I suppose I was lucky, and they were initialized to zero, but that means I was just lucky. Yeah, yeah, what the C compiler does internally.
31:01
Something like that, yeah. Yep, so I should probably initialize them, and it shouldn't make a difference for me in this case. Just run it again. I mean, I get the same result, right, so it was still correct. Yep, same result. Okay, there's one more thing I can do.
31:24
When I run this through Cython minus A, and let Cython tell me what it did, I'm seeing a couple of things here, and you've already noticed the yellow lines, I guess. There are white lines, yellow lines, all this, and the level of yellowness in this output
31:43
tells me how much Python interaction there is going on, so any white line actually tells me plain C code here, and everything that's kind of yellow to dark yellow means there's some kind of Python runtime interaction, some object operations going on, okay?
32:01
So apparently I'm not at the point yet where I've turned my code into native code, because there are still tons of yellow lines in there, and even just the calculations here seem yellow. Why is that? When I click on them, I can see that the actual calculation based on the income variable is done in C.
32:23
So income times value minus something, but then the result is converted into a Python float. So what I forgot to do is, this function is still a Python function. It's still called as a Python function, it's using Python semantics for being called,
32:41
so it's using argument tuples for input, and it's using object for output, okay? And that is inefficient, since I'm only using this function internally inside of my program, I'm not really exporting it to anything, I don't care who else is going to use it, I just want this function to be called as fast as possible inside of my program.
33:02
And for that, I can turn the whole function into a C function by changing the declaration. Now it's a def function, a Python function, and I can turn it into a C def function, which makes it a C function. Now it's a static C function inside of my module.
33:23
In case people were still using it from the outside, I can use a third type of function, that's a CP def function, which turns it into a C function with a Python wrapper. So it's still usable from Python. If it's just a C def function, then it's only usable statically as a C function inside of my module.
33:46
Okay? So here, I'm actually going to use a CP def function, it doesn't make a big difference, but it still keeps the function visible from Python code. And now, since it's a C function, I can declare the return type.
34:04
I don't have to, if I don't declare it, then it's still object. But since I know that the return type is actually a double, a C double, I can just spell it out and say, this function is a C function, and it gets a C double as input, and returns a C double.
34:24
If it's a CP def, so it's Python, does that mean when you call it from Python, it'll still return a C double, but then Python will convert that? Yes. So the question was, when is the CP def, what does the Python wrapper actually do,
34:41
and what it does is, well, it's a complete wrapper, so it unwraps the input, it gets objects as input, or one object in this case, unpacks it into a C double, so you can pass in a Python float or Python integer, they'll just be converted into a C double in this case.
35:02
Then the calculation will be done, and on the output, the result of that is a C double, and since we're calling it from Python, the wrapper will convert the C double return value back into a Python float object.
35:25
So if it's called from siphon code, then that's exactly what the CP def is for. Calls from inside of siphon are fast, they use the C interface, they use the C function, call it directly, do not go through the wrapper,
35:40
and the advantage of that is, they actually see the signature. So if you pass a C double into that function, into a CP def function, then the caller will see that the call function actually expects a C double, there's no conversion needed, it'll just pass it through at the C level as fast as it can,
36:00
and the C compiler will see that and maybe even inline the call or do whatever optimization it can, and the output is a C double, no conversion needed either, you just get a C double back all done, that's it. Okay? That way you get the fastest possible call inside of siphon without losing the ability to use the function from Python, from the outside Python world.
36:27
Okay? So I'll compile that, and now you can see that the whole function dropped into C. So all the return values are now plain C double returns,
36:45
and even down here, that was previously a bit yellow, because the return value was a Python object, and it had to be converted back into a C double in order to sum it up, and now you can see that the function call is a C call,
37:01
that there's no input conversion going on, no output conversion. Any Python usage that's still left in this function is the iteration over the incomes list. That's it. Okay? Okay, I've compiled this, still gives me the same result,
37:23
and I called time it on it, previously we had 139 milliseconds, now we're down to 30 milliseconds for the whole thing. And so the ratio changed to a factor of 90 compared to the previously compiled function, unannotated function,
37:43
and to a factor of 105 compared to the plain Python implementation. Okay? Any questions here?
38:01
Well, the reason why the for loop is in yellow is that we're iterating over a Python list. So the input is a Python list, and the iteration needs to step from value to value through the list. But this is already faster in Cython than it is in Python, because Cython understands what a Python list is, and it can unpack it at the C level and just iterate over the underlying C array of objects.
38:23
Okay, so the iteration and the conversion there is very fast. So why is the last line still yellow? I'm going to go back up. This is a Python function, so I'm calling it from Python, using it from my Jupyter notebook as a Python function,
38:42
and so the return value here. You can see that the tax sum divided by the total is being evaluated in C space. Another thing you can see from the C code is that the total is first checked for zero, and you get a zero division error if the total happens to be zero. Okay, so if I pass in an empty list into this function,
39:04
I will get an exception and not a crash in C. Right? Wonderful. Okay, so there's a bit of exception handling going on here, which is very fast, because this is just a C value comparison. And the final conversion of the result for returning the result
39:24
is also done using a Python operation for creating a float object. And that's it. The whole calculation up to that point is happening in C now. Okay.
39:48
The question was, can I declare a list of doubles for the incomes? No, I can't. So there's currently no way to express... So what I could say is, this here is a list.
40:06
I could even say, this is a list and it's definitely not none. So this is also something I can express in Cython syntax. What this does is, it does a type check on the way in, and if the input value is none, then it will raise, I think, a type error,
40:23
and if it's not a list, you get a value error. Something like that. But it will check the input and make sure it's definitely a Python list and nothing else. It has a tiny advantage, because now Cython knows that it is a list and not maybe a tuple or maybe some iterable.
40:44
Previously, the code supported any iterable. Now it only supports Python lists. So that will cut down the generated code a little, because the loop code will be simpler, but it's probably not going to make a big difference. I can try it.
41:00
So I just compile this and rerun the benchmark, see what kind of a difference it makes. See, it's really tiny. It's a couple of percent. And that's because the looping, the for loop in Cython is so fast. And it has special handling, four tuples, four lists.
41:23
When you pass a list and it hits the fast path, this is a list, then you end up with very fast code, despite not knowing at compile time that it was a list. Because Cython is just going to assume that most of the time, what you're iterating over is probably a list.
41:42
So we're optimizing for that. The cool thing, if I leave this out, if I don't declare anything here for the incomes, is it's still going to be about as fast, not a big difference, but it's going to support any iterable input.
42:03
Not just exactly Python lists. If I say, there's a list, then it would not even accept subtypes of lists, however unlikely they are, but it would reject anything that's not a list. Okay, any more questions?
42:21
Yeah? We're coming to that in a minute. That's right, the next thing. Okay, so we've already achieved a factor of 105 faster code by now,
42:41
and it's already faster than the NumPy version, by a factor of two. Okay. So this is really nice, and I think that's pretty much what I have implemented on here. Yeah, it looks exactly the same.
43:01
So now, a question to you. We could now do an exercise here. I could let you do kind of the same thing that I did on another function. Optimize that. Do you want to do it, or do you want me to continue my show-and-tell presentation?
43:21
Does anyone want to do the exercise? Clear majority, sorry. Okay, so I'll just keep going. You can still do the same thing, and we can talk about it during the conference. Okay, so I'll leave this to you,
43:40
and the next topic then is memory use. And that's the thing you wanted to hear about. So if I'm actually using... Question, is this the fastest version already?
44:01
Well, since my tutorial is not finished yet, probably not. So this is probably the fastest way to do it, if your input is a Python list of numbers. Okay, question was, how does it compare to a fully native C implementation?
44:29
You've seen all white lines. The whole operation, the whole calculation was actually done in C space. So you get exactly the same performance in C as you get for this Cython code.
44:43
It's just plain C in the end. You can try it, you can write a C version and then wrap that in Cython and call it. Does Cython use optimizing C compiler flags?
45:03
Yes, Cython actually does not configure your C compiler for you. What it does is, it generates a C code. And then in this case, it's the Jupyter integration that calls the C compiler on the back. And that uses your C flags, LD flags, whatever you have defined in your environment.
45:22
So it's your responsibility to configure your C compiler properly. And if you know better C flags, then the current users just use them. Okay, so memory use. So the question two questions ago was, what if the whole thing is stored in a NumPy array and not in a Python list?
45:45
This is actually a very common case. How often does it happen to you that you have actual data in a Python list, rather than in a Pandas data frame or a NumPy array somewhere? It's probably quite likely that you'll hit the case that all these income values are already in a NumPy array.
46:04
And Cython has special support for that. So there is a protocol in Python that was added in Python 2.6 and Python 3. Back in the days of, that was like 2007, 2008, around that.
46:22
So it's already about 10 years old now. And that's the buffer protocol. And that is something that is supported natively by NumPy and some other libraries out there. And basically whenever you're pushing around big native data blobs in some way,
46:42
it's probably going to support the buffer interface in one way or another. And so that's why Cython has native support for buffers. So what it allows you to do is, you can take a buffer, you can receive a buffer as input, unpack it into the C memory, and just do some operation on the memory natively,
47:03
rather than going through some Python interface there. So it's a one-shot unpack operation, and from that point on you have native C memory. That changes a bit the interface here. What we previously had was, we just had an object coming in, a Python list in this case.
47:24
And what we're going to change now is, we're not going to use the Python list of data, we're going to use the NumPy array of data that we have. And we pass it in, and declare it to the Cython compiled function as a memory view on that data.
47:47
And that's a special syntax for that, which says it's a one-dimensional array of doubles. That's it. That's how you declare a memory view. And what Cython knows from that is, there's some object coming in which supports the buffer interface.
48:05
It knows that the items in that object, in that memory, are C doubles, have the format of the C double. It's one-dimensional, and if whatever it gets in is not one-dimensional, it's going to be rejected, it's going to say, told you error, wrong input.
48:20
But if it's one-dimensional, it's going to be happy about it and unpack it into a C buffer. And then you can operate on that. In this case, it's actually very simple. Since it's C space now, I'm no longer going to use the Python for loop. What I'll do instead is, I'm going to use an integer loop for i in range len incomes.
48:55
Is this actually going to work in Cython 0.26?
49:00
That's a good question. It might not. If it does not, in 0.27, this is going to work. In 0.26, I think you still have to say, income shape 0. As you would know from number height, probably. The size of the first dimension.
49:21
So, integer loop over the length of the array that we got in. And then in here, I'll just say, we sum up incomes at i. And run a function on the value at that point.
49:45
And that's about it, I think. So I'll try to compile it. And looks like Cython is happy with that. Yep, still the same result. Now passing in a numpy array.
50:00
So I changed the input from incomes, that's the list, to incomes numpy, which is the numpy array. Gives me 24%, the expected value. And now when I run this, we get 11 milliseconds.
50:25
So we'll use 11.1. So that is much faster. And that's already a factor of 5 compared to the numpy version. Another factor of 2.7 compared to the last version we had.
50:41
And a factor of 280 compared to the Python version. Is this clear to everyone? Any questions here regarding memory use?
51:00
The syntax is basically modeled after what you would have in numpy. So in order to get a slice, a single dimensional slice, you would just put a colon in. And that's pretty much what we used as the syntax here.
51:24
Sorry, again. So the question was, when I had the Python for loop for x in the incomes array now, why did that still work?
51:47
Well, it still worked, because numpy arrays actually support iteration. I mean, we could actually optimize the same thing in Cython as well.
52:00
We now know that it's a memory view, and we can say, you can actually iterate over a one-dimensional memory view. Why not? We can do that efficiently for you. So yes, I mean, that would be a nice feature. I'll note that down somewhere. Yeah, but I mean, it's kind of good enough.
52:27
Okay, and it's about how much? Three times faster. What did we have before? 2.7 times. So using memory views here and using numpy arrays as inputs made a big difference again.
52:49
Okay, now I'll show you one more thing here. When I compile this with size minus a again. There's one little thing down here.
53:01
These two lines are yellow, which is a bit unexpected. But what it does internally is, it plays it safe and tries to avoid crashes if you run over the size of the array. And since I actually know that I'm not running over the size of the array, because that's exactly the range I'm iterating over,
53:23
I can tell Cython, please don't do that. I can take care of myself, I know what I'm doing. So I can say C import Cython, that's the magic Cython module, which has some nice functionality. And amongst other things, it has a decorator bounce check, which allows me to disable bounce checks.
53:55
And I can see that the bounce check is apparently gone. So whenever i is too large for the array, it's just going to crash for me, which is fine because I know it's not too large.
54:07
That's me knowing it and I'm telling the compiler, you know, I know better. And now when I run that again, same result here. And for the speed, what do I get?
54:23
It's a bit faster. Okay, tiny bit. Not a huge difference, but it's getting closer to the factor three compared to the list version. Okay, so these are little tweaks that I can do here and there. Once I know that my code is correct and it's actually running and does the right thing in all cases,
54:44
I can start disabling safety measures that Cython provides for me. Okay, what else do we have?
55:00
There's another exercise here, which you can also try at home and then talk to me afterwards. That's one more thing you can do. Now to speed this up, and that's parallelizing the code. So far it's all running one thread. The whole calculation. And now what I can do is, I can use OpenMP to use multithreading, because my laptop has enough cores.
55:26
It actually has just two cores in this case, but you know, multithreading and hyperthreading is a bit faster than just one thread. So I can actually use the power that my laptop has, the different cores, and parallelize my code.
55:44
Which is trivial in this case, because we're doing the same thing on each of the items. So this is trivially parallelizable. And all I have to do is to tell Cython to use parallel threads rather than one thread.
56:03
So as I said, I'm using OpenMP for that, so you need a C compiler that supports it. For a while, CLang has not supported it. I don't know if that changed along the way. If you have GCC, then this is going to be enough. I just have to tell the distutils build that the C compiler needs a compiler argument fOpenMP, and the linker needs that too.
56:34
And then when I compile this, it's actually not going to change anything, because I haven't changed my code yet.
56:42
But that's the first thing I have to do, just checking that it works. So the OpenMP integration apparently worked. If it does not work for you, then it's probably a C compiler that's missing OpenMP support or something like that.
57:01
Now, what I can do here is, I'm using an integer loop to run over my array, and all I have to change is... I'll just change the import here. Cython has a parallel module, and that comes with a prange function, which
57:31
just changes one letter compared to range, but a lot in terms of functionality. And all I have to do now is change range into prange, which makes it a parallel range loop.
57:47
Otherwise, I don't have to change anything, just still running over the whole length of the array. But now it's doing that in parallel, and OpenMP allows me to say how many threads I want to use here.
58:02
There's an argument called numThreads. I have to look that up, so you don't have to know it either. So I use four threads here, and another thing that I have to do is, for this loop, I have to unlock the GIL, because otherwise I won't get parallelism.
58:25
The GIL is the lock in C Python that guards basically everything, every assess to the interpreter. So the GIL is the thing that makes C Python fast, because it doesn't have to do lower-level locking in all sorts of places, it just has one lock.
58:45
So it runs very fast when single-threaded, but since my code is already using plain C code for the execution, I do not need the GIL, because it does not do any Python direction. It does not need the Python runtime.
59:03
And since the GIL is only there to safely guard the Python runtime, I can release the GIL, let Python do other stuff in the meantime if it wants to, and just start a couple of C threads that run my plain C code here efficiently in parallel.
59:23
So, one more argument, no GIL, true. And that's it, that's all I have to change. So I changed the range loop into a p-range, said I want four threads to run this loop, and I want to release the
59:40
GIL while it's running, and then when the loop is done, it's going to reacquire the GIL, and that locks things down to single-threaded again. So I'll compile this. Got a little warning, so I did something wrong. Ah, yes.
01:00:00
So what it tells me is, calling a function that requires the GIL is not allowed without the GIL. Why is that? Because I did not tell Cython that it's actually safe to call this function without the GIL. So there's a check in Cython whenever I release the GIL. And that's something really nice in comparison to writing actual C code.
01:00:23
So if you've ever written a C extension in C and released a GIL somewhere, it's probably going to suck fall at some point. This can not happen that easily in Cython, because Cython does some checking. It tells you whenever you try to do Python operations without owning the GIL.
01:00:43
And so what I have to tell it here is, this is a function that can be called without holding the GIL. And then what Cython will do is, it'll check that function and make sure that that function does not use any Python code either, any Python interaction either. So it's going to check that function for me, it's going to check my loop for me,
01:01:04
it's going to see that I'm calling that function which allows me to call without the GIL. And so it's basically checking the whole execution chain, the whole call chain for no-GIL safety. And since it's all no-GIL safe, it's not using any Python interaction, it's going to
01:01:22
say, all fine, and it's hopefully going to compile it for me. Still got a warning, which is fine, that's just an OpenMP warning. It's a warning, not an error. So it has now compiled my code for me, and I should get the same result, but it should be faster, because I'm now using four threads to run it, rather than just
01:01:44
one. And we're down to six milliseconds. So that's another factor of 1.6 or 1.7 faster than the previous memory view version.
01:02:05
And it's a factor of 500 faster than the Python version. Okay. So the question was, why did I have an auto-picker up here, auto-picker fault? I should also have that in the solution down there.
01:02:22
That is a new feature in Cython 0.26. Previously extension types, so native class implementations, were not pickable. Now they are.
01:02:40
So there's automatic pickling support, and that also applies to memory views. And so the memory view I'm using here would actually be capable of pickling it, dumping it to disk and reloading it, and I didn't care, it just loads up my program without being useful. So that's why I'm disabling the auto-picker in order to avoid the code overhead.
01:03:02
That's it. It's a really nice feature, but it's just not used here. Okay. So that brings us down a factor of 500. And even compared to the NumPy version, we're now at a factor of nine.
01:03:27
Okay. And then I'm pretty much through with the main part of my tutorial. Any more questions regarding this?
01:03:55
Okay. The question was about data dependencies when I'm parallelizing my loop, and I'm not just using incomes i, I'm using incomes somewhere else in the array.
01:04:06
It's not a problem if you stick with that kind of scenario. If you're only reading from the array, that's fine. You can concurrently read from throughout the array. It's probably going to slow down your algorithm a bit if you do random assess in the array
01:04:25
rather than just running through it once, simply because memory bandwidth becomes more important in that case. Actually, at that level, people have to care about memory bandwidth, right? I mean, you don't really have to care about that if you're dealing with Python code somewhere,
01:04:40
but if you're going that deeply native, the memory bandwidth actually becomes important. And so as long as you're only doing read assess to your array, that's not a problem. It can be done concurrently across all threads any way you like. But you obviously run into the thread concurrency problems when you're doing write assess in the array.
01:05:09
How would you actually distribute packages that actually include Python? Okay, so how does distribution of Python code work?
01:05:23
In exactly the same way as it works for other native extensions. We've been using NumPy here, NumPy is a native extension, so there are lots of native libraries or binary libraries in the Python world that you can get from PyPI.
01:05:44
We have integration with distutils. Basically, what you would do from your setup.py script is you start with your Python code, which is commonly written in pyx files, P-Y-X, which stands for Python Extended, maybe.
01:06:10
So you write your Python code in pyx files, and then we provide a sythenize function, where you can just say, sythenize all my pyx files that you find in my project, for example.
01:06:25
And sythenize will generate extension objects, so distutils extension objects from that, that you can then pass into the setup function. And from that point on, distutils will take care of compiling your extension into binary modules,
01:06:42
and then you probably end up building reels from that and distributing them to PyPI. The question was, when I'm currently passing in a NumPy array with the data,
01:07:18
what if I had a C array coming from some C library, for example,
01:07:23
spits out some C data, C memory that holds the data, what would I do with that? If I had that case, I would probably split up my function. So I would provide a Python function that still is able to unpack the memory view.
01:07:46
And then drop the actual code into some CDIF function that I use internally, internal average text rate, and it would do the same thing with the expected C interface.
01:08:12
So C double pointer data and say size t length, and then I would deal with that.
01:08:23
So I would use the length here and just use the array. This is how you would do C array indexing in sythen. Pretty much like in C. And then here, in order to do the conversion, I would call my function and delegate to it.
01:08:52
So that would just be a plain C function. And I would do something like take the address of incomes zero and length of incomes,
01:09:08
or incomes shape zero, and then just pass that on to my function.
01:09:21
So this would have exactly the same performance characteristics as before. Exactly the same speed. The unpacking of the NumPy array would be done in this wrapper function here. And if internally I had some C function returning some buffer,
01:09:46
then it would return it as a double pointer and a length. That's the normal C interface that you would expect. And I would just pass that into my internal function. That way I could use both data sources, a C data source and a Python data source,
01:10:02
both using exactly the same algorithm. I would then also move the bounce check down here, using exactly the same implementation, but it's more versatile for me now because I can use it in more cases.
01:10:22
That was another question up there.
01:10:45
We can try. We can try if that makes a difference. You can do that. That's perfectly fine for memory views. Instead of saying this is an arbitrary memory view, I can say...
01:11:00
The previous version supported arbitrary memory views. You could do slicing and pass in a NumPy slice rather than a plain array. When I say I need a stride of one here, then it's going to reject input that does not have a stride of one. And I can no longer pass in NumPy slices.
01:11:23
Let's see if that makes a difference. I have to revert that back to incomes-shape-0. And anything else? Data was incomes and I think that's it.
01:11:41
So it's compiled it. Let's see. Exactly the same speed. Okay. So it is using strides internally.
01:12:04
Strides for the offset calculation. And apparently the C compiler... Well, it sees the index calculation and is apparently capable of generating sufficiently efficient code from that to avoid big overhead for the stride calculation.
01:12:23
And this is why C and C compilers are surprisingly smart sometimes. That's also the nice thing about developing in Cython, because when we're generating C code, or when we write the code that generates C code, we always know that there's a C compiler cleaning up after us.
01:12:43
So we don't have to do all the optimizations ourselves, we just have to tell the C compiler well enough how the code works that we've generated to make it understand how it can optimize it. Otherwise it's just going to do all the native adaptations and everything.
01:13:09
The question was, can you use OpenMP pragmas in Cython code? You can't, because they wouldn't be passed through into the C code.
01:13:24
And you also probably wouldn't. I mean, there are some features in OpenMP, or there might be some features that you could end up wanting, but we don't support. But we actually support quite a bit of OpenMP, what it provides.
01:13:45
One thing I didn't tell you is, I'm doing the sum calculation here, and this is one function global variable, but I'm doing the sum over.
01:14:02
And I still end up with one result, despite having a parallel loop here. That's because += is actually special in PRange. When I do income equals income plus x, this is different.
01:14:25
So, what the += does is, it turns the income variable into a global regression variable. It's a variable that's global to all threads, and the sum that I built in that variable
01:14:43
has been accumulated back after the threads have run. That's for the +=. When I spell it like this, then the income will be thread-local, and every thread will build their own sum separately, and I won't get the same result.
01:15:11
And so this is the expected way to spell it, and I get the expected output. You can compare it to this rule in Python,
01:15:25
that assigning to a function local variable makes it local. That's because that's the expected behaviour. When you assign to a variable inside of a function, in 99.9% of the cases, you want it to be a local variable.
01:15:43
That's why you're assigning it to it. And that's why for the .something percent of the cases, where you actually wanted to change a global variable, that's the global keyword. And it's exactly the same thing. If you use the += here, in almost all cases,
01:16:02
you want a global sum, and not a thread-local sum. That's why we just do it that way. The use case optimised. More questions? Yes. I'm glad you asked.
01:16:36
So we still have about 50 minutes for the rest of the tutorial, and I can show you an example,
01:16:43
where I'm actually wrapping an external C library. Because that is one of the... It's an extremely common use case for Cython.
01:17:02
Different notebooks, so you won't find that. I think I actually hand out that notebook too. Drop the header. That's the intro quick notebook. It should be in the zip file. So what I'm doing here is... Well, you probably won't be able to compile this,
01:17:22
because you'll need Lua for it, which you might not have installed. So I've actually written a Lua wrapper, which is called Looper, which allows you to use Lua code from Python. So this is an actual example, similar to what Looper does.
01:17:42
So this is a Cython implementation of a minimal wrapper for running Lua code from Python. And for that, I'm linking against the Lua JIT library, which is a JIT implementation of Lua.
01:18:01
For that, I'm telling Cython, or I'm actually telling distutils, where it can find the Lua JIT header files and the Lua JIT library. I would normally do that from my setup.py script. Since I'm using the Jupyter notebook here,
01:18:22
I can do it right in my cell. Which obviously is not portable, so if I wanted to compile this in Windows, for example, then it wouldn't be able to find... Well, user-include? What's that, right? Here it works for me, because I'm on Linux.
01:18:41
So this is where the Lua JIT header files are installed. I can configure the whole thing from my setup.py script as well. So I can pass in configuration into the Cythonize function that we had. I can tell it where to find external libraries, external header files, all that. So I can set that up in my setup.py script so that Cython knows how to configure the build for me.
01:19:03
The example we had was... I was just saying C import libc math, and that's it. That works because the declarations of the math header file that comes with libc
01:19:20
are shipped as part of Cython. So Cython has declarations for a part of libc, for some parts of POSIX, for libcpp, so the STL and C++. For the C Python API, and I think a couple of more things,
01:19:41
OpenMP, a couple of functions that you can call OpenMP. So this is all pretty clear. You can just say C import this, and you're done. So when you need a C++ vector, for example, you can just say from libcpp vector C import vector, and it can use this STL vector in your Cython code.
01:20:02
That's it. I can also show an example of that in a minute. If you're using your own C library, then you have to provide the declarations yourself. This is done using an expression cdf-exturn from some header file.
01:20:22
This is starting a block where you declare external functions. And as you can see here, this is probably not the complete content of the Lua header files. There's way more in that. That's all I need for my code. So I don't need to repeat everything that's in the header. I just need to tell Cython what's there that my program is going to use,
01:20:45
because the C compiler afterwards will actually see the complete header. So my generated C code will say include Lua header. So the C compiler will see the whole thing, and Cython only has to know about the stuff that I'm using. So what I did was I looked through the Lua documentation
01:21:02
and copied out everything that I needed. That includes struct, which is the interpreter state of the Lua runtime, then a function to create a new runtime, close and clean up the runtime, load Lua code into a buffer and compile it,
01:21:23
then do some stack operations on the Lua stack and so on and so forth, a couple of enums, a couple of conversion functions, and that's all I need. So I literally copied those out of the header file into my PIX file. I would normally copy those into an external declarations file.
01:21:44
So rather than copying it in my use-only here PIX file, my source file, I would use a PXD file, so a PIX declarations file, because that allows me to reuse the declarations in multiple Cython modules. It's sort of like a header file, just for Cython.
01:22:03
And now, once Cython knows that these C functions exist, I can write a Python function here, which takes Lua code as an argument. In this case, I have to convert it to a byte string, because that's what Lua expects.
01:22:21
I can take a Python text string, a Unicode string, and pass it into Lua, and the function needs to convert it here. Then I create a new runtime. The return value there is null on an error, so I just say, if there was an error in Lua,
01:22:41
then raise a memory error for me. Very common pattern in Cython. So whenever you call malloc or some initialization function and it fails, and the documentation says, if null pointer returned here, then memory allocation failed. In Cython, you would just say, if not pointer, raise memory error, done.
01:23:02
And then, since I have to clean up the whole thing afterwards, I say, I open a try block, and in the finally, I say, clear the Lua top and close the runtime, clean everything up. So I'm using try-finally in order to make sure that I'm not leaking any memory, I'm not leaving the Lua runtime in some legal state here.
01:23:29
And try-finally is perfect for that. And that's also, again, that's the mix that you see in Cython code between Python and C. Try-finally, if there's an exception raised from my code,
01:23:42
then the finally block, I do some C cleanup. Safe thing to do. Okay, what do I do in this try block here? I load the data into Lua, compile it there. If that fails, I raise a syntax error. Then I call the compiled code.
01:24:01
If that fails, I raise some runtime error. I could try to get some error message from Lua there and put it as an error message in. And then I look at the Lua stack, and what I actually expected was, I expected a number back. That's the only thing I can convert here.
01:24:22
So my Lua code has to return some number, and then I convert that into a Python number. And when you look at this function here, Lua2Number returns a float. So it looks up some value on the Lua stack and returns a C float. And in my Python function, or in my Cython function,
01:24:42
I just say, return that C float, then it turns into a Python float. Okay? Okay, and here's some Lua code. So I'll just hope that compiles.
01:25:00
Not initialized. Try that again. Okay, that's compiled with warnings. I guess that's fine. And it executes my Lua code.
01:25:22
Okay, and it returns 55 as a result. So that's just a recursive Fibonacci, as you would expect from, you know, running stupid code as an example. And so this is basically how you would wrap an external library. And the interface, the Python interface is really nice, it's just Python function pass in code,
01:25:40
and internally it does all the ugly setup, C management, something, stuff, create some C state, clean it up, do safe memory management, do failure handling, all at the C level. You can't see any of that from the Python level.
01:26:01
Anything else there? Another question. Creating NumPy arrays from C arrays. So basically you have some memory blob there and you want to wrap it in a NumPy array. There is a NumPy API function for that,
01:26:23
a C API function. That's another thing, we also have the NumPy C API wrapped, so pretty clear, so you can just say C import NumPy and then do stuff from there. Call C API functions of NumPy and that's one of those. I guess that's the last question.
01:26:55
That's the expected question. I hear that at all conferences lately.
01:27:01
People always say, so there's this typing module in Python 3.5 and why don't you support that? Wouldn't that be the obvious thing to do? Sort of. It wouldn't be entirely obvious because it would still be limited to the typing system
01:27:22
that you could probably use from Python. So for example, the Lua wrapping here would be impossible to do with any Python level typing. You simply wouldn't be able to declare an external C function and actually call it from Python code.
01:27:40
So what this syntax would allow you to do is, you could write Python code and add signature annotations in Python syntax and Python 3 syntax that would be interpreted by Cython in order to basically optimize your code
01:28:01
when it's compiled. And yes, that is possible. So recently someone came up and there are a couple of things to that. One is, we do have support for annotated Python code.
01:28:20
So when you import this magic module in Python code, it actually exists, so you can say import Cython, then it provides some data types that you can use and some cast functions and stuff like that. For example, there's a Cython.int or a Cython.float
01:28:42
or a Cython.double or something like that. So they are all C types provided by that and someone converted that to a typing stop. So there's now a typing support for the typing module for PEP 484, style typing,
01:29:00
which allows you to use Cython types in type annotations. The thing is, Cython itself does not pick them up yet. But that's probably doable. So there's work going on in that direction, but there are also limits to that, because you can't get beyond the level
01:29:23
of what Python allows you to do. You can optimize Python code with it, but you can't go native. Is it possible to interact with the Cython debugger from the IPython? For example, when some code crashes in IPython, you can type the magic directive debug,
01:29:42
and it will go into the text trace and you can go up and down. OK, question was, is there any integration with the IPython debugger? Can you basically debug Cython code from IPython when it crashes and say debug something?
01:30:01
There is support for GDB. You can do GDB debugging at the Cython source level. There originally was support for Python debugging, so you could use GDB for Python source level debugging, and someone added Cython support for that,
01:30:20
so you can now do free source level debugging. You can debug at the C level, at the generated C code level, at the Cython source code level, and at the Python source code level, all for use in GDB. But that does not give you Python debugger support in that sense. There has been work being done in PyCharm
01:30:42
for supporting Cython debugging. And that's at the source code line level, using the actual Python debugger. I don't know what the actual state there is, but there are capabilities in that direction.
01:31:02
OK, I think that's pretty much it. I'm all through, and I hope you learned something.