The Cython Compiler for Python
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 112 | |
Number of Parts | 119 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/20033 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Berlin |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
EuroPython 2014111 / 119
1
2
9
10
11
13
15
17
22
23
24
27
28
41
44
46
49
56
78
79
80
81
84
97
98
99
101
102
104
105
107
109
110
111
112
113
116
118
119
00:00
GoogolComa BerenicesCompilerSoftware developerCore dumpMereologyGroup actionWebsiteMaxima and minimaComputer animationLecture/Conference
00:44
Dedekind cutVarianceCompilerInformation technology consultingMobile appAndroid (robot)Reading (process)CompilerSoftware developerOrder (biology)Source codeMusical ensembleWave packetSoftware developerIncidence algebraInformation technology consultingPrisoner's dilemmaProcess (computing)INTEGRALBitReading (process)Open sourceMultiplication signLecture/ConferenceComputer animation
02:03
Value-added networkSpecial unitary groupTypsystemInterior (topology)MereologyVarianceCompilerSummierbarkeitComa BerenicesMachine codeMathematical optimizationFluid staticsKummer-TheorieModul <Datentyp>Open sourceFormal languageField extensionProjective planeMeasurementCycle (graph theory)Electronic mailing listMachine codeMereologyBitEmailSource codeMultiplication signField extensionType theoryCompilerOpen sourceFluid staticsRegular expressionArithmetic meanNormal (geometry)Cartesian coordinate systemArithmetic progressionGroup actionSlide rulePhysical systemChi-squared distributionGame controllerSubject indexingResultantDifferent (Kate Ryan album)OscillationTheoryInformation securityWordFormal languageLecture/ConferenceComputer animation
04:28
Module (mathematics)Machine codeLibrary (computing)CompilerDiagramInterface (computing)Insertion lossMachine codeRight angleConstructor (object-oriented programming)CompilerEndliche ModelltheorieType theoryLibrary (computing)Module (mathematics)Kummer-TheorieINTEGRALWritingDrop (liquid)Machine codeModule (mathematics)Pattern languageForm (programming)Cycle (graph theory)Lecture/ConferenceComputer animation
05:28
Range (statistics)Task (computing)Computer fileSocial classMachine codeSummierbarkeitSpecial unitary groupCompilation albumRevision controlDifferent (Kate Ryan album)CompilerCompilerLine (geometry)Source codeMaizeMatching (graph theory)Representation (politics)Revision controlImplementationMachine codeFunctional (mathematics)Library (computing)Scripting languageInstallation artOcean currentDifferent (Kate Ryan album)Computer fileBitLine (geometry)Portable communications deviceSound effectMultiplication signObject (grammar)WebsiteComputer iconRow (database)Arithmetic progressionWeb pageSocial classData conversionElement (mathematics)Lecture/ConferenceComputer animation
06:57
Machine codeCompilerRevision controlDifferent (Kate Ryan album)VarianceCompilerLine (geometry)Special unitary groupCore dumpModul <Datentyp>Scripting languageBuildingSet (mathematics)Machine codeDifferent (Kate Ryan album)CompilerRevision controlMultiplication signOnline helpQuicksortContent (media)Adaptive behaviorData compressionMathematicsCoordinate systemProteinEndliche ModelltheorieEvent horizonMetadataDistanceConnectivity (graph theory)WebsiteRule of inferenceMoment (mathematics)Functional (mathematics)Module (mathematics)Machine codeKummer-TheorieSource codeCASE <Informatik>RewritingLecture/ConferenceComputer animation
08:25
Module (mathematics)Endliche ModelltheorieBuildingLibrary (computing)Modul <Datentyp>Field extensionCompilerScripting languageExt functorCore dumpKummer-TheorieConfiguration spaceSummierbarkeitMenu (computing)CompilerMachine codePortable communications deviceSpecial unitary groupComputing platformLevel (video gaming)Formal languageMachine codeVarianceLatent heatObject (grammar)BuildingModule (mathematics)Kummer-TheorieMachine codeSoftware developerQuicksortTrailLibrary (computing)MetadataModule (mathematics)Configuration spaceForm (programming)Centralizer and normalizerEndliche ModelltheorieWindowElement (mathematics)Image resolutionCompilerContext awarenessProduct (business)InternetworkingComputing platformMedical imagingLecture/ConferenceComputer animation
10:03
CompilerMachine codePortable communications deviceSpecial unitary groupCompilerComputing platformLevel (video gaming)Machine codeOperations researchObject (grammar)Software bugElectronic mailing listFrame problemFunction (mathematics)Exception handlingAlgebraic closureMeta elementSocial classElectric generatorProgrammschleifeData structureControl flowFormal languageSuite (music)Complete metric spaceLinear regressionSoftware testingContext awarenessRevision controlFigurate numberFormal languageWordCompilerMachine codeLinear regressionExclusive orMereologySocial classSoftware testingComplete metric spaceSurvival analysisInferenceDirection (geometry)Moment (mathematics)Fundamental theorem of algebraControl flowVideo gameDivision (mathematics)QuicksortGroup actionFunctional (mathematics)Standard deviationWritingFrame problemGame controllerAsynchronous Transfer ModeLevel (video gaming)Source codeLecture/ConferenceComputer animation
11:43
Software bugElectronic mailing listException handlingFrame problemFunction (mathematics)Meta elementSocial classElectric generatorAlgebraic closureControl flowData structureProgrammschleifeOperations researchObject (grammar)Formal languageCompilerSuite (music)Complete metric spaceLinear regressionSoftware testingStandard deviationMachine codeTypinferenzFluid staticsBenchmarkException handlingGame controllerCASE <Informatik>Machine codeStandard deviationFunctional (mathematics)Software bugTypinferenzFluid staticsHome pageMathematical optimizationLocal ringSpectrum (functional analysis)Order (biology)Different (Kate Ryan album)CuboidGroup actionLecture/ConferenceComputer animation
12:29
Gamma functionMachine codeStandard deviationFluid staticsTypinferenzFunction (mathematics)BenchmarkCompilerSuite (music)SummierbarkeitInterior (topology)Tape driveCalculationOperations researchData typeSpecial unitary groupPresentation of a groupBitIntegerLine (geometry)Electronic mailing listSuite (music)BenchmarkMachine codeType theoryCompilerFluid staticsDeclarative programmingMereologyReal numberDivisorTemplate (C++)NeuroinformatikComputer simulationRaw image formatObject (grammar)Drop (liquid)Computer programMusical ensembleSheaf (mathematics)Connectivity (graph theory)Lecture/ConferenceComputer animation
13:42
Machine codeInterior (topology)Fluid staticsTape driveCompilerVarianceCalculationData typeCompilerDemo (music)Letterpress printingData conversionSpecial unitary groupMaxima and minimaUniform resource nameExecution unitHost Identity ProtocolKummer-TheorieExt functorStructural loadView (database)Cellular automatonKernel (computing)Computer fileOnline helpSummierbarkeitTheory of relativityData managementMachine codeMereologyPlane (geometry)Different (Kate Ryan album)Drop (liquid)Formal languageObject (grammar)Operator (mathematics)BitKernel (computing)Type theoryMultiplication signSystem callSummierbarkeitIntegerDirection (geometry)Connectivity (graph theory)Standard deviationForm (programming)Machine codeWordLecture/ConferenceComputer animation
15:40
Kernel (computing)Control flowKummer-TheorieData conversionStructural loadExt functorInterior (topology)SummierbarkeitView (database)Computer fileSpecial unitary groupLetterpress printingRange (statistics)Cellular automatonVarianceCompilerConvex hullString (computer science)Machine codeExecution unitFunction (mathematics)Inclusion mapData typeInfinityUnicodeRing (mathematics)Vector spaceMoving averageDensity of statesWeightNetwork operating systemDrill commandsLibrary (computing)Process (computing)Context awarenessOvalSineMachine codeIterationType theoryReading (process)Variable (mathematics)Mathematical analysisCompilerIntegerPointer (computer programming)Arithmetic meanResultantString (computer science)LengthCodierung <Programmierung>Modal logicOrder (biology)Query languageParameter (computer programming)Direction (geometry)Object (grammar)Heegaard splittingSubject indexingPosition operatorCompilerElectronic mailing listLetterpress printingSummierbarkeitVector spaceFunctional (mathematics)BitOperator (mathematics)System callCASE <Informatik>Line (geometry)Range (statistics)Physical systemRun time (program lifecycle phase)AdditionSequenceStandard deviationDifferent (Kate Ryan album)IdentifiabilityLibrary (computing)Right anglePlane (geometry)Instance (computer science)ACIDWeightLogic synthesisMathematical optimizationSound effectSocial classFormal grammarMereologyProcess (computing)Data structureTerm (mathematics)Task (computing)Point (geometry)Basis <Mathematik>Algebraic closureArmLevel (video gaming)Graph coloringMedical imagingEuler anglesInformation technology consultingProgram slicingPresentation of a groupData conversionGene clusterWebsiteConstraint (mathematics)Java appletNatural numberMultiplication signPattern languageLogical constantComputer animation
24:32
Process (computing)Physical lawFunction (mathematics)Data typeLibrary (computing)Context awarenessOvalPatch (Unix)View (database)World Wide Web ConsortiumComputer-aided designLetterpress printingSystem callSpecial unitary groupState of matterWordFunctional (mathematics)Pointer (computer programming)Point (geometry)Electronic signatureNormal (geometry)Object (grammar)Presentation of a groupTraffic reportingSheaf (mathematics)System callOperator (mathematics)WeightProcess (computing)Constraint (mathematics)Context awarenessOvalFocus (optics)SummierbarkeitLibrary (computing)Core dumpIterationMachine codeString (computer science)AreaLecture/ConferenceComputer animation
27:04
Structural loadExt functorStrutPoint (geometry)Letterpress printingInterior (topology)Computer fileElectronic data interchangeView (database)Machine codeLaptopComputer iconBitKernel (computing)Disk read-and-write headMetropolitan area networkMultiplication signFunctional (mathematics)Data structureObject (grammar)WordPoint (geometry)Physical systemConstructor (object-oriented programming)TwitterPRINCE2Machine codeGraph coloringInstance (computer science)Theory of relativityLetterpress printingField (computer science)Right angleDoubling the cubeLecture/ConferenceComputer animation
28:58
LaptopString (computer science)Numerical digitFormal languageLevel (video gaming)Process (computing)UnicodeComputer fileDensity of statesMetropolitan area networkMenu (computing)Commodore VIC-20View (database)Cellular automatonMachine codeKernel (computing)Ext functorInfinityStructural loadElectronic data interchangeNetwork operating systemComa BerenicesProgrammschleifeLocal ringIRIS-TMultiplication signRepresentation (politics)Process (computing)Kernel (computing)BitPosition operatorWave packetEstimatorIterationoutputRun-time systemInformationMathematicsCompilerType theoryHierarchyModule (mathematics)Physical systemWeb pageUniqueness quantificationLocal ringAreaIntegerString (computer science)Term (mathematics)Functional (mathematics)Variable (mathematics)Endliche ModelltheorieMachine codePopulation densityExterior algebraModule (mathematics)Object (grammar)Parameter (computer programming)Lecture/ConferenceComputer animation
32:32
LaptopElectronic data interchangeNumerical digitView (database)Process (computing)Route of administrationDensity of statesKernel (computing)Machine codeComputer fileProgrammschleifeLevel (video gaming)UnicodeMultiplication signPairwise comparisonDifferent (Kate Ryan album)Demo (music)2 (number)ImplementationCondition numberRight angleLecture/ConferenceComputer animation
33:11
Demo (music)TypsystemMereologyComputer-generated imageryArray data structureBuffer solutionShape (magazine)View (database)Semiconductor memoryParity (mathematics)Level (video gaming)SummierbarkeitProcess (computing)MereologyEntire functionDirection (geometry)Forcing (mathematics)Electronic data processingLocal ringObject (grammar)EstimatorNumberWeightBit rateRadiusLevel (video gaming)Array data structureLecture/ConferenceComputer animation
33:57
Array data structureBuffer solutionComputer-generated imageryShape (magazine)Range (statistics)Semiconductor memoryView (database)Level (video gaming)Parity (mathematics)Semiconductor memoryCartesian coordinate systemProgrammschleifeBuffer solutionMachine codeImage processingProgram slicingType theoryParameter (computer programming)View (database)Doubling the cubeEvent horizonMoment (mathematics)Dimensional analysisRight angleDirection (geometry)MathematicsLevel (video gaming)Computer animation
35:19
Generic programmingCompilerGlass floatShape (magazine)Floating pointImplementationSummierbarkeitSemiconductor memoryAlgorithmBuffer solutionDifferent (Kate Ryan album)Revision controlPoint (geometry)Numeral (linguistics)DampingMachine codeElectronic mailing listMultiplication signIntegerType theoryBitImplementationForestElectric generator1 (number)Graph coloringPattern languageNP-hardFloating pointVelocityLecture/ConferenceComputer animation
37:03
Shape (magazine)Open setCompilerParallel portRange (statistics)ProgrammschleifeThread (computing)Machine codeTexture mappingArmModul <Datentyp>WritingField extensionSpecial unitary groupEndliche ModelltheorieSocial classMathematicsRight angleWave packetResultantMathematical optimizationGroup actionParallel portFunctional (mathematics)Range (statistics)Library (computing)Sheaf (mathematics)Open setField extensionType theoryVector spaceInterface (computing)Point (geometry)Machine codeFormal languageModule (mathematics)ProgrammschleifeThread (computing)System callWrapper (data mining)RewritingWebsiteVirtual machineWritingCore dumpNumeral (linguistics)Lecture/ConferenceComputer animation
39:26
Fatou-MengeFormal languageMachine codeWritingPareto distributionLevel (video gaming)ArmMappingMachine codeConcentricSequelRight anglePareto distributionFormal languageType theoryDynamical systemSymbol tableBit rateMaxima and minimaParticle systemObservational studyLecture/ConferenceComputer animation
40:23
Formal languageMachine codePareto distributionWritingLevel (video gaming)Special unitary groupSummierbarkeitField extensionThermische ZustandsgleichungMachine codeWritingMereologyAreaSocial classDeclarative programmingRepresentation (politics)Condition numberLibrary (computing)Computer fileSystem callStandard deviationWeb pageGeneric programmingInclusion mapEmailLecture/ConferenceComputer animation
42:02
Data typeFunction (mathematics)Process (computing)Quantum stateExecution unitContext awarenessOvalLibrary (computing)View (database)System callComa BerenicesStandard deviationLibrary (computing)Declarative programmingGeneric programmingMechanism designEmailTemplate (C++)GodGraph coloringWebsiteNatural languageComputer animationLecture/Conference
43:07
Generic programmingCompilerGlass floatShape (magazine)Floating pointImplementationSummierbarkeitSpecial unitary groupType theoryMachine codeMechanism designMatching (graph theory)Moment (mathematics)WordSeries (mathematics)Arithmetic meanSoftware developerFormal languageCompilerComputer programPairwise comparisonVirtual machineWindowWave packetWebsiteLevel (video gaming)Computer configurationIdentity managementSemiconductor memoryDenial-of-service attackCompilerSubject indexingSoftware testingPattern languageSocial classLocal ringPresentation of a groupOnline helpSystem callOnline chatMetropolitan area networkCycle (graph theory)VotingDiffuser (automotive)Position operatorAreaSpeciesRandomizationMereologySuite (music)Core dumpProgrammschleifeRepresentation (politics)Asynchronous Transfer ModeFluid staticsDistribution (mathematics)Source codeComputer animationLecture/Conference
Transcript: English(auto-generated)
00:15
Okay, so in a minute, this talk is about Cython.
00:22
Who knows what Cython is? Well actually, who does not know what Cython is? Okay, that's cool, I didn't know one. That's cool. So this is kind of a general, like I'm going to show you a broad overview of everything kind of talk.
00:41
And so if you are a Cython guru, you may not learn much, but you'll certainly see a lot. So to my person, I'm a Python developer, and most of the time also Python advocate since 2002, so it's been a while.
01:00
And I'm currently working for Scooby. Who does not know what Scooby is? Okay, a couple of people. Anyone likes reading books? Yes. Okay, go to Scooby.de and take a look. We are just great, you get unlimited reading for Android, iOS, like any device you like.
01:22
And we change the way people read books, it's really that way it is. No, apart from that, I'm a freelance developer, consultant, trainer, and this kind of stuff, so I actually do Cython trainings. So if you're interested in getting a bit deeper into the matter, you can talk to me after the talk.
01:45
Or go to that page and have a look what else I have. Then, well, there's a couple of other open source tools I'm working on, so I'm involved in the, well, I'm kind of the developer of LXML, the XML Toolkit for Python.
02:02
I've written a new integration for Python. Both of them are actually written in Cython, and so the main project that I'm working for is Cython, so that's the topic of the talk today. So three little parts, a little intro,
02:22
so I'll show you what Cython is, what it gives you, what's cool about it. Then a major part about Cython's type system, which actually involves a little demo, so you'll see Cython code, you'll learn how it works. What makes it cool, why it's so great to integrate with C code using Cython, why it's so cool
02:44
to write Cython code, how it all works. And then in the end, a little quickie, high performance features that we've added to the compiler. I actually got a bit confused with the times. I thought I only had like half an hour. I actually have 45 minutes for this talk,
03:02
so you'll see a longer demo. So what is Cython? Cython is an open source project. You can go to Cython.org, take a look at the project, talk to us, there's a mailing list for it, obviously users' mailing list, so if you have any questions about it,
03:21
just go to the users' mailing list. It's on, well sadly it's on Google groups, but you'll manage anyway. Ask your questions there, just come along, we're very happy to answer them. Now it's an open source project.
03:41
It's a Python compiler, or almost a Python compiler, meaning you can actually take normal regular Python code and just throw it into a compiler and have it compile to C. It translates Python, actual Python source code, to C, using the C Python C API.
04:05
And it features static code optimization, so you usually get kind of a notable performance boost when you do that. The third thing that Cython is, it's an extended Python language, so you can not only throw Python code in there,
04:22
you can also throw actual Cython code in there, which has an extended syntax, so it has additional constructs, which allow you to provide tight annotations which tell the compiler, okay, make this fast, this is important, drop into C, you're with C.
04:41
And so it allows you to write fast extension modules for C Python and interface Python with external native libraries, usually in C or C++, but there's integration for Fortran too, for example, and where the C interface can be used to talk to any native code anyway, so just anything you'll find.
05:03
Okay, so how do you use Cython? Well, basically you write Python code or Cython code. Cython then translates it into C code. The C compiler builds a shared library for C Python and you can import that module into Python. Okay, and it works in Python two and Python three.
05:21
Actually we had Python three support even before Python three or zero was out, and then we re-added Python three support when it actually came out and didn't match the, you know, old implementation anymore, and we had the same for a couple of other C Python versions later, so it happens.
05:42
And keeps happening actually, but like, we write from the code that you don't have to write. Okay, here's an example. So from compiling Python code, this is like stupid little Python script. It has a class, doing nothing interesting. It has a function, passes the function into the class,
06:03
executes it in a loop, you know, like with example. And then when you compile it, you can just use the Cythonize command for it. So when you install Cython, this will come with it, actually only in the next version, so the current version doesn't install yet, I think.
06:22
But we were working on releasing it. So that's the Cythonize script, you can just say, Cythonize minus I, so do an in-place build of this thing. It compiles it, builds the shared library for you, and then you get the C file from it and the shared library, which you can just import and use, okay? And translates to 3,000 lines of C code.
06:44
Okay, so that's a bit more than Python code, you just saw. Why is that? Well, there's lots of portability that's defined in there, which makes it support different C compilers, different Python versions, as I said, version two, version three, different minor versions of Python,
07:02
lots of differences that came in over time. So the compiler actually has a lot of knowledge about how C Python evolves over time and adapts your code to it at compile time. There's lots of helpful C comments in the generated code that mimic your code,
07:23
so it shows what code you have and that makes it kind of easier to trace your code through the generated code, in case you ever have to do that. And there's definitely a lot and a lot of code that you definitely absolutely do not want to write yourself.
07:41
So rewrite C codes so you don't have to, okay? Be thankful. Okay, how do you do that? Now, in the back of this Cython script, and actually, like, whenever you want to distribute a package yourself, you'll use distutils for it, normally for packaging it up and distributing it.
08:03
And this is how you use Cython in your distutils build. You would just say add extension modules and they come from Cythonizing some source files. And Cython would just pick up the source files, compile them, build an extension object, so metadata for it, and push it
08:23
into the distutils setup function to build it all. For the more complex cases, we need an explicit configuration of your extension modules. For example, you're building against specific C libraries
08:40
where you have some kind of dependencies and need to pass in options or something. You can, as before, as for any C extension module that you would build in Python or in distutils, you would normally create this extension object for them, which is the metadata for the build.
09:03
Here, Cythonize does it for you. If it's more complex, you can do it yourself and just pass it into Cythonize, okay? Works the same way. That just makes Cython pick up the metadata and build your code correctly, okay? So what you get out of it is highly portable code.
09:24
So Cython generates C code that compiles basically with all major C++ compilers, which means GCC normally, MSVC on Windows, but there are also people using it with the Intel compiler, for example,
09:42
and it's production-tested, right? Works on all major platforms, Linux, Mac, Windows, people using it on BSD, on really big machines, on all sorts of platforms, and it works in Python 2.4 through 3.4 currently,
10:00
and we're keeping track of the development in 3.5, so that'll work as soon as it comes out. Dropped support, though, for really old Python versions. So the next release that we make will not support 2.4, 2.5 anymore, and the old 3.1 version
10:23
that no one should be using anymore anyway, but it still supports any recent, any somewhat recent Python version, even that version. So the Cython language syntax itself normally follows Python 2.7,
10:42
but we support Python 3 syntax if you want to write source code for Python 3. You just have to tell the compiler, this is code with language level three, so this is compiler directive, and it'll just compile your code in Python 3 mode. Regarding language features of Python itself,
11:01
we compile and run more than 98% of the regression test seed that comes with C Python, which means that we have pretty much complete Python language support. We support classes, functions, closures, generators, like all sorts of features that you'll see
11:21
in your daily life, anything that says Python features is supported by the compiler, comprehensions, any control structures, all sorts of stuff. Now we need a couple of minor deviations that you're rather unlikely to meet in practice, which is we don't have the frames and functions.
11:40
Does anyone know what frames are? Yeah, that's a couple of people, see? You don't need that. So we only have them for exceptions and profiling, okay? Which is the cases where you may actually need them without knowing. And there are a couple of minor bugs. I can't see the URL, but just go to our homepage
12:00
and click on it. Okay, so for speed. Cython generates very efficient C code. There are lots of static and optimistic optimizations that applies to your code. It generates optimized code for the standard Python types, built-in types, many built-in functions also, which means that the code will simply run faster
12:22
because Cython understands it. There's static type inference inside of functions, so you don't have to declare many types, you don't have to tell it like, this is a list, this is an integer, this is whatever. It'll understand your code more partly, automatically, and you can help it a bit to make it understand it better.
12:44
Okay, so a bit more about speed. There's a Python benchmark suite, which contains real-world, pure Python codes, including Django, including a couple of template engines, actually a couple of computational modules,
13:03
like real Python code, and just by compiling it, you get a factor of 1.2 to 2.4. It doesn't sound all that much compared to Python 3.4, but that's what you get out of the box, and you can make it a lot faster by having a little hand-tuning.
13:22
How's that done? Well, static type declarations. So, Cython allows you to put type hints into your code, which tell the compiler that this is not just any arbitrary Python object, it's actually enough to represent a C int, and that will make the compiler drop some parts
13:43
of the code that use this variable into plain C, instead of doing object operations on it. Some languages call this unboxing, so you may know it from Java, for example. The idea is to drop object operations into C, remove overhead, that's it.
14:01
That's the syntax, which you can use in Cython code, so in the Cython language, but that's also a syntax for pure Python, so you can just add this decorator here to a function, to a Python function, and have it either execute in Python, or the decorator will just disappear,
14:21
or have it compile in Cython, and Cython will understand the decorator. And the cool thing is that you can employ this exactly where performance matters, so you can profile your code, take a look at where the real bottlenecks are,
14:40
drop some type annotations in there, and make it way faster, just by optimizing some bits of your code. And you would normally, so for highly computational code, you would normally expect a couple of hundred times speed up.
15:00
So, here's the demo. Yep, so this is the IPython notebook, for those who don't know. So here's a really tiny little example, it actually makes this a bit bigger, and it's maybe too big, okay, there we go. So a tiny example that just says,
15:22
okay, this is C integer, I'm assigning it a value of one, two, three, four, and then printing it, okay? And you get what you expect, unless it doesn't work, come on, restart the kernel.
15:41
Restart, okay, try again. Yeah, it works now, okay. So what this does is, you have a C integer, and your Cython is automatically creating an object for you, wrapping it in a Python object,
16:01
and passing it to the print function, okay? Okay, another example, this here, you get in some sequence of values, or an iterable of values, then iterate over it, and sum it up, okay?
16:21
Here's an example, you pass in range 10, well, this is Python 3, so range 10 gives you an iterable over zero to nine, pass into your function, and it'll add it up at the C level, meaning here, in this iteration loop, it'll get the value from the iterator,
16:43
unpack it into C int, and sum is a C variable, C integer variable, do a C addition here, and return it, and this creates an object again, from the C integer variable. You can take a look, and it actually works,
17:01
and since I didn't just say Cython here, I said this is Cython minus A code, it drops out a little type analysis for you, and shows you, okay, this was your code, this is how it made, and you can take a look at what it made of it, and this is the C code it generates,
17:24
just for the iteration. And the fun thing about it, it does a couple of optimizations, so if you can read C API code, then this is checking for lists, this is checking for tuple, which is kind of the most common two cases for iteration,
17:40
I mean, like, iterating over the list, that's what you do all the time, right? So it kind of has optimistic optimizations, that say, if it's a list, do it in faster code, okay? And you can see this line here is white, yellow kind of means, this is Python operations taking place, and white means this is like plain C,
18:00
and anything in between is, well, anything in between. And you can see, this is like a plain C operation, so it just takes some, adds value, sends it to some, okay? Read the code. Okay? And gets the right result. And this is all done automatically, right? So you're just writing down your code,
18:20
as you would in Python, you're saying, this is, these two variables are integer variables, and that's all you have to do. That just works. Okay, here's another example, you get a character pointer from somewhere, so that's kind of the basic C string type. You can call len on it,
18:41
which internally calls string len. And then print it, so that would call string len, get a size C value back, convert that to a Python object, so an integer object, and print it. And when you return the character pointer, it also knows that it needs to convert it to an object, so it converts character pointer to a byte string.
19:03
Okay? And it works. Okay, encoding and decoding. Again, we have character pointer, and we say, well, decode it as UTF-8, and you get a Unicode string back. It's kind of as you would expect,
19:21
if it was not a character pointer, but a byte string, the same would just work in Python. Exactly the same way. This is a bit more inefficient than it needs to be, because C strings, so character pointer strings in C, don't have a length associated with them, they're just a pointer, and so in order to figure out
19:40
how much of the string we have to convert here, we have to call string len on it. And since I already know, so it calls string len, and then passes the results of the pointer and length into the decoding function, internally. And since I already know how long the string is here, and it's like seven characters, I can tell it.
20:04
I can just slice the pointer, and that's more efficient. Because there's no runtime string detection necessary anymore. One more thing here, what we got back is,
20:22
so we had to do a manual decoding here, and to tell it, okay, what I want here is not the byte string that I would normally get, but decode it by UTF-8, and then return the Unicode string, and you can automate that by saying, okay, what I want, actually want is not byte strings, I want Unicode strings, and the encoding for that is UTF-8, so this is, again, a compiler directive
20:42
that you're passing in, and then you just have to say, return s, and it'll return the decoded Unicode string for you, automatic, okay? So this is kind of the comfort you get by the type system.
21:01
Okay, here's another little example. I'm using the a2i function from the C standard library, and I'm using it to parse a byte string into an integer. Okay, that's what it does. So what I do is, I write a Python function that gets in some kind of byte string,
21:21
I call a2i on the byte string, add one, return it. Okay? Works. So I'm passing in a byte string here, I'm calling it with byte string as argument, I'm calling it with byte array as argument. Both works, both queries to C character pointers. a2i gets the character pointer in,
21:42
returns some C integer value, Cython generates code that adds one, then you say return that, it converts the integer value to Python object returns it. Okay. Okay, here's a C++ example.
22:03
I'm using the standard library, the C++ standard library, the STL. And I'm using two objects from it, a string and vector. And what I'm doing is, I'm taking a byte string here,
22:21
splitting it, this is like plain Python operations, which gets me a list, a Python list, assign it to a vector of strings, so that's C++ code. It gets copied into a C++ vector for me. I can print the vector, which needs to do a translation back into Python,
22:41
so that converts the C++ vector into a Python list again and prints it. I can just show you what it does. Okay, by running it. So printing here gets me the the Python list of strings. And as you see, it's been decoded,
23:00
as I asked it to automatically decode for me. And then I iterate over the vector and whenever it's found, I find this little string here, this byte string in the vector, I print it out. So, and here you can see it's been found. And next thing I'm doing is, I'm passing in a position into the function
23:21
and I'm just indexing the vector, getting the whatever index value I want, returning that, and I'm calling it with index one, index zero, and it tells me that at index one, it's DH and, scrolling down, at index zero, it's ABC, as expected, okay?
23:42
So back and forth, Python, C++, back into Python, any way you like. Callbacks, who's been using callbacks in Python? Okay, basic idea is, you pass a function into some code, and you want the code to call your function, okay?
24:03
At some point, whenever something happens, or like, for any item in lists or something. That's the basic idea. So if you do that in C, it's a bit more verbose, because, well, functions in Python have state, they have closures.
24:22
C does not have closures. All you have in C is pointers. And so, the way callbacks work in C is, you not only pass in your function, so a pointer to your C function, you also pass in a pointer to some state,
24:41
to some data somewhere, which is then, when the callback's called, passed into the callback, so that the callback can make actual use of it, like, know what it has to do, okay? Otherwise, you couldn't remember in what state it's supposed to process something, because C simply can't remember it. So the normal signature of callbacks in C
25:01
is you pass in some void pointer, which gives you a context, and, well, maybe some more data that the callback should be operating on. Okay, so here's a function that uses a callback. As I said, you pass in your function pointer, and you pass in the context, okay?
25:20
So the void pointer that keeps the state for the callback. And then, what I'm doing here is, I'm passing in the character pointer, iterating over it, looking for the end, and whenever I find a small uppercase A, I call my callback. Okay, so I'm running it through byte string, and I will find A, I call the callback,
25:40
and tell it, here's what I found, okay? And then I'm implementing this callback, and I'm saying, okay, the context I'm getting in is actually a byte array. So what I'm passing through here is not an arbitrary void pointer, it's a pointer to an object, a Python object. And then, whatever character was found,
26:00
I'm pending to the byte array, okay? So I'm collecting A's, basically, that's the idea. Okay, and here's the function that calls it all. So I'm creating a byte array, and then call my function that processes some string, tell it, whenever you find A's, call this append character function,
26:22
and the context I'm passing in is a pointer to my byte array, okay? Then when I call it all, I pass in a byte string, and let the code operate on the byte string. And here it is, it found all A's, okay?
26:41
So what happened is, I called this function here, it told this function, here's a callback, here's some byte array, do stuff, and the function called my callback and appended all A's it found, okay?
27:02
Okay, that's it for the general part, a couple of more, and a bit more time. Okay, here's an example, I'll just restart that kernel too, okay? So here's an example of using structs in Cython,
27:25
C struct, this is the way I would define a C struct in my code. Let's say, so CDF is kind of the keyword that tells you, okay, here's a C definition, come in next, and then I say, I'm defining a struct called point, it has two double values, x and y,
27:42
and an in-value color, okay, C struct. Then I have a function here, a Python function, that uses this struct, and I'm creating the struct here first, and the cool thing is, I can actually use object creation syntax. I can say, create a point for me,
28:03
x is one, y is two, color is 123, and it'll assign the values to the right struct fields for me. Really nice syntax, I like that. I didn't implement it, but I really like it. Then I'm printing one of the values here, this is, I should actually have parentheses. Doesn't matter because I'm writing Python 2 code,
28:23
but it looks better. I'm running it, and yep, so I'm creating functions, and then when I call it, it prints one of zero, which is double value of x, and it returns a dict for me.
28:41
Now who would have expected that? What I returned was D struct, and it's automatically turned into a dict for me, which is kind of nice, right? I mean, it definitely helps with debugging where you can just say print this struct, and it's kind of the one obvious way to represent structs,
29:01
because they're like named value pairs, more or less. Okay? This is how structs work in Cython. What else do I have? Again, a bit more time. So, I'll give you an example about Unicode processing. I'll restart that kernel too. This, okay.
29:25
So, here's a little function that iterates over something in Python, we're not telling you what it is, but we actually, sorry, I can tell you it's a Unicode string that I'm going to be operating on. So, iterating over Unicode string, enumerating it,
29:42
and then for any character I find that it's numeric. I, so the first character that I find is numeric, I return the position. Okay? Then I'm creating some dummy data here, taking string escalators lots of times, and then putting one, two, three in there,
30:01
because that's what I'm going to be looking for. I can run the whole thing, and it's going to tell me that there's a position way back in the string, which has what I'm looking for. Okay? So, I can, yeah, I can just runtime it on it,
30:23
and see how long it takes, move kind of, kind of okay, kind of too slow. Now I'm doing the same in Cython, and what I'm actually doing here is I'm using Python syntax, so I'm not using Cython cdev something syntax, I'm actually changing my Python code to stay Python code.
30:44
I'm just telling Cython, you know, when you compile this code, here's some type information that you can exploit to make it faster, okay? So what I change is importing the magic Cython module, which is kind of a dummy module when you use it in Python, but Cython understands it,
31:01
and goes, I know what that module does, and it has a function called locals, which allows me to define types for local variables. And here I'm saying s, so the argument I'm passing in is actually a str, so this is Python 3, so this is Unicode string, and I'm saying that here is some weird little
31:27
integer type, which Python uses internally to represent a Unicode character. So this pyuc4 thingy is there to, you know, you can put any Unicode character in there, and it's an integer type that represents the Unicode string,
31:45
the Unicode character, sorry. And the rest of the code is actually unchanged. As before, and so when I run this through the compiler, it tells me, okay, I generated this code for it,
32:01
and it tells me that, well, this is using some C API code to find out if the character is numeric. This does a bit more, and the main thing it does is, it doesn't do really much, it mainly checks if s, maybe none,
32:20
before it iterates, because none doesn't iterate. And then for returning i, it has to convert it into a Python object again, okay? And the rest actually runs in C. So here's a comparison, time it on both. Which apparently runs for a while, yep.
32:44
And it's a tad faster, okay? So the, well, I mean, the major difference is it's, the second implementation runs in plain C, and the other one runs in plain Python. So the difference here is not entirely unexpected, okay?
33:02
I mean, this is milliseconds, this is microseconds. Yeah, the fact is. Okay, that's pretty much what I wanted to show you as a demo. Now quickly about, yeah, the high performance features that we added to Cython. So, yeah, part three.
33:22
So that's a couple of features that people use when they're doing large data processing, when they're doing processing with NumPy data, for example, NumPy arrays, usually. Then what they want is a way to unpack these objects,
33:41
these NumPy array objects, into something that they can process efficiently at a native level. So they want to basically iterate over the arrays and do processing on them, okay? That's usually the main idea of what people want. And we have added a simple syntax for it,
34:00
which looks a lot like what you would do in NumPy. I don't like the normal NumPy slicing syntax. And this is a syntax for unboxing low-level data that is kind of n-dimensional, okay? That includes one-dimensional bytes objects, but also NumPy arrays, Fortran buffers, C buffers.
34:21
Image processing is done that way. Lots of applications to that, and all you have to do is you have to say, okay, this variable here, this argument I'm getting in, is actually a two-dimensional buffer, so two columns here, two arguments, of type double, and that's it. Then you can iterate over it. You can just say, okay, this is the size
34:42
in one direction, this is the size in the other direction, have a nested loop over it, and all I'm doing here is adding one to each item in the array, okay? And this turns into C code. So a nice second feature of these memory views here
35:02
is that they support efficient slicing. So without changing this code, you can just say, okay, I don't wanna run the code over the whole array, I'm only interested in all even lines, so every second line, and you just slice it, pass it in, done. Exactly as efficient, it just recalculates the buffer layout in memory
35:22
and runs the same algorithm over it, okay? Choose types, next big topic. It allows you to have one implementation with many specializations. That's also very common in numerics, that you want to write an algorithm once, but have it run efficiently on lots of different
35:45
integer types, floating point types, like item types and arrays, okay? And you really just want to implement it once because it's hard enough to get it right once, and you don't want to copy code over.
36:00
So what we've added for that is what we call fuse types, it's kind of compile time generics, and the way it works is you say, I'm defining a type, a fuse type, which I call floating, and it's made up by two different types, two different C types. One is called float and the other one is double, so it's a 32-bit float and a 64-bit float.
36:22
And then you use it in your code. And Cython will just understand it and will say, okay, so I'm getting the buffer in here, but it's actually a fused type buffer, and so it will split up your code and generate two versions from it, one that's optimized for 32-bit float and one that's optimized for 64-bit float, okay?
36:41
You get that all automatic. We have kind of predefined types, which is all floating point types, all integral types, all numeric types, but people use it with, I've seen code that uses huge lists of different types for fused types here, and just writes an algorithm and it expands into 30 different versions of the code.
37:05
Well, if they need it, it didn't so far. Okay, next page, as I said, this is a real quickie, run over everything, OpenMP, so parallel code. That's another thing that people love in Numerics.
37:20
They want to parallelize the code because they have huge amounts of data. And they want to use as many cores as they have, right? Or more. They always want to reuse more cores than they have, but they can't. So there's a way to have thread parallel loops
37:41
and also thread parallel sections. So this is not only usable for Numerics, it's usable for any thread parallelization in your code. And all you really have to do is you replace the four IN range in your loop by four IMP range, so that's a special site on parallel module
38:01
gives you this, and then your code runs in parallel. That's it. What you should not forget is to free the GIL so that you get actual thread parallel code, but again, that's all you need to do. So P range, free the GIL, and then you have C code that runs in parallel on your machine.
38:22
Okay? So, I'm pretty much through. For conclusion, so Cython is a tool for translating Python code to efficiency, and it allows you to easily interface with native code. I mean, you've seen that. You're calling A to I, it's just like, you report it, you're done, right?
38:41
You use it to speed up existing Python modules, so you can concentrate on the optimizations. You don't have to rewrite everything in C or Fortran or whatever. You can just drop it into Cython, optimize a little, see that it works fast, and that's what you get without major changes in your code.
39:01
You can write C extensions for C Python in Python and make sure Python code, so you don't need to swap languages. And you can write, you can wrap C libraries in Python by just calling C functions right out of your code. And implementing a nice wrapper for it in Python code,
39:22
so you can write a class that, you know, calls C code, and it's the interface that you get. Okay, so you can really concentrate on the, on writing an efficient mapping for the C code to Python code, so give it a nice API, and you don't just care about too many details, well, low-level stuff, okay?
39:42
And Cython gets you all the way from Python to C. And what I really like about it is that you can use it as a Pareto language. Does everyone know what the Pareto principle is, the 80-20 principle? Yeah? So that's exactly the type of language it is.
40:02
It says, like, you get 80% of the benefit with 20% of the effort, and Cython really allows you to, like, find the right 20%, do some stuff there, and you get a huge speed-up, okay? So it's dynamic and simple way, it will stay dynamic and simple way you can in your code
40:20
and just go static and low-level where you must. So, yeah, so Cython supports fast code writing just as well as writing fast code, okay? Thanks.
40:44
Thank you, Stefan, for this very nice talk. We have five minutes for questions. There's a microphone next to the camera. Please come to the front.
41:05
Can you hear me? How difficult would it be to include external C or C++ libraries, not just the standard library, for example, Boost or something else? Oh, yeah, Boost.
41:22
Boost is heavily template-driven, so it's not entirely fun to do it, but, like, Cython has template support, so you can declare a class and say, this is a templated class. I don't think I have an example here, but you can, so we have documentation on the page
41:42
that tells you a couple of examples how to declare your code. Basically, what you do is you say, see this extern, so this is an external declaration from some header file. You define the header file, and then you just copy the definitions. You say, this is a class, and these methods are in there, this C++ class with these C++ methods,
42:03
like, add a marker that's templated, and that's also what I did here for the AtoI example. So, the AtoI example just works because we are shipping the declarations for the standard C library.
42:21
We are also shipping declarations for the C++ STL containers, so you can just use those right away. If you have your own stuff, you can declare them. It's totally not much work. You mostly just copy stuff from your header file, declare it in the way that Cython wants it, and then you can just say, import stuff, use it, yeah.
42:43
Thanks. Regarding generics and fusing the types, is it possible to use templates for C++? Sorry, so this is a template, instead of defining your fuse types in Cython?
43:00
Yeah. It's a different mechanism. I don't think so, because, so Cython has to understand these types somehow, and templates don't really have a type. Yeah, they're like old types. Yeah, exactly.
43:20
So, yeah, no, it's a different mechanism. You can use it for the same thing, but this works at the code generator level, and the other one would work in the C++ compiler level. We're using Cython extensively
43:41
in our machine learning library, but one problem we have is we lose all the support for the static code checking, for example, using Pylint on Cython files isn't possible.
44:00
We can use the decorators with the locals, for example, Cysm.locals, but for CDF classes or something else, there's nothing similar, or? That also works, yeah. So we have a syntax, you can look it up in the documentation, so there's a lot of the, so actually most of the Cython syntax
44:21
is available at the Python level, but only the things that are, that can actually be used from Python, or sensitively used from Python. So what you can't do is talk to C code, because there's no representation for talking to C code from Python in the way Cython would allow it to do, so there's no real mapping, so if you're using plain Python syntax
44:42
for your Cython code, then you just can't talk to C code but that's about it. I mean, everything else, all other language features are available in pure Python mode as well. Okay, any more questions? One more, last one.
45:04
Hello, first, thank you for the talk. Second, I use Cython in the past and I had trouble between moving from Linux to Windows, because simply the compilers were different. I mean, it was simply because one compiler was ignoring something like Cython generated,
45:25
which was senseless, but the compiler ignored it. And so I was just wondering if you have some kind of test suite to be sure that code emitted by Cython works on this precise compiler you have on this machine
45:41
so we can have something like a limited warranty that it will work. So as far as I understand, the question was about testing, like do we, how do we assure that? In fact, the C generated was not quite incorrect but in the gray area.
46:01
And so MSVC was saying this is illegal, which was one correct position and GCCU was saying something else, was okay, I ignore it. It's senseless. So most of the developments of Cython itself runs on either Mac or Linux.
46:22
So all developers are basically, no developer is working on Windows. So that's a call for help. That is a call for help. So if you have any Windows-based developers here, which is kind of rare in a Python conference, but it happens. There are really people doing development, serious development work on Windows.
46:42
And so we can always use help there. But as I said, so the main development work, so the core development work is happening on Windows and Mac, so we have GCC and CLang in the loop, but we do not have MSVC in the normal development loop. We just kind of test it after the fact,
47:01
so the people run it through their MSVC compilers, our test suite on their machines, tell us if it works and usually it does not work before release, so we fix it and then when it goes out as a release, it usually works. Yeah, there, I don't know, it's just an idea, but maybe when we install Cython,
47:21
we could have the option of running a test suite to be sure that the machine does work, because maybe not so many people are Windows developers, but maybe they use Cython on Windows and test it and just to be sure, I have these settings and does it work. You can run the normal test suite,
47:40
it's part of the source distribution. Okay, thank you. Okay, then we are at the end. Thank you very much again. Thank you.