Get up to speed with Cython 3.0
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 118 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/44793 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
EuroPython 201978 / 118
2
7
8
15
20
30
33
36
39
40
45
49
51
55
57
58
63
64
66
68
69
71
74
77
78
80
82
96
98
105
107
108
110
113
115
00:00
Execution unitMachine codeSource codeSoftware developerWebsiteCore dumpSpecial unitary groupBitLibrary (computing)Right angleField (computer science)Group actionNeuroinformatikLecture/Conference
01:29
Interior (topology)Data analysisWave packetData analysisProjective planeInternetworkingWeb portalType theoryWhiteboardRight angleSoftware developerFormal languageField (computer science)Core dumpService (economics)Multiplication signTwitterInformation engineeringSoftwareArchaeological field surveyDifferent (Kate Ryan album)Goodness of fitCategory of beingXML
03:42
Menu (computing)Order (biology)Type theoryMachine codeRight angleInformationCategory of beingPatch (Unix)VideoconferencingElectronic mailing listMathematical analysisData analysisConfiguration spaceModule (mathematics)Medical imagingWebsiteObject-oriented programmingJava appletFlow separationProcess (computing)Projective planeXMLProgram flowchartLecture/ConferenceComputer animation
05:52
Execution unitMachine codeWritingDisintegrationAerodynamicsTerm (mathematics)Type theoryBlogBasis <Mathematik>UsabilityFluid staticsComputer programmingFormal languageCompilerBoilerplate (text)Function (mathematics)Focus (optics)Projective planeMedical imagingQuicksortUsabilityDynamical systemSoftware developerType theoryMachine codeBasis <Mathematik>Computer configurationBlogData managementExtension (kinesiology)Pauli exclusion principleCompilerFormal languageModule (mathematics)Fluid staticsProgramming languageField (computer science)Limit (category theory)Mixed realityTerm (mathematics)Boilerplate (text)Product (business)Context awarenessOnline helpBit rateComputer programmingFocus (optics)Functional (mathematics)Physical systemDemo (music)Right angleLevel (video gaming)Row (database)CompilerLecture/ConferenceComputer animation
10:08
Revision controlInformationMachine codeLetterpress printingCellular automatonLine (geometry)MathematicsLaptopSineClefProgrammable read-only memorySigma-algebraPlot (narrative)File formatRandom numberBinary fileMaxima and minimaAverageLaptopFunctional (mathematics)EmailCASE <Informatik>Machine codeSineWordExtension (kinesiology)Declarative programmingLibrary (computing)Type theoryMatching (graph theory)Programming languageRight anglePoint (geometry)Different (Kate Ryan album)Parameter (computer programming)Sinc functionRegular expressionMathematicsFormal languageResultantLipschitz-StetigkeitCompilerWebsiteObject (grammar)SpacetimeSign (mathematics)Software developerPattern languageMessage passingBit rateAverageStatement (computer science)Multiplication signModule (mathematics)Cellular automatonNumbering schemeInteractive televisionLinear subspaceSequenceSquare numberCompilerLetterpress printingoutputContext awarenessSystem callLevel (video gaming)Cycle (graph theory)Buffer solutionCalculationOrder (biology)BitRevision controlComputer fileStructural loadComputer programmingVariable (mathematics)Computer animation
19:02
AverageSigma-algebraFile formatRandom numberMaxima and minimaArithmetic meanDisk read-and-write headBinary fileLogarithmPlot (narrative)ClefFood energyData managementLaptopOperator (mathematics)Letterpress printingLoop (music)Bit rateSummierbarkeitMachine codeFunction (mathematics)Cone penetration testFluid staticsType theoryCompilerAreaCalculationPoint (geometry)2 (number)Right angleDivisorMultiplication signMachine codeNeuroinformatikWell-formed formulaPositional notationType theoryBitInformation privacyGoodness of fitNormal distributionFunctional (mathematics)Data managementParameter (computer programming)MathematicsObject (grammar)Revision controlImplementationPauli exclusion principleLine (geometry)Different (Kate Ryan album)Module (mathematics)Semantics (computer science)AverageSystem callLaptopOverhead (computing)Bit rateNumbering schemeWordFood energySummierbarkeitRange (statistics)Doubling the cubeDistribution (mathematics)Order (biology)Thread (computing)Context awarenessFitness functionElectronic mailing listComputer animation
27:29
ClefLoop (music)View (database)Read-only memoryTotal S.A.Maxima and minimaVector spaceModule (mathematics)Object (grammar)Pairwise comparisonMixed realityFunctional (mathematics)Electric generatorComputer fileMereologyMachine codeObject-oriented programmingRevision controlAddition2 (number)Loop (music)Multiplication signCASE <Informatik>Standard deviationOperator (mathematics)Type theoryFunction (mathematics)Decision theorySpacetimeParameter (computer programming)Arithmetic meanDifferenz <Mathematik>Hardy spaceLaptopDivisorSystem calloutputOverhead (computing)BitAxiom of choiceDifferent (Kate Ryan album)CompilerFile formatData typeImplementationLibrary (computing)Right angle8 (number)Physical lawVariable (mathematics)Sampling (statistics)WebsiteLatent heatFormal languageLine (geometry)Cellular automatonDefault (computer science)Well-formed formulaLimit (category theory)Computer animation
35:56
Letterpress printingFormal languageVector spaceLine (geometry)IntegerStrutEmailUnicodeState of matterMachine codeString (computer science)Run time (program lifecycle phase)Lie groupFloating pointFluid staticsError messageCompilerRight angleElectronic mailing listRepresentation (politics)Adaptive behaviorFunctional (mathematics)BitCASE <Informatik>Interior (topology)Object (grammar)SpacetimeWrapper (data mining)Pattern languageVector spaceMultiplication signDeclarative programmingNumbering schemeVariable (mathematics)Run time (program lifecycle phase)QuicksortMachine codeSemiconductor memoryError messageType theoryExtension (kinesiology)Set (mathematics)Instance (computer science)WebsiteImplementationAttribute grammarSocial classMemory managementComputer fileStreaming mediaString (computer science)DemosceneNormal (geometry)Library (computing)UnicodeOrder (biology)Formal languagePointer (computer programming)IntegerEmailState of matterDependent and independent variablesRegular expressionField (computer science)TesselationShape (magazine)LaptopAddress spaceProper mapReal numberMathematicsSubsetComputer animation
44:23
Demo (music)Module (mathematics)BenchmarkString (computer science)Pauli exclusion principleDefault (computer science)Binary fileFormal languageMaizeInformationMachine codeConfiguration spaceRevision controlLevel (video gaming)MathematicsFormal languageDot productDigitizingCompilerOcean currentBitMobile appDefault (computer science)Right angleRoutingWebsitePoint (geometry)MereologyField (computer science)RootComputer animation
Transcript: English(auto-generated)
00:02
Hey everyone, I'm Stachan Bien and I'm one of the Cython core developers. Before I start this talk, I'm going to show you a lot of examples of Cython, how to use it in different fields. So I'd like to know a bit what you're more interested in. So who's never used Cython before?
00:27
It's true, the little guy over there, right? Youngest non-Cython user, we'll introduce him to Cython today.
00:42
Okay, so that's about one third of you. Who is interested in integrating, so in accelerating NumPy code, computational code, accelerating computational code with NumPy and all that. Okay, that seems a bit less than the group we just had.
01:04
So talking to external libraries, integrating native code into Python, is that a topic for you? Raise your hands. Couple of people, anyone using C++? Yeah, surprisingly not so few.
01:22
Okay, so I'll try to focus a bit on what you just said. Okay, starting with myself. So, Stachan Bien, as I said, I'm a software developer, data engineer, whatever you want to name it these days. I've been using Python since 2002.
01:43
And I'm one of the founders of the Cython project, not of the original inventors, but when we forked the project from Pyrex, that was in 2007, I was already on board. And I'm also a Cython core dev since this year.
02:04
I do training and consulting in-house, so if you want to know anything about Cython, have it taught in your company at home, you can contact me and I might come over and teach you something. I'll definitely have something to teach you, or look at your code and approve it.
02:23
I've been working for TrustU since 2017 and I use that as a little introductory example because TrustU is a data company and that fits very well into the field. So what we do is we have terabytes of hotel guest reviews that we collect from all over the internet.
02:42
And when I say guest reviews, it's literally brain dumps of arbitrary people writing stuff on the internet, right? So we collect them, analyze them in 24 languages, so there's a lot of NLP involved, have surveys that we send out for the hotels, collect them from portals, from partners that we have,
03:03
do text and data analysis on them, analyze sentiment, find out what people are talking about, what kind of different categories they are talking about, analyze trends, find out this kind of information, sell that back to the hotels to tell them how they can improve. So we can tell the hotels, you know, if you renovate your pool or hire someone new for the reception,
03:27
you'll probably improve your score by 10%. That's what we tell them. So this might also be interesting for you. So if you go to trustu.com and type in the name of hotels, next time you're looking for a good hotel somewhere,
03:42
type it in there and we'll tell you exactly what people actually think about this hotel, right? notebookbooking.com wants you to think about the hotel in order to book it. We're independent and so we can give you the actual opinions based on actual reviews, actual data.
04:03
So go there, remember this, and you'll see it's all cut down into different categories and all that, so it's really relevant information that you want to know before you book that hotel. Why do I tell you all this? Well, these are some tools that we use at TrustU. As I said, we're a data company, so we use NumPy, SciPy, we use scikit-learn for things,
04:26
we use pandas for data analysis, we use NFTK and spaCy for text analysis. At XML, whatever, we have to deal with XML, PyYAML for configuration and all this, and it's all based on Hadoop processing in Spark.
04:44
Some MapReduce, some whatever Hadoop provides, Hive and all these technologies. So this is mostly what we do. And now, most of these tools are actually partly or even entirely written inside them, right?
05:03
Who didn't know this? Raise your hand. Interesting, huh? NFTK is no longer entirely true because we're working on a patch for them, NFTK is a pure Python toolkit, and so I've written a PR for them to start compiling certain of their modules
05:26
so that they just run faster if you install the binary instead of the Python package, so we're working on that. Hadoop is straight out because most written Java, but the rest is on our list.
05:42
Who's seen this? Has anyone not seen it yet? Okay, so that's the image that the EHT project took off a black hole a couple of months ago,
06:00
and the interesting thing about this here is that these are the tools that they use for calculating this image out of the raw data. And as you may notice, they're all Python tools, right? They use Pandas, NumPy, Astropy, Scikit-Image over there,
06:23
SciPy, all these tools, all Python tools for aggregating the, again, I think they start with petabytes of data coming from eight different telescopes being taken across ten days in a row, right? Like, huge telescopes all around the world.
06:43
Collected the data, cut it down into manageable data sizes, and then calculated this image out of it, and that was all done in Python. And now if you look at the tools that they used, all of them actually use Cython inside, right? So that was a huge Cython-based project, although they didn't, you know,
07:04
largely advertise that, but it was. Okay, so what is Cython actually used for? It's used for integrating native code with Python, it's used to speed up Python code in C Python, and some people even use it to write C, you know, without having to write C.
07:27
So basically we write C so you don't have to, okay? Let's skip that. Gradual typing. Does anyone know what gradual typing is? Heard the term before?
07:40
Maybe in context with PEP 484, the typing annotations in Python. So the basic idea is there are two main different, you know, ways to type languages. One is statically typed, which tends to be, you know, give you fast languages, but it tends to be cumbersome because you have to type everything and it's very annoying.
08:04
And then there are dynamically typed languages like Python, which are easy to use, but tend to be slowish, kind of. It's not that simple, but more or less those are the two fields. And gradual typing is somewhere in between. It says, you know, you can use typing, or it helps, or it's documenting,
08:22
or it makes things faster. That's where you should use typing. And for everything else, it's fine if you don't use it and don't need it there. So it was termed coined in 2006. There's a blog post, just look for what is gradual typing and you'll find it. And what's the basis for PEP 484 type annotations in Python.
08:44
It's really the best mix of static and dynamic typing. You can use dynamic typing for the ease of development and optional static typing for safety, speed, and documentation. And you should really only use static types where they help. Right? That's important.
09:00
And so don't type all your things. There are limits to what you should use types in your code for. And this is especially important when it comes to Cython, because what Cython gives you is types in Python. And it uses them to accelerate your code.
09:21
So, Cython is a pragmatic programming language. It's gradual typing for Python, and it's an optimizing compiler. So it takes your Python code, or you type annotated Python code, and compiles that, it translates it to C, and that compiles into a native extension module for Python that you can just import like any other binary module.
09:42
It's production proven, it's widely used, as you've seen lots of tools using it. And it's really all about getting things done in the same way that Python is. It helps you keep your focus on functionality, removes the boilerplate for writing native modules. It allows you to move freely between Python and C. You'll see that in a minute.
10:02
So it gives you a language that is as Pythonic as you want, and as low-level as you need it. Okay. So here's a demo. I'll make that bigger. Okay. Who's never used a Jupyter notebook?
10:24
Again, the kid over there. See? Teach him. So pretty much almost everyone else. And it's a very nice and interactive way to use Python,
10:41
to jump around in your code, to show stuff. So what I have here is a Jupyter notebook. And as you probably also know, Jupyter supports lots of different languages. I remember that it was 14 different languages, like, years ago. It's probably 20, 30-something in that order now.
11:05
And it supports Cython. So Cython is included in there. If you just say, load as Cython and have Cython installed, obviously, then that teaches it how to run Cython code, how to compile Cython code in a Jupyter cell. So I'll just show you what I'm using here.
11:20
Python 3.7, fairly old version, but anyway. Cython 3, fairly new version. NumPy GCC 7. Oh, yeah, and that's something I should mention. Since Cython compiles here, it translates the code to C. What you need now is a C compiler.
11:41
Python is nice and easy. You write the Python code. You just run Python, run my code, and it runs it. In Cython, there's a compilation step before that, because in the end, you get a natively compiled module, native code, and that requires a C compiler. But that's it. That's all you pay.
12:01
Okay. So, very simple example. I'm going to take the Python math module, import the sine function, and calculate sine of 5. Okay? It's a bit boring. I can do the same thing in Cython. And one difference you notice here,
12:23
so first thing is, this is how I declare a Cython cell. I just tell Jupyter, you know, this cell is no longer running in Python. It needs translating with Cython, so it needs to be passed through Cython. Please compile a binary module for me. And then what Jupyter does in the back is,
12:42
it imports that module and executes it along the way. And that's also the reason why there's a print statement down here, rather than just the sine of 5, as you would know from Jupyter. So in Jupyter cells, in Python cells, the last result of an expression kind of falls out of the notebook and gets displayed.
13:03
And in a Cython cell, since the cell is compiled to native code, that doesn't work anymore. So Jupyter can't look into native code. It just executes some module, and there's nothing falling out of it. So I have to explicitly say print. But that's fine, I think. Okay, gives the same result.
13:22
Now, as I said, since Cython compiles C code here, it translates to C code, I can start using C functions now, or C functionality. And what I do here is, I take the sine function from the libc math header.
13:43
So this is the C sign function. And for now, I'll just assign it to a variable, and then I can call that from Jupyter. Who's surprised? Anyone? No one? Couple of people over there? What's surprising here?
14:03
Yeah, no compilation, and nothing you can see. So the Cython annotation up here makes it a Cython cell, compiles it for me. But the funny thing is, just by assigning the C sign function here, I can actually call it from Python.
14:22
Right, so now Python knows how to call a C function. That's interesting. The reason for that is that Cython is a typed language, since C is a typed language. So the sign function is something that knows that it's a function, a C function that takes a double floating point value in, and outputs a double, or returns a double floating point value,
14:44
so it knows the input and output values, and that allows Cython to directly wrap this function as a Python object. And when I assign it to a global Python name here, it does it for me. Because, you know, that's an obvious assignment, right? I assign a C function to a Python variable.
15:00
What else should it do than wrap it for me? Obvious, right? And it does it. So this is kind of the quickest way to wrap a simple C function for Python. Here's the same thing spelled out. So I'm again asking Cython to learn
15:21
what the C math header is, and then I write a Python function. You can see the extension here allows me, so Cython allows me to declare C argument types, as I would in a C function, and then I can just call the sign function,
15:41
and this is me manually wrapping this C function call here in a Python function, which essentially does the same as above here, so it's not automatic. So when I run this, and down here, now I can call the Python function, and it calls the C function internally.
16:02
Okay. That's still a bit boring, because as you've seen here, Cython can also do this happily on its own. It doesn't need my interaction, my code writing for this. So when it's the point when it becomes more interesting to do this,
16:21
that's when it can move more functionality into the Cython layer. I can easily now call this C sign function up there, that I just wrote, with X squared as input, but that would calculate the X square in Python, then take the result of that, pass it into a C function,
16:41
calculate that in C space, and return it for me, and return that as a Python object again. If I do the squaring also in Cython, then that gets translated into C code as well, and now the whole expression, the whole function body itself
17:01
runs in C at C speed, rather than me doing something in Python, then in C, and then passing it back to Python. So the nice thing about Cython is that it allows me to move functionality freely between Python space, Cython generated C space,
17:22
and at the end also the low level C written code. And it's up to me as a developer to decide where I want to put my functionality, how I want to implement this. In some cases I have an existing library, the C library for example, that I want to call. I can just do that.
17:41
I can declare it in Cython, call it from Cython code, do some more stuff in Cython space, to keep it below the Python level, and when I'm done with it, then I return to Python. So I have all three levels available in one programming language,
18:01
and that gives me a lot of freedom for moving around, optimizing code, in these three levels of performance also. So here's an example I'll go through somewhat quickly.
18:21
It's about speeding up NumPy code. Any questions so far on what I presented? Anything you would like to know or clarify? I didn't understand. I'm not always as clear as I want to be. I hope I wasn't. Apparently I was, so thanks.
18:42
Next example, NumPy. So the idea is we're going to calculate the average tax rate for Germany. I only have the numbers from 2016. So they'll have to do, at the time, there were 44 million people working,
19:04
so paying income tax, let's put it that way, and the average at the time was 3,703 euros per month. So the average income was, at the time, 44,000 euros.
19:25
Okay, that was 2016. It probably didn't change that much. It's still, you know, it's still an average. So the problem is I then tried to find official data for this, and I couldn't find anything really,
19:40
probably due to data protection or something. Like there are probably a couple of people who earn that much that you would identify them by their income. So there's no official data for this, and what I did instead was I just took a log-normal distribution, fit it to the one data point that I have,
20:01
made it kind of look good, like this, and it's kind of not too far from what you would expect for an income distribution, okay? So just assume this is actual data. Okay, so let's calculate everyone's taxes.
20:24
When you look up the tax rate calculation in Wikipedia, what it gives you is a beautiful Excel formula. This is actually German Excel, right? So it says vin, and so that's like if.
20:42
And then it gives you like lots of formulas, so there are apparently different income ranges that are chained against each other and that have different formulas for them to calculate the tax rate, so the income tax for that income range. Okay, so you can easily translate that into Python,
21:03
and it becomes this, which I think is a bit more readable, especially because it allows you to ignore the right sides of the formula here and just go, ah, look, there are different income ranges, and then it does something, just ignore the formulas, right?
21:23
So this is how the income tax is calculated. And then in order to get the average tax rate, we take the sum of the taxes divided by the sum of the incomes. Okay, so I'll set my, you know, kind of fate income list.
21:54
I think I cut it down a bit. I'm not taking the 44 million, that's just a fourth or something.
22:01
And now, whenever you do timings, don't forget to disable the energy management on your laptop, because otherwise you get funny numbers. Okay, so it's going to take a while,
22:21
and I can already show this a bit, so what we're going to do here is, since we're optimizing this code, I have a little function that just collects the different timings from different implementations, and so I remember the timing here, and then we'll use this function to show the differences.
22:42
So it took three seconds to calculate this whole thing in Python. Okay, so it's 3.2, okay. So that's our baseline.
23:00
Python has factor one. Now you can implement the same thing in NumPy, and that gives you something like this. So we're slicing the income array, then doing some computations on them, and then build the sums on it. Okay, that's kind of heavy NumPy code,
23:21
but that's how you can do it in NumPy. And now we can calculate the whole thing in NumPy. It should be faster. So we're down to 62.1, and that is 50 times faster than Python.
23:42
Okay, there's a different way to do this. Still, you can take the Python function I've written and wrap it for NumPy so that it can apply it through the whole array, one item at a time, and then you can do the same formula, sum up the taxes and the incomes, and divide one by the other.
24:01
That, again, is going to take a while. That is slower than the slicing version, but it should still be faster than the Python version we had, and it's faster by quite a bit. So that gives us 849 milliseconds, and that is four times faster than Python still, but it's completely dwarfed by the NumPy version.
24:27
Okay, enter Cython. Here's a plain copy from the Python code I had above. It's doing the same thing, and now what I change is I compile the whole thing in Cython and then do the calculation again.
24:43
Maybe I shouldn't show this yet. As you can see, it takes a while to calculate this. Actually, it already takes a while to...
25:01
All right, it's running here. Okay, so it takes 2.74 seconds, and as you can see, that is 17% faster than the Python version,
25:24
which is nice, right? It's this little line up here, and so how much is that? So that's 8 characters for a 17% speedup. That's certainly areas where that's acceptable, right?
25:43
Okay, but we can certainly do better. Now, where Cython shines is static typing, and I told you about gradual typing, so what we're going to do now here is we'll gradually start typing the Cython code. And I'm going to use the
26:03
pep4.4 notation for it. So, the income that the text function here is calculating is definitely some double, right? C double value. It's perfectly fine to represent that because it's, you know, some floating point
26:20
and double is just fine for these calculations. Okay, and it returns. Well, it does some calculation afterwards, and so it definitely also returns a double safely. Now, next thing I notice is this function is only used internally inside of my module. I'm not exposing it anywhere.
26:41
I'm really just using it down here. So what I can do is I can convert it to a C function, which is faster to call than a Python function because it has different call semantics. In C code, you can call C functions pretty much without overhead. In Python code, calling a Python function is a lot of overhead.
27:02
Also in Cython code, calling a Python function is a lot of overhead because they are caught with argument tuples, maybe keyword arguments even, and so creating these objects even just for calling the function takes a lot of time. In C, it just passed via stack or register
27:20
a register, so that's very, very fast, as fast as it gets. So one thing I can do is I can declare this function as a C function. Okay? And now when I compile this,
27:45
then Cython compiles it for me, and I can time it again, and we're down to 205 milliseconds. So when I compare this to what we had before,
28:05
that is now quite a bit faster than the Python version by 15 times,
28:20
and the compiled version that we had before is still 13 times slower. So this was a speedup by 13 times compared to the version we had before. Why is that? Because one thing is the call overhead that I completely removed.
28:41
So this is, you know, that's kind of the inner function of my loop where I'm doing a lot of work, right? And the call overhead for that function was removed, so that calling that function, going into that function is basically low time now. But a second thing happened. By typing the input and output arguments,
29:03
Cython understood this function better and managed to generate plain C code for it because it knew now that income, you know, that's a C double, it didn't have to do any object comparisons anymore. It can calculate this whole thing in C space now. So by typing some variables,
29:21
by typing some arguments in there, Cython takes decisions about my code and adapts it to the variable types. And you can see that with a functionality called annotate. So Cython dash a gives me HTML output. And in here, it replicates,
29:43
so it outputs an HTML snippet into my notebook that replicates my code. And in here, sorry, in here you can see when I click on one line, this is just plain C code, right? Income greater than something. Same thing here. Take it, formula, go to something, right?
30:04
In Python, that would look a lot more involved. It would do lots of Python C API operations, lots of object operations along the way, and does now really generate a plain C code. And it gives me a speedup of 13 times.
30:24
So I would then continue this. There's a lot more I can do at this front. And I think the final speedup in the end that I usually have is... Do I have it here?
30:42
Yeah. So in the end, I usually manage to get it down to something like 11 milliseconds from the current 200. So that is another factor of almost 20. I'm going to switch to a different example here.
31:01
So any questions regarding this topic so far? Okay, well, yeah, I think you were first over there. Yes. Well, C func is actually explicitly saying this is a C function, which is the same as declaring
31:21
as a C diff function, right? C diff has more meanings than that. So this is more specific. For the others, the question was there's a different syntax in Cython, which is not Python compatible. And there's a special file type also,
31:41
which is pix instead of py, which allows you to use the syntax. And I've only been using Python syntax here. You can do the same thing in a special Cython syntax, which is more relevant when you start talking to external C code, because that is ground where you cannot
32:00
cover the same thing with Python compatible functionality anymore. And that's where we use the second syntax. But as you can see, you can get very far by just adding decorators, Python type annotations, and so on and so forth, maybe making your code a little more C-ish than it used to be.
32:23
That also helps in a lot of cases. Okay, question over there? So if you were to actually have that in a code base, would you put this part of the code in a pyx file, or would you have it in a Python file and only annotate? Do you Cythonize the file and compile it, and then you ship, or do you ship it,
32:42
and then if you have Cython on the go, it knows that it needs to compile? So the question was, this is in a notebook, right, where it's nice you can compile a cell or use a Python cell, mix them as you wish. How's that when you have code in the module, in the Python module? How do you optimize that?
33:01
So Cython compiles one module at a time, meaning you would normally pass the whole module through Cython and compile it. It rarely hurts to also speed up a couple of other places in your code without actually type annotating them. So if you just want to optimize one function, you can still compile the whole module, right?
33:21
This might not even be a bad idea. Different choices that you have. One is Python often uses this concept of an accelerator module, where you have a Python implementation of something, and then you have an external native implementation of it,
33:43
which you import in the end, and if the import works, then you replace some function in the Python module with some faster function, something like this, right? And you can do the same thing in Cython, right? You can externalize one function into a separate module, compile it in Cython or not,
34:00
and then import it from there, right? And if it's compiled, then it's probably faster. If it's not compiled, then it's as fast as Python will run it, or as PyPy will run it in that case, for example, right? So that is fairly nice. The syntax actually allows you to optionally compile code, right?
34:22
You can have it as fast as PyPy can run it, and in Cython, you can compile it and get it as fast as Cython can run native code. So nice. Um, okay. Um... I'm currently looking for a little example that I had.
34:43
Those related... Yeah, okay. I mean, that answers the question, right? Okay. Okay, anything else? Then the next example that I have is C++. Just a very quick example.
35:02
So, as I said, Cython generates C code by default, but it can also generate the C code into a C++ module, and that allows you to use C++ code along the way from your Cython code. And that is very nice from Cython,
35:22
because C++ is an object-oriented language, Python is an object-oriented language, so you can mix two object-oriented languages in the same Cython module, and they look totally like Python when you use them. Even C++ looks a lot like Python when you use it from Cython,
35:40
but it gives you a nice additional standard library, like fast data types, for example, fast container data types through the STL, or anything you might be able to write in C++ yourself and want to wrap it for Cython. So that is a very nice addition to Cython, actually. Here's a very quick example. One thing you have to change is you have to tell Cython explicitly
36:03
that the language it should use is C++. There's more than one way to do this, but there's a common one, especially from a notebook, and then that enables access to the C++ STL declarations that we already ship in Cython, because they are so commonly used. And then you can use this syntax here
36:24
to say I have a variable v that's a C++ vector of ints, and then I just push a value in there and return the vector. What does this do? Returning a C++ vector from a Python function? What do you think happens?
36:45
Yeah, it would transfer it to a list. It would copy it into a list, right? Because the obvious representation of a C++ vector in Python space would be something list-like, and you would use a Python list for that.
37:01
So simply returning something that Cython understands as list-like will copy it into a Python list automatically. So it will create a Python list for you, copy all the values from the C++ vector into it, and return that. All automatic.
37:21
If you don't want that, you can implement your own little thing in whatever way. You can use a list comprehension to get something out of the C++ vector. Or you can return a generator expression that uses the C++ vector. You can do all these kind of real things by mixing Python features with C++ features.
37:43
But this is the most simple and most straightforward way to do it. Okay, so when I call this function, then I get turned back, which is not very surprising. This is a very common way to wrap C++ objects for Python.
38:04
I can use a so-called CDEF class. So CDEF class is an extension type in Python. So it's a native, low-level implemented class, like a Python class, but implemented in C.
38:21
And in Cython, these extension types allow me to directly use C++ objects as instance attributes, that's an instance attribute. And I'm using the vector of int as values attribute. And then the lifetime of this C++ object
38:41
is tied to the lifetime of a Python object. So it's automatically created when I create a Python object. It's automatically de-allocated for me when my Python object goes out of scope, so I don't have to care about any memory management. It's all automatic in this case. Very nice. And so this is just a very tiny example
39:00
that uses an integer list-like object, wrapper object in Python. I can add values to it, which just uses the pushback C++ function to push it into the C++ vector. And as you can see from the usage here,
39:21
that totally looks like I'm using Python code. I'm calling method on something. I don't care if it's a C++ object or a Python object. Just call it. I also like the wrapper implementation here. What do you think this does?
39:47
It runs wrapper on the list. So it does the same thing as we had before in the function. It creates a Python list from the values. This is actually very expensive. But hey, how often do I actually need wrapper?
40:03
So what it does is it copies all the values in the Python list, uses wrapper on that, and then gives the string back to Python. If it's only something I occasionally use, why not? It doesn't cost much. OK, so much for C++.
40:21
Any more questions on that field? Last time you checked, we didn't have bitvector. You mean in the C++ declarations that we shared might be? Please provide a pull request, and we will happily add it.
40:44
Anything else? Guild handling. I actually have another example for that as the next and last example that I'm showing. OK, so here's an example of wrapping an external library.
41:04
OK. That's the last use case that I wanted to show here. And so far we have seen, you know, this C import something lipsy math or so. It's a bit of magic. And this is unpacking the magic, kind of. What we ship is Python declarations of most of the C++ header set of the STL.
41:29
Well, C++ is huge, but much of the STL. And this is how it's spelled out in the end. There's a special syntax for that, and so what I want to do now is
41:41
I write a little wrapper for the Lua runtime. So I'll execute Lua code from Python. And for this, I just need to declare a couple of functions that the Lua API defines. And then up here, declare how to link against this. This is a bit of a hacky way to do this right in the notebook.
42:03
There are more, you know, nicer ways to do this from the setup.py file. But this works for now. And then once these are declared, I can use them in my code. So I define a Python function. It takes a code string, converts it to ETF8 if it needs to. So it accepts Unicode and byte strings.
42:21
So you can just use a normal Python string from Python3 and pass it in there. Then it creates a new Lua state that's a Lua runtime representation. If that fails, then it raises memory error. That is also fairly nice, right? Do you know what you would have to do in C in order to do this?
42:41
Happily, there's a C API function for it. But just being able to say, you know, do some C stuff, if that fails, if I get a null pointer from some memory allocation back, raise memory error? That is nice. That is what you want in your code. So that's totally the Python way of doing it.
43:02
So then I have a try and accept. Sorry, try finally, and at the end of the finally, I delete the Lua runtime again. So if anything goes wrong along the way, then I make sure I clean up the memory. Because C requires manual memory management, right? I'm responsible for cleaning everything up that goes wrong.
43:23
And in Cython, I can use try finally, and it's just going to work. So here's Lua function to load the code into Lua, execute it from there, and then a tiny bit of return value adaptation
43:41
to return whatever number Lua wants to return here. I just quickly run this, and notice that I didn't run the beginning. And then run it again.
44:01
And it fails because I didn't set up my Lua properly. Just believe me that this works. Okay. It's not the first time I've shown this example, it's just the first time it failed so heavily, so I'm not going to fix it right now in the last two minutes. Okay.
44:21
One title of that talk was Cython 3. So those of you who already use Cython will probably know that Cython is always this 0.something version kind of thing, right? 0.29 currently.
44:41
So what is this Cython 3, where does that come from? Well, the last Cython version that we have is 0.29, so if you push the dots one digit to the right, and then it becomes 3, the next obvious version is 3.0, right? Okay. So that's kind of frightening, right?
45:00
You jump from 0.0 something to 3.0? Well, what's special about it? First thing is, Cython 3 is Python. Now you're going to say, but it was always Python, so what's special about it? Well, it's Python 3. And as some of you might say, I could always put my Python 3 code in there,
45:21
I just had to say, you know, language level equals 3, and then compile it. Well, it's Python 3 by default. So the language level is 3 by default. So there are a couple of things that we changed now in Cython 3. It's Python 3 by default, so we're going to change mostly the standard configuration of the compiler
45:43
to make it more modern, to move stuff that isn't appropriate anymore these days, and we're going to adapt it to today's Python 3 world. So these are the main changes in there.
46:00
We're going to adapt to several paps that came along the way, and as I said, Python 3 by default, yes. Okay, you can look up the Python 3 milestone, there's still quite a bit to do along the way, and we'll try to get there. Okay, that's it from my side.
46:24
Thanks for listening.