pybind11 - seamless operability between C++11 and Python
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 160 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/33723 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Extension (kinesiology)Field extensionAxiom of choiceEndliche ModelltheorieType theoryPresentation of a groupSoftware developerArithmetic meanComputer animationLecture/Conference
01:00
Modul <Datentyp>Field extensionModule (mathematics)Extension (kinesiology)CodeRight angleFormal languageEndliche ModelltheorieExtension (kinesiology)WritingCodeNumberMereologySparse matrixDataflow10 (number)CodeLecture/ConferenceJSONXML
01:44
Field extensionModul <Datentyp>Extension (kinesiology)Library (computing)CodeSoftware testingTranslation (relic)Type theoryError messageSubject indexingField extensionSoftware frameworkSoftware testingWritingMathematicsNumberException handlingArithmetic meanRight angleLoop (music)Goodness of fitCountingPoint (geometry)PrototypeCore dump
03:34
Field extensionExecution unitMaxima and minimaError messageLibrary (computing)BuildingEmailExtension (kinesiology)Modul <Datentyp>Binary fileInterpreter (computing)ModemCore dumpEntire functionInfinite conjugacy class propertyFormal languageType theoryEinbettung <Mathematik>Function (mathematics)Exception handlingVirtual realityInheritance (object-oriented programming)Pointer (computer programming)PlastikkarteBoilerplate (text)CompilerModule (mathematics)IntegerParameter (computer programming)Type theoryNominal numberCompilation albumSelf-organizationMultiplication signVector spaceCASE <Informatik>MetaprogrammierungNormal (geometry)Suite (music)Endliche ModelltheorieRight angleLibrary (computing)Row (database)Lambda calculusExtension (kinesiology)Service-oriented architectureElectronic signatureGoodness of fitSoftware frameworkTelecommunicationNumberTypinferenzLine (geometry)Different (Kate Ryan album)Limit (category theory)Link (knot theory)Functional (mathematics)Field extensionIntegerCodeWindowWritingSet (mathematics)ProteinRule of inferenceServer (computing)Pointer (computer programming)AttractorString (computer science)FraktalgeometrieTensorDivisorBuildingComputer configurationCuboidComputing platformTupleWebsiteProjective planeInheritance (object-oriented programming)Connected spaceNamespaceFormal languageRevision controlParameter (computer programming)AuthorizationEmailForceFeedbackInterpreter (computing)Thread (computing)Core dumpModule (mathematics)Software repositoryLoop (music)Interior (topology)CodeVisualization (computer graphics)Range (statistics)CompilerGeneric programmingQuicksortCoroutineInclusion mapXMLComputer animation
10:08
Module (mathematics)FlagExtension (kinesiology)Field extensionConvex hullFormal languageContent (media)LaptopVideo game consoleSocial classDependent and independent variablesKeyboard shortcutStrutState diagramString (computer science)Inclusion mapConstructor (object-oriented programming)Information overloadExtension (kinesiology)Line (geometry)Default (computer science)Template (C++)Library (computing)Different (Kate Ryan album)Constructor (object-oriented programming)outputQuicksortKeyboard shortcutInterface (computing)FlagString (computer science)Endliche ModelltheorieType theoryAttribute grammarSocial classEntire functionFunctional (mathematics)Web browserLaptopReflection (mathematics)Function (mathematics)Electronic signatureBijectionParameter (computer programming)MultiplicationDependent and independent variablesStreaming mediaSet (mathematics)Source codePopulation densityCursor (computers)Observational studyTupleEmailConnected spaceOnline helpAsynchronous Transfer ModeWeightRun time (program lifecycle phase)Computer configurationData structureWordOrder (biology)Bootstrap aggregatingComputer-assisted translationXML
13:44
Attribute grammarInstance (computer science)Dependent and independent variablesStrutSocial classOperator (mathematics)Binary fileKeyboard shortcutOperator (mathematics)Error messageOperator overloadingMultiplication signKeyboard shortcutPattern languageRow (database)Parameter (computer programming)Object (grammar)Connected spaceCategory of beingType theoryMatrix (mathematics)Vector spaceString (computer science)Read-only memoryShift operatorMechanism designDifferent (Kate Ryan album)Field (computer science)
15:10
Dependent and independent variablesState diagramCodeFunction (mathematics)Parameter (computer programming)String (computer science)Computer virusRange (statistics)Default (computer science)ParsingObject (grammar)Type theoryData conversionWrapper (data mining)Set (mathematics)Polymorphism (materials science)IntegerScalar fieldTupleSequenceInheritance (object-oriented programming)Constructor (object-oriented programming)Virtual realityOperator (mathematics)Attribute grammarFluid staticsInterface (computing)Bridging (networking)Module (mathematics)Linear subspaceModul <Datentyp>File formatCommunications protocolData bufferElement (mathematics)System callException handlingTranslation (relic)Pointer (computer programming)PlastikkarteRun time (program lifecycle phase)Information overloadPairwise comparisonString (computer science)File formatCodeRule of inferenceShape (magazine)Functional (mathematics)Category of beingFunction (mathematics)Buffer solutionDefault (computer science)CASE <Informatik>WordException handlingComputer configurationNumberSparse matrixTimestampTwitterType theoryInstance (computer science)PlastikkarteMultiplicationDampingVector spaceDivisorMultiplication signParameter (computer programming)Broadcasting (networking)Closed setData structureProper mapObject (grammar)Scalar fieldReflection (mathematics)Right angleReal numberInterface (computing)Operator overloadingTupleOperator (mathematics)Computer fileLengthForestLibrary (computing)Electronic signatureDifferent (Kate Ryan album)Dimensional analysisShared memorySemiconductor memoryMaschinelle ÜbersetzungFluid staticsPRINCE2Boss CorporationEndliche ModelltheorieService (economics)Interpreter (computing)Pointer (computer programming)Chemical equationSocial classPresentation of a groupAttribute grammar1 (number)Metropolitan area networkArc (geometry)Keyboard shortcutMereologyBlock (periodic table)Particle systemElectronic mailing listTerm (mathematics)QuicksortHierarchyData conversionConstructor (object-oriented programming)Inheritance (object-oriented programming)Data storage deviceLevel (video gaming)Sheaf (mathematics)Translation (relic)Bus (computing)Product (business)SequenceWrapper (data mining)MappingPiTelecommunicationSystem callCircleFigurate numberNumeral (linguistics)Civil engineeringRun time (program lifecycle phase)Entire functionLine (geometry)Expected valueBoolean algebraInterior (topology)View (database)Direction (geometry)Array data structureBuildingElement (mathematics)Letterpress printingPerformance appraisalState diagramInclusion mapUniqueness quantificationNachlauf <Strömungsmechanik>CodeCommunications protocolIterationVulnerability (computing)VirtualizationModule (mathematics)Module (mathematics)Bound stateCountingSingle-precision floating-point formatNormal (geometry)Casting (performing arts)Computer animation
24:36
Function (mathematics)EmailFunctional (mathematics)Type theoryData conversionState diagramHill differential equationElectronic meeting systemAlgebraic closureNetwork topologyNumberElectronic mailing listFunctional (mathematics)Instance (computer science)Primitive (album)Event horizonSystem callQuicksortPositional notationType theoryPoisson-KlammerSheaf (mathematics)Level (video gaming)Multiplication signIntegerObject (grammar)Field extensionOrder (biology)DivisorPolymorphism (materials science)PlanningState diagramLibrary (computing)Interior (topology)Message passing
27:36
StatisticsStrutState diagramSimultaneous localization and mappingSimulationRevision controlDampingStatisticsFunctional (mathematics)Factory (trading post)Arithmetic meanElement (mathematics)Frame problemInformation overloadSeries (mathematics)AuthorizationStandard deviationMedianKeyboard shortcutVarianceType theoryMultiplication signWindowInterior (topology)NumberFigurate numberHost Identity ProtocolDivisorInfinityParticle systemCivil engineeringEndliche Modelltheorie
29:24
WindowMultiplication signSummierbarkeitArithmetic meanStandard deviationFunctional (mathematics)WindowSquare numberBuffer solutionTrailElement (mathematics)DampingXML
30:02
Execution unitMaxima and minimaHill differential equationStatisticsRandom numberLoop (music)Projective planeMedical imagingNormal (geometry)Physical systemTrailMultiplication signMessage passingElement (mathematics)AuthorizationCodeTerm (mathematics)Data storage deviceResultantNumeral (linguistics)Functional (mathematics)Data structureWindow2 (number)WordArithmetic meanType theoryBound stateEndliche ModelltheorieSummierbarkeitBit rateSquare numberRow (database)StatisticsBuffer solutionInterior (topology)State diagramProxy serverFrame problemKeyboard shortcutMoving averageModule (mathematics)Standard deviationCodeNeuroinformatik
32:56
Multiplication signSocial classMereologyEqualiser (mathematics)Lecture/Conference
33:28
1 (number)Module (mathematics)BenchmarkMemory managementObject (grammar)State of matterWrapper (data mining)MultiplicationEndliche ModelltheorieOperator (mathematics)NumberCodeSocial classMultiplication signLaptopSoftware frameworkElectronic signatureStability theoryLibrary (computing)Resource allocationRevision controlSoftware repositoryDifferent (Kate Ryan album)Interpreter (computing)Extension (kinesiology)Software developerForm (programming)Condition numberConnectivity (graph theory)Well-formed formulaCuboidForcing (mathematics)MereologyGodType theoryApproximationPhysical lawDivisorSpacetimeWater vaporMusical ensembleSet (mathematics)Machine visionSparse matrixComputer configurationPreprocessorInheritance (object-oriented programming)Point (geometry)Cartesian coordinate systemPower (physics)Lecture/Conference
Transcript: English(auto-generated)
00:09
Hello everyone, my name is Ivan and I'm a quant at a trading firm called Susquehanna in Dublin, Ireland. And I'm actually surprised to see so many people here, given that there is a competing talk on Cython right now in the main room.
00:25
So thanks for coming. And today I'd like to talk about C++, which may seem like a weird choice of topic for a Python conference, but I think it's pretty relevant and interesting. There's been a few good talks this week on Cpython and CFFI and PyPy, so I think it fits in quite nicely and I hope you enjoy it.
00:46
So let's talk about Python extensions. So what exactly is a Python extension model? And by Python I really mean Cpython here. Python extension model is something that you can import from Python, but it's not written in Python. And that normally means it's written on C or C++.
01:02
Because Cpython provides a C API. These days you can also write extension models in other languages like Rust or Go. So why would you do that? The main reason usually is to be able to interface with other libraries.
01:22
Maybe you want to use some part of TensorFlow that's not been wrapped in Python yet. So you can do it this way. Or maybe you write your code in C++ and you want to interface with that. Another thing is, a pretty common reason, is writing the performance critical code, the mirror code. It does number crunch very fast and then you can expose it to Python.
01:43
And the two less obvious reasons that are really important, I found really important in my own work, are these. So if you have non Python libraries written in C++, for example, you can mirror the API in Python literally one to one.
02:02
And this lets you prototype things very fast, like in Jupyter notebook, for example. And then you can just translate it back to C++, just add the semicolons and change the for loop syntax. And if you do that, you can also run the test in Python. It's not that you can't do that in C++, it's just it's a lot easier. There's a very nice test framework.
02:21
And if you're testing a mirror code, it's very nice to be able to use NumPy and Pandas and all the other tools. And the last two points actually play well together, like you can start prototyping things and then write tests in Python. And then translate things back to C++, but your tests just stay the same, which confirms that you did it right.
02:44
So it's possible to write Python extensions in pure C, and there's been quite a few talks on this as well. But if you're going this way, you need to have a few skills, like you have to be good at ref counting. And I don't think anyone's good at ref counting if your name is not Larry Hastings.
03:03
And there's exception handling that you have to do manually. And by that I really mean like C style error handling. You have to be able to type fast and have a good keyboard, because you'll type a lot. And you'll make some errors, and if you don't make some errors, then those Python core devs will change the API and make them for you.
03:24
So that's quite a lot of pain to go through. So here's an idea. If we can translate Python into Python C API calls, instead of running that, why don't we just translate it back to C? And then we can augment Python with this fancy news index, so we get pointers and references and all that.
03:47
And it kind of works, and there's libraries that use that heavily, like Scikit-learn and Pandas. The most numeric intensive routines are actually written in Cython. But there's a few problems with that as well. So first you're not writing C and you're not writing Python, and that's actually, at times it's really hard to figure what is it you're writing.
04:05
And as I've just checked a few days back, a two-line size module generates 2,000 lines of C, so it's quite a lot. You have multiple build steps, the ID is usually choked on that. It has limited C++ support, so it's like stuck in 2003, it has a few new features supported, but most of them are not.
04:23
And it has limited support for metaprogram and generic types. You have to create stubs for everything that you use from C. And so I think it's good for wrapping a few functions, again, kind of like Pandas does it. It's not so good for managing a huge codebase. My biggest gripe really is this. It's just debugging compiled Cython extensions is just a complete pain, and I just want to illustrate that real quick.
04:45
So here we have a function that does nothing, but it does this nothing n times. Right, so I'm pretty sure if you pass it to PyPy, it would just not generate any code at all. So if we run Cython on that, then nothing good happens.
05:04
So this is the code for just one line, like for i in range of n. Where we told Cython that n is an integer, so what could i be? And so you would expect a C for loop. I'm not going to zoom this in, there's nothing interesting in there, it's just a bunch of Cython C codes really.
05:21
And so what's wrong? Turns out that I forgot to tell Cython that i is an integer, for some reason I had to do that. And then it generates, it's still not nothing, it's something, and it looks pretty bad. So if you were to, but you can actually see the for loop. But if you were to debug this in step thread in GDB, it would be just a complete pain.
05:42
So here's another idea, let's use Boost. And Boost is just a humongous C++ library that does everything. So if C++ could make coffee, there would be a Boost coffee library for it. And there's a Boost Python written by Dave Abrahams, who's also the author of Boost MPL metaprogramming library.
06:02
And the problem with that is that you have to build it. Depending on your platform, that may or may not be easy. It also requires Boost, which is, the last time I checked was a million and a half lines of headers. And it uses weird tools for building, and then it requires on this, because Boost is compatible with everything, like old compilers,
06:22
it doesn't use the new language features, so it uses its own metaprogramming library. So it takes a very long time to compile, and you end up with huge binaries, and it doesn't attract new contributors, because it's really hard, you actually have to know the entire Boost to do anything with it. And just a disclaimer, this is not a token like Bashan, Cython, or Boost.
06:44
They're both really good options. If you're already using Boost, then Boost Python may be a really good choice, or Cython is a really good choice if you're just wrapping a few functions. And so I just wanted to introduce another library, yet another library, that's sort of like Boost Python, but it does things in a more lightweight fashion.
07:04
So it allows you to interact with Python interpreter, or embed Python interpreter in C++ code. So it's header only, no dependencies, it doesn't require any build tools. It's very small, it's like 5,000 lines, core code base. It's optimized for binary size, compile time,
07:21
so we've seen one of the big projects converted from Boost Python to Pybind, went down by a factor of five in both binary size and compile time, so that was quite big. We support just CLang, Visual Studio, Intel Compiler, Linux, Windows, microOS, C Python 2 or 3, and PyPy. And yes, I really said PyPy.
07:42
We require C++11, but some new features from C++14 and C++17 are all supported. There is support for NumPy without having to actually locate and include NumPy headers. There's support for embedding the interpreter, and there's a whole bunch of different functions and features of C++ that we support.
08:03
And here's a link to the GitHub repo. So I'll just try to walk you through that using a few examples. There's not enough time to cover all of it, but I hope you'll have a good understanding of how it works. So we'll start with a simple slow world example. That's what you normally do when you learn a new language or framework.
08:22
And if you're prerequisites, you need C Python or modern PyPy, and you need PyBind 11 package installed, and some non-ancient compiler. And in all code examples, I will skip these three lines. So it's basically including the PyBind header, then allysing the namespace to just Py,
08:43
and also defining the PyBind module, which is like the main extension module. Okay, so whenever there's a variable m, it just means the module that you're currently defining. All right, so let's write a function that adds two integers, like in C or C++. So here we have a function add that takes a and b, returns a plus b,
09:02
and we can call the def on def method on the module and tell it, hey, here's the function called add, give it a pointer to the function, and you can give it a docstring and a whole bunch of other things so it knows how to generate a Python signature. That's pretty much it. You compile it, it works. You don't have to tell it what's the exact signature.
09:22
It's all inferred from the type. So it's using the C++, like the modern C++ type inference features. Or you can even write it like this. You don't have to define any functions at all. You can just use a lambda, a C++ lambda. So you just tell it, here's an anonymous function, takes a and b, returns a plus b. And this also works, so it's like a one-liner, basically.
09:44
If you compile it, and we'll get to it in a second, it just works like a normal Python model. In fact, it generates the docstrings and the signatures, so you see there's two int arguments, returns an int. You give it one and two, you get three back. If you give it some non-integers, it tells you that the signature's not compatible,
10:01
so it does the type checking. So how do you compile it? Well, there's a few ways. If you're a happy owner of a Linux box, then you can just tell it where the includes are, and that's pretty much it. You don't have to link it to Python or anything, so that's literally the entire line to compile it. You can do the same thing on Mac OS,
10:21
you just have to add one more flag so it doesn't complain. On Windows, I heard it's possible, but I haven't tried, and it's probably not fun. However, there's better ways. So you can actually integrate it in setup tools, just tell it how to find pipeline headers, and that's pretty much it. And you have to tell it that you have to compile it.
10:41
C++ in C++ 11 or 14 mode. That's it, then setup tools will take it from there. There's another thing. So I've actually built it specifically for this tool, specifically for doing this conference, and just for myself, but it turned out to be pretty useful. So that's a Jupyter notebook extension that you can just load in Jupyter notebook,
11:02
and it does all the bootstrapping, and it will compile your model and actually cache it. It will enable C++ syntax highlighting in the Jupyter. It will forward input output streams from C++, and that's literally the entire cell for this. You hit enter, it compiles it, it inputs it back,
11:21
and you can just use your functions from it. So this one is not released in PyPy yet, but I hope to do it this weekend so you can find it on GitHub. Finally, if you use CMake, it's just the one line that we provide a CMake interface as well. You say just create a module, my add, with these sources, and that's it. All right, so how do we go about wrapping classes,
11:42
because that's what C++ is supposedly all about. So let's create simple bindings for an HTTP response class in Python. And this is sort of something like you would get from requests library in Python, like it's a status and reason and text.
12:02
So, for example, in this class, you can create it with just status and reason, or you can also pass the optional text, which defaults to empty string, or there is a default constructor, which just initializes it with 200 and OK, which is the browser response for everything went fine. So this is quite simple, and we want to mirror this, like, one-to-one in Python,
12:22
so we have the same API. So first of all, we bind the type itself. So we tell PyBind, hey, here's a type response. Please create a type called response, a string. Because there's no reflection in C++, we have to give it the name, like a string.
12:40
So this registers the type, and after this line, you can use it anywhere in any function signatures. You can return it. You can take it as an argument. You can nest it within other types. So let's bind the constructors. So in Python, in C++, you can have multiple constructors with different signatures, different overloads.
13:01
In Python, you have only one init. So what we do here is, this is a shortcut for, the Py init is a shortcut. It's a template to which you give the signature, the input arguments to your overloads, and then it creates this overload in Python. So in fact, in Python, all these three constructors will be merged into one init, and it will do a runtime dispatch.
13:23
So obviously, static dispatch is not possible because Python is not compiled, but it will look like one function, and in the docstring, it would say that it has three different signatures. The attributes. You can bind the attributes directly into here.
13:40
And what it does, it creates descriptors on the Python side that could be read-only, like get descriptors or get set, so read-write descriptors. So we can bind this directly as well. And in C++, we have stud string and int, and in Python, you will have str type, like unico type, or a int as well.
14:00
You can, if you have a property, so in C++, a method that doesn't take any arguments, so like here, for example, it's OK property that tells it your status is not an error. You can bind it as a property in Python read-only, or you can actually bind read-write properties as well with a getter and setter.
14:20
And you can overload operators. So like in C++, there's an equals operator that checks that all the fields are the same. You can do exactly the same thing in Python, so you define the double under eq, takes a self and the other, and returns self equals other. And because it's such a common pattern, there's actually a shortcut, so you can just include this by bind operators header
14:42
and say def by self equals self, where this equals could be any operator. It could be like left shift equals or anything like that. This makes it very easy to bind objects that may have like 20 different operators, like matrix or vector type or something like that.
15:03
You can define any method, so here we can define a wrapper, so this type has a proper representation, and you do it like you would do in Python, so we just create a string and format it, return it back. OK, and this is the full binding code, and it's not so much really,
15:21
so it's comparable to, it's less than the initial implementation, and if you were to do this in Python, it would be kind of the same, and maybe more because it doesn't have proper overloads. And you can use it like any normal Python class really, so if you import the type, you have your properties, you have your attributes,
15:40
you have docstrings for everything, so the operator, the equals operator works, so it all works as expected. Now a few other things, and function signatures. So in Python you can have all this, like args and default values, star args, keyword arguments, so it's all doable here as well.
16:00
So first of all you can name arguments through pyarg, because C++ doesn't have proper reflection, you need to give it names. You don't have to, but if you want your argument to be called name, like name in this case, you have to tell it that, and now if you look at the docstring, it's actually called name,
16:21
so that's nice. You can assign to pyarg, so here we have the same function, but it runs like n times, where times is like an optional argument, and defaults to one. And this would work as expected, then you can call this function with one argument or two, or provide it as a keyword argument. And you can do other things, like you can take any Python object's arguments,
16:41
you can take a pylist, which is like a wrapper, but it runs pyobjects. And you can, so here for example we count all the strings in the list, and if you were to do this in Python, it would literally look the same, like line to line, so you would do like for item in list, if is instance string item, increase n, then return n back. So that looks very close,
17:01
and it works as expected. You can take star args and keyword arguments as well, through like pyargs and pykwargs. So, as I've already said, there's function, you can bind multiple C++ functions to a single Python name, and then they would work as overloads,
17:20
so it will do a runtime dispatch on the types. So here we have a function that takes an int or float, and you can pass an int or you can pass a float, and it goes, it gets dispatched to different functions, and if you give it something else, it tells you it's an error. So that's pretty handy. There's a bunch of other things that I would just like to quickly jump through.
17:44
So there's three ways to communicate objects between C++ and Python. So the first one is, you created something in C++, you wrap it in pyobject, and then send it off to Python. And you just sort of store the pointer inside pyobject, and then we record it in registered instance map.
18:03
So if it ever comes back to C++, we know that we were the ones who created it, we quickly unwrap it, and it's very fast. Another one is the opposite, where it's native in Python, but it's wrapped in C++, so it's like pylist, pydict, pyint, pystring. And it's, on C++ sites,
18:20
we have a thin wrapper around pyobjects, that way around. And the third one is, like in the examples that I've just shown, so you have studstring in C++, and str in Python. Now these, even in int, really, because in Python, int is an object as well, they have different memory layouts, so you can't really share them.
18:42
So this would always involve a copy. But there's ways to tell it, to work around that, if you need to share a vector, or a map, for example. So some of the types that we support, that have built-in converters, or scalers, strings, tuples, sequences, maps, dicts, sets,
19:01
polymorphic functions, like data and functions. The newer types from C++14, C++17, like std optional, std variant. You can also write your own type casters. It's fairly easy, so you can write, for example, a timestamp type that would work as an int in C++. Once you send it to Python,
19:21
it works as pandas timestamp, for example. So that could be quite handy. A few more things in classes. I will not go into this in detail, because it will involve quite a lot of code, but you can do single multiple inheritance. You can override C++ virtual methods from Python, and it would work, which requires a middleware class to do that.
19:43
You can have custom constructors, so you're not limited to this PyNH shortcuts. You can do anything there, just like in Python. You can define implicit conversions, so if the types are convertible implicitly to each other in C++, you can make its own Python works the same way.
20:01
So the function that expects a instance of one type can also take another. Okay? You can overload operators, and you can also define static methods, properties, attributes, and all of that. So there's also the Python interface, so it's like everything that starts with, you know, py double colon, like a py list or pydict,
20:23
and we try to wrap quite a lot of it. So it starts with an py object, which is the highest level object in the hierarchy, and py handle, so it could be with or without ref counting. There's all built-in types, like py module and function and list and int. You can cast things back and forth
20:40
from C++ and Python using this cast operator or cast method. You can call Python functions through just using parentheses normally, and I think this is a pretty cool example where we have a tuple of args, and then we have two dicts of... two dicts and a function called, like, engage, and then we call that,
21:01
and we expand both tuple and two dicts, and we actually pass one other keyword argument, exactly like you would do in Python, sort of, and this is, you know, this is still C++. It's just heavily overloaded. It looks pretty cool, I think. You can import modules. You can... There's a bunch of built-ins that have been wrapped,
21:20
like print and format and length is instance and all of that. You can run arbitrary Python code as a string if you want to do that. In fact, we have to resort to doing that on PyPy to make a few things compatible because, you know, if they don't have an equivalent for some C Python call,
21:40
we have to do that. You can run... You can evaluate Python files as well. One of the big parts is support for buffer protocol in Python so you can... so you can interact with numeric code. So you can wrap any type, any custom types, to support the buffer protocol,
22:03
and then you can... For example, NumPy would automatically pick it up. Like, you can just pass it into a NumPy array constructor and it will know what to do. You can build buffers and memory views directly. We also support NumPy. So if you have NumPy installed, you have to include a, like, PyBind NumPy header,
22:20
but you don't have to go and start, like, locating NumPy itself, so we'll figure it out. And there's a few types, like PyArray, which is untyped array, and PyArrayT, which is a template around for a typed array. There's things like... There's a lot of functionality, but the few things I'll mention would be bounds checked
22:41
and bounds unchecked, element access, and fast access to array properties like shape, number, dimensions, dtype, all of that that we did through NumPy C API. There's support for registering structured NumPy dtypes, and if you've never heard what that is, it's kind of like Pandas, but in NumPy, like Pandas data frames, but in NumPy.
23:02
And that was my own contribution to PyBind. There's automatic function vectorization and broadcasting, so you can write scalar functions and then just wrap them so they work in any NumPy arrays of any shapes, and that's pretty handy. We also support Eigen, if you know what that is. That's the... It's a numeric C++ library
23:20
that's quite popular in some scientific circles. And a few other things that don't fit anywhere else, and different return value policies, so you can tell... For example, if you're returning a reference or a pointer, you can tell PyBind that it's actually a pointer to an internal member, so it knows how to create weak references and garbage collect it.
23:41
You can also ask PyBind to keep one object alive while another is alive, so if you're iterating over a C++ container and you don't want it to die well before you're done, so that kind of thing. There's automatic translation of C++ exceptions to Python exceptions. You can also register your own translators,
24:00
sort of like you can do in Boost Python. You can have custom holder types, and we support the default smart pointers like unique pointer and shared pointer. And one last thing I wanted to mention here would be that PyBind does have a runtime of sorts, but it's pretty fast. So the way it works, it has a capsule,
24:20
so that's like a C Python term for a block of shared memory like within the interpreter. So when you import a PyBind module, it looks for an existing capsule, PyBind capsule, and if it doesn't exist, it creates one, and then as you import other PyBind modules, they look for the same capsule and they sort of find it, and they share the same map of registered types
24:41
and registered instances. So that's kind of how it works. And the last two sections, I wanted to be like, show a few examples of what you can do, what's possible with this. And one is a callback. So how do you, that's a quite common thing to do. For example, you have a fast WebSockets library
25:01
in C++, and it takes, like, on events, you can pass it a polymorphic function that would be called each time a message arrives, for example. And how do you wrap this in Python? Well, the answer is you can use the polymorphic function type, the std function in C++,
25:20
and it will be converted back and forth to a Python function object. And this is quite cool, because Python, Python function may be actually closure that has a scope captured, and C++ function can be a closure that has other C++ stuff captured as well. And it works nicely together. So for example, here we have a function for even, so you give it
25:42
an n, an integer, and you give it a function that takes an int, returns nothing, that would be called for each even number, you know, from zero up to n. And you can use it like so, so you have, like, a Python callback. So if you compile that, you have a Python callback that just prints a number, and you just pass that directly, and that seems to work.
26:03
You can also do this kind of stuff. You can have a higher-order function, so you can make use of capturing closures in C++. So for example, here is a, so int of n is a type, is a function that takes an int, returns an int. So apply n is a function that takes a function, and also
26:20
a number n, and applies this function n times, although it does it lazily, so it returns a function that does that, if that makes sense. So it's kind of like a decorator of sorts in Python. So like, if f is, you know, multiplied by two, and n is equal to ten, it will be like multiplied by 1024.
26:41
Okay, and you can note that in the square brackets, we have like f and n, so this is a C++ 11 notation for we capture f and we capture n by value, so this is stored in the, in the closure. And that's pretty much it, like you can, if you compile this,
27:01
we can define a Python function and then pass it there. What's returned back to us is a C++ closure, which is converted to a Python function that we can call, and it still works like a decorator. You can actually go one step further, so the green one, the green apply n is the one from the previous example,
27:21
and the blue one is a factor that creates the green one from, for a given n. So it's like, if you give me n, I will give you a decorator that decorates all these functions in such way, if that makes sense. And you can, and just for the fun of it, we can bind it under
27:40
the same name, because we have overloads in C++, so we have two different versions of apply n, one takes an int, and is like a factory function. And another one takes an int and a function and returns a function. And we can use them both at the same time. So this is the first example where we have a function, we say apply n of f and 8 of 10
28:01
and that gives us 2560. Or we can use it as a decorator, so we say, you know, add apply n of 8, so that's a factory returns a decorator. We decorate a function and it works the same way as well. So I think that's pretty cool, that's quite a lot of machinery going on here and I'm quite baffled myself
28:20
that it actually works. And last but not least, there's like NumPy support is very important for myself and as I talked to Wenzel, the author, I talked to Wenzel, the original author of Pybind just a few days back. He said
28:40
that this talk is not hipster enough if there's no pandas and data frames in NumPy, so I figured I should provide one example. So here's the full example. Took me maybe 10 or 15 minutes to cook it up. Here we want to compute rolling stats on a data frame or like, no, on a series basically
29:01
of floats. So you have a rolling window that, if you don't know what that is, you have a fixed window size that just moves along through the series. Every time it moves, it moves by one element. You recompute some statistic like mean or median or variance standard deviation.
29:20
So here we'll just compute mean and standard deviation. And the type would be double, right? So we have this rolling stats function. It takes a PyArrayT of double, so it's a it would be like a float64 NumPy array and it takes a window and what we do next, we just well, we use a
29:41
to make it faster we don't actually recompute it each time the buffer moves. We don't recompute it in full. We can make use of the fact that to compute the mean you know, every time you can if you have the sum of elements in the buffer and you have sum of squares then you can infer actually both the mean and the standard deviation.
30:01
And to keep track of the sum and the sum of squares, you can just add one element and subtract one element at a time each time you move through the buffer. And it makes it a lot faster than actually trying to re-evaluate the whole thing every time. I'm not entirely sure what Pandas does so I haven't looked in this rolling API, but it's a little bit slower.
30:21
So as you can see from the code, it's not overly involved and it's you know, it uses this unchecked proxy access to NumPy arrays, so we disable the bones checking because we know we're not gonna, you know, run over outside the bones. The rest is just like normal numeric codes where there
30:41
are some computations stored in the stats array. One thing to note is that stats is a struct here and what we return back from this function is a py array of this, of a structured type. So this is known as record array and NumPy or structured array. And
31:01
in our module we have to register it explicitly, so we say here's a stats type, it has stats NumPy dtype, and it has two columns, min and std. And they will be translated to Python with these names. And then we just bind the function. So the way this works, if we compile this and try it out,
31:21
we can pass we can pass anything convertible to an NumPy array really to this function. So this rolling stats, we so here I pass a bunch of ints and window2 and you get back a data frame that looks like this. So obviously this is running min, this is the running standard deviation. So if you were to use
31:41
the pandas rolling dot min, rolling dot std, you would get the same result. In fact, let's check it and let's just generate 25 million values and do it both ways and yeah, we can check that it's the same. And we can also check if it's fast enough. So if you run this in pandas for
32:01
25 million elements with window size of 1000, it takes 1.1 seconds for it to compute the mean and it takes another 1.18 to compute the standard deviation. In our case, it takes 0.26 seconds to compute both. So it actually does make sense. And if, you know, if it starts taking
32:22
minutes or sometimes hours to compute this kind of thing, if you have a lot of data, so it may be worthwhile to spend you know, 10 minutes and code it up yourself. And finally I'd like to say thanks to Wenzel Jacob who's the regional author of this project and Jason and Dean who are currently maintaining it and handling all the issues, adding a lot of features.
32:41
And a lot of people, including myself, for contributing all the other stuff. Also Dave Abrahams for creating Boost Python and Boost MPL. And I'd like to thank my work employer for letting me hack on this at work, at work time. And last but not least I'd like to thank you for listening.
33:01
Thanks. Okay, we have some time for questions if there are any.
33:21
Hi, thank you for the talk. Could I pickle class? Yeah, there's pickling support as well. I just didn't mention all the features cause it'll take too long. Thank you. Hi, thanks for the talk, it looks great.
33:42
I'm just wondering if you can if you're always using the heap for allocation or if you can do any fancy allocation, say placement or if you're dealing with an array are you always allocating objects in the heap? Or can you do different forms of allocation? So the heap allocation for what exactly? So say for example
34:01
when you declare a class and you have the example of multiple init types, signatures when you actually instantiate the object in Python is it always happening on the heap? Can you do anything different? Yeah, it currently happens on the heap so it uses the new operator in C++
34:21
but I think to mean like the new Python malloc PyMalloc API so we don't support that but it should be possible. Yep, thanks. The library looks really cool I'm wondering what the state of documentation
34:42
examples and what's the license of the library? The documentation is pretty good it's pretty well maintained I would say, it explains a lot more than this talk and it walks you through the examples from really simple ones to the most complicated ones.
35:03
What we're also planning to do is set up a tutorial notebooks so you can run the whole thing in Jupyter notebook as well. But the documentation is hosted on readthedocs so you can find it on our github repo. One thing to note is that there's a few things
35:22
in this talk, there's a few syntax differences in this talk from the latest stable release so it's actually, if you try to compile this with whatever's in PyPy right now, it may complain but we'll push a version fairly soon that will be compatible with this.
35:42
And what's the license of the library? The license? Is it GPL? Is it MIT? I think it's MIT. You mentioned that the problems with Boost Python are
36:01
long compile times and the generated object size so do you have any numbers on how PyBind compares to that? I'm sorry, numbers on what exactly? So the compile times? And the generated code size. Yeah, so the thing is if you have a really
36:22
small module then the extension model generated with Boost Python would be smaller because Boost Python has a compiled part, like a pre-compiled part, right? So PyBind model would actually be bigger than both Cython and Boost Python if you have like a tool liner. Once it starts going up, so as I said
36:41
we have an example of, it's a PyRosetta wrapper for a chemical framework that was initially written in Boost Python and then the developers tried to convert it to PyBind which they did successfully. I think it went down by a factor of like 5.8 and 5.7 or something like this
37:01
both compile time and the binary size. In my experience as well I don't personally use Boost Python, but I did some benchmarks just to see that it's true.
37:24
So I assume the global interpreter lock is held when C++ code is called from Python and can I drop the GIL? Yeah, again I skipped it here just for the sake of time, but you can, there's things like scoped, you can
37:41
have like a scoped guard for a GIL release for example, yeah that's all there as well. Any other questions? Ok great, well thanks again. Thanks a lot.