We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

CFFI: calling C from Python

00:00

Formal Metadata

Title
CFFI: calling C from Python
Title of Series
Part Number
24
Number of Parts
169
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Armin Rigo - CFFI: calling C from Python In this talk, we will see an intro to CFFI, an alternative to using the standard C API to extend Python. CFFI works on CPython and on PyPy. It is a possible solution to a problem that hits notably PyPy --- the CPython C API. The CPython C API was great and contributed to the present-day success of Python, together with tools built on top of it like Cython and SWIG. I will argue that it may be time to look beyond it, and present CFFI as such an example. ----- I will introduce CFFI, a way to call C libraries from Python. CFFI was designed in 2012 to get away from Python's C extension modules, which require hand-written CPython-specific C code. CFFI is arguably simpler to use: you call C from Python directly, instead of going through an intermediate layer. It is not tied to CPython's internals, and works natively on two different Python implementations: CPython and PyPy. It could be ported to more implementations. It is also a big success, according to the download statistics. Some high-visibility projects like Cryptography have switched to it. Part of the motivation for developing CFFI is that it is a minimal layer that allows direct access to C from Python, with no fixed intermediate C API. It shares ideas from Cython, ctypes, and LuaJIT's ffi, but the non-dependence on any fixed C API is a central point. It is a possible solution to a problem that hits notably PyPy --- the CPython C API. The CPython C API was great and, we can argue, it contributed a lot to the present-day success of Python, together with tools built on top of it like Cython and SWIG. However, it may be time to look beyond it. This talk will thus present CFFI as such an example. This independence is what lets CFFI work equally well on CPython and on PyPy (and be very fast on the latter thanks to the JIT compiler).
11
52
79
Total S.A.CryptographySystem callInterface (computing)Function (mathematics)Demo (music)Metropolitan area networkSynchronizationInclusion mapStrutData structurePasswordInterior (topology)Library (computing)Letterpress printingWindowOvalString (computer science)Mountain passSystem on a chipProjective planeFunctional (mathematics)NumberType theoryComputing platformInterior (topology)Module (mathematics)Object (grammar)Interface (computing)PasswordData structurePoisson-KlammerPoint (geometry)Field (computer science)StatisticsMultiplication signStandard deviationString (computer science)BitSlide ruleLetterpress printingExtension (kinesiology)Computer fileException handlingCodeSource codeControl flowCryptographyMetropolitan area networkWindowSystem callWeb pageDeclarative programmingPresentation of a groupLibrary (computing)MereologyScripting languageBuildingDemo (music)Line (geometry)Greatest elementNetwork topologyEndliche ModelltheorieComputer programmingTraffic reportingCross-correlationMassParameter (computer programming)Pattern languageExecution unit10 (number)Physical systemSurvival analysisPopulation densityRight angleoutputShared memoryMultiplicationWordPrice indexEquivalence relationCondition numberProduct (business)Latin squareJSONXML
String (computer science)Library (computing)Mountain passSystem on a chipOvalHexagonObject (grammar)Function (mathematics)Social classPersonal digital assistantComputer programSystem callCodeAsynchronous Transfer ModeCompilerAttribute grammarWrapper (data mining)Functional (mathematics)Software testingSocial classBeat (acoustics)Parameter (computer programming)Semiconductor memoryArithmetic meanOvalBroadcasting (networking)Order (biology)CodeObject (grammar)Game theorySubject indexingBitWhiteboardData structureResultantSystem callPoint (geometry)Event horizonComputer programmingPopulation densityInterface (computing)CASE <Informatik>Reading (process)Physical systemState diagramStreaming mediaPattern languageNetwork topologyPointer (computer programming)Poisson-KlammerIntegerField extensionType theoryBounded variationWindowNumberDialectDeclarative programmingCycle (graph theory)Binary codeBeta functionVolume (thermodynamics)Formal languageAsynchronous Transfer ModeInterior (topology)Direction (geometry)Address spaceModule (mathematics)Einbettung <Mathematik>PasswordSearch engine (computing)String (computer science)Casting (performing arts)Rule of inferenceCompilerField (computer science)Library (computing)Extension (kinesiology)Computer animation
Interpreter (computing)Standard deviationData typeCompilerSpeicherbereinigungObject (grammar)Field extensionExtension (kinesiology)Modul <Datentyp>CodeModule (mathematics)Acoustic shadowWeb serviceAreaElectric generatorComputer clusterMereologyRevision controlCore dumpSystem callImplementationFunction (mathematics)Pointer (computer programming)LogicDemo (music)Interior (topology)StrutField extensionExpressionEqualiser (mathematics)Type theoryInterior (topology)WritingFlow separationDifferent (Kate Ryan album)Multiplication signNormal (geometry)Electric generatorInterpreter (computing)Extension (kinesiology)CodeNamespaceProcess (computing)BuildingCompilerSpeicherbereinigungEmailDeclarative programmingModule (mathematics)Term (mathematics)Machine codeComputer programmingHash functionHacker (term)Figurate numberBinary codeMereologyEmbedded systemBitImplementationRoundness (object)Software frameworkInternet der DingeStandard deviationOnline helpDemo (music)Projective planeJust-in-Time-CompilerEstimatorFunctional (mathematics)Object (grammar)Web serviceSign (mathematics)Scripting languageOperator (mathematics)Domain nameHecke operatorAreaMacro (computer science)Pairwise comparisonRule of inferenceEllipseRevision controlPhase transitionTrailSeries (mathematics)Moment (mathematics)Video gameRight anglePresentation of a groupProduct (business)Machine visionIdentity managementEvent horizonLine (geometry)Modal logicDigital photographyInterface (computing)InformationObservational studyPattern languageGraph (mathematics)WhiteboardArithmetic progressionCondition numberLattice (order)Error messageDependent and independent variablesVelocityExecution unitGravitationMedical imagingGame theorySource codeIntegerCASE <Informatik>Bipartite graphSpherical capGoodness of fitLecture/Conference
Transcript: English(auto-generated)
before lunch break the next session is about CFFI and we don't need presentations for the speaker. Armin has been a long time well-known member of the final community working on PyPy and CFFI and other stuff. Welcome, Armin, and thank you.
Okay, so today I'm going to present mostly CFFI and I'm going to talk a little bit about PyPy as well, because, well, we need to have one PyPy talk at every EuroPython and we haven't any this year, so well.
Okay, so first CFFI. What is CFFI? Well, first CFFI is a project that we created about 2012 and it is actually a very successful project according to download statistics of PyPy.
Well, you can see numbers like it's 3.4 million downloads every month nowadays and it's actually, it has beaten Django. Cool. And I mean the main reason why it is so much successful is that there are a few very successful projects like Cryptography that have switched to it.
So it means that every time you install Cryptography you also actually install CFFI as a dependency. Well, PyPy is probably a successful project, it's harder to say for sure. And I will talk more later.
So let's start with CFFI. CFFI is how they call C code from Python, right? Because obviously you have C code, like everybody has C code.
Most libraries out there are actually C stuff and if you want to call one of them then you need something. So CFFI is just one more thing, one more solution to call C code from Python. And the name CFFI comes, well it's boring, just means C for a function interface.
It shares ideas from a lot of projects actually. The original motivation comes from Louagit, Louagit's own FFI module is similar.
But then we took a lot of ideas from other projects like Siphon, C types, SWIG and so on. So here is a demo. Let's say you want to call this essential function from any POSIX system, getPW9.
What do you do? Well you first do man getPW name, you see a man page. The man page contains this. Like it tells you okay, you need to include this and that. And then you get this function, getPW name that takes a char, star, argument, return your struct password, star.
And then a bit later in the man page, the struct password should have roughly these fields like PWM, PW password, PWUID. UID is a type UIDT. It is all fine if you are programming in C and this is all a mess if you are programming in Python.
So what do you do with CFFI? You write this code in Python script. You import CFFI, you make a CFFI builder, you do CFFI build.cdef, triple quote, triple quote.
And here we have a big string and this big string you copy and paste parts of the man page. Like I'm going to say typedef int UIDT except I'm not exactly sure it's an int, right? It could be long, short, whatever.
Because it's C. So I'm going to say int dot dot dot which means in CFFI it means some kind of int but I'm not sure exactly which kind of int. And then you do the same with the struct password. You say the structure has field PWUIDT.
I know that's a UIDT but then the struct password contains tons of more stuff and I don't know what they are. They will depend on the platform that you're running on and so on and so forth.
So you just say colon colon dot dot dot colon for what? It just means and other fields here. Right? Here the dot dot dot are really meant as dot dot dot in the source code. It's not meant as this demo glosses over details, right?
And then you copy paste the line for get PW name. That's easy. Okay. And then the main page also had something about include. So we paste them here. Some other declaration.
And in this FFI builder dot set source we also say give a name PWUIDCFFI. That's the name of something that we want to create. Okay. So you put these two slides into one file, one Python file. You run it.
You run it and up you get PWUIDCFFI dot SO. And now this PWUIDCFFI dot SO is a standard C Python executable extension module. So once you got it in your main program, you just import it.
You import lib from this module. And then lib is something that has an attribute, well a function, built-in function called getPwname. And you call it. And when you call it you're going to get a struct password.
So you can read the field. P-W-U-I-G. Print it. And this works. So in this simple way we have made an interface to call this C function from Python.
And that's it. Okay. So what I'm going to talk about now is yes it's not completely as simple as that in all cases. So I'm going to have some more examples about more complications.
The first one is that actually in this built-in module you get two objects. There is lib, there is also an object called FFI. And this FFI contains general helpers that you may need to call at some point. So the general helpers, this is also another thing that you can do in the CDF.
You can declare your functions. You can also have types that are completely opaque. Like dot dot dot. This is for example what you get if you have a C library that has an interface. Like make a window and it returns you a window star.
But you don't need to know or care what is the type window. And then hide window, destroy window, all these are C functions. About the FFI object, now in the FFI object you get a few helpers.
For example if you really want to make a C structure. Like here I want to make a structure that is of type char underscore. So char brackets, that means if you know C you know exactly what char bracket is, right?
So this is generally the approach of C FFI. You need to know a little bit of C but then if you do know a little bit of C then it's easy because it's the same. So you can, with FFI.new you are creating an object of type char bracket and you are initializing it from a string.
So you get in P some C data of type char bracket and it owns 12 bytes. If you count actually it's the number of characters plus one because there is a terminating null character as traditionally in C.
And you can index it, you can read or write to individual items. You get also another kind of C data for example by the code that we did before lib.getPwnum.
Well first we did it before in the example by giving directly a string. But you can also give it a char bracket which means an array of characters like P in this example. And well in any case you get as a result Q which is actually another C data of type struct password star.
And it leaves that address in memory and then you can index it, sorry you can get its attributes. So there's attributes that are just field names of this.
So from this, from such a Q you can also cast it to void star or to anything else with the C rules of cast you can cast it to another pointer type. You can cast it to an integer type like if you want to cast, if you
have the pointer and you want to really get the number that represents this pointer address. Then you cast it to an integer type and well I mean I could have written ffi.cast long or int but instead I'm using the type int ptr t which is an official C type.
That means an integer that is large enough to contain a pointer. But it's just the same and I'm getting a number that is an integer valued pointer. So this is the kind of thing that are in the ffi object.
You also have ffi.string. So this is another example where in my structure I have pwid okay that's 500 but I also have pwname and reading it returns a char star. And then from this char star you can convert it back to a python string if you want.
So this is what ffi.string is for and so on. One example if you're doing something a little bit more complex and you really want, you have a python object like this x, this x in this example.
I want to have this python object cast to a void star that the C code will just carry around and then at some point later the C code will give us back the void star. And from that void star we want to go back to the python object.
I mean this is standard for example in all callback systems. Like if you register callback for a C library typically you give it the function to callback and you also give it some kind of void star argument.
That the C library will just store and it will give it back to your own callback. So in order to do that you would use ffi.new handle, cast any python object to a void star. Then you save away, fish it again, you get a void star that happens to contain the same value as a void star.
Then from this value you can go back to the original x object using ffi.from handle. So this is just one example of more advanced things.
CFFI as a whole supports more or less the full C language which is actually not so huge. I mean it supports the full C language, I mean of course not the full declarations of C. Like what types you can declare, how you can call functions, various calling conventions and so on and so forth.
This is supported by CFFI.
Okay so it's more than this short introduction suggests of course. Like if you really want, if you have some larger C library that you want to interface with.
A typical example is a search engine library. Well you don't want to expose directly this library but instead you want to expose some kind of python, some kind of pythonic wrapping of the library. So what you do is you write your python wrapper that itself uses CFFI but you use it internally.
Like you write your classes and nice functions in python and inside internally you would use these C data objects but you would not actually expose them to the rest of the users of this wrapper that you're writing. So this is difficult.
So basically instead of writing for example C Python C extension module where you would write in C a bit everything like you write your C Python native types and so on and so forth. And then you get only the C extension C that people import and use directly.
Well here with CFFI the idea is more that what people import and use directly will be the python wrapper that itself uses CFFI.
Yes well there are actually a few other use cases that I did not really speak about. Now you can use CFFI in a mode that is called ABI as opposed to API in which is a mode where you don't have any C compiler involved at all.
And then well you get more like C types as in you have to declare exactly your structures and your functions and you're not allowed to make a mistake. And you're not allowed to use the dot dot dot dot syntax I showed.
Well there is also support for embedding instead of extending which is the case where you have your big program that is written not in Python at all but it just wants to import and use Python for embedding.
So for this case there is a mode of CFFI in which you can write. So you write Python code, you declare with cdef, the thing you declare with cdef becomes the interface that is callable from the C code.
And then the rest of the program calls this interface and calls into your Python code directly. Okay, well, C is a dot basically. Yes.
So let's talk about PyPy for about three minutes. PyPy is a Python interpreter. It's different from the standard which is C Python. The main goal of PyPy is speed. Well run PyPy, you get an interpreter. Looks very much like C Python, the essential difference there are four instead of three greater than signs in the prompt.
But or was it just the same basically, you replace Python, myprogram.py with PyPy, myprogram.py. Contains the JIT compiler, it's fast, it's cool, et cetera. Please use it.
The main difference is that, for example, it implements a very different kind of garbage collection than C Python. It's a moving generational incremental garbage collector. Okay, if you don't know what these technical terms mean, it's fine.
Well, what I mean mostly is that because it is moving garbage collectors and we have trouble implementing the C Python C API interface. So it's hard for PyPy to import a C Python C extension.
It's possible because of, well, because we did tons of hacks basically, and it's possible and it's slow, et cetera. So it kind of works, I would say it works better and better as in we can mostly do it for NumPy for example nowadays, mostly.
Soon announcement, et cetera, but yes. Well, PyPy is great right now if you use Python and don't rely on a lot of C extension module, for example everything. A lot of examples of web services are like this, like you import Django stuff, whatever huge libraries.
That's typically written all in Python, so it works very nicely on PyPy. Okay, yep, well C API is large and it's a mess to implement in PyPy.
Well, I would argue actually that this C API of C Python was actually part of the success of Python. The historical success of Python, why Python worked or started to be really useful like 10 years ago or 15 years ago.
It is also because it has this C API and people actually use it to actually build interesting things on top of it. Then you have all these binding generators that have been built on top of it.
You can write C extensions manually, but you can also use, these are the tools that would generate C extensions for you. And CFFI is just one more such tool. Well, the CFFI is a bit different, I would say, because the goal is really to not expose any part of the C Python C API.
As in, yes, you can write C code with CFFI, but the C code that you write should not use any py object star or any py int from long or any of these functions from C.
It means that it is possible to port this whole CFFI module to other interpreters than C Python, and that's what we did.
So that's one of the motivations for CFFI in the first place, is that it is possible to write a PyPy version of CFFI. Indeed, we did. The example, the demo I showed in the start of this talk, it works just exactly the same on top of C Python or on top of PyPy.
It is actually faster on top of PyPy because PyPy's JIT compiler knows a little bit about CFFI and is able to compile, to read, produce machine code that will directly call the C function, for example.
So it's extremely fast, basically, on top of PyPy. For example, it does not mean that it's extremely slow on top of C Python. On top of C Python, the performance is acceptable as well.
So yes, works on C Python, on PyPy. It would be easy to port to other Python implementations. It has not been done so far, as far as I can tell, like Jython or R on Python. So yes, the main benefit is that it is independent. It no longer depends on the C Python C API.
Use CFFI, it's easy and cool, and it is supported by non-C Python implementations.
That's the conclusion of my talk. Thank you, Armin. Are there any questions?
We have been working with CFFI like a few years ago. I also did mainly an audio, which is a C-based IoT framework.
I encountered that it was really hard to create, when you have complex projects, to create all the headers. So I precompiled all the headers to feed them to CFFI to create the library.
It wasn't documented at the time, I don't know if it's now, but you can actually use a compiler to precompile your header so that it includes everything. This has improved, yes. Now it's a cleanly separated two-step process.
You rewrite a separate Python script that declares what you want, then you run it once and you get your extension module. And then you use it from your main program. So it's better than it used to be, yes.
Thanks for the talk, that looks really cool. I have a question about PyPy, actually, that I'll ask you because there is no separate PyPy talk, it seems. What is the status of Python 3 work there? It would be nice to get to 3.5, is it anywhere near?
I suppose if I were to give an estimate of time, I cannot, obviously, but imagine that I could give an estimate of time,
I would say that next year should be nicely progressed towards PyPy 3.5, yes. And what kind of help do you need? Money, people? Yes, well, we need people on time. Get money, or else forget money.
Hi, thank you. Another question about PyPy. There is some kind of tool to embed PyPy, like PyInstaller or PyTwix or something like this,
to embed and redistribute binaries of these. I don't know the answer. Any more questions? I'm the wrong person to ask, I suppose, but yes, I'm sorry.
Armin, thanks for the talk, thanks for PyPy, thanks for CFFI, it's amazing, I use it quite often. I was wondering, when you have the declarations with the ellipses, like dot dot dot, I don't care, you figure it out. Can you, in very simple terms, explain how it goes out and finds out? Because it always works, so it's very good.
Like this, for example, uaddt is some kind of integer, but we don't know which at all. So the magic is to write one piece of C code that will work just by compiling it with a normal C compiler.
So every single one of these dot dot dot is a different kind of magic.
Like, for example, the type def int uaddt, it probably contains... How does it work again?
It must be something like, you write one big C expression that says size of uaddt equal equal one, question mark,
then I'm going to use this else, size of equal two, then I'm going to use that, etc. And then you do an extra round of magic to know if it's signed or unsigned. For signed versus unsigned, it's something like, you take minus one, you cast it to uaddt, and you ask is it positive now?
So we're at the end of the normal session time from 30 minutes, but food won't be there until quarter two, so people want to ask more questions and sit around, but just want to let you know that if you have to do something at half past, you have to do it now.
Thanks for your talk. I have a question about defines. We have a project with a lot of defines that are constructed dynamically during compilation from a lot of nested macros, and it's possible to use them by name in Python code, because actually I don't know if they're...
So you mean define like constants? Yeah, for example, we have a driver that uses some operations, and these commands are constructed from Linux macroses,
and actually I don't know what they're doing, some shifts, some source, some ores, can I use them by name? Yes, I mean, use dot dot dot. Basically you say here in cdef, hash define name space dot dot dot.
And that means it's some integer, I don't know which one, figure it out. Any more questions? If not, thank you Armin, and see you next year.