Peeking into Python’s C API
60 views
Formal Metadata
Title 
Peeking into Python’s C API

Title of Series  
Part Number 
155

Number of Parts 
169

Author 

License 
CC Attribution  NonCommercial  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
DOI  
Publisher 
EuroPython

Release Date 
2016

Language 
English

Content Metadata
Subject Area  
Abstract 
Sophia Davis  Peeking into Python’s C API Ever wondered how Python works under the hood? One way to learn about PythontheCprogram is by exploring the C API for writing Python bindings to native C libraries. In this talk, we will walk through a simple example of making a C library callable from Python code and vice versa. Along the way, we will encounter some essential features of Python: reference counting, memory management, and the inner workings of objects and modules.  We all love Python. It’s so elegant and easy to use as a programming language that we forget about the giant, complicated C program executing our strings of whitespace sensitive code. For many Python programmers, this side of Python is just a big black box. It works well, so thankfully we don’t *need* to go messing around inside... but what if you *want* to look into the inner workings of this powerful tool? One way to dive into the Cprogramside of Python is by exploring the C API for writing Python bindings to native C libraries. In this talk I will explore the basics of this API as I recount my journey to make a simple C library callable from Python code, and allow C code to invoke objects defined in pure Python. Along the way, we will encounter some essential features of Python: reference counting, memory management, and the innerworkings of objects and modules.

00:00
Computer programming
Programming language
Computer animation
Code
Multiplication sign
Library (computing)
Extension (kinesiology)
00:38
Link (knot theory)
Slide rule
Code
Software developer
Structural load
Projective plane
Code
Table (information)
Summation
Radical (chemistry)
Programmer (hardware)
Hash function
Website
01:23
Logical constant
Complex (psychology)
Collision
Structural load
Length
Multiplication sign
Parameter (computer programming)
Data dictionary
Table (information)
Maxima and minima
Bit rate
Linker (computing)
Hash function
Modulo (jargon)
Area
Collision
Mapping
Structural load
Interior (topology)
Electronic mailing list
Variable (mathematics)
Functional (mathematics)
Electronic signature
Maxima and minima
Category of being
Hash function
Data storage device
Order (biology)
output
Data structure
Resultant
Readonly memory
Trail
Domain name
Implementation
Presentation of a group
Electronic mailing list
Streaming media
Inequality (mathematics)
Thresholding (image processing)
Number
Power (physics)
Natural number
Average
Operator (mathematics)
String (computer science)
Integer
Data structure
Implementation
Best, worst and average case
Traffic reporting
Subtraction
Glass float
Key (cryptography)
Set (mathematics)
Table (information)
Subject indexing
Function (mathematics)
String (computer science)
Key (cryptography)
Integer
Library (computing)
06:35
Length
Multiplication sign
Functional (mathematics)
Symbol table
Revision control
Computer animation
Hash function
Internetworking
Data storage device
Function (mathematics)
String (computer science)
Hash function
Integer
Window
07:08
Programming language
Sensitivity analysis
Computer programming
Computer animation
Hash function
Code
String (computer science)
07:44
Computer programming
Computer file
Code
Structural load
Mathematical singularity
Line (geometry)
Table (information)
Category of being
Pointer (computer programming)
Computer animation
Hash function
Personal digital assistant
Object (grammar)
Data structure
Object (grammar)
Data type
Library (computing)
08:40
Multiplication sign
Counting
Functional (mathematics)
Table (information)
Table (information)
Maxima and minima
Pointer (computer programming)
Computer animation
Hash function
Hash function
Object (grammar)
Macro (computer science)
Data type
09:07
Point (geometry)
Readonly memory
Computer programming
Building
Computer file
Code
Multiplication sign
Scientific modelling
Mereology
Data dictionary
Latent heat
String (computer science)
Representation (politics)
Module (mathematics)
Extension (kinesiology)
Social class
Content (media)
Directory service
Set (mathematics)
Line (geometry)
Functional (mathematics)
Table (information)
Compiler
Computer animation
Hash function
Order (biology)
Module (mathematics)
Interpreter (computing)
Object (grammar)
Data type
Window
11:13
Computer programming
Readonly memory
Trail
Sheaf (mathematics)
Highlevel programming language
Memory management
Counting
Counting
Functional (mathematics)
Number
Programmer (hardware)
Computer animation
Estimation
Electronic meeting system
Object (grammar)
12:09
Computer animation
Scientific modelling
Electronic mailing list
Module (mathematics)
Speicherbereinigung
Object (grammar)
Parameter (computer programming)
Functional (mathematics)
12:37
Computer animation
Demo (music)
Interior (topology)
Object (grammar)
13:02
Computer animation
Scientific modelling
Module (mathematics)
Object (grammar)
Functional (mathematics)
Speicheradresse
Physical system
13:49
Point (geometry)
Computer programming
Readonly memory
Graph (mathematics)
Namespace
Multiplication sign
Projective plane
Counting
Parameter (computer programming)
Mereology
Functional (mathematics)
System call
Frame problem
Computer animation
Arithmetic mean
Right angle
Object (grammar)
Extension (kinesiology)
Macro (computer science)
16:13
Computer programming
Readonly memory
Multiplication sign
Scientific modelling
Parameter (computer programming)
Function (mathematics)
Graph (mathematics)
Rule of inference
Subset
Computer configuration
Term (mathematics)
String (computer science)
Square number
Data structure
Mapping
Key (cryptography)
Wrapper (data mining)
Surface
PoissonKlammer
Projective plane
Electronic mailing list
Counting
Staff (military)
Functional (mathematics)
Table (information)
Pointer (computer programming)
Computer animation
Hash function
Module (mathematics)
Object (grammar)
Freeware
Data type
Window
Gradient descent
19:20
Building
Graph (mathematics)
Multiplication sign
Scientific modelling
Demo (music)
Functional (mathematics)
Theory
Table (information)
Revision control
Pointer (computer programming)
Computer animation
Hash function
String (computer science)
Square number
Object (grammar)
Data structure
Data type
20:42
Video game
Computer animation
Hash function
Cellular automaton
Counting
Object (grammar)
Functional (mathematics)
Table (information)
21:11
Computer programming
Computer animation
Hash function
Demo (music)
Hash function
Counting
Object (grammar)
Functional (mathematics)
Table (information)
Table (information)
21:41
Computer programming
Game controller
Mathematics
Computer animation
Hash function
Multiplication sign
Demo (music)
Counting
Object (grammar)
Functional (mathematics)
Table (information)
22:27
Readonly memory
Computer programming
Multiplication sign
Counting
Bulletin board system
Functional (mathematics)
Table (information)
Video game
Computer animation
Hash function
Readonly memory
Encryption
Object (grammar)
Resultant
24:26
Computer programming
Computer animation
Hash function
Scientific modelling
Multiplication sign
Line (geometry)
Set (mathematics)
Error message
24:50
Point (geometry)
Computer programming
Code
Multiplication sign
Scientific modelling
Letterpress printing
Power (physics)
String (computer science)
Hash function
Square number
Energy level
Data structure
Spacetime
Key (cryptography)
Wrapper (data mining)
Content (media)
Electronic mailing list
Sound effect
Line (geometry)
Functional (mathematics)
Demoscene
Table (information)
Mengenfunktion
Computer animation
Hash function
Statement (computer science)
Module (mathematics)
Object (grammar)
Resultant
Library (computing)
27:31
Point (geometry)
Area
Graph (mathematics)
PoissonKlammer
Multiplication sign
Shared memory
Functional (mathematics)
Rule of inference
Pointer (computer programming)
Computer animation
Computer configuration
Data structure
Object (grammar)
Quicksort
Data type
Form (programming)
00:04
and welcome to so that this is going to book the the key about the controlled by Thomas C. API thank so we were all here because we like Python programming language and talk a little about Python and the C program underlying the programming language by walking through how I look at the basics of making the like a C library callable from code and vice versa so region of the 1st time it's to the technical but
00:42
Sofia I'm an American software developer and and right now in Amsterdam and in the summer of 2014 added recursively which is kind of like a writer's workshop for programmers in New York City and in this talk comes at a very down the radical kind of project that I worked on while I was there and load and soon went to the site of the only so
01:06
I let's get started this is the story of how I shaved a yet because probably if you find yourself breaking out of my fancy API that you started out with a different problem and 1 that you find can solve using tools provided by an existing C code based on so for me this could was a hash
01:23
table implementation so this is probably review from you but does in brief hash table is a very powerful data structure for storing key value pairs and for associating genes with values such that every key maps to 1 value for present people we can call them dictionaries and this and powerful because very efficient no matter how many keyvalue pairs you have in a hash table the average time complexity of adding a key value pair looking up the values associated with the key or removing the value there is constant so that no 1 how does it achieved this amazing performance why under the hood hash table is just an array and were inequalities index in this array of buckets every value pair gets put into 1 of these buckets and how do we know which areas in which bucket wheel that's where the hash of hash table in hash function is just the mapping of any arbitrary input to a fixed set of values like the set of all positive integers when we wanna put at the end of value in the hash table we pass that he a hash function to convert to an integer and then use that number modulo of the size of the array to determine which bucket the key value pair should go on and works similarly for local news we calculate the hash keys got the bucket associated with that hash and neither will cover the value or remove the key value pair and so here's a picture things Wikipedia of phone stored as a hash table and so we calculate the hash value of each person's name entities that number to determine the bucket and the rate at which the phone number entries but what happens if the hash values of 2 keys result in them being put in the same bucket well that's for the collision on and their way to deal with it but 1 way is just to store links list that everybody in the array and so every item that gets assigned to that and it just gets tacked onto the linked list again at Wikipedia example here we using half to their results in a Johnson and sanity and both being assigned to the same index 152 so we just started a list containing both entries but if lots of items end up in the same bucket then our hashtable starts to look like just a bunch of linked lists and the performance of linked list is not nearly as good as those of a hash table especially when looking up removing items a look up or removal of a linked list and in the average case involves traversing the list which is as big of an operation and as we add more and more items to hash table it's inevitable that more and more entries up in the same in the same bucket consider has to with an underlying array of length 1 no matter what hash function you used all items are going to be stored in that 1 and only bucket and that's rapidly going to be a long linked list so In order to keep our average performance constant and will occasionally in increase the size of the the underlying array and redistribute and then provided that were using a decent hash function of the number of collisions should decrease because we're spreading out the same number of these as before among buckets and how do we know when to resize well if we keep track of the number of items currently in the hash table compared to the length of the underlying domain then we should resize 1 report a stream of items to length reaches a certain threshold will call this the maximal proportion and so we talked about 3 variable properties of hash tables we got the size of the underlying array the hash function and the maximum load proportion and all 3 can affect the performance of the year for example initial size helps determine how often you have to resize and that's going to be a costly operation the hash function impacts how many collisions you might have more and more complicated had functions will take longer to evaluate the maximal proportion will play a role in how long those linked lists might get for you resize so in order to explore how these affect performance by Riemann hash table implementation and it enables the user to this is the maximum load proportion and the initial size of the underlying array my library provides functions to initial a table with the given properties to add look up and removed keyvalue pairs and of integer float and string types and finally to free the memory mallet to store the data structures of the array going lists all your
06:01
strings that or whatever and and I also want to explore how different hash functions would affect performance so this is the signature of that function in my C implementation accepts a half argument on because my idea was that the user should do their own hashing of he and then they just pass that has value and when adding looking up or moving in and end of my library would find the opera everybody for that key value pair based on the past and if the user chose not to pass in of the key my my library just use this
06:37
function half functions also but if it's an integer you use the Internet if it's float around it down and use that integer of 2 strings just use the length of the string and that was inspired by the hash function that is no joke an early version of PHP used to store function names in the symbol table at times and so it is basically an awful hash
06:59
function and I that's as about to do some hardcore bitshifting and stringmanipulation in C. to experiment with writing my own hash functions just if I would
07:10
like to experiment but rather do
07:12
in place wouldn't it be nice if I could pull all hash functions in Python on then just call them from my
07:19
C hashtable after all find is so nice and right and I'm a lot faster writing Python and I writing C but you know under the
07:29
hood Python is actually just a really big complicated C program that processes those strings of whitespace sensitive code that we write and thankfully there's a welldocumented API for bridging the gap between the Python programming language and
07:45
on the sea program and it is easy to use this API as including a simple line in your
07:50
C file and then the magic began so my goal
07:57
is to call a hash function that's written in pure python are from inside my see hash table library little disclaimers CAP I did change substantially diminish between 1 and 2 and 3 and all the code in my talk is found to so I started by wrapping everything I needed to use a hash table inside my hashtable library inside structure so I'm at that pointer to the actual data structure and some of those other properties associated with hash tables like the current load the initial size as this prior object pointed to hash function and so this 1 had the data that I wanted to my pipeline types but I needed to implement that API telling case on how to
08:40
manage objects of this type and the fact that this high object had thing which is a macro imported with Python and it expands to the bare minimum that you need to create a Python object which is a reference count and I chose to ignore that at time and a pointer
08:58
to and this high type object which is just the start function pointers defining how Python should manage objects of the hash table takes
09:08
so that's things like the class name and had a prince and make a string representation of the objects that how to initialize the lead and free the memory allocated to hold objects welcome back to these little later and there are more
09:22
than laughter so this point I had my basic types defined but needed some way to use this type from within Python code and so I created a module to contain the hash table type In order to initialize module you need to a highminded in it find a function that has the name and your model so minds in a hash table and when a Python program imports module for the 1st time this is the function that's right again have left some stuff out but of no part of this line which initializes the type and it fills in world I type object but I think some compilers specific functions and here we initialize the model and this line as my new types the module dictionary so we can actually instantiate new objects the class and packaging there couple different ways to package Python modules but 1 simple way is just to write a set of top 5 file I'm telling type on the name of the module and what the files you Madam needs so this is the entire contents of my set of 5 when you run python so that title the creature of build directory and puts and the compiled file containing your extension that can be dynamically loaded into a Python program so on Unisys's shared object file I work on a Mac so my modules and hash table that so on Windows this would be a deal with the dot tidied extension so if you start up a Python interpreter or other a Python program in the same directory as the data so file that you can type import hash table and you hashtable stuff from Python so those except MIT program kept saying
11:14
faulting and I was were to look at a section of the API that's that I kind of been ignoring which is the section on reference
11:20
counting only 1 reason why Python is so nice is that it has it's a pretty high level language and it handles a lot of things for the programmer for example memory management and when you use that in a Python program python takes care of dealing with the OS to ensure that that that is stored in memory of however if I on only added to your programs memory eventually the program would run out of memory so it needs to know when it can remove data and once that that is being used in some fantasy program uses a method called reference counting that's know when it can safely free objects so that means it keeps track of the number of other things referring to a given object and when that reference count drops to 0 I think cleans up the unneeded object by calling the deallocation function that's defined for for its type and
12:12
so here to tools that can help us understand reference Council of it from the SIS model we have to get left out function and from garbage collection module there's get refers which returns a list of all things the currently on references to an you have written
12:32
function partial recounts and all it does is finds the objects from references to the argument
12:39
and an object it prints out how
12:42
many there are optionally can call itself and it can actually print out extra details about exactly which objects and references to
12:52
an object and so look and this work and so in a Python shall we start by
13:02
importing the module the tools that we need this system GC models and also that function registered you which I say about and so will start by instantiating an new object and we will see what the reference counting on the object to find and if we assign another willing to that object then will of the artist and again it's
13:27
really cool so what exactly is referring to this object believes that refers function constant got subject here with the 2 variable names we just made and they're both referring to object object at memory location think and so what exactly is that 1 of the
13:50
local namespace so there is again court so what happens to the reference count on object if we pass it as an argument to a function all use that function that showed you so 1st of all and we're gonna pass in the project and will have not we'll just call once and we will show the details about the refers this time so the so that local namespace but there is something new this frame object in there have now also owns a reference to from our object individually and if you fall again this time will have a call self recursively 100 times and will turn off the overwhelming dividing so we see that each time we call the function of the reference can increases by 1 and if we were to look at the details of the refers we see another frame object being added to the and the 1st with with each the
14:57
question so if you're going to write a c extension and work with Python objects then 1st you need to signals of Python when your program starts working with a certain object and that to I and to increase the reference count on that object by 1 their right to drink keeping it memory while you represented by that 1 otherwise if the reference count of 0 then Python Wilfried object and when your program tries to access the object which is now in a piece of memory that has been released that's to no SAS and your program will crash will like hopefully or just like Richard will happen you also need to take care of telling Python and when you're done working with a certain object by documenting its reference count by 1 and if you don't do your part in determining that reference count its reference count can never decrease to 0 it will never be gleaned from memory so that's a memory leak the Python CGI provides 2 macros to communicate when you're starting to work with the object and when you're finished working with an object point height in graph on an object increases the reference count by 1 and calling accurate places a reference count by 1 also if the reference can reached age is culture the deallocation
16:13
functions for the type and so what happens when you forget to find relevant objects you need to work on remember that I typed object structure function pointers that defines the Python API for my type well this is what I had defined as the deallocation functions for Python to call in the reference count of the hash table object reaches 0 it does 2 important things the first one is the printer debugging so we can see what is being called and
16:43
and it also calls the free function that I defined from an object the at the height project staff and so that every function also has a nice printer and then calls the free table function provided by my so the programs to free the memory Mallet for that initially the set method on my hash table 1 has to objects returned to the hash table object itself you hear that term self now there are a lot of rules and accept options about in which situations the caller received the Holy is responsible for paying graphing arguments and I I barely scratched the surface however I think across a problem here because if a C function returns a reference to an object like self and then that reference must be owned by the function the idea function must have been playing the object must have been playing graphs inside the function but I had left that out and so let's see how this affects my program from 1st is that so the price to build a model OK can the output great about on window opens here to that of the village subdirectory so there's a lot of time and if we start with Python wrapper then we can import hash table my module and we can instantiate new hash table and we'll start selling some values not very creative rounded on so remember that the descent method right now returns the hash geologic itself so this is the string repres entation the river reparanda being returned to the Python wrapper on each bracket square represents 1 of our buckets and each dot represents an item in the linked list at that so that's that's more values 2 maps for alright so so there's the the eprint that every inside my cleanup functions and we help found to increase the reference count on that object on that has the logic of but all the other refers must have released their references to that object and its reference count has dropped to 0 and cleanup functions have been called so if we try to do anything else with a hash table object like certain other key value pair we can then have so but at that time
19:25
and and look the time of the year
19:30
adjustment build and so for the new version of the model and instantiating object the and start going some values so I we only started and time to start slipping going this at so far and keep going with this squares theory science that's great so that 3 for items and will try and strings the stranger you will How can we assessed again so looking at the so called so think that's all the problems but the other type of mistake you can make is forgetting to call tidak graph when you're done with an object and surrendered to that issue so remember that my Python hash table type structure it contains a pointer to another Python objects namely the hash function used to have so here's a snippet from the initialization function for a couple of hash tables we use some of the stuff but I set the objects hash function at either to and the hash function passed in by the user when the object was
20:43
and still initialized 4 2 Python's builtin hash
20:47
functions as a and I also increase the reference count on this hash function object because in each cell quite unlike any and every working with this function of for a while please don't clean it up so we say that each has table object owns a reference to the hash function object and conversely in the deallocation function for my life I tell my friend today that meant reference count on that hash
21:12
function of the all so here's a simple demo look at the reference count on Python's builtin hash function and so I had this function here hashtable stuff and all it does is initialized a hash table using the builtin hash function as the hash function so far has to object will own a reference to the builtin hash function object can it print of the reference count on
21:38
the function and then this program will just called to
21:42
table suffer properly so let's look at how that reference count controls and hash function changes and when
21:50
the bills that and then run the program it the so initially we start out with just 3 reference can be references to the builtin hash function objects each time we entered due hash stuff and instantiate new hash table that owns a reference to builtin hash function the reference count on that function object increases by 1 for and each time they do actually stop completes the hash table that was initialized inside of it goes out of scope so the reference count on the hash table drops to 0 which triggers the geolocation function for hash tables this function which triggers
22:32
applied decrypt on the bulletin hash function objects so after calling do have philosophical times we still just have a reference count of 3 on the builtin hash function object but let's just say that we have
22:49
forgotten to decrease that reference count and will run that again I just results and so this is without initially the reference can still 3 of their and he we introduce national self we instantiate new hash table which means reference to those and hash function object the reference count that function object increases by 1 2 4 to 5 2 6 because each and you hashtable of complete the hash detachable goes out of scope reference cannot table drops to 0 and it's the only reason functions it's called but no we did we release that reference that we all those and hash function so african do hash tables the couple times the reference count on the builtin hash function object has increased from 3 to 6 even the object to that of the objects that owns those last 3 references have themselves been 3 so this is a memory the other 3 x 2 references were owned by objects that Python has now cleaned up they no longer exist we've lost our opportunity to signals of life under that these references are needed anymore the reference count can never dropped to 0 so Python will never remove that function object from memory now we're talking about the builtin hash function here so it's not like we really want removed from memory but imagine a more memoryintensive object and a longer running program that created tons of these objects that could never be cleaned up eventually this type of
24:26
error will cause problems and so I went back and added that
24:31
I get this line and the retirement program and after all that I finally had a model that works really well enough so I wrote my very own Python hash function and if the item to hashes integer or float and we'll do this 1 thing I time and set of 1 and 2 otherwise were clearly hashing during so I take this 2nd thing and that
24:52
flow and and so also included a print statement so we can see what this function is called a prefect but over time and because of the code and I also went back and and it more print statements to my C wrapper module and the original C library and there's also prefixed by their the origin
25:11
and so now let's look at how the program so I started my level here with that also hash function loaded and we can import module the end of so we can see some print statements there there were inside in a hash table function and so instantiated new hashtable objects and will as a hash function will use that Boston played and hash function that and so the the wrapper model we see that it's the the underlying C library that's actually doing heavy hidden like mounting space for my data structure and getting everything set up the look at the object this and to create so that some value results are always squares here so we see that the the sea module and the set function that is is executing the Python code there in that rope in and it's content a simple and then it's also for the you know dealing with the underlying C library which actually does the hard work of like adding the demands of the the right linked lists and things so that's what values the whole story all strings always break everything's look so the underlying C library knew that it needed to resize the effect of time and it's bigger great also bonuses works the half the we can look up the values so we can continue the value of the key value pair associated with power now if we try to look up the values and the library that underlying library looks for those of you that I can find and the module takes care of returning onto the apple and when I finally quit the apple then again we see that the deallocation functions called from the scene model but it's on line library that's actually doing the heavy hitting like walking through my data structures and so I got the point that whole calling the functions from there in my C module and that code executing the hash function that we have found in the rubble and it's just all together and that was pretty cool very
27:35
options thank thank other questions and will 1st thanks for sharing and great June clarify is wouldn't think so at some point you had some blind in the in alluded to it I think I sold by the score X. there is that applied 1 of those different on to there's there's 2 different types of the pi x 2 x 2 x that graph 1 of growing area the pointers and whereas the other 1 if you if you can't you can't indeed yeah if you actually have a pointer there then it 1 and it would be it would be hard to make clear syntax like in the native but in dictionary you can initiate members with the brackets and so on I'm not an option but I know that like all the double under methods sort like related to see functions so probably that wouldn't work you just added to that hightech object structure function of function pointers and the thanks to talk will rules for you and this is is worse for legal but did you chose to experiment with new ways to do this and I'm willing to maximize the actual I had no I just fail that this form talking about the extremists analyzes of this like I respectively which to do and that this is 1 of the reasons for that but this is a very good example started to actually like a time for 1 more would what it was a victim of