Merken

Peeking into Python’s C API

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
and welcome to so that this is going to book the the key about the controlled by Thomas C. API thank so we were all here because we like Python programming language and talk a little about Python and the C program underlying the programming language by walking through how I look at the basics of making the like a C library callable from code and vice versa so region of the 1st time it's to the technical but
Sofia I'm an American software developer and and right now in Amsterdam and in the summer of 2014 added recursively which is kind of like a writer's workshop for programmers in New York City and in this talk comes at a very down the radical kind of project that I worked on while I was there and load and soon went to the site of the only so
I let's get started this is the story of how I shaved a yet because probably if you find yourself breaking out of my fancy API that you started out with a different problem and 1 that you find can solve using tools provided by an existing C code based on so for me this could was a hash
table implementation so this is probably review from you but does in brief hash table is a very powerful data structure for storing key value pairs and for associating genes with values such that every key maps to 1 value for present people we can call them dictionaries and this and powerful because very efficient no matter how many key-value pairs you have in a hash table the average time complexity of adding a key value pair looking up the values associated with the key or removing the value there is constant so that no 1 how does it achieved this amazing performance why under the hood hash table is just an array and were inequalities index in this array of buckets every value pair gets put into 1 of these buckets and how do we know which areas in which bucket wheel that's where the hash of hash table in hash function is just the mapping of any arbitrary input to a fixed set of values like the set of all positive integers when we wanna put at the end of value in the hash table we pass that he a hash function to convert to an integer and then use that number modulo of the size of the array to determine which bucket the key value pair should go on and works similarly for local news we calculate the hash keys got the bucket associated with that hash and neither will cover the value or remove the key value pair and so here's a picture things Wikipedia of phone stored as a hash table and so we calculate the hash value of each person's name entities that number to determine the bucket and the rate at which the phone number entries but what happens if the hash values of 2 keys result in them being put in the same bucket well that's for the collision on and their way to deal with it but 1 way is just to store links list that everybody in the array and so every item that gets assigned to that and it just gets tacked onto the linked list again at Wikipedia example here we using half to their results in a Johnson and sanity and both being assigned to the same index 152 so we just started a list containing both entries but if lots of items end up in the same bucket then our hashtable starts to look like just a bunch of linked lists and the performance of linked list is not nearly as good as those of a hash table especially when looking up removing items a look up or removal of a linked list and in the average case involves traversing the list which is as big of an operation and as we add more and more items to hash table it's inevitable that more and more entries up in the same in the same bucket consider has to with an underlying array of length 1 no matter what hash function you used all items are going to be stored in that 1 and only bucket and that's rapidly going to be a long linked list so In order to keep our average performance constant and will occasionally in increase the size of the the underlying array and redistribute and then provided that were using a decent hash function of the number of collisions should decrease because we're spreading out the same number of these as before among buckets and how do we know when to resize well if we keep track of the number of items currently in the hash table compared to the length of the underlying domain then we should resize 1 report a stream of items to length reaches a certain threshold will call this the maximal proportion and so we talked about 3 variable properties of hash tables we got the size of the underlying array the hash function and the maximum load proportion and all 3 can affect the performance of the year for example initial size helps determine how often you have to resize and that's going to be a costly operation the hash function impacts how many collisions you might have more and more complicated had functions will take longer to evaluate the maximal proportion will play a role in how long those linked lists might get for you resize so in order to explore how these affect performance by Riemann hash table implementation and it enables the user to this is the maximum load proportion and the initial size of the underlying array my library provides functions to initial a table with the given properties to add look up and removed key-value pairs and of integer float and string types and finally to free the memory mallet to store the data structures of the array going lists all your
strings that or whatever and and I also want to explore how different hash functions would affect performance so this is the signature of that function in my C implementation accepts a half argument on because my idea was that the user should do their own hashing of he and then they just pass that has value and when adding looking up or moving in and end of my library would find the opera everybody for that key value pair based on the past and if the user chose not to pass in of the key my my library just use this
function half functions also but if it's an integer you use the Internet if it's float around it down and use that integer of 2 strings just use the length of the string and that was inspired by the hash function that is no joke an early version of PHP used to store function names in the symbol table at times and so it is basically an awful hash
function and I that's as about to do some hardcore bit-shifting and string-manipulation in C. to experiment with writing my own hash functions just if I would
like to experiment but rather do
in place wouldn't it be nice if I could pull all hash functions in Python on then just call them from my
C hashtable after all find is so nice and right and I'm a lot faster writing Python and I writing C but you know under the
hood Python is actually just a really big complicated C program that processes those strings of whitespace sensitive code that we write and thankfully there's a well-documented API for bridging the gap between the Python programming language and
on the sea program and it is easy to use this API as including a simple line in your
C file and then the magic began so my goal
is to call a hash function that's written in pure python are from inside my see hash table library little disclaimers CAP I did change substantially diminish between 1 and 2 and 3 and all the code in my talk is found to so I started by wrapping everything I needed to use a hash table inside my hashtable library inside structure so I'm at that pointer to the actual data structure and some of those other properties associated with hash tables like the current load the initial size as this prior object pointed to hash function and so this 1 had the data that I wanted to my pipeline types but I needed to implement that API telling case on how to
manage objects of this type and the fact that this high object had thing which is a macro imported with Python and it expands to the bare minimum that you need to create a Python object which is a reference count and I chose to ignore that at time and a pointer
to and this high type object which is just the start function pointers defining how Python should manage objects of the hash table takes
so that's things like the class name and had a prince and make a string representation of the objects that how to initialize the lead and free the memory allocated to hold objects welcome back to these little later and there are more
than laughter so this point I had my basic types defined but needed some way to use this type from within Python code and so I created a module to contain the hash table type In order to initialize module you need to a high-minded in it find a function that has the name and your model so minds in a hash table and when a Python program imports module for the 1st time this is the function that's right again have left some stuff out but of no part of this line which initializes the type and it fills in world I type object but I think some compilers specific functions and here we initialize the model and this line as my new types the module dictionary so we can actually instantiate new objects the class and packaging there couple different ways to package Python modules but 1 simple way is just to write a set of top 5 file I'm telling type on the name of the module and what the files you Madam needs so this is the entire contents of my set of 5 when you run python so that title the creature of build directory and puts and the compiled file containing your extension that can be dynamically loaded into a Python program so on Unisys's shared object file I work on a Mac so my modules and hash table that so on Windows this would be a deal with the dot tidied extension so if you start up a Python interpreter or other a Python program in the same directory as the data so file that you can type import hash table and you hashtable stuff from Python so those except MIT program kept saying
faulting and I was were to look at a section of the API that's that I kind of been ignoring which is the section on reference
counting only 1 reason why Python is so nice is that it has it's a pretty high level language and it handles a lot of things for the programmer for example memory management and when you use that in a Python program python takes care of dealing with the OS to ensure that that that is stored in memory of however if I on only added to your programs memory eventually the program would run out of memory so it needs to know when it can remove data and once that that is being used in some fantasy program uses a method called reference counting that's know when it can safely free objects so that means it keeps track of the number of other things referring to a given object and when that reference count drops to 0 I think cleans up the unneeded object by calling the deallocation function that's defined for for its type and
so here to tools that can help us understand reference Council of it from the SIS model we have to get left out function and from garbage collection module there's get refers which returns a list of all things the currently on references to an you have written
function partial recounts and all it does is finds the objects from references to the argument
and an object it prints out how
many there are optionally can call itself and it can actually print out extra details about exactly which objects and references to
an object and so look and this work and so in a Python shall we start by
importing the module the tools that we need this system GC models and also that function registered you which I say about and so will start by instantiating an new object and we will see what the reference counting on the object to find and if we assign another willing to that object then will of the artist and again it's
really cool so what exactly is referring to this object believes that refers function constant got subject here with the 2 variable names we just made and they're both referring to object object at memory location think and so what exactly is that 1 of the
local namespace so there is again court so what happens to the reference count on object if we pass it as an argument to a function all use that function that showed you so 1st of all and we're gonna pass in the project and will have not we'll just call once and we will show the details about the refers this time so the so that local namespace but there is something new this frame object in there have now also owns a reference to from our object individually and if you fall again this time will have a call self recursively 100 times and will turn off the overwhelming dividing so we see that each time we call the function of the reference can increases by 1 and if we were to look at the details of the refers we see another frame object being added to the and the 1st with with each the
question so if you're going to write a c extension and work with Python objects then 1st you need to signals of Python when your program starts working with a certain object and that to I and to increase the reference count on that object by 1 their right to drink keeping it memory while you represented by that 1 otherwise if the reference count of 0 then Python Wilfried object and when your program tries to access the object which is now in a piece of memory that has been released that's to no SAS and your program will crash will like hopefully or just like Richard will happen you also need to take care of telling Python and when you're done working with a certain object by documenting its reference count by 1 and if you don't do your part in determining that reference count its reference count can never decrease to 0 it will never be gleaned from memory so that's a memory leak the Python CGI provides 2 macros to communicate when you're starting to work with the object and when you're finished working with an object point height in graph on an object increases the reference count by 1 and calling accurate places a reference count by 1 also if the reference can reached age is culture the deallocation
functions for the type and so what happens when you forget to find relevant objects you need to work on remember that I typed object structure function pointers that defines the Python API for my type well this is what I had defined as the deallocation functions for Python to call in the reference count of the hash table object reaches 0 it does 2 important things the first one is the printer debugging so we can see what is being called and
and it also calls the free function that I defined from an object the at the height project staff and so that every function also has a nice printer and then calls the free table function provided by my so the programs to free the memory Mallet for that initially the set method on my hash table 1 has to objects returned to the hash table object itself you hear that term self now there are a lot of rules and accept options about in which situations the caller received the Holy is responsible for paying graphing arguments and I I barely scratched the surface however I think across a problem here because if a C function returns a reference to an object like self and then that reference must be owned by the function the idea function must have been playing the object must have been playing graphs inside the function but I had left that out and so let's see how this affects my program from 1st is that so the price to build a model OK can the output great about on window opens here to that of the village subdirectory so there's a lot of time and if we start with Python wrapper then we can import hash table my module and we can instantiate new hash table and we'll start selling some values not very creative rounded on so remember that the descent method right now returns the hash geologic itself so this is the string repres entation the river reparanda being returned to the Python wrapper on each bracket square represents 1 of our buckets and each dot represents an item in the linked list at that so that's that's more values 2 maps for alright so so there's the the e-print that every inside my cleanup functions and we help found to increase the reference count on that object on that has the logic of but all the other refers must have released their references to that object and its reference count has dropped to 0 and cleanup functions have been called so if we try to do anything else with a hash table object like certain other key value pair we can then have so but at that time
and and look the time of the year
adjustment build and so for the new version of the model and instantiating object the and start going some values so I we only started and time to start slipping going this at so far and keep going with this squares theory science that's great so that 3 for items and will try and strings the stranger you will How can we assessed again so looking at the so called so think that's all the problems but the other type of mistake you can make is forgetting to call tidak graph when you're done with an object and surrendered to that issue so remember that my Python hash table type structure it contains a pointer to another Python objects namely the hash function used to have so here's a snippet from the initialization function for a couple of hash tables we use some of the stuff but I set the objects hash function at either to and the hash function passed in by the user when the object was
and still initialized 4 2 Python's built-in hash
functions as a and I also increase the reference count on this hash function object because in each cell quite unlike any and every working with this function of for a while please don't clean it up so we say that each has table object owns a reference to the hash function object and conversely in the deallocation function for my life I tell my friend today that meant reference count on that hash
function of the all so here's a simple demo look at the reference count on Python's built-in hash function and so I had this function here hashtable stuff and all it does is initialized a hash table using the built-in hash function as the hash function so far has to object will own a reference to the built-in hash function object can it print of the reference count on
the function and then this program will just called to
table suffer properly so let's look at how that reference count controls and hash function changes and when
the bills that and then run the program it the so initially we start out with just 3 reference can be references to the built-in hash function objects each time we entered due hash stuff and instantiate new hash table that owns a reference to built-in hash function the reference count on that function object increases by 1 for and each time they do actually stop completes the hash table that was initialized inside of it goes out of scope so the reference count on the hash table drops to 0 which triggers the geolocation function for hash tables this function which triggers
applied decrypt on the bulletin hash function objects so after calling do have philosophical times we still just have a reference count of 3 on the built-in hash function object but let's just say that we have
forgotten to decrease that reference count and will run that again I just results and so this is without initially the reference can still 3 of their and he we introduce national self we instantiate new hash table which means reference to those and hash function object the reference count that function object increases by 1 2 4 to 5 2 6 because each and you hashtable of complete the hash detachable goes out of scope reference cannot table drops to 0 and it's the only reason functions it's called but no we did we release that reference that we all those and hash function so african do hash tables the couple times the reference count on the built-in hash function object has increased from 3 to 6 even the object to that of the objects that owns those last 3 references have themselves been 3 so this is a memory the other 3 x 2 references were owned by objects that Python has now cleaned up they no longer exist we've lost our opportunity to signals of life under that these references are needed anymore the reference count can never dropped to 0 so Python will never remove that function object from memory now we're talking about the built-in hash function here so it's not like we really want removed from memory but imagine a more memory-intensive object and a longer running program that created tons of these objects that could never be cleaned up eventually this type of
error will cause problems and so I went back and added that
I get this line and the retirement program and after all that I finally had a model that works really well enough so I wrote my very own Python hash function and if the item to hashes integer or float and we'll do this 1 thing I time and set of 1 and 2 otherwise were clearly hashing during so I take this 2nd thing and that
flow and and so also included a print statement so we can see what this function is called a prefect but over time and because of the code and I also went back and and it more print statements to my C wrapper module and the original C library and there's also prefixed by their the origin
and so now let's look at how the program so I started my level here with that also hash function loaded and we can import module the end of so we can see some print statements there there were inside in a hash table function and so instantiated new hashtable objects and will as a hash function will use that Boston played and hash function that and so the the wrapper model we see that it's the the underlying C library that's actually doing heavy hidden like mounting space for my data structure and getting everything set up the look at the object this and to create so that some value results are always squares here so we see that the the sea module and the set function that is is executing the Python code there in that rope in and it's content a simple and then it's also for the you know dealing with the underlying C library which actually does the hard work of like adding the demands of the the right linked lists and things so that's what values the whole story all strings always break everything's look so the underlying C library knew that it needed to resize the effect of time and it's bigger great also bonuses works the half the we can look up the values so we can continue the value of the key value pair associated with power now if we try to look up the values and the library that underlying library looks for those of you that I can find and the module takes care of returning onto the apple and when I finally quit the apple then again we see that the deallocation functions called from the scene model but it's on line library that's actually doing the heavy hitting like walking through my data structures and so I got the point that whole calling the functions from there in my C module and that code executing the hash function that we have found in the rubble and it's just all together and that was pretty cool very
options thank thank other questions and will 1st thanks for sharing and great June clarify is wouldn't think so at some point you had some blind in the in alluded to it I think I sold by the score X. there is that applied 1 of those different on to there's there's 2 different types of the pi x 2 x 2 x that graph 1 of growing area the pointers and whereas the other 1 if you if you can't you can't indeed yeah if you actually have a pointer there then it 1 and it would be it would be hard to make clear syntax like in the native but in dictionary you can initiate members with the brackets and so on I'm not an option but I know that like all the double under methods sort like related to see functions so probably that wouldn't work you just added to that high-tech object structure function of function pointers and the thanks to talk will rules for you and this is is worse for legal but did you chose to experiment with new ways to do this and I'm willing to maximize the actual I had no I just fail that this form talking about the extremists analyzes of this like I respectively which to do and that this is 1 of the reasons for that but this is a very good example started to actually like a time for 1 more would what it was a victim of
Programmiersprache
Programmbibliothek
Maßerweiterung
Optimierung
Code
Computeranimation
Programmiergerät
Web Site
Verschlingung
Last
Maschinensprache
Hash-Algorithmus
Rechenschieber
Radikal <Mathematik>
Projektive Ebene
Softwareentwickler
Code
Resultante
Extrempunkt
Natürliche Zahl
Textur-Mapping
Komplex <Algebra>
Streaming <Kommunikationstechnik>
Nichtlinearer Operator
Lineares Funktional
Parametersystem
Schwellwertverfahren
Dicke
Schlüsselverwaltung
Kategorie <Mathematik>
Ein-Ausgabe
Bitrate
Elektronische Unterschrift
Konstante
Funktion <Mathematik>
Datenstruktur
Menge
Ganze Zahl
Automatische Indexierung
Festspeicher
Schwimmkörper
Ordnung <Mathematik>
Schlüsselverwaltung
Zeichenkette
Tabelle <Informatik>
Subtraktion
Hash-Algorithmus
Stoß
Implementierung
Zahlenbereich
Kombinatorische Gruppentheorie
Variable
Domain-Name
Weg <Topologie>
Ganze Zahl
Ungleichung
Mittelwert
Restklasse
Hash-Algorithmus
Programmbibliothek
Zeitkomplexität
Datenstruktur
Speicher <Informatik>
Implementierung
Leistung <Physik>
Tabelle <Informatik>
Stoß
Mailing-Liste
Binder <Informatik>
Data Dictionary
Zeichenkette
Mapping <Computergraphik>
Flächeninhalt
Last
Verkehrsinformation
Lineares Funktional
Dicke
Hash-Algorithmus
Versionsverwaltung
Symboltabelle
Bildschirmfenster
Computeranimation
Internetworking
Zeichenkette
Funktion <Mathematik>
Ganze Zahl
Hash-Algorithmus
Speicher <Informatik>
Zeichenkette
Programmiersprache
Sensitivitätsanalyse
Hash-Algorithmus
Optimierung
Code
Computeranimation
Zeichenkette
Objekt <Kategorie>
Kategorie <Mathematik>
Elektronische Publikation
Code
Computeranimation
Objekt <Kategorie>
Last
Hash-Algorithmus
Datentyp
Programmbibliothek
Optimierung
Datenstruktur
Zeiger <Informatik>
Gerade
Tabelle <Informatik>
Objekt <Kategorie>
Lineares Funktional
Extrempunkt
Hash-Algorithmus
Datentyp
Zeiger <Informatik>
Zählen
Makrobefehl
Computeranimation
Tabelle <Informatik>
Punkt
Compiler
Klasse <Mathematik>
Selbstrepräsentation
Code
Computeranimation
Informationsmodellierung
Hash-Algorithmus
Datentyp
Bildschirmfenster
Inhalt <Mathematik>
Optimierung
Maßerweiterung
Gerade
Modul
Umwandlungsenthalpie
Interpretierer
Lineares Funktional
Gebäude <Mathematik>
Elektronische Publikation
Modul
Data Dictionary
Objekt <Kategorie>
Menge
Festspeicher
Mereologie
Ordnung <Mathematik>
Verzeichnisdienst
Tabelle <Informatik>
Zeichenkette
Objekt <Kategorie>
Lineares Funktional
Höhere Programmiersprache
Programmiergerät
Weg <Topologie>
Festspeicher
Zählen
Zahlenbereich
Garbentheorie
Speicherverwaltung
Zählen
Optimierung
Computeranimation
Objekt <Kategorie>
Parametersystem
Lineares Funktional
Informationsmodellierung
Mailing-Liste
Speicherbereinigung
Modul
Computeranimation
Objekt <Kategorie>
Computeranimation
Demo <Programm>
Objekt <Kategorie>
Lineares Funktional
Informationsmodellierung
Speicheradresse
Physikalisches System
Modul
Computeranimation
Parametersystem
Lineares Funktional
Namensraum
Punkt
Graph
Rahmenproblem
Systemaufruf
Zählen
Computeranimation
Objekt <Kategorie>
Rechter Winkel
Festspeicher
Mereologie
Projektive Ebene
Optimierung
Maßerweiterung
Makrobefehl
Freeware
Stab
Ungerichteter Graph
Zählen
Term
Computeranimation
Poisson-Klammer
Informationsmodellierung
Flächentheorie
Wrapper <Programmierung>
Hash-Algorithmus
Datentyp
Bildschirmfenster
Gradientenverfahren
Optimierung
Datenstruktur
Zeiger <Informatik>
Funktion <Mathematik>
Lineares Funktional
Parametersystem
Mailing-Liste
Schlussregel
Modul
Konfiguration <Informatik>
Mapping <Computergraphik>
Objekt <Kategorie>
Teilmenge
Quadratzahl
Festspeicher
Projektive Ebene
Schlüsselverwaltung
Tabelle <Informatik>
Zeichenkette
Lineares Funktional
Graph
Gebäude <Mathematik>
Versionsverwaltung
Physikalische Theorie
Computeranimation
Objekt <Kategorie>
Informationsmodellierung
Quadratzahl
Hash-Algorithmus
Datentyp
Datenstruktur
Zeiger <Informatik>
Demo <Programm>
Tabelle <Informatik>
Zeichenkette
Objekt <Kategorie>
Lineares Funktional
Videospiel
Hash-Algorithmus
Zellularer Automat
Zählen
Computeranimation
Tabelle <Informatik>
Objekt <Kategorie>
Lineares Funktional
Demo <Programm>
Hash-Algorithmus
Optimierung
Zählen
Computeranimation
Tabelle <Informatik>
Objekt <Kategorie>
Lineares Funktional
Mathematisierung
Hash-Algorithmus
Gamecontroller
Optimierung
Zählen
Computeranimation
Demo <Programm>
Tabelle <Informatik>
Leck
Resultante
Lineares Funktional
Videospiel
Zählen
ROM <Informatik>
Computeranimation
Objekt <Kategorie>
Chiffrierung
Festspeicher
Mailbox
Hash-Algorithmus
Optimierung
Tabelle <Informatik>
Informationsmodellierung
Menge
Hash-Algorithmus
Benutzerfreundlichkeit
Optimierung
Gerade
Computeranimation
Fehlermeldung
Resultante
Hash-Algorithmus
Punkt
Hochdruck
Mengenfunktion
Benutzerfreundlichkeit
Raum-Zeit
Code
Computeranimation
Übergang
Demoszene <Programmierung>
Informationsmodellierung
Wrapper <Programmierung>
Hash-Algorithmus
Programmbibliothek
Inhalt <Mathematik>
Datenstruktur
Optimierung
Gerade
Leistung <Physik>
Soundverarbeitung
Lineares Funktional
Befehl <Informatik>
Mailing-Liste
Modul
Zeichenkette
Objekt <Kategorie>
Quadratzahl
Schlüsselverwaltung
Tabelle <Informatik>
Zeichenkette
Lineares Funktional
Punkt
Graph
Gemeinsamer Speicher
Schlussregel
Quick-Sort
Computeranimation
Konfiguration <Informatik>
Objekt <Kategorie>
Poisson-Klammer
Bildschirmmaske
Flächeninhalt
Datentyp
Zeiger <Informatik>
Datenstruktur

Metadaten

Formale Metadaten

Titel Peeking into Python’s C API
Serientitel EuroPython 2016
Teil 155
Anzahl der Teile 169
Autor Davis, Sophia
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/21224
Herausgeber EuroPython
Erscheinungsjahr 2016
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Sophia Davis - Peeking into Python’s C API Ever wondered how Python works under the hood? One way to learn about Python-the-C-program is by exploring the C API for writing Python bindings to native C libraries. In this talk, we will walk through a simple example of making a C library callable from Python code and vice versa. Along the way, we will encounter some essential features of Python: reference counting, memory management, and the inner- workings of objects and modules. ----- We all love Python. It’s so elegant and easy to use as a programming language that we forget about the giant, complicated C program executing our strings of white-space sensitive code. For many Python programmers, this side of Python is just a big black box. It works well, so thankfully we don’t *need* to go messing around inside... but what if you *want* to look into the inner workings of this powerful tool? One way to dive into the C-program-side of Python is by exploring the C API for writing Python bindings to native C libraries. In this talk I will explore the basics of this API as I recount my journey to make a simple C library callable from Python code, and allow C code to invoke objects defined in pure Python. Along the way, we will encounter some essential features of Python: reference counting, memory management, and the inner-workings of objects and modules.

Ähnliche Filme

Loading...