2 + 2 = 5: Monkey-patching CPython with ctypes to conform to Party doctrine

Video thumbnail (Frame 0) Video thumbnail (Frame 853) Video thumbnail (Frame 4288) Video thumbnail (Frame 5114) Video thumbnail (Frame 15464) Video thumbnail (Frame 20091) Video thumbnail (Frame 26451) Video thumbnail (Frame 28575) Video thumbnail (Frame 33410) Video thumbnail (Frame 34496) Video thumbnail (Frame 35636) Video thumbnail (Frame 42358) Video thumbnail (Frame 44737)
Video in TIB AV-Portal: 2 + 2 = 5: Monkey-patching CPython with ctypes to conform to Party doctrine

Formal Metadata

2 + 2 = 5: Monkey-patching CPython with ctypes to conform to Party doctrine
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
2 + 2 = 5: Monkey-patching CPython with ctypes to conform to Party doctrine [EuroPython 2017 - Talk - 2017-07-10 - PyCharm Room] [Rimini, Italy] A few weeks into your tenure as a software engineer at the Ministry of Truth you are assigned your first real feature request: write a context manager that can make “2 + 2” equal 5 at runtime. Your solution should be written only in Python (for maximum portability). Absurd? Perhaps, but you know better than to ask questions. You are no thought-criminal. In this talk I walk through the steps I took to modify the value of two plus two in CPython at runtime—using only Python and the ctypes module. What began for me as a silly and frivolous side project became an education in how the python data model works behind the scenes and how CPython compiles, optimizes, and executes python code. The goal of this talk is to provide an introduction to CPython internals while walking through the steps needed to monkeypatch integer addition to make “2 + 2” equal 5. The audience should come away with a better understanding of how python objects and types are represented in memory, how references are counted, and how python scripts are transformed into abstract syntax trees, compiled into code objects, and then executed by the CPython virtual stack machine. And because I’ve limited myself to using ctypes, these topics can be explored without familiarity with C as a prerequisite
Web-Designer Software developer Interpreter (computing) Website Bit Social class
Point (geometry) Context awareness Presentation of a group Inheritance (object-oriented programming) Scripting language Run time (program lifecycle phase) Code Patch (Unix) Image resolution Similarity (geometry) Mereology Test-driven development Twitter Natural number Integer Software testing Data structure Proxy server Social class Scripting language Addition Inheritance (object-oriented programming) Unit testing Instance (computer science) Limit (category theory) Flow separation Entire function Mathematics Type theory Data management Process (computing) Error message Doubling the cube Personal digital assistant Order (biology) Interpreter (computing) Video game Software testing Cycle (graph theory) Quicksort Object (grammar) Library (computing)
Context awareness Interior (topology) Code Multiplication sign Strut File format 1 (number) Set (mathematics) Opcode Parameter (computer programming) Function (mathematics) Data dictionary Optical disc drive Inference Pointer (computer programming) Type theory Different (Kate Ryan album) Semiconductor memory Object (grammar) Oval Single-precision floating-point format Damping Extension (kinesiology) Social class Programming language Block (periodic table) Structural load Binary code Electronic mailing list Bit Instance (computer science) Opcode Abstract syntax tree Type theory Data management Process (computing) Order (biology) Normal (geometry) Right angle Quicksort Data structure Classical physics Functional (mathematics) Token ring Virtual machine Content (media) Regular graph Field (computer science) Number Element (mathematics) Attribute grammar Power (physics) Sequence Revision control Crash (computing) Pi Differenz <Mathematik> Operator (mathematics) String (computer science) Integer Software testing Data structure Computer-assisted translation Proxy server Address space Mathematical optimization Default (computer science) Context awareness Default (computer science) Addition Execution unit Inheritance (object-oriented programming) Information Counting Field (computer science) Letterpress printing Binary file Word Pointer (computer programming) Number theory Personal digital assistant Function (mathematics) Case modding Interpreter (computing) Object (grammar) Abstraction Library (computing)
Bytecode Functional (mathematics) Context awareness Scripting language Computer file Code Multiplication sign Function (mathematics) Mereology Attribute grammar Revision control Pi Type theory Semiconductor memory Different (Kate Ryan album) Software testing Integer Office suite Data structure Address space Mathematical optimization Social class Module (mathematics) Covering space Information Software developer Code Bit Letterpress printing System call Abstract syntax tree Mathematics Type theory Lie group Compiler Function (mathematics) Revision control Interpreter (computing) Quicksort Object (grammar) Mathematical optimization Window Tuple Resultant Flag
Functional (mathematics) Stapeldatei Assembly language Multiplication sign Shared memory
Point (geometry) Bytecode Logical constant Functional (mathematics) Three-valued logic Code Length Execution unit MIDI Set (mathematics) Branch (computer science) Function (mathematics) Icosahedron Semiconductor memory Different (Kate Ryan album) Operator (mathematics) Program slicing Error message Address space Hydraulic jump Computer architecture Default (computer science) Multiplication Boilerplate (text) Electronic mailing list Menu (computing) Bit Machine code Opcode Frame problem CAN bus Pointer (computer programming) Rootkit Normed vector space Order (biology) Object (grammar) Quicksort Local ring Window
Bytecode Logical constant Software engineering Slide rule Functional (mathematics) Computer file Code Multiplication sign Patch (Unix) Combinational logic Virtual machine Set (mathematics) Branch (computer science) Parameter (computer programming) Product (business) Mathematics Sign (mathematics) Semiconductor memory Damping Software testing Endliche Modelltheorie Mathematical optimization Social class Form (programming) Software developer Line (geometry) Cache (computing) Kernel (computing) Integrated development environment Personal digital assistant Order (biology) Interpreter (computing) Right angle Quicksort Routing Tuple Writing
Logical constant Functional (mathematics) Computer file MIDI Parameter (computer programming) Coma Berenices Bit rate Different (Kate Ryan album) Flag Software testing Normal (geometry) Physical system Arm Electronic mailing list Menu (computing) Category of being Data mining Integrated development environment Auditory masking Normed vector space Order (biology) Summierbarkeit Window Writing Vacuum
so given a warm welcome tool from the few high I think you us so I'm the lead developer clever developer for the Atlantic on the eponymous website for 106 year-old politics and culture magazine headquartered in Washington DC today and be talking about how to manipulate the internal subsea pipelines so you can make the interpreter evaluate 2 plus 2 equals 5 so a bit of background my topic the original and was trying to do is create a class decorator API that
looks something like this so here you have on the class and then you pass as its parent class a metaclass of sorts that tells it it should be patching and then you define the methods and you'd be able to use super recall the original method it turned out it really wasn't possible or rather it was but it was very inadvisable because it required UTP MRO our method resolution order other classes in a way that can only be done by accessing the underlying C structs but in this cycle the tents I became curious about just how far you can push the limits of manipulating C Python internals at runtime with C types would be possible for instance to make plus 2 equal 5 and it also seen maybe slightly topical i with recent events so that it was 1 of the situations were gone down a rabbit hole at several points it came very close in so this tantalizing nature of it kind of Derby forward to eventually I achieve it and also in the process I learned a lot about how different parts of the Python interpreter work and so I'll go over those of sort of structure the talk so that it a kind recapitulates the order that I attempted to solve the problem and then along the way explain the different techniques and underlying the structures that that informants the so prior art and reference and there's a library forbidden fruit that does something sort of similar doesn't go to quite the same extent on but it does cover the 1st technique that I used to try to patch integer addition and the full code forget this presentation is in Python double scripts which is at that you were all their on and it has running unit tests and what not so and yes yeah so so let's say this test-driven development were gonna write a test case of probably a context manages a good idea because you don't want that to post should equal 5 across the entire life span of the executable that'll certainly crash something and in I don't know how wouldn't so do is a context manager and then assert that 2 plus 2 equals 5 so naive approach and
going naive because it doesn't work but I mean in any other respect it would work on is you take a reference to the old underscore underscore add method you define a new method that if a near both to return 5 otherwise called the old add method and then you try and maybe set in honor of GM were added equals in ad you get a type concert attributes a built-in extension tight and you trends on the deck to get Dick proxy objects not support item assignment the so quick
I can a crash course in types on C times allows you to load any shared library becomes a the convenience after the for accessing the lid Python C library by ways of C types that Python API NEC functions that exposed in library can be accessed at tributes of C. text Python API the few courts with how it handles types and for instance he said the attributes archetypes in res tight on the functions so that the seat of marginal knows how to pass arguments that as the cat convert say number in object in Python into an actual C and or whether it should actually be a python object in and so in this case were reusing pi get version it returns a pointer to a null-terminated care race so we use C types that C care P but also comes with a bunch of built-in types like see care p which is a pointer to a character see void p which is like a pointer to any address etc the so are types of when you specified in the list each element correspond to the type of that positional arguments on it also has is very useful type I object which lets you pass Python objects in the functions and get them back as Python objects instead of as some sort of abstract type that you can use on alternately you could set the rest had to see void for instance is another type they're just give you the address them and so you would go to do much but just as an example so another really powerful feature of C types as the structure type for this talk the only thing will really be concerned with the structure classes this underscore fields underscore at tribute I it's a list of 2 tuples the 1st item is the name of the attribute the 2nd item is the type of these provide a way to create Python objects that act like C strokes and can be passed as he struts into C functions and otherwise act more or less like normal Python objects so here I've listed the struc definition for the base Python object pi object which is 2 fields but will be rough account which is the reference count and it's supplies size size t which represents the size of the block in memory that can be read or written in a single operation they're not entirely in Python 3 it defaults to the 64 bit version you know on 32 bit machines but that's kind of a minor detail so on the reference counts defined on in the C code as Pete pious sized he's we follow that and then the next variable instructors a pointer to the object type like a quick recap a hell structure like conventionally used in the large programming language interpreters on because structure can't basically contiguous blocks of memory and it's just a list of objects that say this 1st number what bits should be used as this type of object the next set of that abuses that type of object you can use the same you can have a base type and pass that around and then as long as the memory is allocated for more things to be added to it you can opt cast it to something more specific so for instance you get object to look at the OB type it's injured type and then you call a method to sort of populate the integer and now you have a full hydrated integer field with a pointer to the number that represents for instance the and so in order to actually be able to use this we need a way to deal with turn our Python objects into type structures and the pipe in the Python the built-in function ID returns the objects address in memory so it's very convenient for this purpose because the from address class method on structures takes that can address in memory and returns a struct for the side notes he takes as a pointer function that transforms a type into a pointer that type I will use that later and or make more sense in context of so here I'm getting the reference count the apply object you can see in the 1st case but it's 7 and the 2nd case is 8 the differences that insist get rough count an extra reference to the argument inside the get rough count function the right so now tried override int ad I will return to our original naive approach of catching integer in it addition of and the debt so if we could mutate the added in the deck revealed achieve our goal now but but since scratch predict underscore underscore is in a regular text on and some sort of proxy for DECT but we need to do something with that and it turns out that the underlying C struck has a pointer to an ordinary mutable debt so on another side note on a useful feature type structures is that like I mentioned about how like pie and object can extend pi object you can do that with the class inheritance syntax and Python so here we have plastic proxy and because these trucks starts with the pi object at tributes those the 1st fields and then the next field of the when we use that we can create a function that lets us need take the classic is a load of trickery this is that the functionality that I sort of borrowed from the forbidden fruit library you are correct populated Dick proxy then you create temporary dictionary and you said a key 7 item with a keen on the values that get in return and that the thing you get back is a regular Python dictionary that you can change so we make it mutable we change the double underscore add method and 2 doesn't work I mean it does in a sense you can call the inference add method and you get the desired result but in every other case it doesn't why doesn't overriding the ones for ad suffice In this C code and if you look at that the function that adds numbers it's doing something where it's looking at the slot function for the entire so I have a quick like look here at what apply type object is and so this is again extensive pi object and has extra at tributes 1 of them is a list of methods if the thing is a number In the case that and it is a number it has a couple of functions so we have the number add numbers subtract number multiplied etc. the 4 words we are the 1st thing they're the syntax is a little bit odd for defining of types in C their functions on more
pointers to functions the pi objects stars the return type so it's returning a pointer to a apply object star binary Frank is the name of things it's a it's a binary binary functions the name it's a pointer and then the next parentheses is are the arguments so we can set that with the types by using the C. Funk tight of function which takes return type and then archetypes is a list of arguments are so we duplicate that their end of we use the structure to represent the pine number struct so here we have the fields for number add and we specifies the type of binary function or we put all together I we define the pi type objects copying over the attributes that we had over here and and we populate it with from address ID ent and then we try and call number Adam we get what we would expect to plus 2 is 4 so and ignore the top function there so again rather trying do is context managers we don't blow everything up in the process of testing on we get the the tricky thing with like the original approach of grabbing the original ad function and then using it again is that it wouldn't work if you were to just use and AD because when you change it the thing your then referencing still pointed to that changed thing so you need to get the address the original function and then create a new binary function that points that address so that's what we do here in the 1st 2 lines of old and the ad address old and the ad we define a new ad function and then I'm we replace the function and call the original the same way we did in that sort of naive approach we get pretty close on if you just 2 to that doesn't work the either L 2 plus 2 dies on so let's use that DistModel to see what's going on this lets you pass it either a code object string of code and it'll give you back but diagnostics and information about the tokens that the the abstract syntax tree that Python uses represent so here we have but a global variable to equals the number 2 and then a function add 2 plus 2 and then sigh returned to was to and you can see what can be and how it interprets that is low the global to the global to began add the last 2 things and then return the value so that binary add instruction opcode but in the C Python code there's a little kind of I optimization where it checks if both of the things are tense and if they are and then it does the addition in C as a like sort of speed optimization so there's actually call inference were an so how do you fix that
well when you change the class to something other than and on call it into so define into extends has its own ad function so it's not meant not exactly anyway on
and then we got a set it TypeError class assignment only for he types of course this is something we can get around by manipulating the structures so here we have a function that sets the type and there's a little bit of a I was like sort of unsure about whether I should include like fully functional code that had things that increased references and decrease them for simplicity remove them I opted to keep them in but the important thing here is on the grabbing of the old tight populating the pi object from the new thing and overriding the OB types and then I'm having a sort of context Majid a let us change to to be an end to end and that to in afterwards so here's a rear With override type 2 into we evaluated with 2 it's 5 so the same results we got before I we define a variable to and then we overwrite tied to the interview in french yeah 5 but if we do just 2 plus 2 not the variable to force the variable to be at 4 what final obstacle is something called people optimization on it's called a people optimization because it sort of looks through the abstract syntax tree in little windows and tries to find bits of code that it can fold together were simple fire return to something faster so 1 of the things it doesn't looks if they're too literal integers being added together and if they are a combined so if we use that this module and instead of using a variable to like we did the last time we use a literal to them we see that in the on byte code it just as a 4 if you look at the pie ce file if you really can rise up I see file you'd see a for there there would be no 2 and 2 so that's why I that kind of prevents us from doing what we wanted do on in C Python this is performed by the Psi function pi could optimize it doesn't occur evil which is why evil tuples 2 works but not when it's defined in a Python function a an interpreter so this is the craziest part of the way to get around it switches to disable python code optimization with what's called a trampoline function you basically take the memory where the pi could optimize function is you overwrite the 1st few assembly instructions to jump to a new address that new addresses and know what of function that just returns the code unchanged in increments the reference and you go from here and success and you run test an hour test passes as so Our again that's and sort of my talk of a quick kind of the thing because were allowed to do this on I hiring at the Atlantic were looking for DevOps were looking for full-stack um were looking for front-end developers and we'd just opened an office in London on James Fallows it if you follow the publication Jan fells moved to London in his opening the London office and if you Staffordshire moving with him so I think there will be room for developer in that office and then also obviously plenty room in Washington DC or with American team in here the and contact information and to us again that your L for getting the code which has a lot more detail like I I sort of shows a simplified version of pi type object it has a cover fully-fledged OWL like every attribute matches whatever it is in Python 2 in Python 3 has the differences between both figured into thank you
in the foot plenty of time for question so if you have any questions please raise your hand we have a microphone for for recording them and the it did usual of the coder which used to be a batch their assembly code tool into the temple and function and could you on but maybe I'm I'm not sure I understand the question so you have a in the last the lighter you'll you'll have showed us the wonderful this 0 this is Howdy Doody caffeate share I can pull that up some yeah what's such and and you turn on your
the so on the In order paper over the differences between Windows and Linux and Mac there's like a lot of sort of boilerplate constant stuff here but the main the main thing is there's a there's a function called and protect where it exists is different on Windows and Linux and I units and was 10 on by it you give it an address in memory and a length and you tell it how you want you want change what that memory does so by default the executable memory of Python that gets loaded into memory is not writeable B you can use this function to make a writable so the 1st thing you do is you figure out where the function is and how long you need to set the jump instruction for and then you make a writable there so that's the and protect stuff here you define your no-op and which is to increment the reference because that's what it does in the original C Python codes that seem no maybe superstitious but it made sense to do it on and but if you're overriding a function in this way you can return pointers to Python objects in the normal way so I just return the address which it interprets correctly as a pointer on this quot ternary funk a function that takes for Python objects and as return anything that's where the pi could optimize on the first one is the code and then there's like the locals and the frame etc. OMB and then there's the override so if it's not an x 86 architecture it throws an error because I didn't write it for armor and the other you know machine languages on you get the the pointer to the old function and to your new past function you change the jump instruction will be 5 bytes the 1st 1 is the jump byte code which is the 9 and then the next 4 is a relative offset to the new address so you satisfied bytes is readable executable and writable what you find the offset between the 2 and and then this was from me and just testing it on and then you combine them all together into a list of opcodes and you can the multiplication operator with the tides lets you create a race of things so here we have like in a you an unsigned by and multiplied by 5 will give us an array of 5 of these things and then we do from address we get back an array that has 5 points each for the the different structions in over right and then we just use the slice of syntax to replace it and then once you do that that patches it to be a jump instruction of the new function the yeah
hello and thank you very much for the talk so it's basically I have 2 questions that would get you going because there's always a certain point where you might say I can't keep doing that and how long did that take but it actually didn't take all that long but I was pretty determined I I think 1 of the of this particular this particular bit of code is interesting because this is how a lot of rootkits and things work is that they're all sort of and set memory writable and then I like
change actual assembly instructions in memory in order to you know exploit some sort of kernel Warner ability or something like that and so like kind of delving into that was interesting in its own right and so that kind of kept me going along that route and then once I knew how to do it then I sort of transferred it to this and so the it was it was it wasn't so much the like I was determined to get tuples to equal 5 so much is that like the things that I had to do in order to accomplish it and the things I don't learn were interesting enough on their own that that kind can at
the you this is the stock but McGregor question is sets having fun here we have a ever had it's such stuff production 1 of the no although other is 1 case where I have considered using something like this on if you want to change so going back to that original like attempt to do the the patching include patch class where you can just call super to call the original instead of having to do something we're like pass the original function of the 1st argument which is what a lot of patching things do I mean that would have been useful for i is I I'm been working on like this uh sort of combination of tools and that it does this thing work does a recent hard for any editable repositories and be really great if I could like put into that and instead of having to reset and potentially destroy your changes in you know of an editable package that's in the src folder i it prompts you and says this branches dirty this there on committed changes do you wanna do something with that as a way to sort of make it easier for our developers to sink of the code with the latest changes are archetypes so that was like the 1 place where it's not exactly production right because it would only be run within the span of like a developer updating some of the requirements on their own development machine as opposed to same production were you build from scratch or something like that on it but you a developer wants is time adjust update the 1 package on so that's where I kind of looked into this I never has actually happened but it's a possibility yeah the this political you mentioned that the people optimizer runs quite early and that the to use do not even show up in the y files it is that the people optimizer removing these two user is it so this is the need to disabled people optimizer right at the interpreters started to do this on the spot I'm confused and when things actually run when this this optimiza run and what's reason is for shows up in the policy and the 2 is that the same models to different parts sure of so the pi c is sort of like a cache of the compiled byte-code so if the price he was generated without having run this people disable optimization thing you're gonna it's not gonna work like the the Pisces than trumpet there's like an environment variable you can set to not miss the pi files on by end and so actually in the uh the run tests but this sort of the test runner I I set sister don't write bytecode equals true on just so that you know there are any pricey files to sort of mess things up and and can cause that to occur but on but yeah so that that's sort of how it connects to the price he felt otherwise if there isn't a pricey file yet when it executes and it's set to write the pi c file it'll use whatever is in memory so the the people optimizer is enabled or not and then generate the the byte code and save it to disk but after that so hypothetically if you ran this once and a generative pricey file that had the TOS distinct then you would I would stay that way the next time you ran with pi for us yeah the Hi there you had a really interesting hatched out in the previous slide you had some because of the an opaque works something about the the this thing along whole with the sign of the times know line up this finding that some constants and then you chief of Microsoft for these here in the form line of yes please do where do
these come from yes so arm for the sum of the constants for re write and execute privileges others are in this cis mn that H include file that's on Linux UNIX systems and then on Microsoft does this thing where they have it masks and so you know like that the benefit did masters like you have 1 you have to and then you can and them and you have like the bite the 1st byte is 1 2nd by this 1 and so they're both on Microsoft does this weird thing with this function where on the side we even know all of the values are offset by 1 in the list of its armed they also have extra properties that combine them instead of just like ending that which would be the sensible thing to do on anyway these I found from like a microsoft . com you know like explanation of how and protect works on Windows because it seemed like a horrible to the log in in a test environment it actually like the I follow this back and it just work on mining because the functions are pretty similar the like the differences kind of minuscule so once I had it working on Linux UNIX and I just had to like make a few tweaks to get around this with the flags that you pass a slightly different the order of arguments is different but in every other respect it's the same in the you and but of the you
there no more questions just a quick note on the Europe by Thornton on that up in your like you got Angelos rate that you got an article meant you and thank the speaker you cannot any any feet but you'll you'll want any constructive feedback for for the speaker but please think that in there the frontier but it