We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

2 + 2 = 5: Monkey-patching CPython with ctypes to conform to Party doctrine

00:00

Formal Metadata

Title
2 + 2 = 5: Monkey-patching CPython with ctypes to conform to Party doctrine
Title of Series
Number of Parts
160
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
2 + 2 = 5: Monkey-patching CPython with ctypes to conform to Party doctrine [EuroPython 2017 - Talk - 2017-07-10 - PyCharm Room] [Rimini, Italy] A few weeks into your tenure as a software engineer at the Ministry of Truth you are assigned your first real feature request: write a context manager that can make “2 + 2” equal 5 at runtime. Your solution should be written only in Python (for maximum portability). Absurd? Perhaps, but you know better than to ask questions. You are no thought-criminal. In this talk I walk through the steps I took to modify the value of two plus two in CPython at runtime—using only Python and the ctypes module. What began for me as a silly and frivolous side project became an education in how the python data model works behind the scenes and how CPython compiles, optimizes, and executes python code. The goal of this talk is to provide an introduction to CPython internals while walking through the steps needed to monkeypatch integer addition to make “2 + 2” equal 5. The audience should come away with a better understanding of how python objects and types are represented in memory, how references are counted, and how python scripts are transformed into abstract syntax trees, compiled into code objects, and then executed by the CPython virtual stack machine. And because I’ve limited myself to using ctypes, these topics can be explored without familiarity with C as a prerequisite
95
Thumbnail
1:04:08
102
119
Thumbnail
1:00:51
Web-DesignerWebsiteSoftware developerBitInterpreter (computing)Social classComputing platformComputer animationLecture/Conference
Inheritance (object-oriented programming)MathematicsScripting languageSoftware testingError messageDefault (computer science)Data structureCase moddingField (computer science)Pointer (computer programming)Type theoryLetterpress printingFunction (mathematics)Object (grammar)SequenceExecution unitStrutBinary fileCountingCASE <Informatik>Element (mathematics)Inheritance (object-oriented programming)Run time (program lifecycle phase)Parameter (computer programming)Unit testingTest-driven developmentSocial classProgramming languageDefault (computer science)Object (grammar)Type theoryPointer (computer programming)32-bitBitFunctional (mathematics)QuicksortLimit (category theory)AbstractionEvent horizonFlow separationInterpreter (computing)Library (computing)Point (geometry)Block (periodic table)Data structureRegular graphField (computer science)Natural numberNormal (geometry)Proxy serverAddress spaceCrash (computing)Attribute grammarInterior (topology)CodeContext awarenessOvalSemiconductor memoryElectronic mailing listOperator (mathematics)NumberImage resolutionOrder (biology)Optical disc driveSimilarity (geometry)Instance (computer science)Patch (Unix)Presentation of a groupDoubling the cubeDifferent (Kate Ryan album)Extension (kinesiology)Power (physics)Data dictionaryProcess (computing)MereologySet (mathematics)IntegerAdditionData managementSingle-precision floating-point formatRevision controlVirtual machineScripting languageSoftware testingCycle (graph theory)Inference1 (number)PiComputer-assisted translationClassical physicsStructural loadRight angleFunction (mathematics)Video gameEntire functionTwitterWordDampingMultiplication signComputer animation
Function (mathematics)Pointer (computer programming)Object (grammar)StrutData structureFile formatContext awarenessContent (media)Letterpress printingOpcodeType theoryAbstract syntax treeParameter (computer programming)String (computer science)OpcodeToken ringInformationNumberPointer (computer programming)Object (grammar)Type theoryBinary codeAttribute grammarContext awarenessCodeStructural loadField (computer science)Data structureQuicksortAddress spaceInterior (topology)Functional (mathematics)Data managementSoftware testingSocial classAdditionMathematical optimizationLine (geometry)Module (mathematics)InferenceSystem callElectronic mailing listNumber theoryProcess (computing)XMLComputer animation
FlagMathematicsLetterpress printingCodeMathematical optimizationLie groupCompilerFunction (mathematics)Revision controlScripting languageAbstract syntax treeType theorySemiconductor memoryMathematical optimizationAddress spaceLetterpress printingData structureFunctional (mathematics)BitPiInformationInterior (topology)QuicksortDebuggerRevision controlAttribute grammarObject (grammar)Function (mathematics)ResultantDifferent (Kate Ryan album)WindowSoftware testingCodeMessage passingContext awarenessIntegerSoftware developerMiniDiscModule (mathematics)Interpreter (computing)Set (mathematics)Office suiteSocial classMemory managementStaff (military)MereologyData managementCovering spaceMultiplication signComputer fileBytecodeTuple
Multiplication signCodeLecture/Conference
Shared memoryFunctional (mathematics)Assembly languageStapeldateiPatch (Unix)Lecture/Conference
IcosahedronCAN busMenu (computing)MIDINormed vector spaceMachine codeThree-valued logicAddress spaceLocal ringBoilerplate (text)Function (mathematics)Frame problemLogical constantWindowFunctional (mathematics)Computer architectureDefault (computer science)Hydraulic jumpObject (grammar)Electronic mailing listPointer (computer programming)OpcodeMultiplicationExecution unitOperator (mathematics)Error messageSemiconductor memoryPoint (geometry)LengthProgram slicingCodeOrder (biology)Different (Kate Ryan album)QuicksortBranch (computer science)BytecodeArray data structureSpeicherschutzType theoryMathematical optimizationArmComputer animation
Point (geometry)Lecture/Conference
Software engineeringCodeBitSemiconductor memoryRootkitQuicksortMathematicsSet (mathematics)TupleKernel (computing)Right angleOrder (biology)RoutingVulnerability (computing)Lecture/ConferenceSource code
CASE <Informatik>Product (business)Set (mathematics)Mathematical optimizationMathematicsRight angleBranch (computer science)Interpreter (computing)Software developerEndliche ModelltheoriePatch (Unix)Functional (mathematics)Virtual machineSocial classCodeDampingParameter (computer programming)Multiplication signQuicksortComputer fileCombinational logicHookingSynchronizationComputer animationMeeting/InterviewLecture/Conference
Computer fileMathematical optimizationCache (computing)BitSemiconductor memoryQuicksortMultiplication signBytecodeIntegrated development environmentSoftware testingSign (mathematics)Slide ruleLine (geometry)WritingLecture/Conference
Menu (computing)Normal (geometry)Normed vector spaceVacuumMIDILogical constantForm (programming)Line (geometry)Auditory maskingCategory of beingPhysical systemOrder (biology)Functional (mathematics)BitElectronic mailing listSpeicherschutzWindowComputer fileParameter (computer programming)Integrated development environmentDifferent (Kate Ryan album)FlagSoftware testingData miningArmSummierbarkeitComa BerenicesWritingComputer animation
FeedbackMobile appMobile WebBit rateLecture/Conference
Transcript: English(auto-generated)
Please give a warm welcome to Frankie. Hi, thank you. So I'm the lead developer, platform developer for the atlantic.com, the eponymous website for 106-year-old politics and culture magazine
headquartered in Washington, D.C. Today I'm gonna be talking about how to manipulate the internals of CPython so that you can make the interpreter evaluate two plus two to equal five. So a bit of background on my topic. Originally what I was trying to do was create a class decorator API that looks something like this. So here you have a class and then you pass
as its parent class a metaclass of sorts that tells it it should be patching and then you define the methods and you'd be able to use super to call the original method. It turned out it really wasn't possible, or rather it was but it was very inadvisable because it required you to mutate the MRO or method resolution order of the classes
in a way that can only be done by accessing the underlying C structs. But in this semi-failed attempt, I became curious about just how far you could push the limits of manipulating CPython internals at runtime with C types. Would it be possible, for instance, to make two plus two equal five? And it also seemed maybe slightly topical
with recent events. So it was one of those situations where I'd gone down a rabbit hole, several points I came very close and so the tantalizing nature of it kind of drove me forward to eventually achieve it and also in the process I learned a lot about how different parts of the Python interpreter work.
And so I'll go over those, I'll sort of structure the talk so that it kind of recapitulates the order that I attempted to solve the problem and then along the way explain the different techniques and underlying structures that inform it.
So prior art and reference, there's a library Forbidden Fruit that does something sort of similar. It doesn't go to quite the same extent, but it does cover the first technique that I used to try to patch integer addition. And the full code for this presentation is in Python double scripts,
which is at that URL there and it has running unit tests and whatnot. So let's say this is test-driven development. We're gonna write a test case. Probably a context manager's a good idea because you don't wanna set two plus two to equal five across the entire lifespan of the executable.
That'll certainly crash something. I mean, I don't know how it wouldn't. So do it as a context manager and then assert that two plus two equals five. So a naive approach, and it's only naive because it doesn't work, but I mean, in any other respect it would work, is you take a reference to the old
underscore underscore add method. You define a new method that if A and B are both two, return five, otherwise call the old add method. And then you try and maybe set in underscore underscore add equals in add. You get a type error. Can't set attributes of built-in extension type int. You try and set it on the dict.
You get dict proxy object does not support item assignment. So quick kind of crash course in C types. C types allows you to load any shared library, but it comes with a convenience attribute for accessing the libPython C library by ways of C types dot Python API.
Any C functions that are exposed in the library can be accessed as attributes of C types dot Python API. There are a few quirks with how it handles types. For instance, you set the attributes arg types and res type on the functions so that the C types module knows how to pass arguments. That is like how to convert say a number,
an int object in Python into an actual C int or whether it should actually be a Python object int. So in this case, we're using py get version. It returns a pointer to a null terminated char array. So we use C types dot C char P. It also comes with a bunch of built-in types like C char P, which is a pointer to a char,
C void P, which is like a pointer to any address, et cetera. So arg types, when you specify it, it's a list. Each element corresponds to the type of that positional argument. It also has this very useful type py object, which lets you pass Python objects into functions
and get them back as Python objects instead of as some sort of abstract type that you can't use. Alternatively, you could set the res type to C void P, for instance, as another type. That would just give you the address, and so you wouldn't be able to do much, but just as an example. So another really powerful feature of C types is the structure type.
For this talk, the only thing we'll really be concerned with the structure class is the underscore fields underscore attribute. It's a list of two tuples. The first item is the name of the attribute, and the second item is the type. These provide a way to create Python objects that act like C structs and can be passed as C structs
into C functions, and otherwise act more or less like normal Python objects. So here I've listed the struct definition for the base Python object py object, which is two fields. OB ref count, which is the reference count, and it's a py size, s size t, which represents the size of a block in memory
that can be read or written in a single operation. Though not entirely in Python 3, it defaults to the 64-bit version, even on 32-bit machines, but that's kind of a minor detail. So the reference count's defined in the C code as py s size t, so we followed that, and then the next variable
in the struct is a pointer to the object's type. Like a quick recap for how structs are conventionally used in large programming language interpreters. Because structs are basically contiguous blocks of memory, and it's just a list of objects that say this first number of bits should be used
as this type of object, the next set of bits should be used as that type of object, you can use the same, you can have a base type and pass that around, and then as long as the memory is allocated for more things to be added to it, you can upcast it to something more specific. So for instance, you get a py object,
look at the ob type, oh it's an integer type, and then you call the method to sort of populate the integer, and now you have a full hydrated integer field with a pointer to the number that it represents, for instance. And so, in order to actually be able to use this, we need a way to be able to turn our Python
objects into C type structures. In C Python, the built-in function id returns the object's address in memory, so it's very convenient for this purpose because the from address class method on structures takes an address in memory and returns a struct for it.
A side note, C types has a pointer function that transforms a type into a pointer of that type, we'll use that later and it'll make more sense in context. So here I'm getting the reference count of the py object, you can see in the first case it's seven, and in the second case it's eight, the difference is that in sys get ref count there's an extra reference for the argument
inside the get ref count function. All right, so now let's try to override int add. We'll return to our original naive approach of patching integer addition in the dict. So if we could mutate the add item in the dict,
we would be able to achieve our goal now, but since __dict__ isn't a regular dict, and it's some sort of proxy for a dict, we need to do something with that. And it turns out that the underlying C struct has a pointer to an ordinary mutable dict. So another side note, a useful feature
of C type structures is that like I mentioned about how like py int object can extend py object, you can do that with the class inheritance syntax in Python. So here we have class dict proxy, and because the struct starts with the py object attributes, those are the first fields,
and then the next field is the dict. When we use that, we can create a function that lets us mutate the class. It uses a little bit of trickery. This is the functionality that I sort of borrowed from the forbidden fruit library. You populate the dict proxy, then you create a temporary dictionary
and you set a key, set an item with the key none and the value that dict and return it, and that the thing you get back is a regular Python dictionary that you can change. So we make it mutable, we change the __add method, and it doesn't work. I mean, it does in a sense. You can call the __add method
and you get the desired result, but in every other case, it doesn't. So why doesn't overriding __add suffice? In the C code, if you look at the function that adds numbers, it's doing something where it's looking at the slot function for the int type.
So kind of a quick look here at what a py type object is. And so this is, again, extends the py object and has extra attributes. One of them is a list of methods if the thing is a number. In the case that it is a number,
it has a couple of functions. So we have number add, number subtract, number multiply, et cetera. For the first thing there, the syntax is a little bit odd for defining types in C that are functions
or pointers to functions. The py object star is the return type, so it's returning a pointer to a py object. Star binary func is the name of the thing. It's a binary, binary func is the name and it's a pointer. And then the next parentheses are the arguments. So we can set that with C types
by using the C func type function, which takes return type and then arg types as a list of arguments. So we duplicate that there. And we use a structure to represent the py number struct so here we have the fields for number add and we specify as the type the binary function.
All right, so we put it all together. We define the py type object, copying over the attributes that we had over here. And we populate it with from address id int and then we try and call number add and we get what we would expect, two plus two is four.
So ignore the top function there. So again, we're gonna try and do it as a context manager so we don't blow everything up in the process of testing. We get the tricky thing with like the original approach of grabbing the original add function and then using it again is that it wouldn't work
if you were to just use nbadd because when you change it, the thing you're then referencing is still a pointer to that change thing. So you need to get the address of the original function and then create a new binary function that points to that address. So that's what we do here in these first two lines of old nbadd address and old nbadd.
We define our new add function and then we replace the function and call the original the same way we did in that sort of naive approach. We get pretty close. If you just do two plus two, that doesn't work but eval two plus two does.
So let's use the dis module to see what's going on. Dis lets you pass it either a code object or a string of code and it'll give you back diagnostics and information about the tokens that the abstract syntax tree that Python uses to represent it. So here we have a global variable two equals the number two
and then a function add two plus two and then inside return two plus two and you can see in the how it interprets that is load the global two, load the global two again, add the last two things and then return the value.
So that binary add instruction op code. In the C Python code, there's a little kind of optimization where it checks if both of the things are ints and if they are ints, then it does the addition in C as a like sort of speed optimization.
So it doesn't actually call underscore underscore add. So how do you fix that? Well, you change the class of two to something other than int. Call it int two. So define int two extends int, has its own add function so it's not an int, not exactly anyway
and then we go to set it, type error, class assignment only for heap types. But of course, this is something we can get around by manipulating the structures. So here we have a function that sets the type. There's a little bit of, I was like sort of unsure about whether I should include
like fully functional code that had things that increased references and decreased them. For simplicity, remove them. I opted to keep them in but the important thing here is the grabbing of the old type, populating the pie object from the new thing and then overwriting the OB type and then having a sort of context manager
to let us change two to be an int two and then back to an int afterwards. So here's where we are. With override type two int two, we evaluate two plus two, it's five. So same results we got before. We define a variable two and then we override type two
to int two and print it, we get a five. But if we do just two plus two, not the variable two plus the variable two, we get four. Why? The final obstacle is something called peephole optimization. It's called a peephole optimization because it sort of looks through the abstract syntax tree
in little windows and tries to find bits of code that it can fold together or simplify or turn into something faster. So one of the things it does is it looks if there are two literal integers being added together and if they are, it combines them. So if we use that disk module and instead of using a variable two
like we did the last time, we use a literal two, then we see that in the byte code, it just has a four. And if you were to look at the pyc file, if you were to analyze a pyc file, you would see a four there. There would be no two and two. So that's why that kind of prevents us from doing what we want it to do.
In CPython, this is performed by the C function pycodeoptimize. It doesn't occur in eval, which is why eval two plus two works, but not when it's defined in a Python function or in an interpreter. So this is the craziest part of the way to get around this, which is to disable
Python code optimization with what's called a trampoline function. You basically take the memory where the pycodeoptimize function is, you overwrite the first few assembly instructions to jump to a new address. That new address is a no op function that just returns the code unchanged
and increments the reference and you go from here and success. You run tests and our test passes. So yeah, that's sort of my talk.
Quick kind of thing, because we're allowed to do this. We're hiring at the Atlantic. We're looking for DevOps. We're looking for full stack. We're looking for front end developers. We just opened an office in London. James Fallows, if you follow the publication, James Fallows has moved to London
and he's opening the London office and a few staff writers are moving with him. So I think there'll be room for developer in that office and then also obviously plenty of room in Washington, DC or with our marketing team in New York. And contact information and again, that URL for getting the code,
which has a lot more detail. Like I sort of showed a simplified version of PyType object. It has like a fully fleshed out, like every attribute matches, whatever it is in Python two and Python three and has the differences between both figured into it. Thank you.
We've got plenty of time for questions. So if you have any questions, please raise your hand. We have a microphone for recording them.
Can you show us the code which you use to patch the assembly code to introduce the trampoline function? Mm-hmm. Could you, maybe I'm not sure I understand the question.
Which code? In the last slide, you showed us the one before, yes. Oh, this. Yes, how did you do it? Heh heh. Sure, I can pull that up.
What's that ring here?
So in order to paper over the differences between Windows and Linux and Mac,
there's like a lot of sort of boilerplate constant stuff here. But the main thing is there's a function called mprotect. Where it exists is different on Windows and in Linux and Unix and OS 10. But you give it an address in memory and a length
and you tell it how you wanna change what that memory does. So by default, the executable memory of Python that gets loaded into memory is not writable. But you can use this function to make it writable. So the first thing you do is you figure out where the function is and how long you need to set the jump instruction for.
And then you make it writable there. So that's the mprotect stuff here. You define your no op, which is to increment the reference because that's what it does in the original C Python code. So it seemed, you know, maybe it's superstitious, but it made sense to do it.
And if you're overwriting a function in this way, you can't return pointers to Python objects in the normal way. So I just return the address, which it interprets correctly as a pointer. We have this quaternary func, a function that takes four Python objects and doesn't return anything.
That's what the Pycode optimizes. The first one is the code, and then there's like the locals and the frame, et cetera. And then here's the override. So if it's not an x86 architecture, it throws an error because I didn't write it for ARM or any of the other, you know, machine languages.
You get the pointer to the old function and to your new patched function. You change, the jump instruction will be five bytes. The first one is the jump byte code, which is E9. And then the next four is a relative offset to the new address. So you set those five bytes as readable, executable,
and writable. You find the offset between the two. And then this was from me kind of just testing it. And then you combine them all together into a list of opcodes.
The multiplication operator with C types lets you create arrays of things. So here we have like an unsigned byte and multiplied by five will give us an array of five of these things. And then we do from address. We get back an array that has five bytes each for the different instructions we're gonna overwrite.
And then we just use the slice syntax to replace it. And then once you do that, that patches it to be a jump instruction of the new function.
Hello, and thank you very much for the talk. So it's basically, I have two questions. What kept you going? Because there's always a certain point where you might say, I can't keep doing that. And how long did that take you?
It actually didn't take all that long. I was pretty determined. I think this particular bit of code is interesting because this is how a lot of rootkits and things work is that they'll sort of set memory writable and then change actual assembly instructions in memory
in order to exploit some sort of kernel vulnerability or something like that. And so kind of delving into that was interesting in its own right. And so that kind of kept me going along that route. And then once I knew how to do it, then I sort of transferred it to this.
So it wasn't so much that I was determined to get two plus two to equal five so much as that the things that I had to do in order to accomplish it and the things I had to learn were interesting enough on their own that that kind of kept me going.
Thanks for the talk. My question is, besides having fun, have you ever had the need to do such stuff in production or no?
No, although there is one case where I have considered using something like this. If you want to change, so going back to that original attempt to do the patch class where you can just call super to call the original instead of having to do something weird like pass the original function as the first argument,
which is what a lot of patching things do, that would have been useful for, I've been working on this combination of pip tools and pip does this thing where it does a reset hard for any editable repositories
and it'd be really great if I could hook into that and instead of have it reset and potentially destroy your changes in an editable package that's in the SRC folder, it prompts you and says this branch is dirty, there are uncommitted changes, do you want to do something with that
as a way to make it easier for our developers to sync up their code with the latest changes. On our GitHub, so that was like the one place where it's not exactly production, right, because it would only be run within the span of like a developer updating some requirements
on their own development machine as opposed to say in production where you'd build from scratch or something like that. But a developer wants to save time and just update the one package. So that's where I kind of looked into this. It never has actually happened, but it's a possibility.
Thanks for the talk. You mentioned that the people optimizer runs quite early
and that the twos do not even show up in the by C file. Is it the people optimizer removing these twos or is it, do you need to disable the people optimizer right at the interpreter's start or can you do this on the spot? I'm confused on when things actually run.
When does this optimizer run and what's the reason this four shows up in a by C and not the two? Is that the same or are those two different passes? Sure, so the by C is sort of like a cache of the compiled byte code. So if the by C was generated without having run
this people disable optimization thing, you're gonna, it's not gonna work. Like the by C is gonna trump it. There's like an environment variable you can set to not use the by C files. But, and so actually in the run tests, like the sort of test runner,
I set sys.don'twritebytecode equals true just so that there aren't any by C files to sort of mess things up and cause that to occur. But yeah, so that's sort of how it connects with the by C files. Otherwise, if there isn't a by C file yet, when it executes and it's set to write the by C file,
it'll use whatever is in memory. So either the people optimizer is enabled or not, and then generate the byte code and save it to disk after that. So hypothetically, if you ran this once and it generated a by C file that had the twos distinct,
then you would, it would stay that way the next time you ran it with by C files. Hi there, you had a really interesting Hackish talk. In the previous file, you had some Microsoft,
I know, hate talk, something like that. It seemed interesting. How easy was it to find? A bit higher, narrow, higher up the file. You have some constants, and then you shit talk in Microsoft. Oh, these here? Okay, yeah. Oh, I know. Yeah, so these, where do these come from? Yeah. Yeah, so for the,
the constants for read, write, and execute privileges, those are in this sys mman.h include file that's on Linux Unix systems. And then Microsoft does this thing where they have bit masks.
So, you know, like the benefit of bit mask is like you have one and you have two, and then you can and them, and then you have like the byte, the first byte is one and the second byte is one, and so they're both on. Microsoft does this weird thing with this function where beside, like even though all of the values are offset by one in the list of bits,
they also have extra properties that combine them instead of just like anding them, which would be the sensible thing to do. But anyway, these I found from like a Microsoft.com you know, like explanation of how mprotect works on Windows. Okay, because it seems like horrible to be bugging
in a test environment this. It actually like, I followed the spec and it just worked. I mean, because the functions are pretty similar, they're like the differences are kind of minuscule, so once I had it working on Linux Unix, then I just had to like make a few tweaks to get it to run on Windows. Like the flags that you pass are slightly different,
the order of arguments is different, but in every other respect it's the same. Okay, if there are no more questions, just a quick note on the EuroPython app in your mobile.
You can just rate the apps, you can add comments, you can thank the speaker, you can add any feedback you want, any constructive feedback for the speaker. So please thank again Frankie.