Exploring Python Bytecode

Video in TIB AV-Portal: Exploring Python Bytecode

Formal Metadata

Exploring Python Bytecode
Title of Series
Part Number
Number of Parts
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Anjana Vakil - Exploring Python Bytecode Do you ever wonder what your simple, beautiful Python code looks like to the interpreter? Are you starting to get curious about those `.pyc` files that always pop up in your project, and you always ignore? Would you like to start investigating your Python code's performance, and learn why some programs you write run faster than others, even if the code looks more or less the same? Have you simply fallen so completely in love with Python that you're ready to peer deep inside its soul? If you, like me, answered "yes" to any of these questions, join me in an illuminating adventure into the world of Python bytecode! Bytecode is the "intermediate language" that expresses your Python source code as machine instructions the interpreter (specifically CPython, the "standard" interpreter) can understand. Together we'll investigate what that means, and what role bytecode plays in the execution of a Python program. We'll discover how we simple humans can read this machine language using the `dis` module, and inspect the bytecode for some simple programs. We'll learn the meaning of a few instructions that often appear in our bytecode, and we'll find out how to learn the rest. Finally, we'll use bytecode to understand why a piece of Python code runs faster if we put it inside of a function. When you go home, you'll be able to use bytecode to get a deeper understanding of your Python code and its performance. The adventure simply starts here; where it ends is up to you!
Bytecode Module (mathematics) Adventure game Functional (mathematics) Real number Data recovery Expert system Bit Machine code Revision control Type theory Loop (music) Different (Kate Ryan album) String (computer science) Computer programming Core dump Right angle Software testing Traffic reporting Data compression
Bytecode Graph (mathematics) Syntaxbaum Virtual machine Abstract syntax tree Compiler Mereology Machine code Computer programming Network topology Object (grammar) Interpreter (computing) Operator (mathematics) Data compression Operations research Source code Parsing Computer program Bit Stack (abstract data type) Machine code Control flow System call Virtual machine Abstract syntax tree Compiler Interpreter (computing) Object (grammar) Control flow graph Abstraction
Intermediate language Interpreter (computing) Computer program Interpreter (computing) Sound effect Representation (politics) Machine code Representation (politics) Data compression Machine code Computer programming
Operations research Machine code Virtual machine Machine code Machine code Virtual machine Formal language Personal digital assistant Interpreter (computing) Series (mathematics) Operator (mathematics) Interpreter (computing) Object (grammar) Series (mathematics) Data compression
Bytecode Logical constant Greatest element Functional (mathematics) Computer file Link (knot theory) Multiplication sign Motion capture Mereology Machine code Computer programming Number Mechanism design String (computer science) Endliche Modelltheorie Data compression Compilation album Disassembler Module (mathematics) Computer file Machine code Line (geometry) Compiler Type theory Interpreter (computing) Object (grammar)
Military operation Multiplication sign Price index Parameter (computer programming) Function (mathematics) Line (geometry) Series (mathematics) Data compression Row (database) Number
Logical constant Trail Information Structural load Set (mathematics) Price index Parameter (computer programming) Line (geometry) Parameter (computer programming) Variable (mathematics) Perspective (visual) Machine code Number Wave packet Subject indexing Different (Kate Ryan album) Military operation Operator (mathematics) String (computer science) Right angle Data compression Reading (process) Disassembler
Web page Bytecode Functional (mathematics) System call Link (knot theory) Multiplication sign Correspondence (mathematics) Parameter (computer programming) Number Operator (mathematics) Endliche Modelltheorie Position operator Operations research Mapping Structural load Electronic mailing list Bit Machine code Sequence Positional notation Sample (statistics) Order (biology) MiniDisc Object (grammar) Resultant Library (computing)
Bytecode Complex (psychology) Functional (mathematics) System call Module (mathematics) Line (geometry) Multiplication sign Letterpress printing Parameter (computer programming) Function (mathematics) Mereology Machine code Attribute grammar Power (physics) Number Operator (mathematics) String (computer science) Oval Flag Arrow of time Endliche Modelltheorie Error message Position operator Disassembler Social class Module (mathematics) Scripting language Mapping Validity (statistics) Gender Computer file Data storage device Content (media) Machine code Line (geometry) Sequence System call Positional notation Personal digital assistant Function (mathematics) String (computer science) Social class Modul <Datentyp> Quicksort Object (grammar) Disassembler
Module (mathematics) Functional (mathematics) Divisor Division (mathematics) Bit Line (geometry) Machine code Parameter (computer programming) Computer Loop (music) Operator (mathematics) Oval Arrow of time Error message Data compression
Functional (mathematics) Computer file Multiplication sign Range (statistics) Machine code Line (geometry) System call Number Positional notation Loop (music) Programmschleife Symmetry (physics) Different (Kate Ryan album) Personal digital assistant Mixed reality Right angle
Beat (acoustics) Structural load Data storage device Electronic mailing list Price index Attribute grammar Machine code Price index Machine code Subject indexing Positional notation Finite difference Object (grammar) MiniDisc Iteration Asynchronous Transfer Mode
Bytecode Functional (mathematics) Machine code Line (geometry) Multiplication sign Combinational logic Data dictionary Power (physics) Different (Kate Ryan album) Operator (mathematics) Data conversion Codierung <Programmierung> Predictability Structural load Transport Layer Security Data storage device Bit Machine code Line (geometry) Opcode Variable (mathematics) Subject indexing Finite difference Mixed reality Statement (computer science) Interpreter (computing) Local ring
we are we from durable but this rules because of all the reports were called Solzhenitsyn's was because FIL you need like you know just stand up comics 0 when the promoters on high mountain and that you know and I yeah I hope you guys are excited about code because I am everybody hear me OK great so and who might well my name is on tonight and I'm a hyperbolic I've been addicted by gone for probably about 3 years no I am right now I I used by by another now Ricci and is allowed to do some testing work for them but what I wanna talk to you about today is and some explorations into the core of pi bonds that I started doing while I was a participant the reader center which is a really cool programming community in New York City where you're allowed to just follow whatever excites you about programming so am today I'd like to tell you a little bit about an adventure that I had that involves getting started with Python bytecode I'm by no means an expert in it but I just wanted to bring you along on my 1st encounters with that and show you why I think it's really cool so while I was at the recovery center and I came across this puzzle I think of it as a pi bond puzzle it turns out that Python code runs the faster if you stick it inside of a function and then call that function maybe you guys are already familiar with this I was not but that for example if we have a rather lengthy for a loop that does nothing useful adjuster just evaluates a variable I for each i in a rather long string of ice and if we call that just in the 5 on module it takes quite a bit longer than if we stick it inside this random function and then call that function 1 and to me this was puzzling because I'm looking at the source code I don't see any real meaningful difference on in fact always see in the inside function version on the right is that if anything by should have more work to do because it's going to create a function and then call it so I couldn't really understand from looking at the source code why this would be so much faster at the right hand side turns out that looking at the byte code can give us a little bit more insight than looking at the source code for certain types of pipeline of puzzles like this 1 and 2 and then all has to do with what happens
when we run python code so this is something I hadn't really thought too much about what happens when I actually executed paper program and today I am I'm just talking about the pi bond a lot of this is implementational details specific to this the Python interpreter but hopefully that a lot of you guys are using and differences between by on another interpreters also really fascinating but not the topic today so when we're using the pi bond to run some code
we start out with our beautiful paper on it's easier read a nicely indented source code that looks fantastic and that gets compiled by a part of the part of the by on the call compiler it gets turned into a parse tree and have set syntax tree of control flow graph what those are doesn't really matter purposes right now they're all just different abstractions of what we wanted to do the important part is that ultimately gets compiled down to code which obviously will be talking a bit more about in a moment and and that whatever it is for now gets passed to the interpreter n is what the interpreter actually runs the interpreter being a virtual machine that is performing operations on a stack of objects so the interpreter executes that byte code and then you get out whatever all some stuff your Python program is designed to do right OK so this code what is it well
uh as we saw it goes kind of in between income in between place between your source code and the effects of your program so it in 1
sense it's an intermediate representation of your program and in fact it's the representation that the
interpreter itself sees the interpreter unfortunately doesn't get to look at your beautiful readable triphonic source code it only gets to see this code so and if we think about the interpreter
as a virtual machine we can think about code as the machine code for that virtual machine so when we think of more of languages that are traditionally considered compiled we think of taking source code and translating that into machine instructions for an actual physical machine in this case but it's pretty much the same idea it's just that the machine is virtual is the Python interpreter instead of the actual physical machine and so on since the virtual machine the
paper interpreter that we're dealing with is is basically a stack machine it the baker that we give it is a series of instructions for what to do which objects to add on that stack how the how do movement which operations to perform on objects there already on it how pop things often return them back to us so it's a series of instructions the bike and
another interesting thing if you've ever wondered about those . pricey files that pop up all over the place when you're importing and pi bond models these are actually captures of the bike that this is the byte code that the compiler has already spit out and the nice thing about this caching mechanism is that since we saw that from source code to execution we have that those 2 steps the compilation and and interpretation if we haven't updated the source code since the last time we ran the program we can skip the 1st part we can reuse the bike that we already compiled before so that's what those by Steve and then if you've ever tried to every 1 of those open 1 among they're gobbledy-gook they're not meant for us newsroom easily humans to understand so how can we humans read this code that's intended to be red red by I well there's a really handy module called this
which has a fund name it stands for disassembly so disassembling byte code and the documentation is is right up there with the link in there I and this allows us to analyze certain types of objects to to read it that the the by code for that object in a way that we humans can understand instead of looking at the bytes themselves which isn't that helpful to us and so for example if I have a really simple function that says that that is called hello and returns on can somebody help me pronounce with the Basque can't show that trying anyway and so if we this this function Hello we get our 1st he at disassembled by could these 2 white lines at the bottom here are our really really simple but bytecode we just have to instructions here and without really knowing what all these numbers are what the problems are what we're looking at we can get a sense for what's happening were loading some kind of constant string onto the stack and returning so don't let's break it
down what exactly are we looking at here what is it mean when we see the output of this so we have a
series of rows where each row in the output is an instruction to the interpreter and on the left hand side a lot of the time we'll see a line number to here is the line in our source code so this is just for us to help us know what hell the source code lines up with the like not every line in the instructions Wilhelm number and you
can see here the return value line doesn't have 1 at because sometimes more than 1 instruction can fit on 1 source code lines so and sometimes we we we only see the line number 1 it's the instruction that starts going and next to
that we can see an offset in bytes and how far into the into this string of bytes is this particular operation that's not super interesting and my perspective for us humans but what is interesting is the next thing which is this training load const load constant so stands for and that's the name of the operation and then it will look at some more of those and see what we can find out about all the different possible operations you could encounter when you're reading this is assembled like I if the operation in question takes arguments which not all of them do but if it does then you'll see some information about the arguments on the right hand side so those last 2 columns on the right and we see the argument index which interpreting that and what exactly means indexed in what object that depends on the operation there are a few different places that keeps track of the different values like constant or variable names that you would need to carry out a particular operation and that's all there is something you can look up and documentation but what's more interesting for our purposes now is the value of that argument which you can see the right in parentheses and this is pipelined kind of giving you a silly human a little hint about what it is that that this set of operation is operating so some operations
we've already seen load constant which takes an argument see and it pushes the onto the top of the stack DRS and then the things like binary had which takes whatever is already on the top of the stack the top 2 items add them together and put that results on top of the stack and then there's things like all function which its argument is a bit strange at argument tells it how many position a large keyword arguments that function is expecting so that it knows how many objects to take off of the top of the stack and in which order to pass that function so there's a time to be used by a would not be able to cover them all even if I had an hour or more holding up but they're all conveniently documented in the documentation for the disks library the this model so that link to the top of the page here and I'm free to these operations the names that we see in these operation names are just for us humans Python doesn't care have a number for each of them of course that's called the offer coded the operations and if you're curious about what the correspondence between a name and code is for a given operation you can use and these are to read this off map and this opname up map is a dictionary where you can just look up a particular operation name and find out it's code and if you haven't already know the code you can have a passage name and here it is also an indexed list of all the the sequence of all the operation so you can find out which code corresponds to which this and
to us and now we have a basic idea of how this function works how we can disassemble some bytecode what can we use a on let's try to this something that I know we can get of this
name of so we so we can just a function here's a nice little by biconic example 1 we're adding spam and and if we did that we see we have a lately ever so slightly more complex sort of thing to do here which is for loading 2 things on them and think that they were doing a binary act on that cool starting to get comfortable with what else can we get out of class for a really simple class here the power at its got uh 1 attribute called kind of the Norwegian blue this is money by buying him over the the government and and and the method is that which always returns true it and when we passed that character class to today we see that it disassembles each of the methods on that class so including the construct a map and so here we've got let's see um I knew and new operation names here in the disassembly of gender in here we have store actually cool so we're starting to get familiar with some of these new operation and in my experience a lot the times the self explanatory but if you're ever curious again on electrical and without operation in does just go to the best documentation at all laid out and another thing we can
disassemble it for using Python 3 . 2 or newer is a string of that contains valid Python code so we don't have to actually uh put echoed in the model we can just use it to disassemble the string directly gets compiled who got object and that could object that disassembled so here we're just assigning Spam and eggs on 1 line which is the cool thing have at and we see anything like unpack sequence also a pretty self-explanatory operation OK we would about an entire module let's have a really simple model called nite stuff I have 1 line it says prints the string I'm I can actually disassemble that straight from the command line by passing the IBM flag and the this module and then this the entire contents of that might stop schools and how we see uh how we're calling this function prints and we see the argument call function is like some number of positional and keyword arguments that I was talking about before but what we can do what we can gather from this is that were loading on this constant no more calling the function print on it course I think are I what about another way to this the model well as we saw we can use code strings we can we can this code strings so what if we read in the module using the open . read a function so now we have the whole content of the module of the string and we can guess that cool it's basically the same thing as last time there's a little 1 kind of return there but essentially we're getting the same functionality that's the another way we can we can just a module is by importing it and then just the imported as object in this case might stop I got a little more complicated we added this method is flesh when function is flesh wound which always return true and as you'll notice when i important it's the whole models getting executed a printing but in in the in the disassembled bytecode we don't see any mention of the printing part always see is is Fletcher so when you do it this way when you try to just a module this way by importing it and it's only going to disassemble the functions in a module anything else that they're just kind of the script is going to get is not going to get put in the output of this so that's just something to know about the different ways of using this OK I there anything else we can this but nothing when we pass no arguments to in this case we're not missing nothing reducing the last trace back the last hour and which is a cool thing because let's say I tried to print this variable spam which I had forgotten to assign I forget his name error of course and if I do dislike this with no arguments I can see the byte code that tells me exactly where data came from so you see the arrow and to the left of the operation names there that indicates that OK when I loaded prints that was finally found print OK but when I loaded spam and I had a problem so these are some different things that we can guess which if you're like me is just find to spend lots of time just basing everything you can get your hands on just to see what they do on an apparently can also help you in solving some puzzle challenges of 1 of the sponsors have of but other than that why
do we care about doing this why we want to do this if we're not at a conference where we get 3 USB their backs we solve problem well as we saw
when we use and the distractors with no arguments that that's a really useful debugging tool because sometimes the error messages that we get from Python although they're usually wonderful sometimes they don't tell us everything we need to know so for example let's say that let's say I had a line in a really complicated mathematical uh uh code there that that is dividing to has 2 division operations on the same line so hampered by explicit and I by spam that gives me as your division error and it tells me what line in my code the 0 division error came from but it doesn't tell me whether with eggs or a spam they gave me the if I this the trace back I can actually see that OK we're going to reload hand we loaded eggs we did true by and there was no problem aha OK so and was fine then we looked at him again we look and then we did that divide that little arrow says that's where the problem was so I know that the problem in my complex mathematical computations is spam and that's what I have to go back and so this can be a really cool debugging tool for certain situations and it can also be a helpful tool to solve puzzles not just the kind of the sponsor has but also the kind that I mentioned at the beginning where we have and this for loop which takes a lot longer outside of function than N and yet in the source code it looks pretty much identical so let's try and get a little bit more insight here by dissing this outside function module and um the run loop function from the inside a function module and see how they compare OK so we have functioned up right now we know a few different ways of using a module I mean
choose the the reading the open . read method get a strangle outside and then dispatch so this is now what Python is when we run that outside function OK I don't understand all this will necessarily need to i can get a general sense of what's going on loading this range function we got a really somewhat big number that were loading and then and we have this new things get data and for eta for inter that's our that's our formerly there so that's about looks like to cool and then inside of that where restoring i I guess for each time we have to follow them were loading up because we had a really really useful for loop and a code that we just saw and and OK right seems to make some somewhat sense let's see how it compares inside so I get from from the inside function of time file will we care about is this run loop function so and then import that in a minute call it inside just for convenience and symmetry and then amended this inside at 1st glance this looks pretty much the same as what we just song so let's see switching back and forth early vassaldom of anything as the inside outside and so OK OK so what we notice differences well 1st on the left-hand side we notice that some of the line numbers were different that's because we had 1 extra line India in the inside function of the function definition that's growing unimportant well we got hot with the range function in 1 case it's locked loops floating as the
name in 1 case it's loading as a global on maybe there's some difference there but we're only doing that 1
so as probably not that big a deal what we probably care more about is what happens inside the iteration so after that for and here we see OK when we're doing inside we're using something called store fast and load fast outside its store name and load name see 16 and 19 there so I don't know it does mean store fast sounds like it would be faster and load fast sound like be faster but I don't know why or what these do so how can I find out
well they can investigate by going into the deaths of documentation where has a list of all of the different operation modes and told you what they do just copy this over here OK storm this in names in the name of that names the of the load name using code names again OK so it looks like store name has something to do it has to look up something with an index and then it goes find the active you and so maybe that's something that could be possibly slowing down where store fast and load fast they're using something else called co their names and we are saying that looking up indices and whatever so that might have something to do with it this is starting to get me on the right path and if you're really interested in digging in if if the disk documentation haven't answered all your questions you can go right to the beating heart of paper
on indeed deeper into
CML . which is where the Python interpreter and processes all these different codes and there really cool talk by Alison character called a 15 hundred lines switch statement power your price on this is true there is a huge switch statement where it's it's telling survive on what to do with all the different operation of codes that you might encounter and so and if we look at the actual code for those operations load fast and load name we see that fast is like a little the things like 10 lines and it involves a look up into an array of fast locals which sounds fast because it is that I load name on the other hand 1st of all it's more code it's longer it's more complicated about 50 lines and involves a dictionary lookup which is quite a bit slower and so it turns out that 1 of the main speed differences here which is a little bit tangential to the bytecode discussion is that when you have a code inside of a function because when you decide to find a function you know how many variables you need in function pi bond can just assign a fixed-length array and so when you need to look up something in that function you can just index into that array and what are really quickly whereas when you have it in the global scope it doesn't know you might you might assign new variables all the time so it keeps things in a dictionary and so looking up from the dictionaries anyway there's another thing called the opcode prediction which makes it even faster if you combine certain operations together because the pipeline can predict what's coming next and I have an idea of the it can save some some text by by doing common operations that always go together uh it by predicting it in advance and so the combination for inter and store fast happens to be 1 of these predicted combinations it moves a lot faster than combining for eta and store name so but if you're curious eyes of so strongly suggest you check out this really cool StackOverflow conversation wires by the encoder invested function and Alison captors talks which talk a bit more about how we can start exploring on this giant switch statement that tells paper how to interpret all these different operations


  112 ms - page object


AV-Portal 3.21.3 (19e43a18c8aa08bcbdf3e35b975c18acb737c630)