We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Exploring our Python Interpreter

00:00

Formal Metadata

Title
Exploring our Python Interpreter
Title of Series
Part Number
32
Number of Parts
169
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Stephane Wirtel - Exploring our Python Interpreter During the last CPython sprints at PyCon US (Montreal), I started to contribute to the CPython project and I wanted to understand the beast. In this case, there is only one solution, trace the code from the beginning. From the command line to the interpreter, we will take part to an adventure. The idea behind is just to show how CPython works for a new contributor. ----- During my last CPython sprint, I started to contribute to the CPython code and I wanted to understand the beast. In this case, there is only one solution, trace the code from the beginning. From the command line to the interpreter, we will take part to an adventure * Overview of the structure of the project and the directories. * From the Py_Main function to the interpreter. * The used technics for the Lexer, Parser and the generation of the AST and of course of the Bytecodes. * We will see some bytecodes with the dis module. * How does VM works, it's a stack machine. * The interpreter and its main loop of the Virtual Machine. The idea behind is just to show how CPython works for a new contributor to CPython. From the command line, we will learn that Python is a library and that we can embed it in a C project. In fact we will see the Py_Main function to the ceval.c file of the interpreter. But there is no magic in the CPython code, we will travel in the lexer and the parser of CPython, and why not, by the AST for one Python expression. After the AST, we will visit the Compiler and the Bytecodes for the interpreter. Of course, we will learn there is the peepholer where some basic instructions are optimised by the this component. And of course, the interpreter, this virtual machine is really interesting for the newbiew, because it's a big stack where the bytecodes are executed one by one on the stack and the ceval.c file.
Interpreter (computing)Goodness of fitCore dumpForcing (mathematics)Data miningGame controllerInferenceLecture/Conference
Self-organizationEvent horizonProjective planeRight angleNatural numberGroup actionComputer animation
SummierbarkeitStudent's t-testScheduling (computing)Core dumpResultantStatement (computer science)Source codeLetterpress printingExpressionPattern languageComputer animation
Software developerCASE <Informatik>Letterpress printingLaptopElectronic program guideReverse engineeringLecture/Conference
Form (programming)Direction (geometry)Software developerComputer animationSource codeXML
Metropolitan area networkPoint (geometry)Parameter (computer programming)Reverse engineeringHydraulic jumpTraffic reportingComputing platformChecklistRight angleDivisorFlow separationSoftware testingData structureSource codeTheorySoftware developerPresentation of a groupCore dumpCloningElectronic program guideSource codeXMLProgram flowchart
Witt algebraCompilerSource codeGraph (mathematics)CompilerElectronic program guideVirtual machineEmailElectronic mailing listMessage passingDependent and independent variablesComputer programmingCodeLink (knot theory)Flow separationInformation systemsVideo gameCondition numberComputer animationLecture/Conference
Open setInternet service providerElectronic mailing listStudent's t-testCore dumpValue-added networkSoftware bugDependent and independent variablesSoftware bugOnline helpArithmetic meanEmailInterpreter (computing)Semiconductor memoryPoint (geometry)Message passingSoftware developerError messageParsingElectronic mailing listRandomizationCore dumpXML
Goodness of fitCASE <Informatik>Arithmetic meanSoftware maintenanceProjective planeLoop (music)Intrusion detection systemEmailSoftware bugElectronic mailing listMereologyLecture/Conference
Port scannerElectronic mailing listSign (mathematics)Source codeWebsiteParameter (computer programming)StatuteCodeXMLLecture/Conference
Density of statesSoftware developerDataflowSoftware testingXML
Electronic mailing listFinite element methodDensity of statesDataflowData miningSoftware testingRight angleKernel (computing)Source codeLecture/ConferenceXML
Software developerBitDataflowHuman migrationSoftware testingDensity of statesPatch (Unix)StapeldateiMessage passingLecture/ConferenceXML
Multiplication signStapeldateiRevision controlNeuroinformatikOpen setComputer programmingDirectory serviceSource codeProcess (computing)Branch (computer science)InformationPoint (geometry)Computer iconPatch (Unix)Software bugDifferenz <Mathematik>Lecture/Conference
Directory serviceMetropolitan area networkMultiplication signFormal languageFormal grammarEndliche ModelltheorieProcess (computing)Library (computing)Graph (mathematics)Directory serviceMereologyParsingModule (mathematics)XML
Directory serviceGrand Unified TheoryInterior (topology)Electronic mailing listHigher-order logicMereologyObject (grammar)Directory serviceModule (mathematics)ImplementationResultantComputer programmingData dictionaryLibrary (computing)Lecture/ConferenceXML
SoftwareFormal languageCondition numberExpert systemTelecommunicationLibrary (computing)Lecture/ConferenceXMLComputer animation
ExpressionTable (information)ResultantComputer animationLecture/Conference
ACIDLine (geometry)Interpreter (computing)Metropolitan area networkCompilerSequenceInterpreter (computing)Computer programmingWordMereologyComputer animation
Line (geometry)Library (computing)Computer fileSoftwareWindowResultantCASE <Informatik>Software developerLecture/ConferenceXML
ParsingMultitier architectureSource codeAuthorizationCellular automatonMultiplication signToken ringString (computer science)Type theoryLengthExpressionProgram slicingIntegrated development environmentData storage deviceGoodness of fitLecture/ConferenceXML
Functional (mathematics)Real numberParsingCodePressureWeightBit rateLecture/Conference
String (computer science)ParsingMetropolitan area networkInformation systemsNumberExpressionContent (media)Metropolitan area networkEndliche ModelltheorieCASE <Informatik>Equaliser (mathematics)BytecodeToken ringElectronic mailing listParsingXML
CompilerParsingParameter (computer programming)Mathematical optimizationCompact spaceCodeInterpreter (computing)SoftwareMaxima and minimaSource codeBytecodeCodeInformationEndliche ModelltheorieLecture/ConferenceXML
ParsingCompilerParameter (computer programming)Interpreter (computing)SoftwareBytecodeStructural loadLecture/ConferenceJSONXML
CompilerParsingStack (abstract data type)Virtual realityVirtual machineInterpreter (computing)Vector spaceRight anglePhysical systemLetterpress printingBytecodeResultantLecture/ConferenceComputer animation
CompilerParsingBytecodeStack (abstract data type)Pointer (computer programming)Interpreter (computing)Software testingSource codeTerm (mathematics)Information overloadMereologyDivisorCASE <Informatik>Physical systemLecture/ConferenceXMLProgram flowchart
ParsingCompilerGrand Unified TheoryArmMetropolitan area networkMereologyStack (abstract data type)Source codeStructural loadBytecodeSemiconductor memoryVirtual machinePhysical systemInsertion lossCodeLogical constantResultantData storage deviceFrame problemRoboticsNumberForestPopulation densityStreaming mediaSound effectBinary codeProduct (business)Lecture/ConferenceXMLProgram flowchart
Physical systemLine (geometry)Source codeCodeFunctional (mathematics)Frame problemEvoluteLecture/Conference
CompilerFunction (mathematics)Maxima and minimaMIDIFlagTheoryCodeFunctional (mathematics)CompilerDifferent (Kate Ryan album)Line (geometry)Endliche ModelltheorieBytecodeJSONXML
Computer programmingFunctional (mathematics)GEDCOMGoodness of fitCodeComputer animationLecture/Conference
DebuggerSource codeRevision controlPatch (Unix)BytecodeRight angle
BytecodeMetropolitan area networkEndliche ModelltheoriePoint (geometry)Lecture/ConferenceSource code
Internet forumMereologyDifferent (Kate Ryan album)CloningRevision controlOnline helpFormal languageCondition numberState of matterSummierbarkeitCompilerGroup actionJSONXMLLecture/Conference
Transcript: English(auto-generated)
And will you join me in welcoming our speaker, Stéphane Wirtel, and his talk about the Python interpreter. Thank you. I think I'm not in a good conference because, in fact, it's just exploring the Ruby interpreter. No, sorry.
Just kidding. Okay, my name is just Stéphane Wirtel, in French, Stéphane Wirtel. I come from Belgium, where we have some beers, and they force them, and the Python force them. Okay, of course, I'm a Python lover, since the first of the decade. No, okay, it's not a real important.
I'm not CPython core dev, just that. It's just an introduction. And yes, I'm just a small contributor to CPython and UNICOR, if you know the project UNICOR, and, of course, CPython. I'm a nominated member of the PSF, Python Software Foundation.
Of course, we can become a member of the PSF, and since two or three years, I'm a member of the Rope Python Society, the organizer of the Rope Python event. Welcome. So, just a reminder, I'm just an introduction, and I'm not a core dev, okay?
Just a contributor. If you want to contribute, just create some patch and send it. So, about the schedule. The schedule will be really simple. We have how to start with CPython, not how can I write a print statement or just that.
Just how can we read the source code and try to understand it. We will have a small question, just how to create, what's the result of Python of this expression, two more, two? And a small summary, okay? So, how can we start?
That's a good question, sorry. Maybe, we have the developer's guide. In fact, when you want to start to develop with CPython, that was my case at Python in Montreal, sometimes we can find some sprints.
There is a sprint on CPython, and that was my case where I came with my bag and my laptop, and I just asked to some developers, hello, I would like to help you with CPython. First thing, so, can you read the developer's guide, the dev guide, in short? Yeah, of course, I can. In this document, it's just a small document, sorry.
Here, please. No, it's not the direct. It's just, it's just the developer's guide. If you want to start to develop to hack and CPython,
you can read this documentation. You will read how to make a clone of the repository, how to become a core dev, how to, for example, you want to add a new keyword in the syntax of Python. There is a small checklist, in fact, a checklist with 20 points to verify, okay?
Just that. Yeah, how to become a, sorry. The dev guide is really interesting because we have some explanation about this, the tracker, the builders, the Python developer pack, the paths, and you can read everything.
So, that's really interesting because you have the getting started, how to compile Python, how can you help, for example, with the documentation, with the source code, or with everything. How to write a test and just how to run the test on several platforms. Okay, so, come back to the doc, my presentation.
So, the dev guide will explain the quick start, the grammar, how to change the syntax, and just the design of CPython compiler. Yes, we have the source code in Python. There will be the lexer, the parser, the pip holder, and just the py code at the end with the virtual machine.
So, you can find the documentation at this location. When you start, okay, I'm sorry, I have a question, how can I, not for you, but for me, when you go in a sprint, I have an issue, I have this issue, how can I fix it, how?
I don't have time, but you can send a message to this mailing list, Python Mentors. Python Mentors is just a big mailing list program where you can send a request and you will get some response from Guido van Rossum, Brett Cannon, David L. Murray, Victor Steiner, myself, and, of course, other developers.
That's really interesting because you can discuss about the solution for your issues, okay? In my case, I wanted to modify the interpreter, just the lexer, because I found an issue, an error. And when I send my message, my mail,
I receive a response from Guido where he told me, sorry, but, in fact, in Python, there is not one parser, but two parsers for the syntax, okay? So, of course, you want to start, where to start? You have the mailing list.
We have the mailing list, we have the anons, the bugs anons. In fact, when you create new bugs in the bug tracker of Python, you have a mailing list for that. You can follow it just to receive some notifications. When you want to discuss about a bug, you have one mailing list. This mailing list is just mapped
with the bug tracker, with random. If you need some help, you have the mentorship mailing list. If you want to discuss about one big point in the core of Python, you have the mailing list, the Python dev. If you want to create, if you have an awesome idea,
for example, the fat Python project to try to improve the performance of Python, you can try to submit something on the mailing list, the Python ID's mailing list, and you will see if you have a good result, a good feedback or not. In this case, that was not. And yes, today we discussed about the performance
with asyncio, with uvloop, and the rest. If you aren't tested by the speed of Python, you can discuss on this mailing list. That's very useful. It's a mailing list where we discuss about the internal parts of Python, okay? Not about how to use the best,
what's the best practice for the performance. Okay. So, how to contribute. Firstly, that's really simple. You go on the bugs.python.org website, and you create a user account. With this user account, you have to sign
a contributor agreement with the PSF, because the source code is the owner of the source code is the PSF. Looking for one stuff. Yeah, sorry, excuse me.
So, the step two, just how can I prove CPython? Firstly, you have the documentation. Please, we have a good documentation, we have some missing tutorials. For example, asyncio. We have the documentation about asyncio,
but we don't have a tutorial about that. How can we start with asyncio, can we use it, and the rest. We have a reference in the documentation. If you want to contribute, that's the right place. Yes, of course, you can create some issues, fill them, and if you find a bug, of course.
Or if you want a feature, a new feature. I'm going to show a feature, a small feature. We need some reviewers, yeah. If you look the source code of Python, per day, we only have 10 commits per day.
It's not really big. If you want to contribute, just review the patches, and we will be happy, okay? You will receive a good message, thank you, because that's really interesting and important for us. Sometimes, you can create a patch, propose it. Just create an issue, propose a patch, and rest.
Ah yes, and sometimes, that's really interesting because you have created your patch, and you can wait for six months before a review because we have some two or three reviewers. If you want to contribute, it's a good place and a good time.
Yes, of course, the program, the process is really slow. I have some issues, open issues, and they are open since two years. Sometimes, that's really difficult for me because, but why my patches not melt in the source code?
We don't have time, sorry. Yeah, the last point, just, we try to migrate Python to GitHub. Bye bye, make real. Yes, yes, yes, you can create account. You can use your icon on GitHub and just create a pull request. I prefer that.
Usually, when you try to create a patch, firstly, you download the branch. You create your patch, and you create a diff. This diff file, you will send it, you will upload it to the bug tracker. If there is a new version, your patch is just outdated.
So, okay, and now, what can we do? Just, firstly, when you start, maybe we can try to find the directories of Python and try to understand them. In fact, how can we find information? Firstly, with the documentation, the doc directory, just the manual of Python,
where you will find the syntax, the reference of the language, the reference of the libraries from the library. You can buy some books. David Bisley or Doug Elman have some good books. That's a good reference. The grammar directory, just the grammar,
where grammar is defined. It's just a text file, just that. If you want to modify it, you have the grammar directory and the parser directory, because if you had a new keyword, you have to lex them and just improve the parser
and the STL and the bytecode, of course. You have the lib directory just for the Python library, Python modules. For example, you have the telnetlib. If you want to modify it, it's just in this directory. For the modules directory, there's a C part of Python.
For the object, for example, you want to learn the implementation of the dictionary of Python, you can go in the object directory. We have the programs directory. It's just the Python executable,
because Python is a small executable. And Python is a library. You can load the library if you want to embed the Python in your software. Okay, about the documentation, we have the reference for the language, the reference for the library, the reference for the C API.
If you want to learn, you can read the documentation. And sincerely, we want a small tutorial. Who is an expert of AsyncIO? Okay, we have a new fix for you. So, just, yes?
Another guy, yeah? No? Oh, shit. No, it's really boring, because you have Victor Steiner and Andrew, they are discussing on the table near the lunch, and they try to improve the documentation about AsyncIO. They want to create a tutorial.
So, I have one question. Just one. Really, just one? What's the result of this expression? Okay, four. It's not very difficult. But for me, I don't want to know this value. I prefer to see the common line,
the lexer, the parser, the interpreter, the compiler, everything about that. And when you start to modify Python, okay, you have the Python part, but I'm just interested by the C part. When you execute the common line, you have that.
Firstly, we have the common line, of course. The common line is just executed by the python.c file. The python.c file will load the Python library. You can try on Windows OS X or just on Linux, you will get the same result. Okay? If you want to embed Python with your software,
because you have developed a software in C++, just use the Python library. Okay, the Python, I don't know. When you will execute the source code, automatically we will initialize C Python, and we try to load some models and read the source code,
convert it to an AST and execute it. So, the lexer. The lexer is just defined, if you are interested, of course, is the topic of this talk. You have the tokenizer.c. The tokenizer will take a string, a Python string,
will convert in some keyword, some expression. You have the first talk and the tokenize, that's good. For example, we take x equal two plus two, we have this result.
Where is my mouse? We have six token. Each token has one type and value. Okay? You can learn with that, if you want to use it for a disassemble of, if you want to disassemble Python. Yeah, you know that with Python 3.5,
we have a new keyword, two new keywords, I think await. In fact, it's not a real keyword in Python. It's just a function of the definition of your code, a keyword or not. The parser is really smart about that. If you will make an example,
a thing equal true, in this case, it's just a name, not a keyword. If we try, I don't have my code, no, sorry. Yes, if we check, it's not in the keyword list of Python.
It's just, yeah, a name. So, now about the parser. You have your tokens. You can convert them in an AST. AST is just that. Okay? For this expression, x equal two, more two, we have a model, we have a body. In the body, we have an equal, and the equal is just name, we have an ID, equal x.
And we have the had, where we will add the two numbers. For the compiler, you have the AST. I would like to convert it to the byte code. Yeah. Just execute this source code.
Compile, you have your tree, the AST, and you can convert it to the byte code. With this model, we can see the byte code. If you want more information, you can read this documentation, this path. Yeah, I know. Okay, for the byte code, in the C part,
we have a definition of pi, yeah? Definition. The byte code is just a compact numeric code. One byte. Not a word, just one byte. The byte code is just portable, and the byte code is just followed by one parameter
or by many parameters. Yes, it's just used by the virtual machine, in this case, the software interpreter. For the byte code, when we have this empty file, and try to convert it, we will receive a byte code.
The byte code is just nothing. Load comes to zero, and return the value. Just an empty. When you create a new file, an empty file, the interpreter will execute it. If you try with this function, and try to convert it, you will get the result of the byte code, okay?
You have the byte code. After that, we try to optimize the byte code with the pip holder. For example, we have x equal two more two. The system will convert it to four, okay?
You don't want to try to add two more two. Example, another example. If one print hello, we have this byte code. If zero, there's nothing. We remove the dead code, okay?
Via the pip holder. So, now we have the pip holder, we have the byte code, how to interpret it. The interpreter is just a virtual stack machine. This virtual machine will execute the byte code. It's just a stack. We push an element, we pop it. We execute something, we pop it, okay?
So, a small example, where we try to create a small interpreter. Maybe there is a bug, I didn't test it. But, an interpreter is just, we have a stack, a pointer on the instruction, the current instruction, and we run, we read each instruction.
In this case, I just create a small byte code. Example. Firstly, I try to push five, push again three, and push them. And the rest is just add, add, and pop. When I'm going to read the source code, the byte code, via the interpreter,
I push five in the stack, push three, push 10. Just add, I will pop the two last elements, and I get a 13. After that, I will add another value, take the second, the two value on the stack,
and get 18. It's just add. I can get the pop. I will empty the, I will erase the stack. So, do you remember this distinction? Just add. We have the byte code, okay? And, yeah, we have the byte code.
Just add, huh? And now, we have the C part of the byte executed by the virtual machine when you execute the byte code. For example, we have the load fast. Here, in the example. The load fast will execute this code. Just, okay, I'm going to push something
in the stack of the memory of Python, in the stack frame, in the frame. For the loss count, I try to get the value from the constants in the global, so it's just in the locales, and try to use them and just push in the stack. When I try to use the binary had,
the hub code, the system will check if it's a string. If not a string, okay, maybe a number. If it's a string, we will create a concatenation, and if it's a number, we just add one, two, one, and one.
And after that, just push the result on the stack. For the store fast, we have the source code of Python. Okay? So, for the rest, and just for the fun, if you read the source code of Python,
no, I think it's sleeping. Yeah, I'm sleeping. In the source code of Python, we have the evolution, what time? Four minutes, okay. We have this function, pyeval frame x. The system will try to read the source code and eval the source code.
The main function is just this function, pyeval frame, a function with 2,000 lines of code. Just one function, okay? And there is a hack in the function, because sometimes, some compiler, C compiler does not support a feature,
and we create the default, where in the default, there is another switch for the next bytecode. Okay? So, I have a summary. The summary is just we need to improve the documentation, review some patch, and try to improve the issues
if you have any problem with Python, and just that. That's really fun. No, so silly, yeah. I like, because when you put, it's on my key, because I'm not a code developer, but I try to add, I am a contributor to C Python.
Yes, good. Good for you, Ged. No, a small example, I wanted to show you about an issue, not an issue, a small functionality for me. Come on, where are you?
No, bye-bye. Okay, come on. I just modify C Python, just with a small patch. And sometimes, I want to learn the bytecode, okay? And I would like to create a small debugger where you see on the left, the source code,
the Python source code, and the right bytecode. Okay, I would like to create that. Here's my example. Bye, yeah, come on. Okay, is the last version of Python, the version of today. If you print hello, there is just hello.
There is a missing, there is a feature in Python, and you don't know, okay? If you want to see the bytecode of C Python, this feature is in the source code of Python
since two or three years. It's not you, okay? If you define L trust, you will get the bytecode and the value of the argument, okay? So, what's the next point? Yeah, I know, two minutes.
Where's my, sorry, my mouse. Yeah, that's all. Thank you very much, Stéphane. Can I just say, it's absolutely wonderful to know that someone can go from not being a contributor
to going to a conference and becoming one. So, three years ago, Stéphane could not have given that talk, and now he really is an actual contributor to Python, anyone can do it. Not that I'm saying, Stéphane isn't a wonderful person. But it makes it real, you know. So, is there a quick question before we move on to the next talk? Go on, one question, you can have the honor
of being the person that asked that question. It will make you special. Let's do it. Yes? Is the documentation available in different languages, and is help needed in those translations? I'm sorry? The documentation, is it available in different languages,
like in French, in Italian, Spanish, or is it only in English? No, the documentation is just in English, of course, because those are France. But I know that in France, there is a group, the FP, they try to translate in French. No, if you want, you can download the documentation,
of course, it's just a clone of the repository, try to translate it. And in the last version of, last, since two or three years, you have a feature in things where you can translate, you can create a French or Italian part of the documentation.
Okay? All right, join me in thanking Stéphane once again. Thank you very much. Thank you very much.