The Hidden Power of the Python Runtime
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 130 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/49994 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Slide ruleCartesian coordinate systemInformationRun time (program lifecycle phase)DebuggerPower (physics)TouchscreenIntegrated development environmentMeeting/Interview
00:35
Software developerSoftwareIntegrated development environmentRun time (program lifecycle phase)Software frameworkSoftware testingExecution unitError messageInformationContent (media)Object-oriented programmingFrame problemSocial classFunction (mathematics)Variable (mathematics)Modul <Datentyp>CodeStack (abstract data type)Object (grammar)System callElectric currentLine (geometry)CompilerModule (mathematics)BytecodeInterior (topology)Frame problemFunctional (mathematics)CodeInterpreter (computing)Ocean currentSystem callObject-oriented programmingComputer fileAdditionGodInformationIntegrated development environmentResultantObject (grammar)Link (knot theory)Data dictionaryRun time (program lifecycle phase)Different (Kate Ryan album)Local ringString (computer science)Cycle (graph theory)Key (cryptography)Parameter (computer programming)NumberNatural languageMereologyCuboidModule (mathematics)Variable (mathematics)BytecodePower (physics)Software testingException handlingStatement (computer science)Software developerError messageSlide ruleTwitterComputer programmingUnit testingData structureMemory managementLine (geometry)SpeicherbereinigungMultiplication signUtility softwareCorrespondence (mathematics)Stack (abstract data type)Social classState of matterSoftware frameworkUniform resource locatorMachine codeReading (process)Analytic continuationLattice (order)Semiconductor memoryMultilaterationDisassemblerComputer animation
09:41
Object-oriented programmingException handlingCodeSource codeVariable (mathematics)Error messageLine (geometry)DebuggerFunction (mathematics)Performance appraisalFrame problemElectric currentData storage deviceEvent horizonClefThread (computing)Machine codeLibrary (computing)Run time (program lifecycle phase)Data typeInformationFrame problemDebuggerLine (geometry)Online helpLibrary (computing)Functional (mathematics)Module (mathematics)Exception handlingSoftware testingObject-oriented programmingComputer programmingVariable (mathematics)Statement (computer science)Attribute grammarPerformance appraisalState of matterEvent horizonNumberData dictionaryImplementationSampling (statistics)Repository (publishing)Error messageTracing (software)Correspondence (mathematics)Link (knot theory)Type theoryRepresentation (politics)String (computer science)QuicksortIntegrated development environmentElectric generatorCoefficient of determinationProfil (magazine)Different (Kate Ryan album)Parameter (computer programming)Right angleRow (database)Run time (program lifecycle phase)Projective planeSheaf (mathematics)CodeInformationComputer fileAdditionInterpreter (computing)Ocean currentUniform resource locatorGoodness of fitTheory of relativitySource codeText editorRevision controlFunction (mathematics)PrototypeFlow separationComputer configurationAbstract syntax treeMachine codeGroup actionTraffic reportingElectronic mailing listMultilaterationStandard deviationComputer animation
18:47
Event horizonFrame problemRun time (program lifecycle phase)InformationDebuggerDisintegrationType theoryObject (grammar)Data typeError messageMachine codeThread (computing)Task (computing)Lattice (order)Letterpress printingSynchronizationFundamental theorem of algebraDeadlockStack (abstract data type)Online helpCore dumpRouter (computing)Loop (music)CodePower (physics)Link (knot theory)Video gameSoftware developerObject-oriented programmingParameter (computer programming)LoginLetterpress printingStack (abstract data type)Task (computing)BlogFrame problemLoop (music)Event horizonSynchronizationThread (computing)Module (mathematics)Library (computing)CodeInformationInterrupt <Informatik>Functional (mathematics)Computer programmingContext awarenessRun time (program lifecycle phase)Different (Kate Ryan album)Process (computing)Sampling (statistics)TwitterEmailOnline helpSystem callObject (grammar)Computer fileType theoryNumberCorrespondence (mathematics)Repository (publishing)Moment (mathematics)IterationStandard deviationTracing (software)Variable (mathematics)Line (geometry)Row (database)Profil (magazine)InfinitySimilarity (geometry)Data dictionaryElectronic mailing listLocal ringUniform resource locatorView (database)Point (geometry)DebuggerEndliche ModelltheorieOrder (biology)Data managementClique-widthDeadlockBlock (periodic table)Core dumpCoefficient of determinationStreaming mediaOcean currentMultiplication signLink (knot theory)Projective planeGroup action2 (number)Computer animation
27:53
Software bugElectronic signatureTerm (mathematics)MereologyDebuggerComputer programmingFrame problemDifferent (Kate Ryan album)Revision controlMathematicsFunctional (mathematics)Performance appraisalParameter (computer programming)Sinc functionComputer animationMeeting/Interview
Transcript: English(auto-generated)
00:06
So, our next speaker is Elisabetta and she works for JetBrains and works on the PyCharm IDE, which a lot of people probably know. She's working on the Python debugger and the data science tools in that application.
00:21
And she's going to tell us about the hidden power of the Python runtime, how to retrieve useful information from the Python runtime and the build tools. So, Elisabetta, could you share your screen? Great. Thank you very much. I hope everybody can see my slides. So, today we will learn a lot of new things about Python runtime.
00:43
As you already have heard, I'm a software developer at JetBrains. I'm working on the PyCharm IDE, the most popular Python IDE. And here's my Twitter down there. And feel free to join Discord room. So, there is a link to the slides there.
01:01
Python is a very simple and beautiful language. But a very big part of its power is hidden from users. And available only during execution, only at runtime. You might even don't know about it, but you already use this power every day. For example, we run tests every day.
01:20
The two most popular ways to do it are either built-in module unit test or popular test framework PyTest. When your code raises assertion error, unit test will just show you that this assertion error was raised and will show you the place where it was raised. But if your code raises assertion error with PyTest,
01:43
it will show you not only location of this error, but also values of variables which you try to compare. Whether the PyTest gets this information, of course, from the Python runtime. And today, we will learn how Python runtime works, how you can get a lot of useful and interesting information from it,
02:03
and how different development tools can use this information to make developers' lives better. Let's start with learning the basic concepts. When you want to use some objects in your Python program, you usually create them explicitly.
02:21
For example, you use assignment statement to create variable, keywords like def or class to create functions, classes. But when Python interpreter executes your code, it creates not only objects which you declare explicitly. It also creates a lot of different util objects which contain information about current execution state.
02:44
And the most important of them is a stack frame object. Stack frame object presents a program scope. It contains information like a corresponding code object, local global variables in the current scope, and a lot of other data.
03:04
Frames are stored in a stack-like structure. The bottom-most frame is called module frame. And when Python interpreter executes the program, for example here, execution is on the line 7, and interpreter calls function foo.
03:22
It creates new stack frame and puts it on top of other frames. Then it executes a function, and when it's going to leave, it removes this newly created frame from our stack and returns some data to the previous frame, and execution in the previous frame continues.
03:45
This runtime machinery and concept of call stack are the same for many different languages. But the major difference between them and Python is that not so many languages contain this runtime information out of the box.
04:03
That means you can access this frame object right into your code and work with it like with any other Python object. And in the next part, we will learn what you can do with it. You can get a frame object with the built-in function sys.getframe.
04:26
It takes argument depth, which returns number of calls below the top. So if you want to get the current frame, you should pass 0 to this function. And when you get this frame object, you can inspect which interesting data is stored inside it.
04:47
First of all, frame object contains a dictionary of local variables, where keys are their names, stored as strings, and various variables objects. In addition to it, a frame contains information about global variables.
05:03
Global means global for the current module. That's great, but to be honest, we don't need frame to get this to these dictionaries because there are built-in functions, locals and globals, which return exactly the same dictionaries. The good news is that it's not the only interesting information stored inside frame object.
05:26
Because in addition to it, frame object contains a link to the current code object, which also stores a lot of interesting data. Code object represents a chunk of executable code, but it differs from function object
05:41
because it doesn't contain reference to the global execution environment. The easiest way to create code object is to call built-in function compile, like you can see here. And for example, you can evaluate its value calling built-in function URL. You can see here we evaluated result of this A plus B piece of code.
06:06
Very, very small piece of code. Okay. What code object knows about your code? First of all, it knows file name, where it was created. It knows name of a function or module, where it was defined.
06:24
Also it knows names of variables, which are used inside this piece of code. In addition to it, it also contains a compiled bytecode. Bytecode instructions, which were generated by interpreter. So if you want, you can call built-in module dis, disassemble it,
06:44
and read instructions generated by your Python interpreter, just for fun. Let's return to our frame object. In addition to code object, it also contains, for example, current line number, which has been executed in your program, link to the tracing function,
07:02
which we'll discuss later, link to the previous frame, and a lot of different things. Talking about previous frame, as you remember, our frames are stored in a stack-like data structure, and it is very useful, for example, to print traceback.
07:22
When some exception appeared in your code, you might have seen this many, many times, like Python shows you this beautiful text, and the link to the previous frame is the exact thing, which helps Python interpreter to print this information for you.
07:42
I've mentioned the most important data stored inside frame and code object, but there is also built-in module inspect, which has a lot of different functions to inspecting frame objects, and your program scope.
08:00
The most important thing you should remember, if you decide to do some things with frame objects, is that you should explicitly delete frame variable when you're leaving the scope. It happens because if you don't do it, there will be a link from local variables dictionary
08:20
to your frame object, to this local variable, and local variable is a link to the frame object, so there will be a cycle of references, and it's bad for memory management, because, as you know, there is a reference counting in Python, which is used for memory management,
08:43
and this cycle will be removed only by garbage collector, and it will happen much later. So it's better to delete this local variable explicitly. Okay, now we know a lot of information about Python runtime
09:01
and how we can get this information in Python. Let's learn how different development tools can use this information and how we can use it in our everyday lives. As you remember, in the beginning of this talk, I've mentioned assertion error, which is very useful to use with PyTest,
09:26
because it shows you real values which you try to compare under your assert statement. Let's try to understand how PyTest does it. Where does PyTest get this information? Every exception object has a link to a traceback object.
09:44
It's stored in a dunder traceback attribute, and a traceback object has a link to the corresponding frame object, and, as we already know, frame object knows everything about your current program state.
10:00
So what can we do? Let's define function, which takes exception object as an argument, and, after that, we try to get variable names and their values, which were used inside assert statement, which was used in this line of code where exception was raised.
10:26
So we have exception object, so we can get traceback object, frame object, and corresponding code object. From the code object, we can get, for example, line number, where this exception was raised, and even the source code,
10:42
the string representation of the source code of our code object with the help of module inspect. We can call this inspect.getSource function. Now we have a source code represented as a string and line number, and, again, with the help of the standard module AST,
11:01
we can even find variables' names used on this line of code. I don't show this function here because it's quite big, but it's rather simple. You just need to go through this abstract syntax tree and find variables you're interested in. You can find this code in my repository with code samples.
11:25
After that, when we know variables' names and we have frame object, as you remember, we can find these variables in the local variables dictionary and just print it to the output. Now, how can we use this function?
11:41
If assertion error was raised inside our code, we can pass this exception object to this function, and it will print variable names and their values right to the output. If you, for example, want to log some errors and understand some exceptions without degrading them with PyTest,
12:03
you can use this, our new function. Of course, PyTest implementation is much more powerful, but our small but also powerful prototype describes how it works. The second tool we're going to consider today is debugger.
12:23
As I've already said, I've been working on PyCharms debugger for several years. That's why I know so much information about Python runtime. And let's learn how they work. Modern Python debunkers are based on two main functions,
12:42
tracing function and frame evaluation function. Tracing function is defined for the frame and traces all the events which happen in your program. So, if your program is being executed and events arise to the function, inside this function, you can analyze these events
13:03
and depending on it, the debugger can understand should you suspend program in this place or should you continue execution or should you step inside, for example, function. As you can see, trace function takes three arguments and frame object is one of these arguments.
13:22
Frame evaluation function is being executed before entering new frame. And again, you can see it takes frame object as an argument, again. And debugger, which is based on frame evaluation function can be implemented the following way.
13:41
You can insert breakpoints code right into code object of the function and when execution comes to this place, it just calls breakpoint code and debugger stops at this place. So, there is no need to analyze every event and tracing function. You can just quickly stop in the place
14:00
which you're interested in. If you're interested in this topic, you can check my PyCon US talk. It was about Neo Frame Evaluation API, which appeared in Python 3.6, but it's also true for other older versions of Python.
14:20
Okay. What is interesting for us today is that both these functions takes frame object as their argument. And that means that we can get a lot of information from this frame object. For example, thanks to the frame object, debug can understand file name and line number where a Python interpreter is executing our code
14:44
and it can understand should it suspend program here in this place or not. Does user has breakpoint in this place or not? In addition to it, debugger can use local variables dictionary to show variables values to the user.
15:04
And also, it can use fback attribute to show stack frames to the user. Again, we have current frame. We can iterate through this link to the previous frames and show user the stack frame for the current location
15:22
where debugger is suspended. Great. Now we know how debugger uses this runtime information. Let's move to the next tool. Next tool will be code coverage. Code coverage shows you which lines
15:40
of your code base were executed. It were useful, for example, to run your tests with code coverage and it will help you to understand which lines of your code are covered with tests and which are not covered. So you can improve the quality of tests
16:03
and that will make your project much more stable. The most popular code coverage library is coverage.py. Look, they have extremely cute mascot, sleepy Python. And coverage.py also uses tracing function
16:21
which we've already seen in section about debugger. And again, as we already know, it takes frame object as one of arguments. So it can use file name and line number information to get the location which is being executed,
16:43
record it in some place and later show you in coverage report, which is pretty simple. And really cool. The next tool. The next tool will be the group of tools. They are tools for runtime tabbing.
17:03
Here is the list of the most popular tools. What are they doing? By annotated by Dropbox. You can run your code base with this tool. It will record all the function codes inside your code base,
17:21
record types of arguments of your functions and later will generate typing annotations for every argument of every function which was called. Monkey type by Instagram is also very similar. It also records types of arguments of your functions
17:42
and later generates stub files, which also can be used for your text editor or IDE to help you write more high quality code. And collect runtime information in PyCharm.
18:01
It's also very similar to the previous tools, but it's integrated with debugger. So when you run debugger and this option is enabled, it also records types of your arguments, but later it suggests you to use this information when you want to generate doc string right inside PyCharm.
18:23
Let's try to understand how these functions work. Function tools. PyAnnotate and monkey type are both based on profiling function. You can see it's very similar to tracing function. At least it's arguments.
18:41
The main difference between profile function and tracing function is that trace function traces every event in the program and profile function traces only call events. So only when we call function or enter a new scope, this function will be called.
19:01
And it's logical to use it here instead of tracing function because we are interested only in call events because we want to record types of function arguments. We are not interested in other events.
19:21
And collect runtime information is integrated with debugger and as we already know, debugger has access to a frame object, so we can get information from frame here as well. Okay. In each of these tools, we have access to the frame object.
19:40
How can we get information about types of our arguments? That's quite simple, to be honest. From the code object, we can get argument names which are used inside this function which was called. And after that, again,
20:01
we can find these variables by their names in local variables dictionary and we have access to the object. That means we can get their types. After that, now we know their file name and line number where this call happened.
20:21
We know variables names and their types, so we can record it and after that, show it either a stub file or type annotation or inside docstring. So this is also really cool and will help you to generate typing annotations automatically
20:44
just by running your code. This is very useful. Okay. We've learned some interesting facts about different popular tools. But let's try to create something new, something that didn't exist before.
21:05
There are two ways to execute tasks concurrently in Python inside one process. They are threads or asynchronous tasks. You can start new thread with the help of the standard module threading.
21:21
You can do it like this. And for synchronization, between threads, there are synchronization objects and the most fundamental among them is a log object. A thread can acquire a log object and that means that the following log of code will be executed by this and only by this thread
21:42
until it will release this log object. Also log objects are context managers, so you can work with them with a keyword with. Running more than one thread and using log objects sometimes can lead to dead log.
22:02
Dead log is a situation when you're waiting for resources which can't be released. The easiest way to reproduce it is to do the following. You can take two threads and two log objects. The first thread acquires first log object and the second thread acquires second log object.
22:24
After that, the first thread wants to acquire the second log object, but it's unavailable, so it starts to wait. And the second thread wants to acquire the first log object, but it's also unavailable and it also starts to wait.
22:41
The problem is that they will be waiting forever because this situation can't be resolved without program interruption. And this is really sad because, of course, we don't want to have this situation in our program. And the second problem is it's really hard
23:02
to detect dead logs in big projects, but we will try to help people who are fighting with dead logs. As you remember, we used sys.getFrame function which returns frame object for the current thread, but there is also sys.currentFrame
23:24
which returns topmost stack frame for each thread. What it means? It means that we can create our own tool that's called ThreadHandler. This tool will be living in a separate thread and will print trace back to all the threads
23:41
in the process with some interval. And we will see that if from some threads, tracebacks of some threads won't be changed for some time, we can see that they're stuck in some place and we can look at their traceback and understand,
24:03
okay, they're waiting for logs in different order and we can quickly fix it. Pretty simple idea and we can implement it, but there is a problem. This function, this tool is already implemented inside standard library.
24:20
There is a model fault handler and method dumpTraceBack which dumps the tracebacks of all threads into the file. It's implemented natively in C code and while it already works, everybody can use it to detect dead logs, the location of dead logs in their code base.
24:41
Okay, but as you remember, there is the second way to execute tasks concurrently inside Python. They are asynchronous tasks and asynchronous tasks are from the user point of view
25:00
and they're very similar to real threads. For example, there are the same, again, very similar synchronization objects in AsyncIO module so there are asynchronous logs. That means that there is a place where we can apply our knowledge of the Python runtime and create a tool which will help us
25:23
to detect asynchronous dead logs. It will work the very similar way. So there is a method, allTasks which returns all the running tasks in the current loop and also each task has a method which returns the list of stack frames for this task.
25:45
That means that how we can implement our asynchronous fault handler. We will start it in a separate thread and in the infinite loop with some interval we will dump stack traces of all the tasks in this loop.
26:03
We can do it this way. So we're iterating over tasks and print these tracebacks. And again, if from some moment we will understand that some tasks are stuck in some places their traceback isn't changed
26:20
we can look at this stack trace and try to understand why it happens. Are they waiting for some log objects which will never be released or not? That's really great. We implemented our own asynchronous fault handler which will help us to detect asynchronous dead logs.
26:44
Okay, today we've learned a lot. Today we've learned that Python runtime is very powerful. Python allows you to easily get stack frame object and corresponding code object and inspect them. Also, we've learned that there are a lot of development tools
27:02
which use this information and which help you to write your code and which make your life much easier. I hope after today talk you will start using runtime development tools or you will start using them more often
27:21
if you already use them. And maybe, who knows, you will create something new your own tool which uses Python runtime information. Here are some links. The repository with code samples which I showed you today. A blog post based on this talk. And also feel free to contact me by email or on Twitter.
27:43
And of course, come to the Discord channel. I'll be there ready to answer your questions. Thank you very much for your attention. Thank you very much, Elizabeth, for the nice talk. Excellent. Let me play the applause.
28:06
So, we don't have any questions in the Q&A and I also don't see any on the chat, but I have a question. Since you're working on these debug tools you must know the differences between the different Python versions. Has anything much changed in recent Python versions
28:22
in terms of these frame access or the debugging tools in general? In debugging tools, in general, I think. I know for sure that the frame evaluation function signature was changed in Python 3.9.
28:42
It added some new argument, but it didn't affect debugger. So, it still works. Yeah, but I think nothing else changed. So, debuggers still work. They work with different versions of Python.
29:00
And, of course, Python debugger works with different versions of Python. So, if you don't use it yet, give it a try. Debuggers are really cool tools which help you to find bugs in your program and which also help you to understand your program execution. Okay, thank you very much.
29:21
Excellent. Thanks again for the talk and, yes, enjoy the remaining part of the conference. Thank you.