Python Debugger Uncovered
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 100 | |
Number of Parts | 119 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/19963 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Berlin |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
EuroPython 201499 / 119
1
2
9
10
11
13
15
17
22
23
24
27
28
41
44
46
49
56
78
79
80
81
84
97
98
99
101
102
104
105
107
109
110
111
112
113
116
118
119
00:00
CodeGoogolDebuggerMetropolitan area networkDebuggerSoftware developerQuantumState of matterLattice (order)Integrated development environmentTask (computing)NP-hardComplex (psychology)Lecture/Conference
01:09
Metropolitan area networkNP-hardSheaf (mathematics)Task (computing)CodeJSONXMLLecture/Conference
01:36
CuboidFunction (mathematics)BlogCodeResultantDebuggerGroup actionSoftware bugLecture/Conference
02:07
GenderDebuggerHypothesisGroup actionSoftware maintenanceJSONXML
02:42
Maxima and minimaProcess (computing)CodeSemiconductor memoryDivisorReading (process)DebuggerGroup actionComputing platformModule (mathematics)Presentation of a group2 (number)Cross-platformWritingLecture/Conference
03:40
Metropolitan area networkComputer iconStandard deviationSystem callGamma functionHecke operatorFunction (mathematics)Event horizonPlastikkarteLine (geometry)Division (mathematics)Uniformer RaumInsertion lossDebuggerVideo game consoleSample (statistics)Control flowLine (geometry)MereologyEvent horizonNumberOcean currentFormal languageCost curveFunctional (mathematics)Interpreter (computing)Point (geometry)Computer fileDebuggerMultiplication signCASE <Informatik>Parameter (computer programming)Exception handlingOnline helpForcing (mathematics)Software developerFrame problemIterationFunction (mathematics)Context awarenessComputer programmingRange (statistics)Prime idealVariable (mathematics)Primitive (album)ExpressionRadical (chemistry)Constraint (mathematics)CausalityString (computer science)Condition numberNormal (geometry)Tracing (software)CodeJava appletArc (geometry)System callAnalytic continuationBitImplementationSieve of EratosthenesSoftware testingVideo game consoleLetterpress printingDivision (mathematics)outputXML
09:34
Metropolitan area networkMaxima and minimaVideo game consoleAnalytic continuationLine (geometry)Type theoryNumberSubject indexingSet (mathematics)MultiplicationDebuggerComputer programmingLetterpress printingPrime idealFrame problemCodeDivisorGame theoryPoint (geometry)Computer animation
10:53
Video game consoleLemma (mathematics)DebuggerVisual systemMetropolitan area networkNetwork socketEmailRevision controlRight angleThread (computing)Remote procedure callView (database)Interface (computing)WritingVirtual machineDebuggerFrame problemVideo game consoleNetwork socketComputer architectureComputer programmingConnected spaceCommunications protocolProcess (computing)BitExpressionNetwork topologyTelecommunicationControl flowPoint (geometry)Event horizonComputer animation
13:07
Metropolitan area networkType theoryCloud computingIntegrated development environmentNetwork socketDebuggerCodeComputer fileArmComputer networkScripting languageEvent horizonLine (geometry)Function (mathematics)Communications protocolMessage passingLine (geometry)TelecommunicationFlow separationPoint (geometry)Network topologyNumberCycle (graph theory)Control flowComputer programmingThread (computing)WeightConsistencyParameter (computer programming)Functional (mathematics)Dependent and independent variablesType theoryEvent horizonMereologySoftwareConnected spaceProcess (computing)Open sourceContent (media)Computer fileVideo gameFrame problemBitDigital electronicsValidity (statistics)ExpressionSocket-SchnittstelleHeegaard splittingFocus (optics)Element (mathematics)Scripting languageSampling (statistics)Network socketDebuggerCodeLecture/ConferenceComputer animation
17:36
Binary fileMetropolitan area networkValue-added networkComputer fontContext awareness10 (number)Tracing (software)Sampling (statistics)Computer programmingInterface (computing)View (database)DebuggerLecture/ConferenceComputer animation
18:29
Software bugMetropolitan area networkJava appletVideo game consoleNewton's law of universal gravitationIntegrated development environmentGamma functionReal numberUniformer RaumIdeal (ethics)Inclusion mapExt functorInformationMultiplication signCondition numberSound effectPlastikkarte2 (number)ImplementationException handlingXMLComputer animation
19:24
Computer iconOvalEvent horizonLine (geometry)PlastikkarteExpressionException handlingCondition numberEvent horizonControl flowType theoryMultiplicationTraffic reportingFunctional (mathematics)ImplementationTracing (software)Multiplication signConditional probabilityForcing (mathematics)Parameter (computer programming)PlastikkarteXMLComputer animation
20:50
Business & Information Systems EngineeringMetropolitan area networkPell's equationReal numberHill differential equationIdeal (ethics)Computer animationLecture/Conference
21:24
Hand fanMetropolitan area networkValue-added networkLine (geometry)PlastikkarteText editorFrame problemMessage passingLine (geometry)Functional (mathematics)PlastikkarteEvent horizonComputer animation
22:50
Line (geometry)PlastikkarteDebuggerHaar measureQueue (abstract data type)Inclusion mapMultiplicationStandard deviationMetropolitan area networkFunctional (mathematics)Line (geometry)Control flowPoint (geometry)Arithmetic meanParameter (computer programming)Process (computing)Constraint (mathematics)Interpreter (computing)MultiplicationLibrary (computing)Row (database)CoprocessorForm (programming)MereologyFormal languageOrder (biology)CodeRevision controlEndliche ModelltheorieDifferent (Kate Ryan album)DebuggerThread (computing)Condition numberNatural numberTouchscreenClient (computing)Group actionStandard deviationPatch (Unix)Scripting languageTracing (software)Computer fileMehrprozessorsystemVideo game consoleLecture/ConferenceComputer animation
26:29
Source codeOnline helpLink (knot theory)DebuggerField (computer science)Lecture/Conference
27:08
GradientDebuggerSoftware developerRevision controlLink (knot theory)Arithmetic progressionMultiplication signBranch (computer science)Repository (publishing)Moment (mathematics)Different (Kate Ryan album)DebuggerCodeOpen sourceComputer animation
28:03
Metropolitan area networkProjective planeCondition numberSpeicherbereinigungLine (geometry)Client (computing)Limit (category theory)Single-precision floating-point formatCASE <Informatik>Product (business)DataflowBytecodeDebuggerVideo game consoleReal numberException handlingPoint (geometry)Computer programmingFormal languageLecture/ConferenceComputer animation
29:51
Observational studyInstance (computer science)Computer programmingState of matterElectronic mailing listException handlingIntegrated development environmentProduct (business)CASE <Informatik>Ocean currentProcess (computing)Lecture/Conference
31:05
Googol
Transcript: English(auto-generated)
00:15
Our next speaker is Dmitry Trovimov, and his talk is Python Debugger Uncovered.
00:23
Dmitry is a developer of the PyCharm IDE. Dmitry. Hi, my name is Dmitry. Today, I will speak about Python Debugger.
00:42
First, I will introduce myself and this talk. I work for JetBrains. In the last four years, I developed PyCharm, and debugger is among the other things that I do. Often when I say to people that I developed a Python debugger, they say,
01:03
wow, that's kind of hard or complex. But I'd like to show you in this talk that developing a Python debugger is quite an easy task. There is no rocket science here. But in the first place, why do we need debuggers actually?
01:22
As Brian Kernighan wrote in his book, debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. And what we do normally, we always, or often, but I think rather always, write our code as cleverly as possible.
01:43
And as a result, we have bugs that spoil our code, and then we need to find those bugs and we have problems. The only thing that can actually save us is good debug tool. And we'll look how to implement one.
02:02
So, there are a lot of Python debuggers, but I'd roughly divide them into big groups. The first one, Python debuggers that are implemented in Python itself. And those are PDB, PyCharm debugger, and PyDev debugger.
02:20
Actually, PyCharm debugger is a fork of PyDev debugger. It was forked four years ago, and it gained a lot of new features. And now we're in a very strange situation when we develop this PyCharm debugger separately, and Fabio, the maintainer of PyDev, develops it separately, and we exchange the fixes and backport features.
02:43
But we're now in some kind of process to stop the situation. But I'll tell about it at the end of my presentation. So, Python debuggers implemented in Python. The most advantage of such debuggers is that they are platform independent.
03:01
They can run on C Python, JITON, PyPy, on iron Python. That is because they're written in pure Python. But the problem with these debuggers is that they can be broken by user code. Because they run on the same Python interpreter, you can just write something like clear sys modules, and the debugger will evaporate from the memory, and it won't work anymore.
03:28
The second group of debuggers are those that are implemented in C. The main debuggers are winpdb and winq. They work only for C Python, but they don't interfere with user code.
03:42
So they work better for such cases like debug of gvent or twisted code. But that is actually not a problem for Python debuggers written in Python because all those cases can be solved. But you need to make something for that.
04:01
So, how to implement Python debugger? Actually, many languages provide developers of debuggers some kind of API to develop debugger. And Python is a bit different in this case. It provides only one function to develop debugger.
04:21
This function is called sys set trace. And if we look into the Python documentation, we see that you can set trace function, and it will be called every time you get an event in the Python interpreter. An event can be a call of function, execution of a line, or return from a function, or exception.
04:42
Or if you use some kind of C bindings, then it can be called the C function and return from the C function or C exception. So, when we want to implement our debugger and we see this documentation, and we realize that this is the only one function that we can use,
05:02
we can get a bit scary because it's very primitive and we feel very constrained. But, as we know, constraint breeds creativity. So, as I show you, because of that fact that we are constrained a lot,
05:20
we can implement a lot more features than actually exist in normal languages like Java and C++ and so on. So, trace function. How can we use it? Here we implement a simple trace function. That is a function with three arguments.
05:41
The first one is frame. That actually is the top of our call stack. This is the context which has local variables. The event that we get and some arcs. Arcs depend on event. And we bring just which event we get and on which line we get.
06:02
The line that we can access from the frame. So, we import our sys module, set trace, and then we have some simple code. We iterate within a range and print the division of some arithmetic expression. Let's look what we have.
06:22
So, after we have set our trace function on the first line, that will be line number 9. Actually, it will be line number 13, but we have call of the function that is declared on line number 9. We have a call.
06:43
Then we have execution of a line number 10. Then we have execution of a line number 11. And after that, we get the first output. The iteration continues. We are getting on the line number 10, on the line number 11, and we get gain our output.
07:07
And on the third iteration, we have an exception. So, execution terminates. And you see that we have in this short program, we have all four Python events that are possible.
07:27
Call, line, exception, and return. So, actually, that shows us that developing debugger with the help of trace function is possible. So, let's develop a simple console debugger.
07:45
To test it, we use a simple program that is simply a red offense sieve. That's a function that gets N as a parameter and prints all the primes between 2 and N.
08:05
And here are our command line debugger. It's actually only 23 lines, and it works. I'll just tell you about how it is implemented, and then I'll show it in work. It has two main parts. The first part is trace function.
08:23
As we've seen before, we get events. And here, except just printing all the events, we have a breakpoint. There is only one breakpoint. And if the current execution equals to this breakpoint line, we do the simple thing.
08:43
We just read the input from console, and we handle commands from console. There are three possible commands. The first command is frame. If we get it, we just print all variables. The second command is C.
09:01
It's for continue, inspired by GDB. And we continue to the next execution of the breakpoint. And any other string is just treated like any expression that is evaluated. And the second part of this simple debugger is the main part.
09:21
We get here from command line arguments the line of the breakpoint file to be debugged. And then we set our trace function as trace function and execute this file. So let's look how it works. So you see, we execute our simple debugger. We set our breakpoint at line number six.
09:43
And we pass aratathans.py as our sample program. So our debugger stops somewhere. And we type frame. And we see that we have three variables. Our index, our multiple set that is empty now.
10:03
We continue. And we get to this our first prime. We print our frame. Again, we see that our index is three. And the multiple set is all even numbers. So we can continue, continue, continue.
10:22
And if we don't want to read all this frame, we can... No, that doesn't work. Print multiples. We can print just the things that we are interested in.
10:45
So, voila, we have our simple console debugger just implemented in 23 lines of Python code. But this debugger is a bit not full-fledged. It can be used, actually.
11:02
It's not very convenient because we can set only one breakpoint. And normally we run our program and we need to place breakpoints somehow interactively. And actually here you can pass while true as an expression that will hang the whole process.
11:23
And it's not very convenient to read our frames in console view. We need some kind of tree for that or some kind of UI. So thinking about that, we came to the idea that we need a visual debugger. And if we need visual debugger, we need some multitasking.
11:45
And for this, we need some kind of architecture. And this architecture can look something like that. On the left you see our debugger interface. And on the right you see the Python process that is being debugged.
12:03
They communicate between each other within socket connection that allows us to run them on different machines and to allow remote debugging, for example. And our debugger interface sends breakpoints and commands to the Python process and gets back events, threads, frames, evaluated values.
12:25
So in the Python process, to handle this communication, we need two threads for reading and writing. We use threads here because we don't need actually performance. And GIL is not a problem for us, but we need cross-platform work.
12:45
We need this to work on JITAN and all versions of Python. So we just use threads and it works good. And here we have reader thread, writer thread, and user thread.
13:01
So if we talk about communication, then we need to find a protocol. The protocol for this communication will be quite simple. Every message is just a line and separated by line separation.
13:22
And all the data inside this line is divided by tab separation. The first one is command ID, the second one command type, and then we have different arguments that depend on command type. So command types can be set breakpoint, resume, get thread, get frame, evaluate expression.
13:46
And for example, for a message, get thread, we generate some ID, and the response for that, it will have the same ID. And this will make us to know that this is a response for this very request.
14:05
So this is a very simple protocol, but it's very powerful, actually. So we can get responses for our request, or we can get not. But if we want to get responses, we can get exactly the same responses for this request that we want. So it's very simple and very powerful.
14:22
So on the side of ID, we assume, we will not go into ID details. We'll focus on the Python code, but we assume that ID creates server socket for us, and it launched the script that is being debugged. With the command line, it passes socket address
14:46
and passes the sample program as an argument to our debugger. So let's look how our code will be. It's quite simple also. The main code looks like that.
15:02
We initiate our debugger, and first of all, we make a socket connection. Socket connection is very simple. We create a socket and connect to it. It is already opened on the side of the ID. Next, we initialize our network communication.
15:20
It's very simple. We just create writer thread, reader thread, and start them. How can look reader thread? It just, in a cycle, until it is killed, it reads data from the socket and finds line separation.
15:43
If it finds line separation, it thinks that it gets the whole message that can be parsed. Then we parse the message just by splitting it by tabs, which read the first element as ID of the command, the second as a type of the command,
16:02
and we put this in our process queue. The writer is implemented by the same. So, what's next? The next is a bit more interesting. We run our program. To do that, first, we set our trace function,
16:23
and then we wait for a command from ID to start, because at this point, we need to be sure that all the data from ID has just arrived, that breakpoints are set, and when we get a command from ID to start,
16:41
we execute our file. And the most interesting part is our trace function. It's actually also very simple. We handle here line events. We take from the frame the line number and the file name, and we see if we have breakpoints for this file, if we have breakpoints for this file,
17:02
and if we do have a breakpoint for this line, then we just send a message to our ID that we need to suspend, and we wait in this point in a cycle for resume message from ID.
17:20
So, execution is suspended here. We don't execute commands anymore. We just wait for the message from ID to resume. And if we don't have breakpoints for this file, we don't trace this context, because it will optimize a lot our tracing.
17:48
So, I can show you how it looks like. The font is a bit small, but I think it's okay. Here we have our TENS sample program, and we debug it, and we just stop.
18:12
So, it looks strange on this screen, I think.
18:22
Okay. So, actually, that's it. We just implemented a visual debugger that communicates with interface. But we lack now very important features. The first one is conditional breakpoints. It's the ability to set... Let me close this.
18:59
It's a demo effect. I don't know. I see this the first time.
19:07
I can show... Wait a second.
19:24
Okay. So, we need to implement conditional breakpoints, exception breakpoints, step over, step into, smart step into. We need to make it work on Python 2.4 to 3.4. And we really like to implement multi-process debugging.
19:42
And I'll show you now that it's also very, very, very simple. So, how do we implement our conditional breakpoints? We just enhance our trace function that we see if we have any condition. Condition is just a Python expression that is related to true or false.
20:04
We see if we have any condition expression, we evaluate it. And if it is true, if it is false, we don't stop on this breakpoint. So, voila, we have conditional breakpoints. Exception breakpoints. To trace exception breakpoints, we need to trace, to handle somehow exception event.
20:26
And we do that very simple. We get our exception time from arguments and we see if we have exception breakpoint for this exception type. If we do have, we suspend. If we don't have, we don't suspend.
20:41
So, step into, step over, smart step into and run to line. These functions are very simple. I'd like to show you whether you, maybe you don't know what is smart step into.
21:12
I'll show you. Okay, now it's okay.
21:21
So, step spy. Okay, so step into is just going inside the function that is executed.
21:50
Step over is keeping the execution of the function and going to the next line. And smart step into is just the possibility to step into of the selected function.
22:05
And go to line is actually going to the specific line that you can select in your editor. So, how is it implemented? It's very simple. It's totally simple. Step into is just resume and stop on the next line.
22:23
Step over is just step into, but we step on the next line in the same frame. We remember which frame was it when we received step over message. Then execution goes somewhere inside.
22:40
And when we return to this very stack frame, we stop there. And smart step into is just step into, but we stop on the line of the selected function. And run to line is just temporary break point which we remove after we reached it.
23:03
So, these four features are implemented just in, I don't know, a couple of lines each. So, what about support of Python 4 and all versions of Python and all interpreters? How?
23:21
Actually, that's not the best part of the code. Because when you need to support all versions, you code very fast against lines like this. You need to handle all differences in standard library, but that's okay actually.
23:40
You can collect it in only one file and it will not spoil the rest. So, multiprocess debugger. That is the point that I like the most because this feature shows us how constraints that we have in Python,
24:03
only one API function to implement debugger, allows us to make something better than in different languages. For example, you cannot debug multiple processes in Java easily, but Python, due to its dynamic nature and due to the fact that we implement all by hand, allows us to do that.
24:30
If we go to the Python standard library documentation again, we see that all new processes, in the end, use the functions of the OS model like that exact where and spawn where and there are a dozen, maybe.
24:50
First of all, the fork function is executed normally and then some of this function, one of this function. So, what we can do in Python?
25:01
We can just monkey-patch them and we do that this way. We take OS model function and replace it with our new exact function. And in our exact function, we call the original function with patched arguments.
25:28
And the patching of arguments is very simple. If it is Python executed, we leave it and then add our debugger script in front of the real arguments and hosting port that we already have.
25:44
And what happens in Praxis is that our new process that is about to launch, it first launched inside the debugger, the debugger connects to the IDE, and then the debugger code executes this new process. So, we have debugging of the new process like debugging of the new thread, actually.
26:10
What we have learned today, we just saw that it's very simple to trace Python code,
26:20
and it's very, very simple to make a simple console debugger. And also, it's very simple to implement a real visual debugger. But what for? I encourage you actually to contribute. There are a lot of features that can be implemented in this field.
26:47
And they can be implemented by you or with your help. I don't say actually that we give up, that we are stopping to develop debuggers. We actually make a lot of work, but if you help to solve your daily problems, it will be great.
27:03
And the sources where the best place to look are, the first one is the link to debugger in PyCharm, open source repository, and the second one is the link to PyDev debugger on GitHub.
27:23
But there is one moment that I'd like to tell about. Now there is a work in progress, there is a merged version of PyCharm and PyDev debugger. The repository is already created, it's called PyDev.debugger.
27:41
It has no code yet, because the repository was created just the last week, and it has some development branches. But stay tuned, in a short time, we get a merged version of PyDev and PyCharm debugger with all the union of different features that have both of the ID.
28:07
And also documentation will be there, so it will be possible to contribute to this project and to learn how it is all implemented. So that's all. If you have any questions, you can ask.
28:31
I will start with a simple question. Is there a console client for your debugger agent, or are you aware of anything like that?
28:41
Not yet, but I think that after we establish this merged debugger with PyDev, we will make one. Now for a harder one. Have you considered data watchpoints, and how hard would those be in a garbage collector language?
29:02
We considered to implement that, but we have not evaluated the performance problems which can be there. I think it definitely worth to try, but I cannot say nothing about real production limitations of that feature.
29:30
Is it possible for the debugger to modify the flow of the program? Can I skip the execution of single lines or suppress exceptions? Actually, it's not possible.
29:42
In Python, it is partly possible, so you can hack bytecode that you get, but it won't work in all cases. As for suppressing, I think you cannot suppress the exception that is raised and not caught.
30:14
I have a question. It's nice when you run your programs in development environment and you can run them with your debugger, but when you run programs in production, they are usually not instrumented, but they still fail sometimes,
30:25
and you want to troubleshoot and debug them. So, what's the current state of Python debugging for the un-instrumented processes? So, that's actually the first one.
30:40
Stay tuned, I hope it will arrive soon. It's not yet there, but it is the first on the list. Okay, any more questions from anyone? Okay, so that will be the end of the session then, so thank you Dimitri. Thank you.