We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Visual debugger for Jupyter Notebooks: Myth or Reality?

00:00

Formal Metadata

Title
Visual debugger for Jupyter Notebooks: Myth or Reality?
Subtitle
Understand how Python debuggers work and how to build Visual Debugger for Jupyter Notebooks
Title of Series
Number of Parts
118
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Many Python developers like Jupyter Notebooks for their flexibility: they are very useful for interactive prototyping, scientific experiments, visualizations and many other tasks. There are different development tools which make working with Jupyter Notebooks easier and smoother, but all of them lack very important feature: visual debugger. Since Jupyter Kernel is a usual Python process, it looks reasonably to use one of existing Python debuggers with it. But is it really possible? In this talk we’ll try to understand how Python debugger should be changed to work with Jupyter cells and how these changes are already implemented in the PyCharm IDE. After that we’ll look into the whole Jupyter architecture and try to understand which bottlenecks in it prevent creation of universal Jupyter debugger at the moment. This talk requires a basic knowledge of Jupyter Notebooks and understanding of Python functions and objects. It will be interesting for people who want to learn internals of the tools they use every day. Also it might be an inspiration for people who want to implement a visual debugger in their favourite IDE.
Keywords
20
58
DebuggerVisual systemGoogolPoint cloudSoftwareSoftware developerIntegrated development environmentMusical ensembleVariable (mathematics)Thread (computing)AudiovisualisierungDebuggerVisualization (computer graphics)LaptopSoftware developerSoftware bugCodeIntegrated development environmentCuboidProduct (business)Computer animationLecture/Conference
DebuggerCellular automatonComputer fileSequenceStatement (computer science)Line (geometry)Event horizonDebuggerAudiovisualisierungSource codeSystem callFunction (mathematics)LaptopLetterpress printingStatement (computer science)CodeCellular automatonWrapper (data mining)CuboidTraffic reportingLimit (category theory)Computer programmingComputer fileMereologyType theorySoftware bugContent (media)SequenceDifferent (Kate Ryan album)Functional (mathematics)InformationAdditionExistenceLine (geometry)Message passingVisualization (computer graphics)Set (mathematics)Tracing (software)Integrated development environmentXML
Graphical user interfaceLetterpress printingEvent horizonFunction (mathematics)Tracing (software)Line (geometry)Equals signCellular automatonKernel (computing)Integrated development environmentoutputCodeUniqueness quantificationObject (grammar)Source codeTexture mappingVideo trackingAsynchronous Transfer ModePatch (Unix)Module (mathematics)Complex numberComputer programmingEvent horizonArc (geometry)Parameter (computer programming)Frame problemFunctional (mathematics)Line (geometry)Function (mathematics)InformationCodeKernel (computing)Point (geometry)Source codePairwise comparisonText editorCellular automatonMappingDebuggerComputer fileObject (grammar)Laptop2 (number)Loop (music)Interpreter (computing)NumberProcess (computing)Connected spaceResultantMereologyImplementationAsynchronous Transfer ModeWebsiteIdentifiabilityUniform resource locatorElectric generatorMultiplication signReal numberContext awarenessFlow separationMessage passingIntegrated development environmentPosition operatorModal logicSystem callUtility softwareInstance (computer science)IterationControl flowPattern languageFront and back endsYouTubeDefault (computer science)PurchasingComputer animation
Cellular automatonMessage passingIntegrated development environmentFunction (mathematics)Tracing (software)DebuggerTelecommunicationKernel (computing)outputEvent horizonLoop (music)Thread (computing)Computer architectureBlock (periodic table)Visual systemCodeText editorDemo (music)Direction (geometry)Socket-SchnittstelleConnected spaceNetwork socketProxy serverAdditionMereologyUtility softwareoutputCellular automatonNumbering schemeType theoryCodeLimit (category theory)Functional (mathematics)Web 2.0MappingKernel (computing)Computer programmingText editorOcean currentImplementationSource codeFront and back endsMultiplication signLine (geometry)Message passingTelecommunicationDebuggerComputer architectureContent (media)Asynchronous Transfer ModeEvent horizonAudiovisualisierungLoop (music)Instance (computer science)NumberSystem callProgrammschleife2 (number)InformationSequenceFunction (mathematics)IdentifiabilityIntegrated development environmentLaptopThread (computing)Suspension (chemistry)Sampling (statistics)Group actionPoint (geometry)Control flowEndliche ModelltheoriePoint cloudCASE <Informatik>FrequencyComputer animation
LaptopComputer fileCellular automatonMessage passingLecture/ConferenceMeeting/Interview
LaptopSystem callConvex hullSanitary sewerConfiguration spaceCellular automatonMassDebuggerVideo game consoleVariable (mathematics)Letterpress printingConnected spaceSimulationMIDIZeno of EleaSimulated annealingPlane (geometry)Demo (music)Function (mathematics)CodeText editorVisual systemImplementationIntegrated development environmentDeclarative programmingDefault (computer science)Group actionLaptop2 (number)Computer programmingCondition numberComputer fileMultiplication signText editorCodeDebuggerVisualization (computer graphics)ImplementationType theoryRight angleBookmark (World Wide Web)Sampling (statistics)Projective planeCellular automatonLetterpress printingElectronic mailing listFunction (mathematics)MiniDiscArray data structureFunctional (mathematics)AudiovisualisierungWindowMereologyMappingSource codeOcean currentReal numberData conversionThomas BayesExtension (kinesiology)WordProgram slicingTerm (mathematics)Presentation of a groupComplete metric spaceKernel (computing)Series (mathematics)Table (information)Control flowPoint (geometry)Integrated development environmentComputer animation
Functional (mathematics)Computer programmingFrame problemTracing (software)DebuggerSet (mathematics)Meeting/Interview
View (database)Function (mathematics)Drum memoryContent (media)Statement (computer science)DebuggerLine (geometry)Letterpress printingEvent horizonFunctional (mathematics)Frame problemLecture/ConferenceComputer animation
Event horizonFunction (mathematics)Sound effectFrame problemRight angleFunctional (mathematics)Similarity (geometry)Computer animationLecture/ConferenceMeeting/Interview
Kernel (computing)Set (mathematics)Functional (mathematics)Connected spaceDebuggerInstance (computer science)Lecture/ConferenceMeeting/Interview
DebuggerCellular automatonComputer programmingNeuroinformatikStatement (computer science)Letterpress printingBitMultiplication signLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
Hello, everyone. My name is Elizaveta Shashkova. And today, I want to tell you about Visual Debugger for Jupyter Notebooks. First, let me introduce myself. I'm software developer at JetBrains. I'm working on the PyCharm IDE. And currently, I'm focused on debugger and data science
tools. We always write code with bugs. But productive developer is not developer who writes code without bugs, but developer who can quickly find and fix them. And Visual Debugger is a tool which can help you to do it really efficiently. Visual Debuggers for Python files
exist in almost every IDE nowadays, but they usually can't work with Jupyter Notebooks because Jupyter Notebook doesn't contain only Python source code. It's a sequence of cells with different types of content, including Python source code. And exactly like code in Python files,
code in Jupyter Notebooks may contain bugs. The most popular ways to find bugs in Jupyter Notebooks nowadays are either pre-statements or common line debugger, IPDB. To be honest, both these ways are not very useful. Pre-statements requires code modification inside your cell
and rerunning your cell to get additional information. And IPDB debugger is based on built-in PDB debugger. Produces a lot of output during debug session and requires remembering all these comments to evaluate some variable or to put breakpoint.
Also, there are some visual wrappers for IPDB, like pixie debugger, for example. But they all have the same limitation, like IPDB. For example, you can't add breakpoint during program execution. You need to wait for program to suspend and to ask you next comment.
The good news is you can see the whole Jupyter ecosystem lacks very important tool, visual debugger. And the good news is that recently, visual debugger for Jupyter Notebooks was implemented in the PyCharm Professional by me. And today, I'll try to explain how it was done.
So the answer for the question from the title is, of course, reality, because otherwise my talk wouldn't exist. As I've already said, usual Python files and Jupyter Notebooks have at least one thing in common. Both of them contain Python source code.
Debuggers for Python already exist, so let's learn how they work and which part we can reuse to build our Jupyter debugger. Most Python debuggers are based on built-in tracing function, which allows you to monitor program execution. So you can define your custom tracing function, pass it to set trace function in sys module,
and it will report you all the events happening in your program. As you can see, tracing function takes three arguments, frame, event, and arc. Frame is an object which contains information of the current place in the program. Event is event, which happened in this place,
and arc is an argument of this event. We defined simple tracing function which brings line number and event which happened on this line number. And let's check how it works. On a simple example, simple function greet neighbors which sends greetings to our neighbors.
On the first line, when we call our function, the event call arrives on the first line because Python called this function greet neighbors. Then Python executes the second line, so the event line arrives on the second line. Then interpreter executes lines three and four,
and we're receiving the following events, and output high Mars appears. After that, during the second loop iteration, Python executes lines three and four again, and an event high Venus appears in output.
And after that, we are returning from function, so Python executes line five, and event line and events return appears on the line five. Okay, how can we use this tracing function to implement brick points in our debugger? On each program event,
tracing function receives a frame object which contains not only line numbers like we've seen in our example, but also a file name of the current code which is stored in code object. Brick point also has a file name and a line where user put it.
So on each call, we can compare brick points file name with frames file name, brick points line number with frames line number, and if these values are equal, we can suspend our program in this place. Cool, so this is how Python debuggers work,
and how we can use tracing function to implement brick points. But execution of Python code in Jupyter notebook files differs from usual Python files. And the next part, let's learn how Jupyter executes Python code and what we should change in existing Python debugger
to implement Jupyter brick points, brick points in Jupyter files. You browse your Jupyter notebook in the front end. For example, to support Jupyter notebooks in PyCharm, we implemented our custom front end which works similarly to the default one. So when you run the first cell in your notebook,
it starts an ipython kernel and establishes connection to it. ipython kernel is in fact a Python process which works similarly to the idle, so it's running in a loop and waits for the next comment to be executed. So when you execute your cell,
front end sends its source code to ipython kernel, ipython kernel compiles it to a code object, executes it, and sends result back to Jupyter notebook. The most interesting part for us here is how kernel executes this code. For every cell execution, kernel generates a unique name for every cell
and passes this name as a file name for generated code object. Usually kernel hides this information from users, but it stores all this generated code object in its internals. That's why when you define some function in the first,
cell executed it. After that, you can call this function in another cell because ipython kernel saved it for you. To implement breakpoints in usual Python files, we can use pair file name, line number to define a place in a source code because this pair uniquely identifies a location
or breakpoint or some source code position. But in Jupyter notebooks, it doesn't work because each cell is a separate code snippet with its own line numbers inside, and all cells are located in the same file. So we can't reuse the same pair for Jupyter breakpoints.
But we already know that ipython kernel generates all the necessary information during cell execution. Executed cell has a generated file name and its internal line numbers. So we can use this pair to define a unique location of our code.
Great. But the problem is that this generated information is available only in ipython kernel and not in our IDE. And when debugger sends some message to the IDE about, for example, debugger suspension, I'm suspended in some place, this message contains generated file name. But IDE doesn't know which cell is suspended
because can't find its source code. It was generated on ipython kernel site. In the IDE, we can introduce some cell identifiers, for example, to find their locations in the editor. But we still need to find a source mapping between these two objects,
cell identifier in the editor and generated file name. I spent a lot of time trying to understand how to implement it, and solution appeared to be quite simple. There are two things which helped during implementation. Firstly, in the IDE, as I've already said,
we have a custom Jupyter front end. That means that we can control all cells execution inside our Jupyter notebook. That means that, for example, we can track all the cells which were executed during the session or send some additional comments. The second thing which helped to implement it
was a silent cell execution supported in Jupyter. That means that when you can execute some code in a silent mode, so it will be executed in the context of the ipython kernel, but it won't be added to kernel history, and it won't increase execution counter.
How did we use it to support source mapping? Now, before sending a real cell code, we can send several util comments in a silent mode. For example, we can silently send a comment which patches function for name generation and saves currently generated name to debugger instance.
Also, we can silently send information about currently executed cell ID and save this value to debugger instance as well. That means that when cell is started to execute, we already know all the necessary information about mapping
because it's saved in our debugger instance. And inside our Jupyter tracing function, we can do the following. When execution is suspended inside a code with some generated name, we can map this generated name to the cell identifier which is stored in our debugger instance, and then send a message to the ID side.
And now this message doesn't contain a generated name, but it contains a cell identifier in the editor. And after that, the ID can quickly find the cell, its source code in the editor, and highlight a suspended line. That's how Jupyter breakpoints were implemented.
Now we know how IPython kernel works and how we can define source mapping for Jupyter breakpoints. But we still have two separate entities, the ID and the IPython kernel, and they should be able to communicate somehow.
As I've already said, we have an ID instance with its custom Jupyter front-end implemented there, and IPython kernel which executes our commands. But for debugger, it isn't enough to send just a source code for execution or some commands in silent mode.
We also need to send a lot of util information, something like user added breakpoint in cell number three, line number two. Or when debugger is suspended, it should be able to send a message like, I'm suspended in some place in the cell. That means that debugger needs
some additional communication channel with IPython kernel. When I started to think about it, I realized that there are two possible solutions for this problem. The first one is to establish an additional connection to IPython kernel, and the second one is to reuse existing Jupyter channels.
The first one is the simplest one, and it's the first thing that came to my mind, but it has some limitations. And the reason of these limitations is in Jupyter architecture. This is more detailed scheme of Jupyter communication model. This is a front-end. When front-end contents to IPython kernel,
it doesn't connect directly. It connects via kernel proxy, which connects to IPython kernel via web sockets. And if we want to avoid Jupyter messaging architecture, we can establish only one direct socket connection to IPython kernel. And of course, it isn't always possible.
For example, if your kernel is located far, far away in some cloud, you can't connect to it without proxy. So the solution with additional connection is currently implemented in PyCharm Professional, but it works only if you can establish a direct connection to IPython kernel.
But Jupyter already have a rich messaging architecture. Maybe we can try to reuse it. Yes, IPython kernel has five sockets. Here I'm showing three, the most important of them, which sends cells for execution, sends output back to front-end, and requests user input.
It would be possible to reuse some of them in our debugger, but there is another serious limitation in Jupyter. IPython kernel runs a tornado event loop in the main thread, which processes execution events. Also, there is a second event loop
in a separate thread, which processes output comments. Each of these event loops is single-threaded. That means that if some comments started to execute, event loop is busy, and the following comments will be executed only when execution is finished. And the following messages, with some debug information,
which we'll send to the same channel, will be blocked. But the problem is that debug information should be sent exactly during cell execution. It's useless when execution is already finished. That means that in the current Jupyter architecture, it's impossible to reuse existing channels
for sending debug information. But wait, everybody knows that IPDB works for both local and remote cases, and it doesn't require any connection. How does it do it? If you remember a workflow with IPDB at debugger, you understand how it works.
To call IPDB debugger inside your Jupyter notebook, you need to add a call to set trace function inside your cell, and after that, debugger starts, suspends, and asks you for some comment. You type some comment, debugger receives it,
does some actions, and asks you for the next comment. You type it again, debugger receives it, and so on and so on. So an IPDB debug session is in fact a sequence of request, reply comments, which kernel sends to front end and back.
And it works so because it's based on built-in input function. And it can reuse an existing input channel, because it's input function, for sending debug comments. It works, but it has some limitations.
For example, if you started to execute some long-running cell at a debugger, and realized that you forgot to put breakpoints in some important place, you have no chance to do it with IPDB. You need to wait for program to suspend and ask you for the next comment, and only after that you can add your breakpoint
or execute some stepping comment. It's okay for common light debugger, but we can't reuse the same technique in our visual debugger. Because in our visual debugger, we want to have an ability to put breakpoint even when debugger is running, and to make program suspend in this place
where we added this breakpoint. That's why in the current implementation in PyCharm Professional, I decided to establish an additional connection, and send all debug util comments separately from Jupyter channels. Well, in this part we've learned how Jupyter debugger sends its util comments,
and why it was implemented this way. Also, we've learned how IPDB works inside. So, it looks like now our visual debugger is ready. Let me remember you how we built it today. Firstly, we defined a tracing function
which can work with code generated by IPython kernel. Secondly, we created a mapping between editor and generated code for cells. We used silent cell execution to implement it, and features of our custom front-end.
And after that we established a debugger connection for sending comments from the IDE side to IPython kernel and back. So, today we've learned how Jupyter visual debugger is implemented. And that means that it's time for the most entertaining part for you, and the most horrifying part for me,
a live demo in PyCharm. Okay, can you see anything? Yeah, you can see. You can see everything.
Here it is, yeah. Can you see everything? Okay, well, this is Jupyter notebook in PyCharm. You can see cells are located on the left side, and Jupyter notebook preview is located on the right side. You can work with cells
as if they are located in one Python file. And it's important thing to notice that we don't convert Jupyter notebook to a Python file. It is still the real Jupyter notebook with IPy and B extension, which is located in your project on the disk.
We just show our custom presentation, so you can work with it like it is in one big Python file. And you can use both features of Python editor, usual Python editor in PyCharm, and features of Jupyter notebooks.
For example, when you type code, you can use the same code completion, which works. Or, for example, here we have some function. You can quickly navigate to function declaration or to any variable declaration. And even if this declaration was in other cell,
PyCharm will navigate you to the correct place. But also, you can use features of PyCharm. For example, you can run cell, and you can see it was executed and output appeared here in the notebook, and it's stored in Jupyter notebook, so it works exactly like your default front end
for Jupyter notebooks. Also, there are many other actions. For example, in the PyCharm 2019.2, it will be available to run all cells in your notebook, or restart kernel, or clean outputs, and do a lot of other things.
Well, but we came here to check that our visual debugger works. Let's put breakpoint, we put it here on the second line, and run debug cell. Debugger is suspended. As you remember, we defined a tracing function,
it established source mapping between editor and kernel, and then sent comment to our editor, and we found a place where it should be suspended. You can see variable values. Here, you can expand them, check their values, and for example, resume programming.
Great, simple breakpoints work. Let's look the next cell. This is greetNeighbors function, which we've already seen today when we discussed tracing functions. Let's put breakpoint here and debug this cell as well.
During the talk, I had time to discuss only breakpoints, but the very important part of every debugger is stepping comments, and they were implemented in PyCharm. Let's try, check that it really works. I can press here, step into, and debug your steps into function declaration in this cell.
Here, we can also execute stepping comments. You can see its values, it changes, and after that, we can go step again and continue our stepping comments in the cell where it was executed. Great, so stepping in the current cell works quite fine.
Let's consider a more complicated code sample. There is a lot of code, but it's quite simple. We have a list of planets, and we iterate over these planets, print its name for each planet,
search for its neighbors, the left one and the right one if they exist, and call the same greetNeighbors function after that and sleep for two seconds because we like sleeping. Well, let's execute our cell under debugger.
Okay, execution is started. You can see output appears, but I forgot to put breakpoint here. Let's add breakpoint, and yeah, we added breakpoint and debugger suspended exactly inside our cell. This is the thing that isn't possible currently in IPDB. You can't add breakpoint,
but in PyCharm with our visual debugger, we can do it. We also can iterate here. For example, we can check where we stopped. We can select this planet's E, evaluate this expression, and we're stopped in Jupyter planet, great.
Also, we can execute stepInto again where we're calling greetNeighbors function, and we navigate it to the correct place where function was defined. Even it was defined in other cell. Okay, we can resume our execution,
remove breakpoint, great. We are checked neighbors for Jupyter, but I would like to learn what are neighbors for Uranus. And I don't want to do a lot of stepping, stepping comments and press resume many, many times. For that, I can put breakpoint,
and then I can set condition for my breakpoint. So I can select this one, suspend my program if only the name of the planet is Uranus, okay?
And start our debugging session again. Let's just hide this. Let's start our debug session. Let's check output is starting to appear, but we are waiting for our condition to work. Okay, okay, it's okay.
We are suspended, we are suspended in, let's check we are suspended in the correct place. The current planet name is Uranus, yeah, so that means that condition for our breakpoint is really worked. And we can check the name of neighbors for Uranus.
Neighbors are Saturn and Neptune, that's correct. Great. We can add breakpoints even during debug session. If you do a lot of some data science work, and you might do a lot when you work with Jupyter Notebooks,
you might work a lot with NumPy arrays or Pandas data series. And you know it's quite, sometimes it's quite difficult to check values in this data science arrays. You can press here in your variable here and it will be opened in a beautiful window as a table,
and you can work, inspect its values, type some slices, and do anything you want. For your, with your code. Okay, so we've checked that Visual Debugger for Jupyter Notebooks really works.
I didn't cheat you during my talk. That's excellent news. And then let's go back. Okay, during my talk today, we learned how to build a Visual Debugger for Jupyter Notebooks. And that means that now, after my talk,
you can implement Visual Debugger for Jupyter Notebooks in your favorite IDE, if it's for some reasons not PyCharm, and if it is PyCharm, I've already implemented it for you, so you can try it right now. Thank you very much for coming to my talk.
Now I'm ready to answer to your questions. Thanks for the talk. I've got one question. When you register a set trace function,
the program is running slower. Oh, sorry. Is the program running slower when you have the set trace function? I didn't get it. Your debugger is based on the set trace function, which does a lot of things.
When it is activated, the program runs slower. It's activated when you passed your function to a set trace. It's activated in the next frame, which was called. So as you can see in this example,
where was it? Here it is. So you can see we set this for the, it will be applied to the next frame, which will be executed. So here we're calling next function grid neighbors. So we are entering the next frame,
and it is activated in this function. And it should return, tracing function should return its, where is it? Here it is. It should return tracing function in the current frame,
or it could return none, and the tracing will be stopped. And is it possible to unregister the trace function, so the revert effect of registering? Yeah, yeah, yeah, you should call sys set trace none, and it will be unregistered. So that's when you push on the play button,
that's what it does? Something similar, yeah, yeah. If you don't have breakpoints, maybe. Okay, thanks. Thank you. I have another question. For you to connect to the kernel, to add a new connection, did you have to modify the kernel
to build a custom kernel, or is it done in the set trace function? No, no, it's only set trace function. We're connecting, we're silently executing this comment which connects to our debugger. Yeah, and we are storing the debugger instance in some internal, so it doesn't modify kernel,
we're just setting this tracing function. And do you have an idea of the performance impact of this set trace function? Yeah, it's as usual for debuggers, of course, it makes your program execution a bit slower, and sometimes much slower, if you have a lot of computations.
But I think it's faster than to add print statements and rerun your cell many, many times. On average, it's slower, but usually you even don't notice it. Thank you.
Any more questions? Thank you very much for coming. You can always find me at the PyCharm booth during the whole conference, so feel free to come and ask me any questions about my talk, about PyCharm, or about anything you want.
Thank you.