Let’s Build a Python Profiler in 25 LOC
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 132 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/44905 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
User profileComputer programDigitizingTwitterUser profileOrder (biology)Formal languageLibrary (computing)ImplementationProduct (business)Service (economics)Profil (magazine)Demo (music)Computer programmingSession Initiation ProtocolExtension (kinesiology)Web 2.0LengthCalculationPower (physics)Modulare ProgrammierungNumberData conversionComputer animation
02:01
User profileDemo (music)Multiplication sign2 (number)TouchscreenGraph coloringBitPoint (geometry)Computer animation
02:42
Machine codeUser profileLine (geometry)2 (number)Graph coloringModulare ProgrammierungMultiplication signProfil (magazine)Radical (chemistry)Point (geometry)NumberQuicksortResultantComputer animation
03:39
DigitizingCountingDifferent (Kate Ryan album)MereologyComputer programming
04:02
Demo (music)Inclusion mapDigitizingGroup actionString (computer science)LengthNumber2 (number)Goodness of fitProcess (computing)Counting
04:37
Inclusion mapRevision controlAnalytic continuationMathematical optimizationFunctional programming
05:05
Numerical digitLimit (category theory)QuiltInheritance (object-oriented programming)Partition (number theory)BootingFunktorQuicksortMathematicsDemo (music)Object-oriented programmingFibonacci numberMaß <Mathematik>Surjective functionScale (map)Functional programmingFunction (mathematics)System callComputer animation
05:38
Right angleComputer programmingMereologySystem callProcess (computing)Mechanism designLetterpress printingFunctional programmingHierarchyMachine codeStack (abstract data type)Modulare ProgrammierungFrame problemMathematical analysisOcean currentRight angle
06:50
Surjective function8 (number)Stack (abstract data type)Functional programmingMachine codeStack (abstract data type)Level (video gaming)Line (geometry)Ocean currentFrame problemLoop (music)
07:34
Type theoryData typeDeterminismProfil (magazine)Row (database)System callLine (geometry)Functional programmingDifferent (Kate Ryan album)Multiplication signUser profileMachine codeRandom matrixParameter (computer programming)Type theoryEvent horizonStack (abstract data type)Function (mathematics)Student's t-testCross-site scriptingSet (mathematics)Computer fileModulare ProgrammierungDeterminism
09:26
Function (mathematics)Computer programmingProfil (magazine)Functional programmingEvent horizonFunction (mathematics)Multiplication signMachine codeCycle (graph theory)Letterpress printingLine (geometry)System call
10:26
StatisticsComputer programmingProfil (magazine)System callPersonal digital assistantInterrupt <Informatik>Stack (abstract data type)Interpreter (computing)Modulare ProgrammierungFerry CorstenMereologyFunctional programmingRadical (chemistry)Right angleDisk read-and-write headUser profileMachine codeSampling (statistics)Mathematical analysisProof theoryLetterpress printingTouchscreenStatisticsResultantMultilaterationNumberLine (geometry)
12:53
Sampling (music)Maxima and minimaInterior (topology)Machine codeString (computer science)User profileMathematical analysisRight angleMereologySampling (statistics)MultilaterationLocal area networkInclusion map
13:53
StatisticsBuildingComputer programmingMereologyTouchscreenSystem programmingFile formatStatisticsGoodness of fitCASE <Informatik>Frame problemMultiplication signProduct (business)Game controllerSinc functionSampling (statistics)String (computer science)User profileSound effectCountingGraph (mathematics)Variable (mathematics)Function (mathematics)Right angleFerry CorstenComputer animation
16:56
Stack (abstract data type)Sample (statistics)StatisticsLine (geometry)Machine codeDifferent (Kate Ryan album)SpacetimeCheat <Computerspiel>Order (biology)Computer animation
17:30
SicComputer programmingDemo (music)Order (biology)Functional programmingPower (physics)Letterpress printingCalculationMachine codeMultiplication signBefehlsprozessorUser profileStatisticsProduct (business)
18:10
19 (number)Function (mathematics)CalculusFile formatOrder (biology)Multiplication signWeb browserSampling (statistics)NumberGraph (mathematics)Machine codeComputer programming
19:12
Graph (mathematics)Function (mathematics)Sample (statistics)Function (mathematics)Sampling (statistics)Graph (mathematics)File formatComputer programmingMereologyCalculusRight angleComputer animation
19:55
Graph (mathematics)Web browserExecution unitComputer programmingProgrammschleifeSemiconductor memoryComputer fileHard disk driveFunctional programmingTap (transformer)Arrow of timeMobile appMachine codeStatisticsOpen setPauli exclusion principleMiniDisc
21:04
Modulare ProgrammierungLimit (category theory)Link (knot theory)Modulo (jargon)Context awarenessCache (computing)Stack (abstract data type)Line (geometry)PiHigher-order logicFlagSystem on a chipDependent and independent variablesFunction (mathematics)Common Intermediate LanguageRootMenu (computing)EmailGraph (mathematics)Web browserBendingGraph (mathematics)DiagramFunction (mathematics)Online help
21:34
Function (mathematics)Sample (statistics)Dependent and independent variablesLine (geometry)Stack (abstract data type)Multiplication signMereologyCountingDiagramZoom lensMachine codeComputer programmingStandard deviationLibrary (computing)Open setSampling (statistics)Computer animation
22:31
Structural loadLimit (category theory)Modulare ProgrammierungMeta elementGraph (mathematics)Web browserWindowStack (abstract data type)Interior (topology)Function (mathematics)Sample (statistics)Zoom lensResultantComputer animation
23:02
Limit (category theory)Multitier architectureGraph (mathematics)Web browserModulare ProgrammierungModulo (jargon)Line (geometry)Function (mathematics)Sample (statistics)DiagramString (computer science)Function (mathematics)Library (computing)Graph (mathematics)Computer animation
23:25
Web browserGraph (mathematics)Modulare ProgrammierungGraph (mathematics)Type theoryFreewareIntegrated development environmentMereologyBlogFunction (mathematics)User profileLine (geometry)Right angleProfil (magazine)Service (economics)
25:07
User profileMachine codeComputer animation
25:37
Line (geometry)User profileType theoryMereologyMehrprozessorsystemMultiplication signProcess (computing)Service (economics)MultiplicationTouchscreenWorkloadFunction (mathematics)Server (computing)Web 2.0ForestProfil (magazine)System programmingComputer animation
Transcript: English(auto-generated)
00:04
OK, so this is going to be somewhat of a hands-on talk. I'm Noam Elfenbaum. I work at BlueZine, which is a fintech startup in Israel. I help organize the PyWeb.IL, the monthly meetup,
00:22
and the PyCon Israel. It's kind of new, only three years. So you're welcome to hop on the next one. And you can find me online at Twitter and at noamelf.com. So what is a program profile? So a profile is something we do on a program
00:43
in order to understand what its resource consumption is. And we can use it in production to monitor our service. And we can also use it to debug performance problems that we see just locally when we're working. So Python, being a battery-included language,
01:02
has a built-in profiler in the standard library. So it has two of them. One is a pure Python implementation. And after it, someone contributed a C profile, which is just the C extension of the same profiler.
01:20
It's just faster. So we're going to start with a demo, which is the easiest way to understand stuff. So we have this module in our production. And someone's noticing it's being kind of slow. What it does, it does power twice on a number it receives.
01:48
It counts the digits of that number using conversion to string and then length on it. And then it just does the calculation and prints it out. So let's run it live.
02:07
So we're running profiling demo. So it takes quite some time to run. We're running it locally. We're trying to debug. And it will come up soon.
02:21
OK, so it takes 9.1 seconds to run. I'll make the screen a bit larger. In terms of where's the bright. Maybe I can do it in white colors. Let me think.
02:42
Let me think if I can fix that. Don't really want to get into changing the color of the terminal, but try to work it out like that. I hope you see well enough. Maybe we can turn up the front lights here. Oh, just for the number?
03:03
You can see here I can highlight it like this. So it took 9.1 seconds to run. So what we're going to do is we're going to run it under the profiler. So we're going to run Python, add the minutes
03:21
and to run the module C profile, and then use minutes asked to sort it by the total time each line of code took to run. So let's run the code again. We'll take the same length, but we'll see the results. We'll see what happens. Now, before we see it, what do you guys think?
03:43
Which part of the program is making it slow? Is it the superpower or the count digits? Who thinks the superpower? OK, and who thinks it's the count digit? OK, oh, guys, you're good, I must say. I did it in another conference in Israel, and no one liked it.
04:00
It was a different ratio. So we're going to see it as well here. So apparently, counting digits is very expensive in Python because we need to convert a very large number into a string. And then actually, the length method itself is not so expensive, but the transition into a string is very expensive.
04:21
And we can see it takes 8 seconds, that's 7. And the superpower, this is the count digit, and the superpower only takes 200 milliseconds. So good job, guys. Let's continue.
04:40
Now, we want to create an optimized version of it. We want to write it much faster. So I've created an optimized version. Do you think, can anyone think what it does differently? Yeah? Not modulo, something else? Right, exactly, it's the log function.
05:01
I hope you didn't see when I moved it for a second there. So exactly, just use the log 10 function, and then it can move much, much faster. Let's try it now. It took 439 milliseconds, which is much, much better for us. But the output is much bigger.
05:22
You just see this very large output because of the imports. Imports is very, there's a lot of function calls when you do an import. You see a very big output. On the top, you see that the superpower is now the most expensive function. OK, cool. So it looks like magic, right?
05:42
When you look at it, how does Python does that? I know it's so nice to have, but when I saw it, I was wondering what are the mechanisms that helps it do it. So Python has a pretty easy access to its stack from everywhere you are in the Python
06:01
execution in your program. You can see the whole stack of how you got there. And so the most profiler, what they do is run as part of your Python process and set up a trigger to capture the stack. And then it can later do the analysis on it. So let's see how we access the process call stack.
06:25
So we can easily access the stack using the syscoin frame method. And then we'll just run on it, convert it to something that we can access easily with the tracer module extract stack, and print it out.
06:42
So let's run it. Here we have some more functions that help us just to see the hierarchy of the code when we run it. So this stack access. So this is the actual code here. And when we run it, we'll see that the first line is
07:05
in the module level. And it shows we were calling bar. The second line here is in the bar function. And we're calling the show stack function here. And inside the show stack, we see the for loop.
07:23
So we can really easily, just by doing syscoin frame, access Python stack. Now we're missing when it will be called. We need some trigger to call it. So Python, in general, profilers have two types of triggers.
07:41
One is a deterministic trigger, which is then later a deterministic profiler. It's triggered on every function or line call. And then you can see the exact execution path. And we have statistical profile, which is triggered on a certain time interval. So let's see how someone builds a deterministic profiler.
08:05
So Python lets you specify the callback that gets run whenever a line or a function is being executed. You can do it using sys set profile or sys set trace. The difference is the sys set profile
08:21
is calling the handler every time a line or function is being called. And the set trace is being called only when a function is called. Now, when it's called, we'll record the stack. So let's see an example of that.
08:41
So we're using the sys set profile. Here is our handler. The handler receives a call stack argument, an event, and the arg itself. We'll see the output later when we run it. In general, we'll see that the event is a call to a function.
09:02
The argument is the function name. And the line is something that we created here. It's just what line it's on. Now we're going to set the profiler just like that using the sys module, set profile and our handler. And we're going to call hello world just for fun.
09:23
We can execute whatever code we want. So let's see how that goes. OK, so we're going to do Python set profile. And we're going to run it. Now we can see the output.
09:42
So maybe see the code so it will be easier to understand. So this is the code and this is the output. So you see that the first event that was received is a SQL. It's a SQL because Python print function is actually implemented in C. And we see the line itself,
10:03
print hello world. Now we get another event, C return, which is the function return. It's finished. So think about when you want to create the timer. You get the time it started, the time it ended. And then you get another event, which is just the return of the whole executable,
10:22
the whole program. So now we have our deterministic profile. We know how to trigger it. But how do statistical profilers work? So what they do, like we said, they sample the program on a given interval
10:41
instead of every line call on an interval that we decide in advance. And one way to implement it, you can implement it in several ways, but one of the ways is to implement it using an OS interrupt or an OS signal. So let's see the code that uses it.
11:06
So we have our handler. It's very similar to the handler we used in our deterministic profiler. The handler is being called. It received the signal number, which is less interesting for us, and the call stack.
11:20
We extract the call stack. And then we can print it into screen. In a real profile, it will capture the results for later analysis. Now, this is the interesting part for us. This is how we set the actual signal. So we create a signal, sigprof.
11:41
It's a signal that is used for profiling, actually. And we put the handler on it. So when the signal is called, the handler will fire. And we put a timer on it. So the signal will be, like the OS will send the signal every interval that we put.
12:01
This is the first interval, and this is all the intervals after it. And we also need to do something else. We need to remove the timer when the program exits. So this is a pretty nice module that Python had that I wasn't familiar with. It lets you execute something at exit.
12:20
When the interpreter exits, it will execute this function. And we'll just set the timer to zero. Now, this is the explanation. It's nice. OK, so the last part, we're going to start the,
12:41
we're going to call the function that starts the sampler, that sets the trigger and starts the sampler. And then we're going to make our calculation. And we're going to print it to screen. Now, let's run it in our terminal.
13:01
OK, so this is our code. And it's running. So now you're going to see something which is pretty interesting, right? We can see that we only get the LAN STR like before. We only get this part of the code being sampled.
13:24
And I guess you know why it is like that, right? Since we're doing a statistical sampling, we're not seeing, so the answer was it's just the longest one. So since we're doing a statistical profiling, we don't see everything that is being executed. We're just sampling on a certain interval.
13:41
So what you get is less inclusive. It won't impact performance as much. But you also get less accurate analysis later. So now we know how to do both. So now for the practical question, when do we use which profiler?
14:03
Someone say? When will we use the statistical one? And when will we use the deterministic one? What are the use cases? Which use cases are better for which profiler? Yes?
14:25
You wouldn't use it. OK, good instinct. OK, cool. Yeah, so let's, exactly. So that's a big part of the answer. So the statistical profiler, since it's
14:40
controllable and predicted, you know what the interval you're going to sample. And you know what is the effect on your production system will be. So it's much more suitable for production monitoring. But it's less accurate. So when you debug locally, you might want to use the deterministic profiler.
15:02
It introduces a fixed amount of latency. But you don't really mind when you're doing it locally. Yeah, good instinct. So for the most interesting part of our lecture, we want to build a profiler by our own.
15:22
So we have all the parts. We just need to merge them together, merge them in. Let's see. OK. So we already know how to set an interval, right? How to set a sampling interval. So we're going to set the signal,
15:42
set the timer on the signal, and register and remove it once the program exits. Now, the sample method itself is a little different now because we want to collect the data, right? So what we're going to do is extract the stack. And then we're going to do a nice trick where
16:03
we join the stack, right? We join the stack layers into a single string separated by semicolons. And then we count each time we arrived at that path. So we know how many time we spent there. Here we have a global variable.
16:21
We keep it there. This is a naive profiler, right? So we can do global variables and stuff. And when the program finishes, we can call the format stats and output it in a way which is we'll use the flame graph
16:41
tool to show it very nice on screen. And this is its format. It just puts the frames with semicolon, like we said, and then the count of how many time we spent there. So let's go back to our command line. Oh, so firstly, I need to prove to you that it's actually
17:02
25 lines, right? So we want 25 lines of code. And I was pretty generous with the it can be shorter if you're interested. I was pretty generous with the spaces and stuff like that. So you won't say I'm cheating. You see we have the imports on different lines, right?
17:22
We have two spaces here. So there's a lot more. We can put in a lot more stuff. And let's use it. So in order to use it, I've created a demo program. What the demo program does is a simple calculation.
17:41
It has two functions here. The main function is called the calc function. And it does power. I'm using power all the time because it takes some time for the CPU to execute. So we're doing power 100,000 and then power 200,000. And then we're calling the start method on the profiler.
18:03
We're calling the code and formatting the stats, printing it to stdout. So let's run our code. Okay, so this is the format. And I'm not sure if it was clear earlier. So now you can see it more clearly.
18:21
So we see that we spend time here in this whole stack is where we spent this amount of samplings, okay? So we've been in main calc, when calc was with 200,000, 79 times.
18:41
And we've been in calc 100,000, 24 times, which makes sense because it's a larger number, right? So now we can take this code, which it's pretty clear when we're just using it for a very simple program, but it won't be in a larger program, right? So in order for really reviewing it,
19:01
we can output it into graph, no, flame graph, into flame graph, and then output it into our browser. And now the flame graph tool, I'll give the reference to the guy who invented it later, I really like it. It's a really simple format and you get really nice output.
19:22
So it's actually an SVG that shows you a nice diagram of like the relative part of the program where things have been, thanks. So we can see that main had 91 samples, right? 100% of the samples.
19:41
The calc 100,000 had 24% of the samples and the calc 200,000 had 75% of the sample, which is pretty clear, right? Makes sense for us. So this is a simple program, but if we'll take a more complex one,
20:02
for example, downloading peps into our local hard drive, which is still simple, but it's complex enough for our lecture, for our talk. So what this program does, it takes, it composes the URL, it gets to the python.org and downloads a pep, sorry,
20:23
and download the pep using URL open, saving it to memory, reading the request, saving it to memory, and then writing it to a temporary file on disk. And the main function, it loops through four peps,
20:41
in this example, run the save pep and says it's accomplished it. We're outputting it to stdio because we want to use the pipe for something else, stdio we want to use for something else. And that's it. And now we can call our code,
21:00
go to start, run the main function, and then print out the stats. So let's see. So like I said earlier, I'm counting on the wifi here, so that's good, actually I'm surprised. I thought I'd use an ethernet cable. So what we see, we see like we have an output
21:21
that we can't really understand anything from it, and here comes the flame graph tool to the help, and we'll run everything back. But in the end, we receive a very nice diagram. Again, it shows how much time
21:41
it's part of the program was sampled, how much, like the count of the samplings. So what we can do, we can really look into the diagram and start like to understand. Actually, it's nice also for just understanding the code if you want to understand like the standard library or stuff like that, to just see the execution path.
22:01
And if you want to focus on a certain place, for example, we can click on it, and then we'll have everything zoomed out and we'll only zoom on this certain part and see where it spent its time. Tokenizing stuff, opening them,
22:20
we won't really get into that, but it's something that you can do with your own code and really get a good notion of where you've spent time. Another nice thing about it, by the way, if we're here already, it works really nicely with grep. So for example, I can do like a grep on the results
22:44
and remove something, for example, remove, what we can remove, it will be interesting for us.
23:01
So we can remove any connect, any place we have a connect method. And then we receive a much simpler diagram, right? The diagram will be smaller, it will be kind of funny because I just remove a random string, but you can do it for yourself if you want to remove, for example, a certain library,
23:20
you don't want to see its output, so you can just do grep minus V and then it will be gone. Okay, so we saw how it works. I think we've come to the reference part. So the flame graph tool is built by Brendan Gregg and it's really nice, he just built it himself,
23:42
I think in Netflix. And you can use it for all types of things, like you can start doing profiler, you can just create this output and then you'll see it in whatever you want to profile. Julia Evans had a really nice blog post about Ruby and Python profile, which inspired this talk.
24:03
And, which is also pretty interesting, Nylas, which is a San Francisco company, just implemented their own performance monitoring service, just a simple script, and they're using it for all the production environment. We're using it, my company, we're using New Relic, which is really nice, but it has some problems,
24:23
for example, keeping the data, for how long it keeps our data, I think two weeks or stuff like that. And you need to pay for everything if you want to get more, right? And you have Python profilers docs, and this doc can be found at GitHub.
24:40
Okay, thank you, any questions? For questions, we have five minutes. If you have a question, please line up at the microphone. Over there, don't be shy.
25:02
Feel free. If everything was very clear, it's also okay. So, it's also good. Yes. Excellent, very good, very good. Thank you. So, let's start, thanks.
25:21
How would you profile your profiler OLAP? Whoa, it's very meta, how would you profile your profiler? Actually, you do see the profiler code in your profiling, it's pretty funny. We might be able to see it here as well.
25:41
That's pretty fun here, you see? Here we can see the extract stack, that's what we use. So, it does show up in our profiling. How would you profile it? I don't know. You can see it's part of the profiler, right? Profile output, and I think you probably want to keep it out, so you probably do grep minutes v and kick it out of your profiling.
26:03
Okay, thanks. And I will now collect you off the screen. And so, I really like the talk. What about, I don't know, profiling a multi-processing
26:21
type of thing with multiple processes and things? Yeah, so what Nyleth did, that's how they do it, they do it for their real workload. So, they have another service which takes all the outputs of the different processes separately, and then merge them in together,
26:40
and then you can see, across your whole system, your whole processes, where there's time is being spent. So, you just need to create another service for it. They just created a web server, another web server. Okay, thanks. What is the next part of the talk? Huh?