Profiling the unprofilable
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 110 | |
Number of Parts | 169 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/21125 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 2016110 / 169
1
5
6
7
10
11
12
13
18
20
24
26
29
30
31
32
33
36
39
41
44
48
51
52
53
59
60
62
68
69
71
79
82
83
84
85
90
91
98
99
101
102
106
110
113
114
115
118
122
123
124
125
132
133
135
136
137
140
143
144
145
147
148
149
151
153
154
155
156
158
162
163
166
167
169
00:00
Uniform resource nameSoftware developerIntrusion detection systemTheoryNewton's law of universal gravitationCASE <Informatik>Observational studySoftware developerIntegrated development environmentProcess (computing)Machine codeScheduling (computing)TheoryDebuggerUser profileStatisticsModal logicRun time (program lifecycle phase)Different (Kate Ryan album)Computer programmingMathematical optimizationCategory of beingExtension (kinesiology)Subject indexingLevel (video gaming)Ocean currentComputer animation
02:21
Coma BerenicesGastropod shellGamma functionMachine codeDebuggerDynamic random-access memoryRange (statistics)Metropolitan area networkThread (computing)Arithmetic meanFrame problemEvent horizonSystem on a chipNetwork socketUniform resource nameFunction (mathematics)System programmingException handlingString (computer science)Data typeSet (mathematics)Source codeImplementationLibrary (computing)Standard deviationComputer wormWide area networkLine (geometry)Computer fileExecution unitLetterpress printingMeasurementProcess (computing)Task (computing)Amsterdam Ordnance DatumValue-added networkMaxima and minimaHand fanEmulationConditional-access moduleMathematical analysisMoving averagePhysical lawLevel (video gaming)Asynchronous Transfer ModeDebuggerTraffic reportingRange (statistics)Line (geometry)Integrated development environmentFunctional (mathematics)Row (database)CASE <Informatik>Different (Kate Ryan album)Event horizonProjective planeDivisorSampling (statistics)Greatest elementNormal (geometry)OrbitPerspective (visual)AuthorizationProcess (computing)IterationMereology2 (number)TelecommunicationControl flowPoint (geometry)Pie chartComputer programmingException handlingData structureMeasurementWindowNetwork topologyComputer fileLoop (music)Core dumpSoftware developerFlow separationTracing (software)Connected spaceVariable (mathematics)Interactive televisionBridging (networking)Information technology consultingWebsiteData managementMachine codeParameter (computer programming)Stack (abstract data type)Asynchronous Transfer ModeCartesian coordinate systemNetwork socketOpen sourceThread (computing)Software testingSoftware bugOverhead (computing)BitSource codeComputer animation
08:48
Gamma functionRange (statistics)MeasurementDebuggerAsynchronous Transfer ModeChi-squared distributionLibrary (computing)Execution unitVarianceMenu (computing)Arithmetic meanMetropolitan area networkArmSummierbarkeitStatisticsMereologyComputer programWritingDefault (computer science)Axiom of choiceSystem on a chipThread (computing)Line (geometry)Coma BerenicesMachine codeProcess (computing)Task (computing)User profileMaxima and minimaStaff (military)PlastikkarteInterior (topology)Computer clusterValue-added networkConvex hullCAN busRow (database)Physical lawStandard deviationLine (geometry)User profileInformationPairwise comparisonMereologyDebuggerFunctional (mathematics)Extension (kinesiology)Core dumpStatisticsAxiom of choiceTask (computing)Multiplication signCASE <Informatik>Machine codeIterationFlow separationSet (mathematics)System call2 (number)Perspective (visual)Endliche ModelltheorieMeasurementDefault (computer science)BitGraph (mathematics)Computer programmingOverhead (computing)Profil (magazine)BenchmarkSource codeTerm (mathematics)Thread (computing)ResultantSampling (statistics)Cartesian coordinate systemLibrary (computing)Tracing (software)Social classBasis <Mathematik>Software testingPoint (geometry)Process (computing)Internet service providerMathematical optimizationOperator (mathematics)AdditionCondition numberWeb 2.0Data miningComputer animation
15:14
SummierbarkeitComputer programType theoryArmEvent horizonException handlingSystem programmingInformationFunction (mathematics)Machine codeSource codeImplementationLibrary (computing)Standard deviationLine (geometry)Sample (statistics)Stack (abstract data type)System callStatisticsIntelPerturbation theoryRoute of administrationOverhead (computing)Open sourceRegulärer Ausdruck <Textverarbeitung>AreaOpen setFreewareProcess (computing)Task (computing)Coma BerenicesConvex hullDebuggerArithmetic meanValue-added networkGamma functionCAN busStructural loadAmsterdam Ordnance DatumModulo (jargon)Conditional-access moduleMUDNewton's law of universal gravitationMetropolitan area networkMathematical optimizationRule of inferenceMachine codeUser profileFunctional (mathematics)Set (mathematics)Computer programmingDebuggerLine (geometry)System callDifferent (Kate Ryan album)Dependent and independent variablesMultiplication signAxiom of choiceFlow separationLevel (video gaming)StatisticsTask (computing)CASE <Informatik>Design by contractStack (abstract data type)Regular graphDecision theoryLimit (category theory)Physical systemOpen sourceBitMathematical optimizationNetwork topologyFreewareConfiguration spaceArithmetic progressionWeb pageOverhead (computing)WindowEvent horizonDeterminismSource codeMacOS XNumberTracing (software)Theory of relativityRoundness (object)State of matterStability theoryInternet service providerCellular automatonPrisoner's dilemmaEndliche ModelltheorieDynamical systemVideo gameGreatest elementAssembly languageArithmetic meanType theoryoutputForcing (mathematics)Computer fileComputer animation
22:11
Metropolitan area networkAlgorithmData structureFrame problemLine (geometry)Event horizonFunction (mathematics)System callLibrary (computing)Standard deviationNormal (geometry)Information managementWitt algebraRepetitionMachine codePoint (geometry)Uniform resource nameMathematical optimizationComputer wormSource codeValue-added networkHand fanNormed vector spaceTask (computing)StatisticsComa BerenicesProcess (computing)DebuggerEmulationDew pointPhysical lawInfinityConditional-access moduleWeightRaw image formatWide area networkThread (computing)InformationParity (mathematics)Plug-in (computing)FingerprintOpen sourceBuildingCompilerLimit (category theory)ArmRoute of administrationBus (computing)CompilerFluid staticsPower (physics)Mathematical optimizationUser profileData structureDebuggerType theoryLine (geometry)Level (video gaming)Open sourceDifferent (Kate Ryan album)Axiom of choiceRun time (program lifecycle phase)AlgorithmFunctional (mathematics)MathematicsAsynchronous Transfer ModeMachine code2 (number)Computer fileCASE <Informatik>ImplementationOnline helpObject (grammar)Stack (abstract data type)Goodness of fitMultiplication signSource codeConfiguration spaceTracing (software)StatisticsRewritingPoint (geometry)Limit (category theory)WeightRandomizationInterpreter (computing)Group actionSelf-organizationControl flowSpeech synthesisBit rateParticle systemNetwork topologyCycle (graph theory)Local ringGame theoryRankingDivisorGreatest elementVideo gameTraffic reportingSource codeComputer animation
29:05
Wide area networkPrimality testSummierbarkeitUniform resource nameEvent horizonInformationRoute of administrationFrame problemBinary fileDebuggerComa BerenicesArmLink (knot theory)Function (mathematics)Library (computing)Dynamic random-access memoryScripting languageMachine codeOpen sourceOpen setDifferent (Kate Ryan album)Functional (mathematics)Semiconductor memoryFormal languageMathematical optimizationCASE <Informatik>MathematicsInformationComputer configurationExtension (kinesiology)User profileCycle (graph theory)DebuggerRow (database)CompilerLimit of a functionPower (physics)Level (video gaming)Network topologyWebsiteLine (geometry)ImplementationDeclarative programmingComputer programmingVariable (mathematics)Order (biology)Constraint (mathematics)Observational studyWordNormal (geometry)Video gameCombinational logicCategory of beingRevision controlDecision theoryMeasurementNP-hard2 (number)Fluid staticsTracing (software)EmailBuildingType theoryCompilation albumMemory managementLink (knot theory)Sampling (statistics)Lecture/Conference
35:59
Lecture/Conference
Transcript: English(auto-generated)
00:00
I'd like to introduce Dimitri Trofimov, who's the team, Trofimov. Trofimov. And Dimitri Trofimov, who's the team lead and a developer on the PyCharm team, and is gonna talk about profiling. Thank you. Hi. You are brave people who are interested in profiling
00:21
and don't afraid of talks marked as advanced. Actually, when I saw this talk in schedule marked as advanced, I was scared a bit myself. It won't be that hard, I hope. So, first, I briefly introduce myself. My name is Dimitri Trofimov.
00:40
I work for JetBrains. I'm team lead and developer for PyCharm IDE. My talk won't be about PyCharm directly, but I will use this debugger as a case study for profiling and optimization. If you want to discuss anything about PyCharm, just come to JetBrains booth in the expo hall to talk with the team.
01:03
Being involved in the development of PyCharm, I have done a lot of different things. But the runtime aspects of Python, like debugging, profiling, and execution, interested me more. Today, I want to show you how usage of statistical profiler can help to optimize program.
01:22
And this program, as I've said already, will be a Python debugger. I will try to stay in the high level, using the debugger as an example, and touch its details only if necessary. So, let's begin. The best theory is inspired by practice.
01:43
The best practice is inspired by theory, said Donald Knuth. I like this saying. What I'm going to show today is inspired by practice. It was a real problem, and to some extent, still is. And the approach, the solution to it, that I will show later,
02:02
it was also real. It was actually done at some moment, and if you're interested in, you can later look into the code. But also very interesting is that when preparing for this talk, I tried to rationalize things, and to look at the process which happened in the past
02:23
from a bit more theoretical perspective. As if I did that again, but more in the right way. And actually that opened some knowledge for me, and gave me some ideas that I will implement in future, and I hope that you find something interesting in this talk too.
02:42
So, as it happens quite often in our software development work, we start with an issue ticket in the bug tracker. So, the issue says debugger gets really slowed down,
03:00
and it provides a code sample, and so we see clearly that this issue is about Python debugger in PyCharm. PyCharm debugger. That's some part of the PyCharm that's written in Python. That's the same debugger that's used in PyDef AD.
03:24
That's an open source project that is maintained by Fabio Zadrosny, the author of the PyDef, and also it's maintained by PyCharm team. To understand better how the debugger works, I recommend to listen the recording of my talk at EuroPython 2014.
03:41
This is called Python Debugger Uncovered. But now I will remind some basic concepts. PyCharm debugger consists of two parts. The part on the IDE side, or the visual part, is responsible for the interaction with the user. It communicates with the second part
04:01
that lives in the Python process. This second part, the Python part, receives breakpoints and comments via socket connection and sends back some data if needed, and the data can be the values of variables and stack traces and notifications about breakpoints hit.
04:21
And so that's the Python application with some threads, IO, and separate event loop, and actually it's always running in the background of the process. And that all can lead to some performance overhead. And the core of the Python debugger
04:42
is the trace function. That is actually the window through which the debugger looks to the user code and sees what's happening there. Python provides an API for tracing the code.
05:01
It is a function called set trace. It gets a trace function as an argument. Then the trace function is executed on every event that happens with the user program. An event like line execution, or function call, or exception, or return from the function.
05:22
There are a lot of checks that trace function performs. For example, it checks whether there is a breakpoint for a given line, and if there is, it generates a suspend event. So I think you've got an idea how a debugger looks like. There are some threads doing communication with the IDE in the background, and there is a trace function
05:41
that gets events about executed lines. So let's go back to the issue ticket. When the code is executed normally, it runs for three seconds. In the debug mode with a breakpoint,
06:00
it executes for 12 seconds. But in the debug mode with breakpoint, it executes for 18 minutes. It's very long.
06:28
And let's reproduce this issue, whether it actually exists. So we open PyCharm, and we have this code. And actually, not to wait 18 minutes, we will reduce the code snippet.
06:44
Actually, about this code snippet, it's just... Actually, that is a simple function with one iteration through the range. The only interesting thing, the range is quite big,
07:01
and we have here an increment. So let's reproduce this issue. We just run it. It was fast. Then we debug it. It was a bit slow, but also fast. And then we place a breakpoint, and we...
07:21
Then it works. Yes. So the issue exists. Let's analyze this issue. So we have here three different cases.
07:40
Normal run, debug without breakpoints, and debug with a breakpoint. And actually, as we can place a breakpoint in different lines, there are three more cases. So it's debug with a breakpoint in the function, debug with a breakpoint in the same file,
08:00
but not in that particular function, and debug plus a breakpoint in some other file. But testing shows that the last case actually behaves the same as debug without breakpoints at all. Breakpoint in some other file doesn't affect performance at all, so we won't look at that case.
08:23
So basically, we have four different cases. So for our four cases, we have two cases with breakpoint in the function and breakpoint in the file. Debug works slow.
08:41
William Edwards Deming, famous engineer, statistician, and management consultant, said, you can't improve what you can't measure. So before we do anything else, profiling optimization, we should be able to measure the performance of the thing we want to make faster. In our case, the core of the sample code is iteration.
09:06
So we use model time to write how many seconds it took for the iteration to complete. So that will be our simple measurement. And after we apply this measurement to our cases, we see that the two cases
09:24
with debug with breakpoints actually work 100 times slower than normal run. Which is a bit sad, but who knows? Maybe in this particular case, with this example,
09:41
it's not possible to make any better. So we need to compare this with something, with some program which does the same thing and have more or less the same functionality. And we choose PDB for that. Although it is less functional than PyCharm debugger, but it is sufficient for our comparison.
10:02
You can place a breakpoint and PDB will stop at it. It is also written in Python, so it is in the same class. It wouldn't make any sense to compare with something written in C, because it has different application. So and PDB is in standard library
10:20
and so it sounds natural to take it as a performance standard. And now we can make benchmarking. After we took PDB as a standard, we can apply the same measurement to it. And then we can compare results with our debugger, which now will become a baseline in terms of benchmarking.
10:42
And what we see is that PDB being a bit faster, still suffers from the same problem. In the cases where breakpoint is set, it has the performance drops down dramatically.
11:00
But still it is a bit faster. It takes five seconds instead of nine. So we can try to reach its performance. And the first thing we need to do to make the code faster is to find the bottleneck. It doesn't make sense at all to optimize parts of the code that doesn't influence the overall performance.
11:21
And the part that influence the overall performance the most called a bottleneck. So let's find it. And the best way to do that is profiling. Profiling is the way to look at your code from the different perspective to find out what goes what and how long does it take for that to run.
11:41
Profile is a set of statistics that describes how often and how long various parts of your program executed. A tool that can generate such statistics for the given program is called profiler. Let's use a Python profiler. But first we need to choose one.
12:02
So let's learn about Python profilers available. If you're looking for a Python profiler, you'll find several of them. The most obvious choices are C profiler, yappi and line profiler. C profile is a part of the Python standard library. It is written in C. Python documentation says about it,
12:22
C profile is recommended for most users. It's an C extension with reasonable overhead that makes it suitable for profiling long-running programs. Yappi profiler is almost the same as C profile. But in addition, it able to profile separate threads.
12:43
Line profiler is very different from two previous profilers. It provides statistics not about functions that are executed but about lines inside the functions. Although written in C, it provides rather high overhead because it traces every line. SC profile is a default choice
13:02
and we don't need the features of yappi and line profiler at least yet. Let's use C profile. And we do that in PyCharm. For that case, we will have a bit, our sample code will be changed a bit
13:20
because we need here to use, at the same time, debugging and profiling. So we will set up debugger from the source code and we'll put place breakpoint here. And what we do now is we profile it and we continue.
13:45
So the task is started. We wait until it finish. So and after that finish, we see, no, sorry.
14:04
That is not what I wanted to show. Let's do that one more time. We continue, the task is started and we wait until it finish.
14:20
Yes. And we look at the call graph. We see here a lot of calls but actually, if we look closer, we'll see that all of them actually take zero milliseconds. That calls, that are internal calls of debugger.
14:44
And the calls that took most of the time, actually, there are two of them, are user code, that's our function and the main work. So basically, what we are seeing here is that C profile didn't show us any useful information.
15:02
Is our debugger unprofitable? Or should we use Yappie or line profiler then? Actually, if we do, we'll see that they don't show anything neither. And so why is that so? Why is it so? It doesn't work.
15:23
Okay, to answer this question, we need to learn a bit about how C profile, Yappie and line profiler work. C profile provides deterministic profiling of Python programs. What does deterministic profiling means?
15:41
There are actually two major types of profilers. Tracing profilers and, or deterministic profilers and sampling profilers, also called statistical profilers. Tracing profilers, they trace the events of the writing program. An event can be a function call or execution of a line.
16:03
That is the same as we had with the trace function in our debugger. The disadvantage of such profilers is that as they trace all the events, they add significant overhead to the execution. As for the debugging, Python provides an API for the profiling.
16:20
The function responsible for that is called set profile. Set profile. It is almost the same as set trace with only difference that the function that we pass to their profile function isn't called for every line. It's called only for function calls.
16:40
All these profiles use a set profile or set trace function to set up the profiling. And that's why they profile on the user code. And our debugger, which also uses a set trace, turns out to be out of the scope of set profile. So all these profilers aren't applicable in our case.
17:04
So is our debugger unprofitable? Actually, there is another type of profilers. It's called a sampler or statistical profilers. Such profilers operate by sampling.
17:21
Sampling profiler captures the target performance call stack at regular intervals. Sampling profilers are typically less specific and sometimes not very accurate. But they allow to run the program at its full speed.
17:41
So they have less overhead, which in some cases make them actually much more accurate than tracing profilers. Finding a statistical profiler for Python is not that easy as a tracing profiler, as there is no obvious choice. But if you search enough time,
18:01
we'll find several statistical Python profilers as well. That are statprof, plop, intl, vt, and amplifier, and vmprof. Let's have a closer look at them to choose the one that we'll use to profile our debugger. Statprof is a sampling profiler written in pure Python. It's open source.
18:21
It doesn't work, unfortunately, on Windows, only on Mac and Linux. It works, but it's quite minimal. And last time it was updated was long ago. Plop or Python low overhead profiler is written in pure Python. So actually it's funny, but it's not that overhead,
18:41
it's not that low overhead as it could be. And it doesn't work on Windows neither. And its main page on GitHub says that it's a work in progress and it's pretty rough around the ages. So not our choice. Intel vt and amplifier, it is very accurate,
19:03
has low overhead, but it is proprietary and not open source. You need to buy a license to use it, which may be not the worst thing, but it isn't suitable in my case as it doesn't work on Mac OS X. And vmprof, vmprof is a lightweight statistical profiler that works for Python 2.7, Python 3, and even PyPy.
19:24
This profiler was developed by PyPy team and presented a year ago at EuroPython 2015. And since that has been developed and actively enriched its stable state.
19:41
It is written in C, so it has a really low overhead. It's open source and free. And actually it's very great it's open source because it allowed me, for example, to add line profiling feature to it during preparation for this talk a week ago, which would be impossible if it weren't open source.
20:02
So it seems that it's a profiler of our choice. Let's try to use vmprof to profile our debugger. And we do that again in PyCharm.
20:20
So we'll use another run configuration for that, the same source code. And we press profile button and we continue. We wait until the main task finishes. Yes, and after it finished,
20:41
we see that we have here a call tree. Actually, that is a nice feature of sampling profiler that provides you with a call tree where you can see actually how your program was executed with timings. And we see here that the most of the time
21:01
was taken by our trace function. That is the trace function for our debugger. So that is the bottleneck. Our trace function itself is a bottleneck. Not everything else, not threads, not IO. It's a trace function.
21:22
So we found bottleneck. What should we do next? To make our program faster, we need to optimize it. And optimization can occur at a number of levels. Typically, the higher levels have greater impact.
21:40
The optimization can proceed via refinement from higher to lower. At the highest level, the design may be optimized to make best use of the available resources and expected use. The architectural design of the system highly affects its performance. But in our case, we're a bit limited with our design decisions
22:01
as we need to comply the set trace API contract. So this optimization level isn't available for us. Given an overall design, a good choice of efficient algorithms and data structures and efficient implementations of these algorithms and data structures come next. Let's see whether we can make
22:21
an algorithmic optimization. To find the way to optimize our debugger algorithmically, let's ask ourselves a question. Why does debug without breakpoints work so much faster than with breakpoints in the executed file?
22:41
If we look into the code, we will find out that in case there is no breakpoints in the current file, the trace function returns none. While if there are any, it returns itself. So in the middle of this function, we get the breakpoint of the file.
23:00
And if there is none, then we just return none. And so if we refer to the documentation again, we see in the last sentence that local trace function should return a reference to itself or another function for further tracing that scope. Or none to turn off tracing that scope.
23:21
So actually, if we don't have breakpoints for the file, we turn off the tracing for the scope altogether. That's why it works very fast. And why don't we do the same but for functions, not for file? So we can add a little change.
23:41
We store the name of the function where the breakpoint is placed. And then if we don't have breakpoints for a function, there is no need to trace it. We just return none. If we measure the performance of this optimization,
24:01
we see that our function started to work 110 milliseconds instead of nine seconds, which is a big deal. Beyond general algorithms and their implementation, concrete source code level choices
24:20
can make significant difference. So our next optimization will be on the source level. But to make such an optimization effectively, we need to go to the source lines level for that line profiling can be useful.
24:41
But line profiler won't help us in that case, as it is implemented by trace function. Instead, we use a special mode of VM prof profiler, which was introduced there recently. And it enables capturing line statistic from stack traces.
25:01
Let's use it and see how it works. We will again run it in PyCharm. We'll use another run configuration for that with line profiler mode enabled. And we use the same source. And we press profile button. And we continue.
25:29
So after it finish, we see our trace dispatch function. And now what we can do is go to source.
25:41
And in the source, we see a heat map. Shows us which line took the most of the time. And it's very strange, but the most of the time was taken by this particular line. It was 20% and 330 heats from nearly 1,500.
26:07
Actually, what that line does is that it checks whether we need to trace this particular thread or not. And that's it. So if we see that those two lines in the beginning,
26:21
they are not related at all to this line. So what we can do is to move this line in the beginning of this function. Let's do that. So we'll just put it here. And also, if we're thinking about how to optimize this source, we can remember that get better
26:41
is not the optimal way to check whether an object has an attribute, because get utter makes a lot of different things. So how we can rewrite this is we can write it. Oh, no, it's not very convenient to write it.
27:08
Okay, I won't type because my setup doesn't allow me to do. So we rewrite it this way. So we just check whether this attribute,
27:21
which is used just as a mark, is in the dict of the object. And after we check the performance of this, we'll see that this source optimization actually gave us one second.
27:41
There are several low-level optimizations which aren't available for Python. Being an interpreter, Python doesn't have build, compile, and assembly phases. Runtime optimization is possible in Python, because runtime optimization is, for example, JIT, that's just runtime optimization, but it's available now only for PyPy and not for Cython.
28:06
So what to do? Did the optimization reach its limit? Actually, if all high-level optimizations are already done and Python doesn't permit us to go deeper,
28:21
we need to go beyond Python. Maybe we should rewrite everything in C to improve the performance. But in that case, we will lose the compatibility with Python implementations other than C Python. For example, Jython, R on Python, and PyPy would become incompatible. And having two implementations of the debugger,
28:41
one in Python and one in C, will make adding new features a lot more harder. Now, what if we could just leave our Python code as it is but still optimize it a bit on lower level?
29:03
So, solution exists, and it's called Cython. Cython is a static compiler for Python which gives the combined power of Python and C. That is an example of program written in Cython. Know that it looks exactly like normal Python code
29:20
except that declaration of variables in the second and third line. These declarations have type information which allows Cython compiler to generate more efficient code. So, this basically provides us with another level of possible optimization inaccessible before, namely compile optimization.
29:43
Let's add Cython type information to our trace function implementation. So, after we compile our trace function with Cython as a native extension and measure its performance, we'll learn that it made our debugger more than twice as fast, four seconds instead of nine.
30:05
So, now we can compare all three optimizations combined with the baseline, our initial version of debugger, and with the PDB, our goal. And we see that we have reached the goal and actually done even better.
30:20
Yay, happiness. But to better our happiness, I will say that after we compiled our debugger with Cython it became a native code which can't be profiled with VMprof well anymore. So, it is unprofitable again, ironically. But there are still ways to profile it
30:40
which will live out of the scope of this talk today. And the issue, we managed to double the performance for the sample code from the issue ticket. And we made it better than PDB.
31:00
But still in this particular case it worked slower than run. And maybe it is possible to make it even, to work even more faster given the constraints of the set trace API and so on. But still, maybe there are ways to optimize it. So, we'll leave that issue open for a while.
31:21
Conclusion. Use profilers to find bottlenecks in your code. There are different profilers. Each has own advantage. Learn about them. Start to optimize things from the higher level to lower. And to optimize Python on the lower level, use Cython. So, that's all for today. Thank you for listening.
31:42
There are links for VMprof profiler and debugger if you are interested in looking into the code. Actually, this feature of line profiling was added to VMprof recently. So, it's not available in PyCharm yet. But it will be available via a plug-in.
32:03
I will publish it on this week, I hope. So, thank you very much. Thank you very much, Dimitri, for this great talk.
32:20
So, the floor is open for questions. May I ask you to wait for us to give you the microphone just because we are recording everything. Thanks. Hi there. My biggest issue is memory profile. Can you help me with that? Actually, in this particular case,
32:43
memory profiling wasn't an issue. If you're interested in memory profile, I can recommend to look at the VMprof because it supports memory profiling. The only thing it doesn't support yet is profiling of the native memory allocations.
33:01
But that's actually quite a hard problem in Python. So, if you have a pure Python code, memory, VMprof can profile your memory. And actually, in Python 3.5, there is an API for memory profiling. I don't remember how it's called. I think it's called memory profiling. So, you can look at it also.
33:27
Questions? Hi, I'm at Kowalski. And I wanted to ask maybe a new question.
33:44
But isn't writing the code in Cython somehow also rendering it incompatible with other Python implementations?
34:01
Yes, that's a great question, by the way. Yes, it does. If you just add a CDF into your Python source, it won't be compatible anymore. But what you can do and what we did in PyCharm Debugger
34:21
is we had these Cython optimizations optional. So, the only change that you need to make in your Python source to be it Cython compilable is to add these CDF definitions in the beginning.
34:41
So, we used a little template language. So, in our source, these CDF definitions are commented out. So, the source is running as a normal Python source. But to build Python Cython extension, we uncomment these lines
35:00
and the source became Cython compilable. I can show you, actually, it's better to see than to say.
35:26
So, here we have, this is a custom template, small language. And it says, if it is Cython, then we have this header.
35:40
If it's not Cython, then it's normal Python. So, actually, this source works for all Python implementations. And if we need to compile that, we do it with a setup.py, where we uncomment this, in case of this Cython.
36:01
Any more questions? Well, if not, please join me in thanking Dimitri again. Thank you very much. Thank you.