Profiling the unprofilable

Video thumbnail (Frame 0) Video thumbnail (Frame 3518) Video thumbnail (Frame 7356) Video thumbnail (Frame 8613) Video thumbnail (Frame 9375) Video thumbnail (Frame 13427) Video thumbnail (Frame 22411) Video thumbnail (Frame 30291) Video thumbnail (Frame 31994) Video thumbnail (Frame 34010) Video thumbnail (Frame 35400) Video thumbnail (Frame 38753) Video thumbnail (Frame 41185) Video thumbnail (Frame 43835) Video thumbnail (Frame 44538)
Video in TIB AV-Portal: Profiling the unprofilable

Formal Metadata

Profiling the unprofilable
Title of Series
Part Number
Number of Parts
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Dmitry Trofimov - Profiling the unprofilable When a program is not fast enough, we call on the profiler to save us. But what happens when the program is hard to profile, like for instance the Python Debugger? In this talk we're going dive deep into Vmprof, a Python profiler, and see how it helps us find out why a debugger can be slow. Once we find the culprit, we'll use Cython to optimise things. ----- Profile is the main way to find slow parts of your application, and it's often the first approach to performance optimisation. While there are quite a few profilers, many of them have limitations. In this talk we're going to learn about the new statistical profiler for Python called Vmprof that is actively being developed by the PyPy team. We'll see how it is implemented and how to use it effectively. We will apply it to an open source project, the Pydev.Debugger, a popular debugger used in IDE's such as Pydev and PyCharm, and with the help of Cython which we'll also dig into, we'll work on optimising the issues we find. Whether it's a Python debugger, a Web Application or any other kind of Python development you're doing, you'll learn how to effectively profile and resolve many performance issues.
Ocean current Statistics Run time (program lifecycle phase) Observational study Software developer Software developer Debugger Theory Machine code Theory Computer programming Subject indexing Category of being User profile Computer animation Intrusion detection system Different (Kate Ryan album) Personal digital assistant Uniform resource name Energy level Extension (kinesiology) Mathematical optimization Newton's law of universal gravitation
Standard deviation Debugger Mereology Tracing (software) Perspective (visual) Machine code Computer programming Core dump Set (mathematics) Library (computing) Exception handling Metropolitan area network Source code Software developer Sampling (statistics) Range (statistics) Variable (mathematics) Flow separation Thread (computing) Connected space Process (computing) Uniform resource name System programming Website Row (database) Frame problem Functional (mathematics) Divisor Dynamic random-access memory Event horizon Pie chart Arithmetic mean Authorization Computer worm Data structure Gamma function Implementation Data type Debugger Projective plane Interactive television Coma Berenices Line (geometry) Loop (music) Event horizon Computer animation Integrated development environment Network socket System on a chip Function (mathematics) String (computer science) Network topology Gastropod shell Exception handling Window
Point (geometry) Frame problem Execution unit Functional (mathematics) Line (geometry) Debugger Computer file Control flow Line (geometry) Event horizon 2 (number) Event horizon Computer animation Telecommunication Wide area network Data type
Point (geometry) Asynchronous Transfer Mode Functional (mathematics) Greatest element Mapping Divisor Computer file Debugger Range (statistics) Maxima and minima Control flow Mathematical analysis Hand fan Machine code Information technology consulting Value-added network Emulation Measurement Bridging (networking) Different (Kate Ryan album) Moving average Physical law Process (computing) Gamma function Conditional-access module Traffic reporting Mathematical optimization Data type Debugger Amsterdam Ordnance Datum Coma Berenices Letterpress printing Line (geometry) Measurement Orbit Data management Computer animation Personal digital assistant Uniform resource name Function (mathematics) Normal (geometry) Iteration Task (computing)
Axiom of choice Thread (computing) Debugger Multiplication sign Set (mathematics) Mereology Perspective (visual) Tracing (software) Arm Machine code Computer programming Variance Measurement Web 2.0 User profile Core dump Row (database) Process (computing) Extension (kinesiology) Library (computing) Social class Chi-squared distribution Metropolitan area network Interior (topology) Sampling (statistics) Range (statistics) Menu (computing) Bit Mereology Statistics Measurement Benchmark Flow separation Thread (computing) User profile Data mining Process (computing) Internet service provider Computer cluster Convex hull Task (computing) Resultant Point (geometry) Asynchronous Transfer Mode Functional (mathematics) Statistics Overhead (computing) Line (geometry) Maxima and minima Plastikkarte Staff (military) Value-added network 2 (number) Writing Arithmetic mean Term (mathematics) Operator (mathematics) Software testing Summierbarkeit Condition number Default (computer science) Addition Default (computer science) Pairwise comparison Execution unit Standard deviation Axiom of choice Graph (mathematics) Debugger Computer program Coma Berenices Basis <Mathematik> Line (geometry) Cartesian coordinate system System call CAN bus Computer animation Personal digital assistant System on a chip Library (computing)
Standard deviation Intel System call State of matter Multiplication sign Stack (abstract data type) Arm Machine code Route of administration Computer programming Area User profile Type theory Different (Kate Ryan album) Physical law Information Library (computing) Stability theory Metropolitan area network Source code Open source Bit Statistics Open set Type theory Arithmetic mean Sample (statistics) Internet service provider System programming Freeware Arithmetic progression Statistics Functional (mathematics) Regulärer Ausdruck <Textverarbeitung> Freeware Overhead (computing) Computer file Open source Line (geometry) Perturbation theory Regular graph Event horizon Implementation Summierbarkeit Overhead (computing) Dependent and independent variables Information Assembly language Forcing (mathematics) Computer program Stack (abstract data type) Line (geometry) System call Event horizon Computer animation Personal digital assistant Function (mathematics) Video game Exception handling
Axiom of choice Greatest element Dynamical system Structural load Debugger Multiplication sign Decision theory Set (mathematics) Design by contract Tracing (software) Arm Computer programming User profile Roundness (object) Process (computing) Endliche Modelltheorie Physical system Metropolitan area network Algorithm Theory of relativity Amsterdam Ordnance Datum Statistics Modulo (jargon) output Convex hull Task (computing) Data structure Point (geometry) Functional (mathematics) Implementation MUD Open source Computer file Algorithm Control flow Value-added network Number Goodness of fit Arithmetic mean Energy level Data structure Gamma function Summierbarkeit Conditional-access module Mathematical optimization Newton's law of universal gravitation Rule of inference Prisoner's dilemma Cellular automaton Weight Coma Berenices CAN bus Computer animation Personal digital assistant Network topology Mathematical optimization
Point (geometry) Frame problem Standard deviation Metropolitan area network Information management Functional (mathematics) Group action System call Computer file Line (geometry) Repetition Witt algebra Control flow Event horizon Computer animation Personal digital assistant Function (mathematics) Network topology Game theory Normal (geometry) Traffic reporting Library (computing) Local ring
Point (geometry) Axiom of choice Frame problem Functional (mathematics) Implementation Statistics Open source Algorithm Line (geometry) Debugger Multiplication sign Control flow Machine code Hand fan Value-added network Emulation 2 (number) Mathematics User profile Different (Kate Ryan album) Energy level Ranking Computer worm Physical law Process (computing) Data structure Mathematical optimization Metropolitan area network Source code Point (geometry) Coma Berenices Line (geometry) Statistics Measurement Event horizon Computer animation Personal digital assistant Uniform resource name Network topology Normed vector space Dew point Video game Self-organization Mathematical optimization Data structure Task (computing) Asynchronous Transfer Mode
Randomization Group action Run time (program lifecycle phase) Debugger Weight Multiplication sign Combinational logic Compiler Arm Route of administration Bit rate Different (Kate Ryan album) Process (computing) Information Source code Building Open source Thread (computing) Fluid statics Compiler Uniform resource name Cycle (graph theory) Task (computing) Data structure Wide area network Frame problem Functional (mathematics) Implementation Open source Divisor Observational study Algorithm Limit (category theory) Infinity Hand fan Value-added network Power (physics) Energy level Conditional-access module Mathematical optimization Plug-in (computing) Raw image format Coma Berenices Line (geometry) Machine code Parity (mathematics) Limit (category theory) Power (physics) Compiler Particle system Event horizon Computer animation Personal digital assistant Interpreter (computing) Speech synthesis Bus (computing) Object (grammar) Mathematical optimization Fingerprint
Frame problem Implementation Functional (mathematics) Primality test Information Line (geometry) Machine code Binary file Variable (mathematics) Route of administration Computer programming Declarative programming Compiler Event horizon Computer animation Uniform resource name Energy level Normal (geometry) Information Summierbarkeit Mathematical optimization Wide area network
Implementation Functional (mathematics) Scripting language Open source Debugger Decision theory Open set Dynamic random-access memory Arm Formal language Power (physics) Revision control Mathematics User profile Semiconductor memory Computer configuration Different (Kate Ryan album) Energy level Extension (kinesiology) Summierbarkeit Library (computing) Mathematical optimization Link (knot theory) Constraint (mathematics) Debugger Coma Berenices Machine code Category of being Word Personal digital assistant Function (mathematics) Network topology Order (biology) Normal (geometry) Website Video game Cycle (graph theory) Limit of a function Row (database)
I'd like to introduce the 3 if enough was that the of all through the mouth and it so the mother who's taking lead and the developer and by chanting and is gonna talk about profiling hide your brief people who are interested in profiling and build afraid of books of marked as advanced picture when I saw this talk current skidoo what is it was it was scared of it myself handed to want to be that hot I hope so 1st of I briefly introduce myself my name is due to try the most and they were afforded brains and to interval before some of which I might be my talk I will be about shown directly but I will use this debugger as a case study for profiling and optimization I If you want to discuss in the Vermont by Sharma just comfortable to breathe index behold the property I've been involved in the development of by Sharma have done a lot of different things but the runtime aspects of python like debugging profiling and execution interested in many more they want to show you how usage of statistical profile so can help atomized program and this program is that it said that said already and will be a Python debugger I will try to stay in high level you the debugger as an example and I details only if necessary so let's begin the best theory is inspired the practice the best practice is inspired by theories said don't know if I like what I'm going to show today is inspired by practice it was a real problem and to some extent still use and the approach the solution to eat at that I'll show later but it was also real there it was like should die at some moment and if you're interested in and you can later look into the code I that but also very interesting is that when preparing for this stock I try to rationalize things and told that
the process by which happened in the past from become authority perspective as if I did that again but more in the right way and to make sure that all of them some knowledge from me and gave me some ideas that I will present and future and I hope that you find something in this talk this happens quite often in our software development work was stopped with a you should take it in the bud so the issue of assessed Divided gets really is slowed down and it provides the article sample and so now we see clearly that this issue is Python debugger I shall debugger that's some part of the pie chart written by from this the same debugger that's used in by the the that's not was project that is maintained by funders of trust me the alter all of them by their and to also it's maintained by by chanting so come to understand better how the body works I recommend to listen to the recording of my talk to your Python 14 is called Python debugger empowered but now I will remind some basic concepts motion debugger consists of 2 parts the part of the IDE site all the visual part is responsible all for the interaction with the user it communicates with the 2nd part at least in the Python process the 2nd part of the Python part receives breakpoints and comments we have for the connection and sense some data it and the data can be the values of variables and stick traces santification spot breakpoints heat and sooner and that's a Python apart by the by the aggregation with some friends I separate event loop and it's actually feels right the ground of process and that all can lead to some before also had and the core of the debugger the use of the tree structure that is actually the window through reach the debugger looks to the use of course use what's happening there by the provides
an API for 3 tracing the court I do function called trace it gets a trace trace functions then that this function is executed on every event that happens with the use of program an event like a line execution of functional or exceptional we're going to function there are a lot of
checks that this function performs for example the checks whether there is a break point for you line and it varies it generates suspended so I think you got an idea how about it looks like there are some President communication with the idea of a ground and there is a trace function that gets events involving the declines
so let's go back to the initiative but when the cold is you normally it runs for 3 seconds In the debug motor with the breakpoint it useful for us what is it about what we excuse for 18 minutes for a long and to um and
unless apart is the issue whether it's actually actually exists so we by charm and you have a cold and x inoculated 18 years we will reduce the the culture that actually about this quality but is just and picture this and that is a simple function will will 1 iteration of the range of the only thing to think that the interest quite might be and we have here in increments so let's reproduce this if you would just running it was fast then we evaluate was a its full orbitals evolves and then replace a breakpoint and we then it works yes so the issue mn let's analyze this issue so we have here 3 different cases normal around but without breakpoints and about with the breakpoint and actually have as we can place a breakpoint in different different lines there are 3 more cases so the device with the break points in the function above the bridge went in some in the same file but not in that article functions and divide will blossom breakpoint in some other files but this thing shows that the last case actually behaves the same as the bottom with all the points of all the reported some other file doesn't affect performance at all so we want look at that OK so basically we have 4 different so for all 4 cases that we have 2 cases with ProPoint in the function in the report in the file debugger works slope have government was being famous ingenious institution and management consultant said you can't improve what you can't measure so before we do anything else prefinal optimization we should be able to measure the performance of the thing we want to make
I In our case this is the core of the sample called is duration so we use more time to write how many seconds it took for the operation to complete so that will be our simple measurement the and
after we apply this measurement to tool all cases we see that the 2 cases with the department breakpoints actually a world of 100 times slower than a moron which is that it says that a whole also made in this particular case with this example it's not possible to make a better so we need to compare this with something with some program which does the same thing and have more or less the same functionality and so we choose before that although it is less functional than by chance divided but it is sufficient for a comparison you can place a breakpoint and PDB will stop but it is also written in so it is in the same class it wouldn't may consist from process written in C this it's different for different application some so and instead of library and so it sounds natural to take it as a performance standard and now we can make a benchmarking after would appear to be a standard we can apply the same measurement and then you can compare results with our device which now will become at baseline in terms of benchmark and to what we see is that the DB being a bit faster still suffers from the same problem and the basis of the Web breakpoint what we want is that it has of the performance drops down dramatically but still it is the before takes by that of mine so but we can try to reach and so the 1st thing we need to do to make the court fast and is to find the bottleneck it doesn't make sense at all to optimize parts of the called doesn't influence the overall performance and the part that influence the overall performance the most called a bottleneck so let's find it and the best way to do that that is profile Refinement is the way to look at your course from a different perspective to to find out what false what homes to take the granted profile is a set of statistics that describes how often and how long and various parts of your program executes a tool that can generate such statistics for a given program is called profile let's use Python profile but 1st we need to choose 1 so let's learn about Python profiles available if you're looking for a Python profiler you'll find several them the most obvious to assess the profiler yet there and wonderful the profile is part of this by the standard libraries is written in C Python the conditions says about 8 see profile is a recommended for most users it's at next C extension with reasonable overhead that makes it suitable for profile along problem Janet refinery is almost the same as the profile but in addition it's able to profile separate threads line providers were different from 2 previous providers it provides statistics what about functions that I think you should but about lines inside the functions also of seen the price rather high overhead because traces every line as the profile is the default choice and we don't need the features of the Afghan provider at least yet let's you see profile and we do that and by Sharon for that case will have to beat up all some people called will be changed to beat because we need you to use at the same time debugging and profiling so we have will subsidize divided from the softball and so we will place the point here and so what we do now is we profile and we continue so the tests started reading to advance so and after that finished we see nope sorry that is not what I want to show let's do that 1 more time continue just started and we should finish yes and we look at the call graph we he's here we see here a lot of we see here a lot of phone calls but actually if you close the we'll see that all of them actually take 0 minutes said that that cold so that our and during the course of the debugger and the calls the to the most of the time actually there are 2 of them are you support this our function and the men work so basically what we're seeing here
is that super profoundly ensures an useful information is all about unprofessional or should we use yappy all line profile of them actually we do you will see that they don't show anything either and so why is that so is
it so what but they haven't
got this question we need to learn a bit about policy profile yappy online profile profile work see profile provides deterministic provide of Python programs or does that you need to provide a means there are actually 2 major types of profile tracing providers and is ordered to Mr. profiler and assembler profile also called statistics tracing providers the trace of the events of writing program an event can be a function call or execution of life and this is the same as that we had with the trace function that by the disadvantage of such profiles that as the trace all the events they ate significant overhead to the as for the dividing Python provides an API for the prefinal function responsible for that it's called that provide that provide is almost the same as the trace the only difference is that the function that we pass their profile function is called is called for every line is called only for functional all these profiles use a separate files that function to up the profile and and that's why the profile on the use of force and algae which also uses of such trace turns out to be out of the scope of the profile so all these profiles breakable in our case so these out about unprofessional actually there are there is another type of 4 providers scold a simple statistical profile profiles operate by simply 7 profile get to the target performance call stack at regular intervals 7 provides a typically less specific and have um and then sometimes require correct but they allow to run the program at at it it's all this full speed the soul they have less overhead which in some cases make them make sure it's a much more accurate than person profile find in statistical profile of a Python is not that easy as addressing far as there is no overstretched but if assertion of diet will find several statistics provided by the proposed as well that I start growth all intelligent amplifier and the emperor let's have a closer look at them to choose the 1 that will be used to perform all about stop the simple profile a written in pure python it's open source it doesn't work unfortunately knows on the market and the next it works uh but it's quite minimal and last time was was 1 blocked or by from low overhead profile is written in pure python so actually it's funny but it's it's not that all had this not that low overhead is it could be and it doesn't work on Windows Snyder and the EU SME innovation heart says that it's a work in progress and it's pretty rough around the ages so not not watches and until the amplifier it's it is very accurate has low overhead but it is proper and not open source you need to buy a license to use it which maybe not the worst thing but it it isn't a suitable in my case is doesn't work on with 6 and to the improv the improves likely statistical profile of that works for Python 2 . 7 5 3 and given by pi this profiler was developed by by by team and presented a year ago at I think about 15 and since that has been developed and actually it will actually increase the stable state um it is a so has a real overhead is open source and free and actually it it's very great it's open source because it's it allowed me for example to red line provided usually during preparation which would be impossible if it weren't for so it seems that is the profile of cultures let's try to use so being brought to profile for followed by the and we
do that again in backed by chance we use another round of relations that the same source sport and we present profile bottom we continue we wait until the plaintiffs just finished yes and after it finished we see that we have here are called the actually that is and that's I see for simple profile this provides you with the gold tree we we're going see actually how you program was executed with the dynamics and we see here that the most of the time I was taken by by all traces function that has the trees functions for the by the so that was the that that is model might alter its fashion self-assembled like not ever possible threats not by all of the prison so we found bottleneck
what should we do next to make program faster we need to optimize it and so optimization can occur at a number of cells typically have the higher levels have greater input that timization can proceed here refinement from higher to lower at the highest level the design may be optimized to make best use of the available resources and expected user the the architectural design of the system highly affects is performance but in our case where be delimited without design decisions as we need to comply with the set that trace EPA contract so these optimization level isn't available for us given an overall design a good choice of efficient algorithms and data structures and diffusion implementations of these incentives that come from next let's see whether we can make an algorithmic optimization that find of the weight of my divided Europe algorithmically let's ask ourselves the question why does the but without breakpoints work so much faster than we've break points and executive file it will go into
the court will find out that in case there is no group points in the Quran file the trace function returns None while if there are any it adjusts itself so in the middle of this function we get that report the file and if there there none of that just a demand and so if we refer
to to the documentation game we see in the last sentence that local trees function from the from a reference to itself or another function for the trace and that's called on 1 2 torn tracing the school so actually if we don't have a break points for the file return of the tracing for the school altogether that's why it works were fast and so why don't we do the same for all but the functions of the file so we can
that it'll change we story of the name of the function where the breakpoint placed and then if you don't have break points for a function there is no need to trace it we just don't know if we
measure the performance of this optimization we see that our our functions start work 110 ms instead of 9 seconds which the deal beyond the general organisms and their implementation concrete source courtship level choices can make significant difference so while next optimization will be on source level but to make session optimization effectively we need to go through the source like lines level for for that line 1 provided can be useful but line profile would help us in that that case as it is implemented by a tree structure instead we use a special mode of the improper follow which the book was introduced there recently and it enables capturing of life statistics from spectra let's use it see held works we will again run it in by Sharon who use another rank of the reason for that with a lot of fun what enabled and we use the same source and we prayers profile bottom and be continued so after you finish we see an trees dispatched function and now what we can do is go to the source and in the source seal uh achievement we show shows us we show line to the most of the time
and so strange but most of the time was taken by these particular lines was 20 per cent and 330 he's from nearly of 1 half thousands and actually what would that lend us is that some of the checks whether we need to trace of these particles or not that speech so when we received that those 2 lines in the beginning there not related at all to this line so what we can do is to move the line in the beginning of this function let's do that so we'll just put it here and also affecting rates thinking about how to optimize the source we can remember that get better is not the optimal way to check whether of an object has an attribute because get after makes a lot of a lot of different things so what we get hold going right this is so we can write the whole world what we're going to write it OK I want to die because my of doesn't allows you to do so we write it this way so we just check whether this on the edge of with which they use them just market is the object of the object and after we're check
the the performance of the world these will see that it makes the source of mization actually gave us us 1 there are several low-level optimizations which aren't available for Python being an interpreter python doesn't have appealed compiled and some runtimes innovations is possible in Python because randomization is for example a Jedi just-in-time transition but is available now only 4 by pi and also sigh so what do and did that timization reached its limits and actually if all level optimization already done in Python doesn't permit us to go deeper we need to go beyond by when you should write everything seems to improve the performance but in that case tools a compatibility with Python implementations of c by for example Jonathan R and Python by by would become incompatible and 2 implementations of the divide 1 in Python and 1 C will make it new features a lot more China what if we could just leave our Python code as a piece of steel optimizer to beat up on L so solution exists it's called cycle 2nd the study compiler for Python which use the combined power of Python and C that is an
example of a program written in Cython that looks exactly like normal Python code except that declaration of variables In the 2nd and 3rd line this declarations of that information which allows so I think about generate more efficient code so this basically provides us with another level possible optimization inaccessible before namely a compiler optimization let's and
2nd of information brought to its function implementation so after
we compile trees functions cycle as a native extension injuries performance wilderness that is made out of either more than twice as fast for seconds that of 9 so now we can compare 3 optimizations combined with the baseline all initial version of the and with the PDB and we see that we have reached the goal and actually done in that a had been but to be general happiness I will say that after we compiled our to bother with size and it became a native quote which can be profiled with the improper well anymore so it is profile obligating ironically but there are still ways to profile at which will leave out of the scope of this property today and to the issue we managed to double the performance before the simple court and from the to get and to be made it a better than B 2 B but still in this particular case it works for them right and maybe it is possible to make it given even more foster given the constraints of the such trace API and so on but still made it there are ways to optimize it so we'll leave that issue open for a while conclusion use profiles defined bottlenecks in your course there are different profile of each has all advantage to learn about them start to optimize things from the higher level the lower and to to optimize Python lower level use either so that's all for today thank you for listening that darlings for the improper father and body if you're interested in and look the code actually this feature of land performance was added to the improv recently mean so it's not available in by Sharon yet but it to you will be available be it like and I will publish it the make on this week I hope so think you're much and there are matched imagery for this and great told so therefore is open for questions has to do with 1st you they give you the microphone it because they're recording everything things however my ideas memory provide what can you help me do that 1 actually and and in this particular case memory profile and wasn't an issue there are differences in memory profile I can recommend to and they're being brought because it supports memory performance the only think it doesn't support you is the performance of the native memory-allocation but that's actually but quite a hard problem in by so if you can have your Python called memory of improv can profile your memory and actually there in in Python 3 . 5 the reason I API from marine provided kind of remember how called this thing is called profound so if you can look at this also questions hi I'm that of of steam and I want to do is to the power of maybe you the new people question about to do it isn't good writing good the the cold in the site on those themselves but also the good and the into it incompatible with the order the patent implementations yes that's a great question by the way they yes it does if you just had a CDF into the you by source each 1 being compatible anymore but what you can do and so what we did in by showing debugger is so we had these site Optimization Optim optional so the only change that you need to make induced by the source to be site and compilable is still at this CDF definitions in the beginning so we use the little tinplate language were pursuing all sources the city of decisions are commented out so the source is running as a normal Python source but tubules Python Cython extension we uncomment this this alliance and the source became site incompatible I can show you actually it's better to see them to see so here we have we can um like this it's a custom templates a small language and it says if it aside and then we have of the center if it's not side and then it's normal life so should be a source of words for all Python implementations and if we need to compile that we do it well you with about the fire where we uncomment these if in case of this site the any more questions was not histamine thinking intriguing this thing