Profiling the unprofilable - TIB AV-Portal

Profiling the unprofilable

00:00

14

Trofimov, Dmitry

Formal Metadata

Title

Profiling the unprofilable

Title of Series

EuroPython 2016

Part Number

110

Number of Parts

169

Author

Trofimov, Dmitry

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/21125 (DOI)

Publisher

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Dmitry Trofimov - Profiling the unprofilable When a program is not fast enough, we call on the profiler to save us. But what happens when the program is hard to profile, like for instance the Python Debugger? In this talk we're going dive deep into Vmprof, a Python profiler, and see how it helps us find out why a debugger can be slow. Once we find the culprit, we'll use Cython to optimise things. ----- Profile is the main way to find slow parts of your application, and it's often the first approach to performance optimisation. While there are quite a few profilers, many of them have limitations. In this talk we're going to learn about the new statistical profiler for Python called Vmprof that is actively being developed by the PyPy team. We'll see how it is implemented and how to use it effectively. We will apply it to an open source project, the Pydev.Debugger, a popular debugger used in IDE's such as Pydev and PyCharm, and with the help of Cython which we'll also dig into, we'll work on optimising the issues we find. Whether it's a Python debugger, a Web Application or any other kind of Python development you're doing, you'll learn how to effectively profile and resolve many performance issues.

EuroPython 2016110 / 169

1

26:50

Welcome to EuroPython 2016

2

27:12

A Million Children (and MicroPython)

3

28:16

Machine Learning for dummies with Python

4

26:11

Dynamic Class Generation in Python

5

41:12

Efficient Django

6

31:14

Python and Async programming

7

25:46

Is that spam in my ham?

8

25:24

Build and control a Python-powered robot.

9

43:18

Hyperconvergence meets BigData

10

18:04

Learn Python The Fun Way

11

23:58

12

37:17

20 years without a 'proper job'

13

25:18

Performant Python

14

21:54

Implementing a Sound Identifier in Python

15

33:22

Python as the keystone of building and testing C++ applications

16

27:41

Behind Closed Doors: Managing Passwords in a Dangerous World

17

28:40

Managing Kubernetes from Python using Kube

18

43:27

Kung Fu at Dawn with Itertools

19

37:32

The Journey from Python Developer to Python Company Owner

20

38:48

Where is the bottleneck?

21

38:41

It's not magic: descriptors exposed

22

42:23

Fast Async Code with Cython and AsyncIO

23

55:13

EuroPython 2016: Lightning Talks I

24

30:08

CFFI: calling C from Python

25

30:17

So you think your Python startup is worth $10 million...

26

45:52

APIs and Microservices With Go

27

35:50

High Performance Networking in Python

28

36:03

CloudABI: Capability based security on Linux/Unix

29

45:33

Minds, machines and Python

30

28:41

Making robots walk with Python

31

30:49

Go for Python Programmers

32

26:28

Exploring our Python Interpreter

33

1:04:13

EuroPython 2016: Lightning Talks II

34

42:17

Salting things up in the DevOps' World: things just got real

35

33:11

How to improve your diet and save money with Python

36

45:18

Automate, contribute, repeat.

37

27:15

Protect your users with Circuit Breakers

38

18:51

Re-Discovering Python's Regular Expressions

39

38:47

Designing a Pythonic Interface

40

41:23

The Joy of Simulation: for Fun and Profit

41

24:09

Writing faster Python

42

32:12

Conda - Easier Installs and Simpler Builds

43

47:20

EuroPython 2016 Recruiting Session

44

44:17

Effective Code Review

45

35:14

Query Embeddings: Web Scale Search powered by Deep Learning and Python

46

24:36

Publish your code so others can use it in 5 easy steps

47

27:28

Game Theory to the Rescue When Hard Decisions Are to Be Made

48

25:00

Per Python ad Astra

49

25:46

Get in control of your workflows with Airflow

50

28:57

Lessons Learned after 190 Million Lessons Served

51

1:05:17

Modern OpenGL with Python

52

38:50

53

38:07

Introduction to aiohttp

54

39:37

Infrastructure as Code: "pip install" your environment

55

33:17

Things I wish I knew before starting using Python for Data Processing

56

44:55

Against the silos: usable encrypted email & the quest for privacy-aware services

57

23:04

Building a reasonably popular web application for the first time.

58

39:34

Metaclasses for fun and profit: Making a declarative GUI implementation

59

45:55

60

1:01:57

Core Developers' Panel

61

56:17

LIGO: The Dawn of Gravitational Wave Astronomy

62

1:02:24

EuroPython 2016: Lightning Talks III

63

33:57

Using Service Discovery to build dynamic python applications

64

35:51

FAT Python: a new static optimizer for Python 3.6

65

32:27

Moving away from NodeJS to a pure python solution for assets

66

22:20

Cybersecurity in the financial sector with Python

67

30:44

EuroPython 2017: Help us build the next edition!

68

1:07:07

EPS General Assembly

69

25:17

70

22:54

Real virtual environments without virtualenv

71

46:21

Documentation-driven development

72

26:32

Simplifying Computer Art in Python

73

22:47

Leveraging documentation power for better web APIs

74

55:28

EuroPython 2016: Lightning Talks IV

75

40:13

What is the best full text search engine for Python?

76

38:11

What's the point of Object Orientation?

77

49:28

Come for the Language, Stay for the Community

78

32:43

Asynchronous network requests in a web application

79

35:01

80

25:07

Async/await in Python 3.5 and why it is awesome

81

27:01

Build your Microservices with ZeroMQ

82

29:21

Music transcription with Python

83

39:05

OMG, Bokeh is better than ever!

84

42:55

Data Formats for Data Science

85

35:15

Python in Astronomy

86

53:22

A deep dive into the Pymongo MongoDB driver

87

41:02

Unveiling the Universe with python

88

26:11

Writing unit tests for C code in Python

89

19:15

Grocker, a Python build chain for Docker

90

32:37

Algorithmic Trading with Python

91

59:59

Planning for the worst

92

43:18

Analyzing Data with Python & Docker

93

40:10

Ethical hacking with Python tools

94

31:00

Using and abusing Python’s double-underscore methods and attributes

95

25:07

Another pair of eyes: Reviewing code well

96

24:42

Writing Redis in Python with asyncio

97

20:49

High Availability Scaling with Share Nothing Architecture

98

43:06

How to conquer the world

99

43:21

I Hate You, NLP... ;)

100

24:12

High Performance Python on Intel Many-Core Architecture

101

41:39

Beyond scraping

102

48:38

Kung Fu al amanecer con itertools

103

30:57

Interactive data Kung Fu with Shaolin

104

39:51

How OpenStack makes Python better (and vice-versa)

105

27:55

FBTFTP: Facebook's open source python3 framework for dynamic TFTP servers.

106

59:57

Managing technical debt

107

24:43

Create secure production environment using Docker

108

20:02

Effectively test your webapp with Python and Selenium

109

33:10

What Python can learn from Haskell packaging

110

36:22

Profiling the unprofilable

111

41:16

Developing a real-time automated trading platform with Python

112

24:24

Python Descriptors for Better Data Structures

113

57:19

The Report Of Twisted’s Death

114

49:19

The Router Game

115

35:40

Un vector por tu palabra

116

38:20

Hacking ético con herramientas Python

117

18:12

Implementación de un Identificador de Sonido en Python

118

22:32

Pytest desde las trincheras

119

40:24

Implementing Parallel Programming Design Patterns using EFL for Python

120

32:53

Build your first OpenStack application with OpenStack PythonSDK

121

44:32

Python in Gravitational Waves Research Communities

122

30:13

MiniBrew: Brewing beer with Python

123

18:51

Keeping the Lights on with Python

124

37:46

Do I need to switch to Go(lang) ?

125

53:10

How to make IT-recruiting suck less.

126

30:43

SQLAlchemy as the backbone of a Data Science company

127

23:12

Datu bistaratze soluzioen garapena Smartcity proiektuetan

128

26:05

EITB Nahieran: askatu bideoak API honen bidez

129

22:24

Buildout Django eta Fabric. Kasu praktikoa euskarazko tokiko hedabideetan

130

43:11

Building Service interfaces with OpenAPI / Swagger

131

30:09

Endor, ipuinak kontatzen zituen Nao robota.

132

22:36

Exploring Python Bytecode

133

35:23

Brainwaves for Hackers 3.0

134

29:42

System Testing with pytest and docker-py

135

21:43

Clean code in Python

136

27:40

Import community

137

31:42

Entendiendo Unicode

138

21:12

Ingesting 35 million hotel images with python in the cloud.

139

25:47

Building beautiful RESTful APIs using Flask

140

23:20

RESTful API - Best Practices.

141

24:59

Testing the untestable: a beginner’s guide to mock objects

142

22:17

Monkey-patching: a magic trick or a powerful tool?

143

25:15

Iteration, iteration, iteration

144

41:27

Get Instrumented!

145

42:22

Handling GPS Data with Python

146

25:47

Wrestling Python into LLVM Intermediate Representation

147

21:26

Python, Data & Rock'n'Roll

148

1:00:17

Deep Learning with Python & TensorFlow

149

45:00

Jupyter for everything else

150

42:12

Scaling Microservices with Crossbar.io

151

37:12

TDD of Python microservices

152

18:40

Server for IoT devices and Mobile devices using Wifi Network

153

23:10

Split Up! Fighting the Monolith

154

27:42

3D Modeling and Printing by Python

155

30:01

Peeking into Python’s C API

156

19:07

Free your papers, researchers!

157

23:13

NetworkX Visualization Powered by Bokeh

158

39:22

Nipy on functional brain MRI

159

39:20

How to migrate from PostgreSQL to HDF5 and live happily ever after

160

21:39

How can machine learning help to predict changes in size of Atlantic herring ?

161

24:27

The value of mindfulness and how it has arrived at Google

162

51:55

EuroPython 2016: Lightning Talks V

163

41:01

Log all the things!

164

36:41

Test-driven code search and reuse coming to Python with pytest-nodev

165

45:03

A Gentle Introduction to Neural Networks (with Python)

166

1:01:47

An Introduction to Deep Learning

167

16:48

Machine Learning: Power of Ensembles

168

54:00

Scientist meets web dev: how Python became the language of data

169

20:17

EuroPython 2016 Closing Session

Automatic playback

Speech

Text

Image

00:00

Uniform resource nameSoftware developerIntrusion detection systemTheoryNewton's law of universal gravitationCASE <Informatik>Observational studySoftware developerIntegrated development environmentProcess (computing)Machine codeScheduling (computing)TheoryDebuggerUser profileStatisticsModal logicRun time (program lifecycle phase)Different (Kate Ryan album)Computer programmingMathematical optimizationCategory of beingExtension (kinesiology)Subject indexingLevel (video gaming)Ocean currentComputer animation

02:21

Coma BerenicesGastropod shellGamma functionMachine codeDebuggerDynamic random-access memoryRange (statistics)Metropolitan area networkThread (computing)Arithmetic meanFrame problemEvent horizonSystem on a chipNetwork socketUniform resource nameFunction (mathematics)System programmingException handlingString (computer science)Data typeSet (mathematics)Source codeImplementationLibrary (computing)Standard deviationComputer wormWide area networkLine (geometry)Computer fileExecution unitLetterpress printingMeasurementProcess (computing)Task (computing)Amsterdam Ordnance DatumValue-added networkMaxima and minimaHand fanEmulationConditional-access moduleMathematical analysisMoving averagePhysical lawLevel (video gaming)Asynchronous Transfer ModeDebuggerTraffic reportingRange (statistics)Line (geometry)Integrated development environmentFunctional (mathematics)Row (database)CASE <Informatik>Different (Kate Ryan album)Event horizonProjective planeDivisorSampling (statistics)Greatest elementNormal (geometry)OrbitPerspective (visual)AuthorizationProcess (computing)IterationMereology2 (number)TelecommunicationControl flowPoint (geometry)Pie chartComputer programmingException handlingData structureMeasurementWindowNetwork topologyComputer fileLoop (music)Core dumpSoftware developerFlow separationTracing (software)Connected spaceVariable (mathematics)Interactive televisionBridging (networking)Information technology consultingWebsiteData managementMachine codeParameter (computer programming)Stack (abstract data type)Asynchronous Transfer ModeCartesian coordinate systemNetwork socketOpen sourceThread (computing)Software testingSoftware bugOverhead (computing)BitSource codeComputer animation

08:48

Gamma functionRange (statistics)MeasurementDebuggerAsynchronous Transfer ModeChi-squared distributionLibrary (computing)Execution unitVarianceMenu (computing)Arithmetic meanMetropolitan area networkArmSummierbarkeitStatisticsMereologyComputer programWritingDefault (computer science)Axiom of choiceSystem on a chipThread (computing)Line (geometry)Coma BerenicesMachine codeProcess (computing)Task (computing)User profileMaxima and minimaStaff (military)PlastikkarteInterior (topology)Computer clusterValue-added networkConvex hullCAN busRow (database)Physical lawStandard deviationLine (geometry)User profileInformationPairwise comparisonMereologyDebuggerFunctional (mathematics)Extension (kinesiology)Core dumpStatisticsAxiom of choiceTask (computing)Multiplication signCASE <Informatik>Machine codeIterationFlow separationSet (mathematics)System call2 (number)Perspective (visual)Endliche ModelltheorieMeasurementDefault (computer science)BitGraph (mathematics)Computer programmingOverhead (computing)Profil (magazine)BenchmarkSource codeTerm (mathematics)Thread (computing)ResultantSampling (statistics)Cartesian coordinate systemLibrary (computing)Tracing (software)Social classBasis <Mathematik>Software testingPoint (geometry)Process (computing)Internet service providerMathematical optimizationOperator (mathematics)AdditionCondition numberWeb 2.0Data miningComputer animation

15:14

SummierbarkeitComputer programType theoryArmEvent horizonException handlingSystem programmingInformationFunction (mathematics)Machine codeSource codeImplementationLibrary (computing)Standard deviationLine (geometry)Sample (statistics)Stack (abstract data type)System callStatisticsIntelPerturbation theoryRoute of administrationOverhead (computing)Open sourceRegulärer Ausdruck <Textverarbeitung>AreaOpen setFreewareProcess (computing)Task (computing)Coma BerenicesConvex hullDebuggerArithmetic meanValue-added networkGamma functionCAN busStructural loadAmsterdam Ordnance DatumModulo (jargon)Conditional-access moduleMUDNewton's law of universal gravitationMetropolitan area networkMathematical optimizationRule of inferenceMachine codeUser profileFunctional (mathematics)Set (mathematics)Computer programmingDebuggerLine (geometry)System callDifferent (Kate Ryan album)Dependent and independent variablesMultiplication signAxiom of choiceFlow separationLevel (video gaming)StatisticsTask (computing)CASE <Informatik>Design by contractStack (abstract data type)Regular graphDecision theoryLimit (category theory)Physical systemOpen sourceBitMathematical optimizationNetwork topologyFreewareConfiguration spaceArithmetic progressionWeb pageOverhead (computing)WindowEvent horizonDeterminismSource codeMacOS XNumberTracing (software)Theory of relativityRoundness (object)State of matterStability theoryInternet service providerCellular automatonPrisoner's dilemmaEndliche ModelltheorieDynamical systemVideo gameGreatest elementAssembly languageArithmetic meanType theoryoutputForcing (mathematics)Computer fileComputer animation

22:11

Metropolitan area networkAlgorithmData structureFrame problemLine (geometry)Event horizonFunction (mathematics)System callLibrary (computing)Standard deviationNormal (geometry)Information managementWitt algebraRepetitionMachine codePoint (geometry)Uniform resource nameMathematical optimizationComputer wormSource codeValue-added networkHand fanNormed vector spaceTask (computing)StatisticsComa BerenicesProcess (computing)DebuggerEmulationDew pointPhysical lawInfinityConditional-access moduleWeightRaw image formatWide area networkThread (computing)InformationParity (mathematics)Plug-in (computing)FingerprintOpen sourceBuildingCompilerLimit (category theory)ArmRoute of administrationBus (computing)CompilerFluid staticsPower (physics)Mathematical optimizationUser profileData structureDebuggerType theoryLine (geometry)Level (video gaming)Open sourceDifferent (Kate Ryan album)Axiom of choiceRun time (program lifecycle phase)AlgorithmFunctional (mathematics)MathematicsAsynchronous Transfer ModeMachine code2 (number)Computer fileCASE <Informatik>ImplementationOnline helpObject (grammar)Stack (abstract data type)Goodness of fitMultiplication signSource codeConfiguration spaceTracing (software)StatisticsRewritingPoint (geometry)Limit (category theory)WeightRandomizationInterpreter (computing)Group actionSelf-organizationControl flowSpeech synthesisBit rateParticle systemNetwork topologyCycle (graph theory)Local ringGame theoryRankingDivisorGreatest elementVideo gameTraffic reportingSource codeComputer animation

29:05

Wide area networkPrimality testSummierbarkeitUniform resource nameEvent horizonInformationRoute of administrationFrame problemBinary fileDebuggerComa BerenicesArmLink (knot theory)Function (mathematics)Library (computing)Dynamic random-access memoryScripting languageMachine codeOpen sourceOpen setDifferent (Kate Ryan album)Functional (mathematics)Semiconductor memoryFormal languageMathematical optimizationCASE <Informatik>MathematicsInformationComputer configurationExtension (kinesiology)User profileCycle (graph theory)DebuggerRow (database)CompilerLimit of a functionPower (physics)Level (video gaming)Network topologyWebsiteLine (geometry)ImplementationDeclarative programmingComputer programmingVariable (mathematics)Order (biology)Constraint (mathematics)Observational studyWordNormal (geometry)Video gameCombinational logicCategory of beingRevision controlDecision theoryMeasurementNP-hard2 (number)Fluid staticsTracing (software)EmailBuildingType theoryCompilation albumMemory managementLink (knot theory)Sampling (statistics)Lecture/Conference

35:59

Lecture/Conference

Transcript: English(auto-generated)

00:00

I'd like to introduce Dimitri Trofimov, who's the team, Trofimov. Trofimov. And Dimitri Trofimov, who's the team lead and a developer on the PyCharm team, and is gonna talk about profiling. Thank you. Hi. You are brave people who are interested in profiling

00:21

and don't afraid of talks marked as advanced. Actually, when I saw this talk in schedule marked as advanced, I was scared a bit myself. It won't be that hard, I hope. So, first, I briefly introduce myself. My name is Dimitri Trofimov.

00:40

I work for JetBrains. I'm team lead and developer for PyCharm IDE. My talk won't be about PyCharm directly, but I will use this debugger as a case study for profiling and optimization. If you want to discuss anything about PyCharm, just come to JetBrains booth in the expo hall to talk with the team.

01:03

Being involved in the development of PyCharm, I have done a lot of different things. But the runtime aspects of Python, like debugging, profiling, and execution, interested me more. Today, I want to show you how usage of statistical profiler can help to optimize program.

01:22

And this program, as I've said already, will be a Python debugger. I will try to stay in the high level, using the debugger as an example, and touch its details only if necessary. So, let's begin. The best theory is inspired by practice.

01:43

The best practice is inspired by theory, said Donald Knuth. I like this saying. What I'm going to show today is inspired by practice. It was a real problem, and to some extent, still is. And the approach, the solution to it, that I will show later,

02:02

it was also real. It was actually done at some moment, and if you're interested in, you can later look into the code. But also very interesting is that when preparing for this talk, I tried to rationalize things, and to look at the process which happened in the past

02:23

from a bit more theoretical perspective. As if I did that again, but more in the right way. And actually that opened some knowledge for me, and gave me some ideas that I will implement in future, and I hope that you find something interesting in this talk too.

02:42

So, as it happens quite often in our software development work, we start with an issue ticket in the bug tracker. So, the issue says debugger gets really slowed down,

03:00

and it provides a code sample, and so we see clearly that this issue is about Python debugger in PyCharm. PyCharm debugger. That's some part of the PyCharm that's written in Python. That's the same debugger that's used in PyDef AD.

03:24

That's an open source project that is maintained by Fabio Zadrosny, the author of the PyDef, and also it's maintained by PyCharm team. To understand better how the debugger works, I recommend to listen the recording of my talk at EuroPython 2014.

03:41

This is called Python Debugger Uncovered. But now I will remind some basic concepts. PyCharm debugger consists of two parts. The part on the IDE side, or the visual part, is responsible for the interaction with the user. It communicates with the second part

04:01

that lives in the Python process. This second part, the Python part, receives breakpoints and comments via socket connection and sends back some data if needed, and the data can be the values of variables and stack traces and notifications about breakpoints hit.

04:21

And so that's the Python application with some threads, IO, and separate event loop, and actually it's always running in the background of the process. And that all can lead to some performance overhead. And the core of the Python debugger

04:42

is the trace function. That is actually the window through which the debugger looks to the user code and sees what's happening there. Python provides an API for tracing the code.

05:01

It is a function called set trace. It gets a trace function as an argument. Then the trace function is executed on every event that happens with the user program. An event like line execution, or function call, or exception, or return from the function.

05:22

There are a lot of checks that trace function performs. For example, it checks whether there is a breakpoint for a given line, and if there is, it generates a suspend event. So I think you've got an idea how a debugger looks like. There are some threads doing communication with the IDE in the background, and there is a trace function

05:41

that gets events about executed lines. So let's go back to the issue ticket. When the code is executed normally, it runs for three seconds. In the debug mode with a breakpoint,

06:00

it executes for 12 seconds. But in the debug mode with breakpoint, it executes for 18 minutes. It's very long.

06:28

And let's reproduce this issue, whether it actually exists. So we open PyCharm, and we have this code. And actually, not to wait 18 minutes, we will reduce the code snippet.

06:44

Actually, about this code snippet, it's just... Actually, that is a simple function with one iteration through the range. The only interesting thing, the range is quite big,

07:01

and we have here an increment. So let's reproduce this issue. We just run it. It was fast. Then we debug it. It was a bit slow, but also fast. And then we place a breakpoint, and we...

07:21

Then it works. Yes. So the issue exists. Let's analyze this issue. So we have here three different cases.

07:40

Normal run, debug without breakpoints, and debug with a breakpoint. And actually, as we can place a breakpoint in different lines, there are three more cases. So it's debug with a breakpoint in the function, debug with a breakpoint in the same file,

08:00

but not in that particular function, and debug plus a breakpoint in some other file. But testing shows that the last case actually behaves the same as debug without breakpoints at all. Breakpoint in some other file doesn't affect performance at all, so we won't look at that case.

08:23

So basically, we have four different cases. So for our four cases, we have two cases with breakpoint in the function and breakpoint in the file. Debug works slow.

08:41

William Edwards Deming, famous engineer, statistician, and management consultant, said, you can't improve what you can't measure. So before we do anything else, profiling optimization, we should be able to measure the performance of the thing we want to make faster. In our case, the core of the sample code is iteration.

09:06

So we use model time to write how many seconds it took for the iteration to complete. So that will be our simple measurement. And after we apply this measurement to our cases, we see that the two cases

09:24

with debug with breakpoints actually work 100 times slower than normal run. Which is a bit sad, but who knows? Maybe in this particular case, with this example,

09:41

it's not possible to make any better. So we need to compare this with something, with some program which does the same thing and have more or less the same functionality. And we choose PDB for that. Although it is less functional than PyCharm debugger, but it is sufficient for our comparison.

10:02

You can place a breakpoint and PDB will stop at it. It is also written in Python, so it is in the same class. It wouldn't make any sense to compare with something written in C, because it has different application. So and PDB is in standard library

10:20

and so it sounds natural to take it as a performance standard. And now we can make benchmarking. After we took PDB as a standard, we can apply the same measurement to it. And then we can compare results with our debugger, which now will become a baseline in terms of benchmarking.

10:42

And what we see is that PDB being a bit faster, still suffers from the same problem. In the cases where breakpoint is set, it has the performance drops down dramatically.

11:00

But still it is a bit faster. It takes five seconds instead of nine. So we can try to reach its performance. And the first thing we need to do to make the code faster is to find the bottleneck. It doesn't make sense at all to optimize parts of the code that doesn't influence the overall performance.

11:21

And the part that influence the overall performance the most called a bottleneck. So let's find it. And the best way to do that is profiling. Profiling is the way to look at your code from the different perspective to find out what goes what and how long does it take for that to run.

11:41

Profile is a set of statistics that describes how often and how long various parts of your program executed. A tool that can generate such statistics for the given program is called profiler. Let's use a Python profiler. But first we need to choose one.

12:02

So let's learn about Python profilers available. If you're looking for a Python profiler, you'll find several of them. The most obvious choices are C profiler, yappi and line profiler. C profile is a part of the Python standard library. It is written in C. Python documentation says about it,

12:22

C profile is recommended for most users. It's an C extension with reasonable overhead that makes it suitable for profiling long-running programs. Yappi profiler is almost the same as C profile. But in addition, it able to profile separate threads.

12:43

Line profiler is very different from two previous profilers. It provides statistics not about functions that are executed but about lines inside the functions. Although written in C, it provides rather high overhead because it traces every line. SC profile is a default choice

13:02

and we don't need the features of yappi and line profiler at least yet. Let's use C profile. And we do that in PyCharm. For that case, we will have a bit, our sample code will be changed a bit

13:20

because we need here to use, at the same time, debugging and profiling. So we will set up debugger from the source code and we'll put place breakpoint here. And what we do now is we profile it and we continue.

13:45

So the task is started. We wait until it finish. So and after that finish, we see, no, sorry.

14:04

That is not what I wanted to show. Let's do that one more time. We continue, the task is started and we wait until it finish.

14:20

Yes. And we look at the call graph. We see here a lot of calls but actually, if we look closer, we'll see that all of them actually take zero milliseconds. That calls, that are internal calls of debugger.

14:44

And the calls that took most of the time, actually, there are two of them, are user code, that's our function and the main work. So basically, what we are seeing here is that C profile didn't show us any useful information.

15:02

Is our debugger unprofitable? Or should we use Yappie or line profiler then? Actually, if we do, we'll see that they don't show anything neither. And so why is that so? Why is it so? It doesn't work.

15:23

Okay, to answer this question, we need to learn a bit about how C profile, Yappie and line profiler work. C profile provides deterministic profiling of Python programs. What does deterministic profiling means?

15:41

There are actually two major types of profilers. Tracing profilers and, or deterministic profilers and sampling profilers, also called statistical profilers. Tracing profilers, they trace the events of the writing program. An event can be a function call or execution of a line.

16:03

That is the same as we had with the trace function in our debugger. The disadvantage of such profilers is that as they trace all the events, they add significant overhead to the execution. As for the debugging, Python provides an API for the profiling.

16:20

The function responsible for that is called set profile. Set profile. It is almost the same as set trace with only difference that the function that we pass to their profile function isn't called for every line. It's called only for function calls.

16:40

All these profiles use a set profile or set trace function to set up the profiling. And that's why they profile on the user code. And our debugger, which also uses a set trace, turns out to be out of the scope of set profile. So all these profilers aren't applicable in our case.

17:04

So is our debugger unprofitable? Actually, there is another type of profilers. It's called a sampler or statistical profilers. Such profilers operate by sampling.

17:21

Sampling profiler captures the target performance call stack at regular intervals. Sampling profilers are typically less specific and sometimes not very accurate. But they allow to run the program at its full speed.

17:41

So they have less overhead, which in some cases make them actually much more accurate than tracing profilers. Finding a statistical profiler for Python is not that easy as a tracing profiler, as there is no obvious choice. But if you search enough time,

18:01

we'll find several statistical Python profilers as well. That are statprof, plop, intl, vt, and amplifier, and vmprof. Let's have a closer look at them to choose the one that we'll use to profile our debugger. Statprof is a sampling profiler written in pure Python. It's open source.

18:21

It doesn't work, unfortunately, on Windows, only on Mac and Linux. It works, but it's quite minimal. And last time it was updated was long ago. Plop or Python low overhead profiler is written in pure Python. So actually it's funny, but it's not that overhead,

18:41

it's not that low overhead as it could be. And it doesn't work on Windows neither. And its main page on GitHub says that it's a work in progress and it's pretty rough around the ages. So not our choice. Intel vt and amplifier, it is very accurate,

19:03

has low overhead, but it is proprietary and not open source. You need to buy a license to use it, which may be not the worst thing, but it isn't suitable in my case as it doesn't work on Mac OS X. And vmprof, vmprof is a lightweight statistical profiler that works for Python 2.7, Python 3, and even PyPy.

19:24

This profiler was developed by PyPy team and presented a year ago at EuroPython 2015. And since that has been developed and actively enriched its stable state.

19:41

It is written in C, so it has a really low overhead. It's open source and free. And actually it's very great it's open source because it allowed me, for example, to add line profiling feature to it during preparation for this talk a week ago, which would be impossible if it weren't open source.

20:02

So it seems that it's a profiler of our choice. Let's try to use vmprof to profile our debugger. And we do that again in PyCharm.

20:20

So we'll use another run configuration for that, the same source code. And we press profile button and we continue. We wait until the main task finishes. Yes, and after it finished,

20:41

we see that we have here a call tree. Actually, that is a nice feature of sampling profiler that provides you with a call tree where you can see actually how your program was executed with timings. And we see here that the most of the time

21:01

was taken by our trace function. That is the trace function for our debugger. So that is the bottleneck. Our trace function itself is a bottleneck. Not everything else, not threads, not IO. It's a trace function.

21:22

So we found bottleneck. What should we do next? To make our program faster, we need to optimize it. And optimization can occur at a number of levels. Typically, the higher levels have greater impact.

21:40

The optimization can proceed via refinement from higher to lower. At the highest level, the design may be optimized to make best use of the available resources and expected use. The architectural design of the system highly affects its performance. But in our case, we're a bit limited with our design decisions

22:01

as we need to comply the set trace API contract. So this optimization level isn't available for us. Given an overall design, a good choice of efficient algorithms and data structures and efficient implementations of these algorithms and data structures come next. Let's see whether we can make

22:21

an algorithmic optimization. To find the way to optimize our debugger algorithmically, let's ask ourselves a question. Why does debug without breakpoints work so much faster than with breakpoints in the executed file?

22:41

If we look into the code, we will find out that in case there is no breakpoints in the current file, the trace function returns none. While if there are any, it returns itself. So in the middle of this function, we get the breakpoint of the file.

23:00

And if there is none, then we just return none. And so if we refer to the documentation again, we see in the last sentence that local trace function should return a reference to itself or another function for further tracing that scope. Or none to turn off tracing that scope.

23:21

So actually, if we don't have breakpoints for the file, we turn off the tracing for the scope altogether. That's why it works very fast. And why don't we do the same but for functions, not for file? So we can add a little change.

23:41

We store the name of the function where the breakpoint is placed. And then if we don't have breakpoints for a function, there is no need to trace it. We just return none. If we measure the performance of this optimization,

24:01

we see that our function started to work 110 milliseconds instead of nine seconds, which is a big deal. Beyond general algorithms and their implementation, concrete source code level choices

24:20

can make significant difference. So our next optimization will be on the source level. But to make such an optimization effectively, we need to go to the source lines level for that line profiling can be useful.

24:41

But line profiler won't help us in that case, as it is implemented by trace function. Instead, we use a special mode of VM prof profiler, which was introduced there recently. And it enables capturing line statistic from stack traces.

25:01

Let's use it and see how it works. We will again run it in PyCharm. We'll use another run configuration for that with line profiler mode enabled. And we use the same source. And we press profile button. And we continue.

25:29

So after it finish, we see our trace dispatch function. And now what we can do is go to source.

25:41

And in the source, we see a heat map. Shows us which line took the most of the time. And it's very strange, but the most of the time was taken by this particular line. It was 20% and 330 heats from nearly 1,500.

26:07

Actually, what that line does is that it checks whether we need to trace this particular thread or not. And that's it. So if we see that those two lines in the beginning,

26:21

they are not related at all to this line. So what we can do is to move this line in the beginning of this function. Let's do that. So we'll just put it here. And also, if we're thinking about how to optimize this source, we can remember that get better

26:41

is not the optimal way to check whether an object has an attribute, because get utter makes a lot of different things. So how we can rewrite this is we can write it. Oh, no, it's not very convenient to write it.

27:08

Okay, I won't type because my setup doesn't allow me to do. So we rewrite it this way. So we just check whether this attribute,

27:21

which is used just as a mark, is in the dict of the object. And after we check the performance of this, we'll see that this source optimization actually gave us one second.

27:41

There are several low-level optimizations which aren't available for Python. Being an interpreter, Python doesn't have build, compile, and assembly phases. Runtime optimization is possible in Python, because runtime optimization is, for example, JIT, that's just runtime optimization, but it's available now only for PyPy and not for Cython.

28:06

So what to do? Did the optimization reach its limit? Actually, if all high-level optimizations are already done and Python doesn't permit us to go deeper,

28:21

we need to go beyond Python. Maybe we should rewrite everything in C to improve the performance. But in that case, we will lose the compatibility with Python implementations other than C Python. For example, Jython, R on Python, and PyPy would become incompatible. And having two implementations of the debugger,

28:41

one in Python and one in C, will make adding new features a lot more harder. Now, what if we could just leave our Python code as it is but still optimize it a bit on lower level?

29:03

So, solution exists, and it's called Cython. Cython is a static compiler for Python which gives the combined power of Python and C. That is an example of program written in Cython. Know that it looks exactly like normal Python code

29:20

except that declaration of variables in the second and third line. These declarations have type information which allows Cython compiler to generate more efficient code. So, this basically provides us with another level of possible optimization inaccessible before, namely compile optimization.

29:43

Let's add Cython type information to our trace function implementation. So, after we compile our trace function with Cython as a native extension and measure its performance, we'll learn that it made our debugger more than twice as fast, four seconds instead of nine.

30:05

So, now we can compare all three optimizations combined with the baseline, our initial version of debugger, and with the PDB, our goal. And we see that we have reached the goal and actually done even better.

30:20

Yay, happiness. But to better our happiness, I will say that after we compiled our debugger with Cython it became a native code which can't be profiled with VMprof well anymore. So, it is unprofitable again, ironically. But there are still ways to profile it

30:40

which will live out of the scope of this talk today. And the issue, we managed to double the performance for the sample code from the issue ticket. And we made it better than PDB.

31:00

But still in this particular case it worked slower than run. And maybe it is possible to make it even, to work even more faster given the constraints of the set trace API and so on. But still, maybe there are ways to optimize it. So, we'll leave that issue open for a while.

31:21

Conclusion. Use profilers to find bottlenecks in your code. There are different profilers. Each has own advantage. Learn about them. Start to optimize things from the higher level to lower. And to optimize Python on the lower level, use Cython. So, that's all for today. Thank you for listening.

31:42

There are links for VMprof profiler and debugger if you are interested in looking into the code. Actually, this feature of line profiling was added to VMprof recently. So, it's not available in PyCharm yet. But it will be available via a plug-in.

32:03

I will publish it on this week, I hope. So, thank you very much. Thank you very much, Dimitri, for this great talk.

32:20

So, the floor is open for questions. May I ask you to wait for us to give you the microphone just because we are recording everything. Thanks. Hi there. My biggest issue is memory profile. Can you help me with that? Actually, in this particular case,

32:43

memory profiling wasn't an issue. If you're interested in memory profile, I can recommend to look at the VMprof because it supports memory profiling. The only thing it doesn't support yet is profiling of the native memory allocations.

33:01

But that's actually quite a hard problem in Python. So, if you have a pure Python code, memory, VMprof can profile your memory. And actually, in Python 3.5, there is an API for memory profiling. I don't remember how it's called. I think it's called memory profiling. So, you can look at it also.

33:27

Questions? Hi, I'm at Kowalski. And I wanted to ask maybe a new question.

33:44

But isn't writing the code in Cython somehow also rendering it incompatible with other Python implementations?

34:01

Yes, that's a great question, by the way. Yes, it does. If you just add a CDF into your Python source, it won't be compatible anymore. But what you can do and what we did in PyCharm Debugger

34:21

is we had these Cython optimizations optional. So, the only change that you need to make in your Python source to be it Cython compilable is to add these CDF definitions in the beginning.

34:41

So, we used a little template language. So, in our source, these CDF definitions are commented out. So, the source is running as a normal Python source. But to build Python Cython extension, we uncomment these lines

35:00

and the source became Cython compilable. I can show you, actually, it's better to see than to say.

35:26

So, here we have, this is a custom template, small language. And it says, if it is Cython, then we have this header.

35:40

If it's not Cython, then it's normal Python. So, actually, this source works for all Python implementations. And if we need to compile that, we do it with a setup.py, where we uncomment this, in case of this Cython.

36:01

Any more questions? Well, if not, please join me in thanking Dimitri again. Thank you very much. Thank you.