Pythonic code vs. performance
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 132 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/45000 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
EuroPython 20185 / 132
2
3
7
8
10
14
15
19
22
27
29
30
31
34
35
41
44
54
55
56
58
59
61
66
74
77
78
80
81
85
87
91
93
96
98
103
104
105
109
110
111
113
115
116
118
120
121
122
123
125
127
128
129
130
131
132
00:00
Mathematical optimizationFormal languageCodePattern languagePairwise comparisonJava appletBitSoftware developerSoftwareComputer animation
00:41
Local ringDemo (music)Mathematical optimizationComputer animation
01:24
Semiconductor memoryMultiplication signDemo (music)BitBasis <Mathematik>Process (computing)Computer animation
02:02
Java appletFormal languageClient (computing)CodeBefehlsprozessorStatisticsComputer animation
02:40
BefehlsprozessorWebsiteStructural load2 (number)System callData conversionSingle-precision floating-point formatSemiconductor memoryComputer animation
03:18
Product (business)Software testingBasis <Mathematik>Semiconductor memorySoftwareIntegrated development environmentScaling (geometry)LeakDifferent (Kate Ryan album)outputDivisorFile systemDatabaseOperator (mathematics)Function (mathematics)Computer animation
03:56
Operations researchSound effectMultiplication signCartesian coordinate systemoutputOperator (mathematics)Limit (category theory)Latent heatFunction (mathematics)NumberService (economics)File systemComputer fileDatabase transactionOpen setDatabaseMobile appComputer animation
04:33
DatabaseCodeLimit (category theory)ImplementationResultantData structureComputer animation
05:11
Proof theoryInternet service providerCodeSoftware testingCASE <Informatik>Set (mathematics)Computer animation
05:49
CASE <Informatik>Software testingIntegrated development environmentIdentifiabilityBenchmarkProduct (business)CodeComputer animation
06:38
Point (geometry)Standard deviationProfil (magazine)MereologyLibrary (computing)Functional (mathematics)Demo (music)BefehlsprozessorComputer animation
07:21
User profileSemiconductor memoryLibrary (computing)Profil (magazine)Computer configurationMultiplication signBitDemo (music)ResultantComputer animation
07:59
Operating systemLevel (video gaming)Semiconductor memoryLibrary (computing)Process (computing)BefehlsprozessorCodeMultiplication signDemo (music)Computer fontComputer animation
08:34
Computer fontComputer animation
09:10
Presentation of a groupProfil (magazine)Functional (mathematics)BefehlsprozessorElectronic mailing list2 (number)
09:49
Normed vector spaceProfil (magazine)Functional (mathematics)String (computer science)Object-oriented programmingComputer animationSource code
10:32
Normed vector spacePoint (geometry)Functional (mathematics)2 (number)NumberSystem callMultiplication signComputer fileLine (geometry)Source code
11:15
Multiplication signFunctional (mathematics)CumulantSystem callParameter (computer programming)Profil (magazine)CASE <Informatik>BefehlsprozessorSource codeComputer animation
11:53
Multiplication signBefehlsprozessorSpeicherbereinigungHTTP cookieProfil (magazine)Point (geometry)2 (number)Source code
12:33
Multiplication signPoint (geometry)Functional (mathematics)String (computer science)ProgrammschleifeBitInformationMeasurementResultantProfil (magazine)
13:53
IntegerElectronic mailing listTotal S.A.Semiconductor memoryPoint (geometry)Line (geometry)Standard deviationProcess (computing)Library (computing)Profil (magazine)ResultantCodeSpeicherbereinigungProgrammschleifeFunctional (mathematics)BefehlsprozessorMultiplication signSign (mathematics)Parameter (computer programming)Computer configurationMusical ensemble
17:28
CodeLine (geometry)Functional (mathematics)Loop (music)Greatest common divisorResultantBinary codeModule (mathematics)Library (computing)Variable (mathematics)Modulo (jargon)Dependent and independent variablesDivisorComputer animation
18:38
Line (geometry)Loop (music)ResultantSystem callSemiconductor memoryProfil (magazine)CodeDisassemblerBefehlsprozessorComputer animation
19:50
Electronic mailing listDifferent (Kate Ryan album)Semiconductor memoryCache (computing)ResultantProcess (computing)Multiplication signSystem callVirtual machineDivisor
21:19
Electronic mailing listBitMultiplication signGame theoryScaling (geometry)
22:17
Normed vector spaceSet (mathematics)Multiplication signFunctional (mathematics)Constructor (object-oriented programming)Key (cryptography)Data dictionaryData typeTupleImplementationPairwise comparisonElectronic mailing listBoolean algebraData conversion
24:05
Electronic mailing listFunctional (mathematics)Subject indexingComputer configurationCASE <Informatik>Data dictionaryComputer animation
25:09
Normed vector spaceCASE <Informatik>BitElement (mathematics)Position operatorImplementationElectronic mailing listMereology2 (number)Loop (music)Heegaard splittingMultiplication signOperator (mathematics)Point (geometry)TwitterComputer animation
27:15
Element (mathematics)TwitterComputer configurationPosition operatorMultiplication signImplementation
27:51
Line (geometry)Variable (mathematics)TupleComputer animation
28:51
Data storage deviceStructural loadMultiplication signProcess (computing)ResultantCodeTupleComputer animation
30:05
String (computer science)Constructor (object-oriented programming)IterationFile formatNumberInterior (topology)CASE <Informatik>Loop (music)2 (number)Computer animation
32:15
State diagramDiscrete element methodInterior (topology)Loop (music)CodeBitProgrammschleifeIterationDifferent (Kate Ryan album)Parameter (computer programming)Variable (mathematics)Object-oriented programmingComputer animation
35:12
Point (geometry)Run time (program lifecycle phase)Semiconductor memoryObject (grammar)Revision controlParameter (computer programming)Structural loadCellular automatonMultiplication signDifferent (Kate Ryan album)Functional (mathematics)NumberRegular graphBitVariable (mathematics)View (database)Code
37:48
PredictionUser profileDemo (music)MereologyProfil (magazine)Point (geometry)Multiplication signSystem callTerm (mathematics)CodeForm (programming)2 (number)FeedbackLine (geometry)Total S.A.Parameter (computer programming)Latent heatField (computer science)Semiconductor memoryGraphical user interfaceFunctional (mathematics)Loop (music)QR codeProcess (computing)Cartesian coordinate systemLink (knot theory)Right angleRepository (publishing)Electronic mailing listLibrary (computing)WebsiteSpacetimeSlide ruleResource allocationMathematical optimizationObject (grammar)CASE <Informatik>Different (Kate Ryan album)Computer fileSoftware testingQuicksortLevel (video gaming)Musical ensemblePredictabilityStatisticsResultantComputer animation
Transcript: English(auto-generated)
00:05
Thanks. Hi, my name is Lukasz Konkol and I will speak about performance optimization and profiling as an examples I will use comparison of Pythonic code and patterns Inherited from other languages like C, C++, Java
00:27
Yeah, I will try Okay, let me introduce myself a bit I'm working as senior Python developer at STX next. It's Python software house located in Poland
00:40
I have also gave my talks at the Python UK last year and PYSS, which is Python conference in San Sebastian in Spain And I have also shared my knowledge during Local meetups in Poznań in Poland, but what's most relevant for the topic
01:05
I'm really interested in a Performance optimizations, and I'm really fascinated in it Let's check up the agenda. I will first introduce you shortly
01:20
to the topic of the performance aspects then we would have a live demo and Then I will summarize it. I talk shortly and at the end We would have some time for questions But before we get started
01:41
Let me get to know you a little bit better Who have ever used any profiling tool? Okay Who is using that frequently like on daily basis in daily job Okay, not that much
02:02
And who is just starting a journey in Python Okay And who has used any other languages and then Python like CC plus plus Java Okay, that's much Okay, so let's get started
02:22
What are the base aspects of performance? First of all, it's CPU We all want our code to be as fast as possible That's what our client and our end user most cares about Google statistics says that
02:42
Then that more than the half of the users will abandon the website if it takes more than three seconds to load and moreover each single second calls customer satisfaction to drop by about 15% and conversion to drop by
03:03
around 5 to 10% Another thing is memory The truth is that we don't care about the memory until we run out of it and I must admit It's fair approach because why should we care about something that is not affecting us?
03:22
but We should be still aware that any memory leak can affect our software for example in Is On daily basis in the testing environments. We don't use as much fixtures as we would have in production So the scale makes a really difference here
03:44
Last factor is input output of operations And for example database operations or file system this may massively influence the first aspect so the
04:02
So the time of loading an application or running But it also has another side effects There are specific limit limitations or on the simultaneous Input output operations For example database has transactions which may lock and hold each other
04:24
File system has limits on number of open of simultaneously open files And moreover some services like Google App Engine provides resources like database up to certain quota Then each read and write to database over that limit costs you real money
04:49
What should we do to optimize our code in a good way? First we have to plan and predict our approach Implementation and the data structure which we will use
05:04
And predict which which one will give us the best results Then we should profile our code It would be good to provide a few proofs of concepts Of course if we can we should use some fake data just to indicate which one would be
05:24
the best But that's not all once we ship our code We should monitor the performance in living ecosystem Keep in mind That you will not be able to test each data set
05:45
You can be almost sure that your end-user Will have some edge case you could not invent on your on your testing Environment And note that this may be also related to the
06:03
Tech stack the user is using Once you have production benchmarks collected you should identify bottlenecks and quick wins And it really depends on each specific case
06:23
You should then take a look into optimizing your code again You should first look into quick wins and bottlenecks But should we optimize Everything of course not there is no point in optimizing
06:44
something that will give you like 5% gain and it will cost you a few weeks or even months of work So about profiling tools there are a lot of them available in a piping I
07:06
Will just introduce a few of them, which I will use during the demo First one is C profile it's to inspect the CPU usage as divided by functions
07:21
That's part of standard library And it's most accurate of the available tools Next one is memory profiler. That's third party library. It's available on piping It's better than other options, but it's still not perfect
07:44
Results may vary a bit which I will present you later during the demo And it also takes some time to profile with the memory Next one is sis. It's built in library It provides a provides us with
08:03
low level operating system API which we can use to to inspect for example CPU usage of the process or the memory and Last tool is this It's also built in our library We can use it to disassemble the Python code
08:23
and Inspect it on lower level Okay, so now it's time for demo is the process font size, okay
08:54
Okay, okay, so let's start with the first example
09:15
First I will show you the usage of the tools. I I will I will use during the presentation
09:23
So first one is the function which I will use profile the CPU usage So I'm creating the list I'm creating the second list and then I'm deleting the first list and returning the
09:45
Second list Okay, and how do we profile it? We can we should import to the C profile Then import the function which we will use and
10:00
Provide it as a string So it can get evaluated
10:38
Okay, so we can see that
10:42
Six functions has been called during during the execution of the main function It took zero point zero two seconds And we got number of calls of each function Here in the last column. We have the file name line number and the function name
11:04
Yeah, we got the number of calls of the of that function we have total time of all the calls Then we got time per one single call average time Then we got the cumulative time Which is from the start up to the end of this function?
11:25
Yeah, and cumulative time per call, okay We can also pass some additional arguments to C profile But I will not do it for now. I just want to show you the simple cases of
11:46
CPU profiling, okay Another Another tool which we can use for profiling CPU is time it but it's not so accurate as C profile mostly because of
12:06
Garbage collector not being crumb in Time it so let's run the second example Okay, we got one second
12:27
one point eighty five seconds it's The time of running the 100 retries on This example. So what we provide here is the real function to
12:43
Measure measure and the setup string which also will be evaluated before running the The real function to to profile Time it is also available
13:01
With the dash M time it then we should provide the setup string and Then the string with the real function to profile this will gives us a bit more readable
13:36
Result so we got the information that we have run
13:41
100 loops best of three of the of these loops is 18 point five milliseconds And we have also nice plug-in for for time it in IPython
14:02
We should just import Import the function and then we use percent sign time it
14:21
CPU profile and here we will have even more user-friendly way of showing the results We will also have the standard deviation
14:42
Yeah, so we got average time here and then the standard deviation Here It has been run in 100 loops and it has been around seven times. Okay, so that's about
15:00
the performance CPU performance and Now let's go to the memory profiling so on the function We need to profile We should call the decorator
15:22
We get we got We should import memory profiler first then Use the profile decorator from this library And we have a few optional parameters like precision. Yeah, that takes a bit longer
16:01
it probes the memory usage of the process after each line So here we got the code we're profiling we got the starting memory usage And we got increment and the total memory usage here. So here we can see that
16:29
first list took About one point eighty five megabytes second one to one point eighty seven megabytes And then when we release actually the least a
16:44
We get back one megabyte of memory What that may mean? That We got a garbage collector didn't run yet or there is one more thing
17:00
Some small integers are being cached by by Python So these are just few of the reasons, but there are definitely more and moreover there is We should also keep in mind that it's just
17:21
Probing so it might not be a hundred percent accurate accurate, okay? Another example will be greatest common divider So
17:40
We will disassemble it and see what the result will be okay All these codes are listed in documentation of this library So we got here the we're setting up the loop
18:05
We're loading the Y variable And then if Y is false we will jump to line 24 So over here
18:26
Okay in next line we're loading X and Y We're calling binary modulo function, then we store it in temp variable Next line is assignment of Y to X
18:43
So we're loading loading Y and storing it in X Next line is assignment of temp to Y We're loading temp and storing Y and we got the end of the loop, so we're jumping back to
19:03
the second second line over here and Then we're loading X and returning it. So that's the simple example of disassembled Python code We'll also use it later for
19:23
more interesting examples Okay another thing is what what Another thing I want to talk about is the reliability of profiling so we'll call
19:44
both CPU profiling and memory profiling three times and see the results
20:00
Yeah, note that We got three different times Keep in mind that it might depend on many factors like another processes Being run in the background. I Tried I tried to isolate that virtual machine as much as I could
20:24
But But it still differs between the calls About the memory. We also got different results here First
20:41
First reason behind it is as I said before Python caching and It's mostly visible in difference between first call and the other tools So here we got different result and here we got the same result at the end The increment still differs, but it's because of caching
21:07
In Python, okay another
21:21
Example is Creating lists by list comprehension by appending the result and by extending the list
21:44
so definitely the fastest is List comprehension Then four times slower is Creating least by appending each subsequent item
22:00
and a bit faster is extending the least so whether you would have the possibility to extend use it because it's Small game but in the scale it can really differ
22:21
next example is comparison of data types Tuples lists sets and the dictionaries. So the fastest is the least
22:41
Then we got sets Tuple is much slower than Least and set and dictionary is Even more slower is the slowest of these collections, but we should keep in mind that
23:01
Dictionaries are Keys and values pairs. So the construction of it takes time moreover. There is a hashing function which is hashing the keys and We can see also the size of of these collections. So smallest one is is tuple
23:23
Then we got least set and dictionary Set is much larger than than tuples and the least It's almost four times larger
23:41
but the reason behind it is that It's just optimized dictionary in the implementation so it just have keys and Dummy values and It's using just just the the keys of the of the dictionary and that's how set is implemented
24:11
Okay, next example is using the Values of the two lists iterating by indexes
24:25
Zip function and Using a dictionary so zip function is the fastest one
24:43
then we got dictionary and Then we got iterating using indexes of two lists, so zip looks like the best option, but It's really not so usable in most cases
25:04
but Dictionary is still fine next example is Checking if element is contained within the list So first one is just in keyword
25:24
Second one is running through all the list and checking if the element is present And the last case is a binary search over the list So we're splitting the list in
25:40
two equal parts and comparing if The element is present in left or on or in the right part of the list then doing that recursively Note that List must be sorted before running that operation So I
26:01
Compare all these checks For two cases first one is positive case, so we will find this element in the list and Second one is that we will not find this element in the list. Okay, so
26:35
First one is in keyword So we can see that it runs two times slower if element is not present in the list
26:46
So the implementation here is a bit optimized for loop over the elements Then we can see the for loop over the over the elements implemented in Python
27:02
it's in positive case that's 1.6 seconds and for loop is 2.5 seconds and it also keeps the the Trend of Positive to negative ratio, so it's still two times slower and
27:24
last one binary binary search It is definitely the fastest one. So in keyword is Pythonic way to check if element is present, but if we really care about performance and really need to run it fast
27:43
binary search is the best option here, even if the Implementation is more complex, but it's still not that scary, right? It's just Less than 15 lines So if we really care about performance, we should use more complex solutions
28:03
But which are more efficient okay, the next one is swapping the variables first example shows the swapping using temporary variable and
28:23
second one is swapping tuple so assigning tuple of y x to x y Okay, so
28:43
swapping Using temporary variable appear to be faster now Okay That's not what I expected that might be as some Some process running in the background which is interrupting that yeah again, but different is not so
29:07
So we can now probably if I run it again It will finally give expected results. No. Yeah. Yeah, it's it's finally finally gives the smaller
29:20
Value for swapping tuples, so let's see how it's How it looks in this assembled code We got Here it's it's swapping tuples. So we're loading x and y and x then rotating these two
29:40
Storing in storing x and storing y and then returning known While here we got load fast store fast load fast so fast and again load fast or fast, it's Three times load and store instead of to store and load and store
30:11
Next example is efficiency of string construction. First one is f string Introduced in Python 3.6 and next one is formatted string and the last one is
30:28
Percent percent formatted string so f strings is are definitely fastest
30:45
Then we see that Percent formatted strings are in the middle and Format is longest. Yeah, it's nice to use but it's slow But now in Python 3.6. We got f strings. So let's use it
31:09
okay, so we got nested loop here and What I will do is iterate
31:20
over the smaller Portion of data in outer loop then Iterate over larger amount of data in outer loop and then split it equally between outer and inner loop so in all these cases we got the same number of
31:45
iterations, okay, so maybe a Let's guess which one will be the fastest so the first first example who thinks that it will be the fastest
32:03
Okay, second example. Okay, and the last one Okay Not everyone plays a bet but okay Let's run it. Okay. So this one is the fastest one. So second one
32:35
So iterating over more items in outer loop
32:40
Which I will expect otherwise, but it's up here. It's not Yeah, and there is not much difference between The first and the last example Yeah Let's see at the disassembled code because it may explain a bit more
33:04
So here we're starting first outer loop Here we got we start we're starting In a loop here. We're finishing the inner loop and
33:21
Here we're finishing outer loop. So It's really a What we would expect is that iterations over inner loop will be faster But it appears that the iteration over outer loop is faster
33:46
why is that because here we're We're jumping To the inner loop and here we're jumping to outer loop
34:05
So if we jump over here frequently It's a bit slower than jumping over here Not sure if that's
34:21
If if that explains it, but yeah It's it's pretty pretty hard to explain on that disassembled code But The fact is that More
34:40
loops in outer loop is more efficient than Than in the inner loop. Okay next example Using global variable versus using parameter So again who thinks that the global variable would be more efficient
35:04
Okay Who thinks that parametrized variable more efficient, okay Let's see we're frequently using
35:21
The Global variable But it's not so efficient as parametrized version it's We will see their Reason over here in disassembled code, so their only difference here is that
35:44
In global global variable is loaded with load global instead of load fast over here So that's the only difference And it appears twice Where we load this global variable or X parameter of the function
36:10
Okay Next example is slots who is familiar with slots Okay, so I will explain it a bit deeper so slots are the least that
36:26
Parameter names that Object will be restricted to so if we got slots X over here We cannot dynamically assign a cell of dot Y for example
36:40
It will just throw the runtime error. We will not be able to do that okay, so But that sounds like an Restriction for us, but what does it mean from for from performance point of view?
37:02
It's Definitely faster with slots and as we will see in the second It consumes a much less memory than just the regular object without slots So we can see that it's three times more memory here used
37:23
The reason behind it is that objects without slots Will over allocate the memory So that we can we can see it over here, it's three times more That may of course differ based on
37:43
Different different number of slots if we for example gives a give here like five five five variable names or ten it will differ Okay, so that would be all from demo part let's summarize it
38:08
Or plea okay, so first Predict what would be the best solution based on your experience?
38:22
Based on the examples you just saw or just based on your hunch Predict but to have preliminary data and don't trust it at all always profile your code
38:41
Predictions might be misleading When I was preparing examples for this demo, I encountered a lot of surprises We have also seen seen one. I have to rerun this three times because Data data was Not perfect enough. I would say
39:02
So things do not do not behave as we expect And Yeah, we should always profile our code Even if you think it's fast enough or fastest Just check it for your own
39:22
For your own peace of mind, right and the last thing calculate return on investment So try to find the best ratio between possible profit and optimization cost There is no point in spending weeks on optimization, which will give you in unnoticeable improvement
39:46
Hey, thank you for for your attention You can find slides on the left side in the link or QR code. I Will also put it in on our Python website. I
40:01
Have also pushed the code snippets to GitHub links also available here It's Repository is private for now. I will make it public right after the talk and I also appreciate any feedback about my talk so I can improve myself to give better talks
40:22
There is a link on the right side to the feedback form. It's simple anonymous form I would really appreciate if you spend a few seconds to share your opinion Thank you very much
40:43
So we have time for a couple of questions, I see their hands Thank you for the talk I was wondering is there any easy way in C profiler to give kind of like a person's text So there was total time there was cumulative time. Is there anything where you just give a parameter so that
41:07
One line or one allocation took let's say 80% of the total time something like that Just not to calculate the whole thing. Yes, so that might be not so visible in the examples
41:20
but there is total time of all the calls or total time of or time per call for the function you can also Give a parameter to C profile to sort the results by specific column Okay, so we can sort for by total time or time per call and also one short question
41:42
So in terms of that slot the your example didn't allocate anything to a particular slot So let's say if you allocate a hundred megs or like read a file Of the same size with slot and without slot. Do you think will make a difference? Yeah, definitely because
42:02
It's it's always good to profile the specific case because that depends what you're putting in the in this variable but in general Object without slot is over allocated so it will just allocate some some memory Right after the object to have some space to some dynamic assignments
42:26
Thank you Who's okay?
42:42
I've got two questions actually. The first one is are there any graphic tools for the for the profiling libraries you've shown? Yes, there are graphic tools. I didn't list them because I wanted simplest tool which will be most accurate but there are many tools you can just google it and
43:01
Yeah, there are a lot of them. I checked like week or two weeks ago There were a lot of them. All right. Thanks. And the second question is Since the profiling you've shown us is not really reliable and not hundred percent reliable Should we rerun it multiple times to get a better?
43:22
Understanding of how the how the application performs. Yeah. Yeah when I run the memory profiler a few times in the loop, for example It gives it stabilizes with with the time So first run is like a bit inaccurate But then it's stabilized and gives at least the total memory usage on the stable level then increment also
43:47
Stabilized like we see we saw in the example with when I run three times the same function with with memory profiler Total total memory for the process has stabilized in the second and third
44:07
line but Increment still did not stabilize but it will stabilize at some point Because it probes the memory
44:20
By inserting the the probes into the code Okay, we have to stop here. Thank you very much for a talk