We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Pythonic code vs. performance

00:00

Formal Metadata

Title
Pythonic code vs. performance
Title of Series
Number of Parts
132
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Idiomatic Python is beautiful. If you’re new to Python, this talk is for you because I’m going to reveal the charm of python in front of you. I’ll present how boilerplate code can be replaced with idiomatic python. If you’re experienced python developer, this talk is also for you because I’ll compare the performance of the idiomatic code, both from CPU and memory point of view. Some of these results may surprise you.
Keywords
35
74
Thumbnail
11:59
Mathematical optimizationFormal languageCodePattern languagePairwise comparisonJava appletBitSoftware developerSoftwareComputer animation
Local ringDemo (music)Mathematical optimizationComputer animation
Semiconductor memoryMultiplication signDemo (music)BitBasis <Mathematik>Process (computing)Computer animation
Java appletFormal languageClient (computing)CodeBefehlsprozessorStatisticsComputer animation
BefehlsprozessorWebsiteStructural load2 (number)System callData conversionSingle-precision floating-point formatSemiconductor memoryComputer animation
Product (business)Software testingBasis <Mathematik>Semiconductor memorySoftwareIntegrated development environmentScaling (geometry)LeakDifferent (Kate Ryan album)outputDivisorFile systemDatabaseOperator (mathematics)Function (mathematics)Computer animation
Operations researchSound effectMultiplication signCartesian coordinate systemoutputOperator (mathematics)Limit (category theory)Latent heatFunction (mathematics)NumberService (economics)File systemComputer fileDatabase transactionOpen setDatabaseMobile appComputer animation
DatabaseCodeLimit (category theory)ImplementationResultantData structureComputer animation
Proof theoryInternet service providerCodeSoftware testingCASE <Informatik>Set (mathematics)Computer animation
CASE <Informatik>Software testingIntegrated development environmentIdentifiabilityBenchmarkProduct (business)CodeComputer animation
Point (geometry)Standard deviationProfil (magazine)MereologyLibrary (computing)Functional (mathematics)Demo (music)BefehlsprozessorComputer animation
User profileSemiconductor memoryLibrary (computing)Profil (magazine)Computer configurationMultiplication signBitDemo (music)ResultantComputer animation
Operating systemLevel (video gaming)Semiconductor memoryLibrary (computing)Process (computing)BefehlsprozessorCodeMultiplication signDemo (music)Computer fontComputer animation
Computer fontComputer animation
Presentation of a groupProfil (magazine)Functional (mathematics)BefehlsprozessorElectronic mailing list2 (number)
Normed vector spaceProfil (magazine)Functional (mathematics)String (computer science)Object-oriented programmingComputer animationSource code
Normed vector spacePoint (geometry)Functional (mathematics)2 (number)NumberSystem callMultiplication signComputer fileLine (geometry)Source code
Multiplication signFunctional (mathematics)CumulantSystem callParameter (computer programming)Profil (magazine)CASE <Informatik>BefehlsprozessorSource codeComputer animation
Multiplication signBefehlsprozessorSpeicherbereinigungHTTP cookieProfil (magazine)Point (geometry)2 (number)Source code
Multiplication signPoint (geometry)Functional (mathematics)String (computer science)ProgrammschleifeBitInformationMeasurementResultantProfil (magazine)
IntegerElectronic mailing listTotal S.A.Semiconductor memoryPoint (geometry)Line (geometry)Standard deviationProcess (computing)Library (computing)Profil (magazine)ResultantCodeSpeicherbereinigungProgrammschleifeFunctional (mathematics)BefehlsprozessorMultiplication signSign (mathematics)Parameter (computer programming)Computer configurationMusical ensemble
CodeLine (geometry)Functional (mathematics)Loop (music)Greatest common divisorResultantBinary codeModule (mathematics)Library (computing)Variable (mathematics)Modulo (jargon)Dependent and independent variablesDivisorComputer animation
Line (geometry)Loop (music)ResultantSystem callSemiconductor memoryProfil (magazine)CodeDisassemblerBefehlsprozessorComputer animation
Electronic mailing listDifferent (Kate Ryan album)Semiconductor memoryCache (computing)ResultantProcess (computing)Multiplication signSystem callVirtual machineDivisor
Electronic mailing listBitMultiplication signGame theoryScaling (geometry)
Normed vector spaceSet (mathematics)Multiplication signFunctional (mathematics)Constructor (object-oriented programming)Key (cryptography)Data dictionaryData typeTupleImplementationPairwise comparisonElectronic mailing listBoolean algebraData conversion
Electronic mailing listFunctional (mathematics)Subject indexingComputer configurationCASE <Informatik>Data dictionaryComputer animation
Normed vector spaceCASE <Informatik>BitElement (mathematics)Position operatorImplementationElectronic mailing listMereology2 (number)Loop (music)Heegaard splittingMultiplication signOperator (mathematics)Point (geometry)TwitterComputer animation
Element (mathematics)TwitterComputer configurationPosition operatorMultiplication signImplementation
Line (geometry)Variable (mathematics)TupleComputer animation
Data storage deviceStructural loadMultiplication signProcess (computing)ResultantCodeTupleComputer animation
String (computer science)Constructor (object-oriented programming)IterationFile formatNumberInterior (topology)CASE <Informatik>Loop (music)2 (number)Computer animation
State diagramDiscrete element methodInterior (topology)Loop (music)CodeBitProgrammschleifeIterationDifferent (Kate Ryan album)Parameter (computer programming)Variable (mathematics)Object-oriented programmingComputer animation
Point (geometry)Run time (program lifecycle phase)Semiconductor memoryObject (grammar)Revision controlParameter (computer programming)Structural loadCellular automatonMultiplication signDifferent (Kate Ryan album)Functional (mathematics)NumberRegular graphBitVariable (mathematics)View (database)Code
PredictionUser profileDemo (music)MereologyProfil (magazine)Point (geometry)Multiplication signSystem callTerm (mathematics)CodeForm (programming)2 (number)FeedbackLine (geometry)Total S.A.Parameter (computer programming)Latent heatField (computer science)Semiconductor memoryGraphical user interfaceFunctional (mathematics)Loop (music)QR codeProcess (computing)Cartesian coordinate systemLink (knot theory)Right angleRepository (publishing)Electronic mailing listLibrary (computing)WebsiteSpacetimeSlide ruleResource allocationMathematical optimizationObject (grammar)CASE <Informatik>Different (Kate Ryan album)Computer fileSoftware testingQuicksortLevel (video gaming)Musical ensemblePredictabilityStatisticsResultantComputer animation
Transcript: English(auto-generated)
Thanks. Hi, my name is Lukasz Konkol and I will speak about performance optimization and profiling as an examples I will use comparison of Pythonic code and patterns Inherited from other languages like C, C++, Java
Yeah, I will try Okay, let me introduce myself a bit I'm working as senior Python developer at STX next. It's Python software house located in Poland
I have also gave my talks at the Python UK last year and PYSS, which is Python conference in San Sebastian in Spain And I have also shared my knowledge during Local meetups in Poznań in Poland, but what's most relevant for the topic
I'm really interested in a Performance optimizations, and I'm really fascinated in it Let's check up the agenda. I will first introduce you shortly
to the topic of the performance aspects then we would have a live demo and Then I will summarize it. I talk shortly and at the end We would have some time for questions But before we get started
Let me get to know you a little bit better Who have ever used any profiling tool? Okay Who is using that frequently like on daily basis in daily job Okay, not that much
And who is just starting a journey in Python Okay And who has used any other languages and then Python like CC plus plus Java Okay, that's much Okay, so let's get started
What are the base aspects of performance? First of all, it's CPU We all want our code to be as fast as possible That's what our client and our end user most cares about Google statistics says that
Then that more than the half of the users will abandon the website if it takes more than three seconds to load and moreover each single second calls customer satisfaction to drop by about 15% and conversion to drop by
around 5 to 10% Another thing is memory The truth is that we don't care about the memory until we run out of it and I must admit It's fair approach because why should we care about something that is not affecting us?
but We should be still aware that any memory leak can affect our software for example in Is On daily basis in the testing environments. We don't use as much fixtures as we would have in production So the scale makes a really difference here
Last factor is input output of operations And for example database operations or file system this may massively influence the first aspect so the
So the time of loading an application or running But it also has another side effects There are specific limit limitations or on the simultaneous Input output operations For example database has transactions which may lock and hold each other
File system has limits on number of open of simultaneously open files And moreover some services like Google App Engine provides resources like database up to certain quota Then each read and write to database over that limit costs you real money
What should we do to optimize our code in a good way? First we have to plan and predict our approach Implementation and the data structure which we will use
And predict which which one will give us the best results Then we should profile our code It would be good to provide a few proofs of concepts Of course if we can we should use some fake data just to indicate which one would be
the best But that's not all once we ship our code We should monitor the performance in living ecosystem Keep in mind That you will not be able to test each data set
You can be almost sure that your end-user Will have some edge case you could not invent on your on your testing Environment And note that this may be also related to the
Tech stack the user is using Once you have production benchmarks collected you should identify bottlenecks and quick wins And it really depends on each specific case
You should then take a look into optimizing your code again You should first look into quick wins and bottlenecks But should we optimize Everything of course not there is no point in optimizing
something that will give you like 5% gain and it will cost you a few weeks or even months of work So about profiling tools there are a lot of them available in a piping I
Will just introduce a few of them, which I will use during the demo First one is C profile it's to inspect the CPU usage as divided by functions
That's part of standard library And it's most accurate of the available tools Next one is memory profiler. That's third party library. It's available on piping It's better than other options, but it's still not perfect
Results may vary a bit which I will present you later during the demo And it also takes some time to profile with the memory Next one is sis. It's built in library It provides a provides us with
low level operating system API which we can use to to inspect for example CPU usage of the process or the memory and Last tool is this It's also built in our library We can use it to disassemble the Python code
and Inspect it on lower level Okay, so now it's time for demo is the process font size, okay
Okay, okay, so let's start with the first example
First I will show you the usage of the tools. I I will I will use during the presentation
So first one is the function which I will use profile the CPU usage So I'm creating the list I'm creating the second list and then I'm deleting the first list and returning the
Second list Okay, and how do we profile it? We can we should import to the C profile Then import the function which we will use and
Provide it as a string So it can get evaluated
Okay, so we can see that
Six functions has been called during during the execution of the main function It took zero point zero two seconds And we got number of calls of each function Here in the last column. We have the file name line number and the function name
Yeah, we got the number of calls of the of that function we have total time of all the calls Then we got time per one single call average time Then we got the cumulative time Which is from the start up to the end of this function?
Yeah, and cumulative time per call, okay We can also pass some additional arguments to C profile But I will not do it for now. I just want to show you the simple cases of
CPU profiling, okay Another Another tool which we can use for profiling CPU is time it but it's not so accurate as C profile mostly because of
Garbage collector not being crumb in Time it so let's run the second example Okay, we got one second
one point eighty five seconds it's The time of running the 100 retries on This example. So what we provide here is the real function to
Measure measure and the setup string which also will be evaluated before running the The real function to to profile Time it is also available
With the dash M time it then we should provide the setup string and Then the string with the real function to profile this will gives us a bit more readable
Result so we got the information that we have run
100 loops best of three of the of these loops is 18 point five milliseconds And we have also nice plug-in for for time it in IPython
We should just import Import the function and then we use percent sign time it
CPU profile and here we will have even more user-friendly way of showing the results We will also have the standard deviation
Yeah, so we got average time here and then the standard deviation Here It has been run in 100 loops and it has been around seven times. Okay, so that's about
the performance CPU performance and Now let's go to the memory profiling so on the function We need to profile We should call the decorator
We get we got We should import memory profiler first then Use the profile decorator from this library And we have a few optional parameters like precision. Yeah, that takes a bit longer
it probes the memory usage of the process after each line So here we got the code we're profiling we got the starting memory usage And we got increment and the total memory usage here. So here we can see that
first list took About one point eighty five megabytes second one to one point eighty seven megabytes And then when we release actually the least a
We get back one megabyte of memory What that may mean? That We got a garbage collector didn't run yet or there is one more thing
Some small integers are being cached by by Python So these are just few of the reasons, but there are definitely more and moreover there is We should also keep in mind that it's just
Probing so it might not be a hundred percent accurate accurate, okay? Another example will be greatest common divider So
We will disassemble it and see what the result will be okay All these codes are listed in documentation of this library So we got here the we're setting up the loop
We're loading the Y variable And then if Y is false we will jump to line 24 So over here
Okay in next line we're loading X and Y We're calling binary modulo function, then we store it in temp variable Next line is assignment of Y to X
So we're loading loading Y and storing it in X Next line is assignment of temp to Y We're loading temp and storing Y and we got the end of the loop, so we're jumping back to
the second second line over here and Then we're loading X and returning it. So that's the simple example of disassembled Python code We'll also use it later for
more interesting examples Okay another thing is what what Another thing I want to talk about is the reliability of profiling so we'll call
both CPU profiling and memory profiling three times and see the results
Yeah, note that We got three different times Keep in mind that it might depend on many factors like another processes Being run in the background. I Tried I tried to isolate that virtual machine as much as I could
But But it still differs between the calls About the memory. We also got different results here First
First reason behind it is as I said before Python caching and It's mostly visible in difference between first call and the other tools So here we got different result and here we got the same result at the end The increment still differs, but it's because of caching
In Python, okay another
Example is Creating lists by list comprehension by appending the result and by extending the list
so definitely the fastest is List comprehension Then four times slower is Creating least by appending each subsequent item
and a bit faster is extending the least so whether you would have the possibility to extend use it because it's Small game but in the scale it can really differ
next example is comparison of data types Tuples lists sets and the dictionaries. So the fastest is the least
Then we got sets Tuple is much slower than Least and set and dictionary is Even more slower is the slowest of these collections, but we should keep in mind that
Dictionaries are Keys and values pairs. So the construction of it takes time moreover. There is a hashing function which is hashing the keys and We can see also the size of of these collections. So smallest one is is tuple
Then we got least set and dictionary Set is much larger than than tuples and the least It's almost four times larger
but the reason behind it is that It's just optimized dictionary in the implementation so it just have keys and Dummy values and It's using just just the the keys of the of the dictionary and that's how set is implemented
Okay, next example is using the Values of the two lists iterating by indexes
Zip function and Using a dictionary so zip function is the fastest one
then we got dictionary and Then we got iterating using indexes of two lists, so zip looks like the best option, but It's really not so usable in most cases
but Dictionary is still fine next example is Checking if element is contained within the list So first one is just in keyword
Second one is running through all the list and checking if the element is present And the last case is a binary search over the list So we're splitting the list in
two equal parts and comparing if The element is present in left or on or in the right part of the list then doing that recursively Note that List must be sorted before running that operation So I
Compare all these checks For two cases first one is positive case, so we will find this element in the list and Second one is that we will not find this element in the list. Okay, so
First one is in keyword So we can see that it runs two times slower if element is not present in the list
So the implementation here is a bit optimized for loop over the elements Then we can see the for loop over the over the elements implemented in Python
it's in positive case that's 1.6 seconds and for loop is 2.5 seconds and it also keeps the the Trend of Positive to negative ratio, so it's still two times slower and
last one binary binary search It is definitely the fastest one. So in keyword is Pythonic way to check if element is present, but if we really care about performance and really need to run it fast
binary search is the best option here, even if the Implementation is more complex, but it's still not that scary, right? It's just Less than 15 lines So if we really care about performance, we should use more complex solutions
But which are more efficient okay, the next one is swapping the variables first example shows the swapping using temporary variable and
second one is swapping tuple so assigning tuple of y x to x y Okay, so
swapping Using temporary variable appear to be faster now Okay That's not what I expected that might be as some Some process running in the background which is interrupting that yeah again, but different is not so
So we can now probably if I run it again It will finally give expected results. No. Yeah. Yeah, it's it's finally finally gives the smaller
Value for swapping tuples, so let's see how it's How it looks in this assembled code We got Here it's it's swapping tuples. So we're loading x and y and x then rotating these two
Storing in storing x and storing y and then returning known While here we got load fast store fast load fast so fast and again load fast or fast, it's Three times load and store instead of to store and load and store
Next example is efficiency of string construction. First one is f string Introduced in Python 3.6 and next one is formatted string and the last one is
Percent percent formatted string so f strings is are definitely fastest
Then we see that Percent formatted strings are in the middle and Format is longest. Yeah, it's nice to use but it's slow But now in Python 3.6. We got f strings. So let's use it
okay, so we got nested loop here and What I will do is iterate
over the smaller Portion of data in outer loop then Iterate over larger amount of data in outer loop and then split it equally between outer and inner loop so in all these cases we got the same number of
iterations, okay, so maybe a Let's guess which one will be the fastest so the first first example who thinks that it will be the fastest
Okay, second example. Okay, and the last one Okay Not everyone plays a bet but okay Let's run it. Okay. So this one is the fastest one. So second one
So iterating over more items in outer loop
Which I will expect otherwise, but it's up here. It's not Yeah, and there is not much difference between The first and the last example Yeah Let's see at the disassembled code because it may explain a bit more
So here we're starting first outer loop Here we got we start we're starting In a loop here. We're finishing the inner loop and
Here we're finishing outer loop. So It's really a What we would expect is that iterations over inner loop will be faster But it appears that the iteration over outer loop is faster
why is that because here we're We're jumping To the inner loop and here we're jumping to outer loop
So if we jump over here frequently It's a bit slower than jumping over here Not sure if that's
If if that explains it, but yeah It's it's pretty pretty hard to explain on that disassembled code But The fact is that More
loops in outer loop is more efficient than Than in the inner loop. Okay next example Using global variable versus using parameter So again who thinks that the global variable would be more efficient
Okay Who thinks that parametrized variable more efficient, okay Let's see we're frequently using
The Global variable But it's not so efficient as parametrized version it's We will see their Reason over here in disassembled code, so their only difference here is that
In global global variable is loaded with load global instead of load fast over here So that's the only difference And it appears twice Where we load this global variable or X parameter of the function
Okay Next example is slots who is familiar with slots Okay, so I will explain it a bit deeper so slots are the least that
Parameter names that Object will be restricted to so if we got slots X over here We cannot dynamically assign a cell of dot Y for example
It will just throw the runtime error. We will not be able to do that okay, so But that sounds like an Restriction for us, but what does it mean from for from performance point of view?
It's Definitely faster with slots and as we will see in the second It consumes a much less memory than just the regular object without slots So we can see that it's three times more memory here used
The reason behind it is that objects without slots Will over allocate the memory So that we can we can see it over here, it's three times more That may of course differ based on
Different different number of slots if we for example gives a give here like five five five variable names or ten it will differ Okay, so that would be all from demo part let's summarize it
Or plea okay, so first Predict what would be the best solution based on your experience?
Based on the examples you just saw or just based on your hunch Predict but to have preliminary data and don't trust it at all always profile your code
Predictions might be misleading When I was preparing examples for this demo, I encountered a lot of surprises We have also seen seen one. I have to rerun this three times because Data data was Not perfect enough. I would say
So things do not do not behave as we expect And Yeah, we should always profile our code Even if you think it's fast enough or fastest Just check it for your own
For your own peace of mind, right and the last thing calculate return on investment So try to find the best ratio between possible profit and optimization cost There is no point in spending weeks on optimization, which will give you in unnoticeable improvement
Hey, thank you for for your attention You can find slides on the left side in the link or QR code. I Will also put it in on our Python website. I
Have also pushed the code snippets to GitHub links also available here It's Repository is private for now. I will make it public right after the talk and I also appreciate any feedback about my talk so I can improve myself to give better talks
There is a link on the right side to the feedback form. It's simple anonymous form I would really appreciate if you spend a few seconds to share your opinion Thank you very much
So we have time for a couple of questions, I see their hands Thank you for the talk I was wondering is there any easy way in C profiler to give kind of like a person's text So there was total time there was cumulative time. Is there anything where you just give a parameter so that
One line or one allocation took let's say 80% of the total time something like that Just not to calculate the whole thing. Yes, so that might be not so visible in the examples
but there is total time of all the calls or total time of or time per call for the function you can also Give a parameter to C profile to sort the results by specific column Okay, so we can sort for by total time or time per call and also one short question
So in terms of that slot the your example didn't allocate anything to a particular slot So let's say if you allocate a hundred megs or like read a file Of the same size with slot and without slot. Do you think will make a difference? Yeah, definitely because
It's it's always good to profile the specific case because that depends what you're putting in the in this variable but in general Object without slot is over allocated so it will just allocate some some memory Right after the object to have some space to some dynamic assignments
Thank you Who's okay?
I've got two questions actually. The first one is are there any graphic tools for the for the profiling libraries you've shown? Yes, there are graphic tools. I didn't list them because I wanted simplest tool which will be most accurate but there are many tools you can just google it and
Yeah, there are a lot of them. I checked like week or two weeks ago There were a lot of them. All right. Thanks. And the second question is Since the profiling you've shown us is not really reliable and not hundred percent reliable Should we rerun it multiple times to get a better?
Understanding of how the how the application performs. Yeah. Yeah when I run the memory profiler a few times in the loop, for example It gives it stabilizes with with the time So first run is like a bit inaccurate But then it's stabilized and gives at least the total memory usage on the stable level then increment also
Stabilized like we see we saw in the example with when I run three times the same function with with memory profiler Total total memory for the process has stabilized in the second and third
line but Increment still did not stabilize but it will stabilize at some point Because it probes the memory
By inserting the the probes into the code Okay, we have to stop here. Thank you very much for a talk