We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

00:00

Formal Metadata

Title
Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask
Title of Series
Part Number
95
Number of Parts
119
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production PlaceBerlin

Content Metadata

Subject Area
Genre
Abstract
Piotr Przymus - Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask Have you ever wondered what happens to all the precious RAM after running your 'simple' CPython code? Prepare yourself for a short introduction to CPython memory management! This presentation will try to answer some memory related questions you always wondered about. It will also discuss basic memory profiling tools and techniques. ----- This talk will cover basics of CPython memory usage. It will start with basics like objects and data structures representation. Then advanced memory management aspects, such as sharing, segmentation, preallocation or caching, will be discussed. Finally, memory profiling tools will be presented.
Keywords
KnotRead-only memoryBerlin (carriage)Spherical capComputer clusterRead-only memoryWordComputer animationJSONXMLUML
Student's t-testPersonal digital assistantCovering spaceRead-only memoryMultiplication signWordNatural numberLecture/Conference
Integrated development environmentPerformance appraisalPersonal digital assistantPlot (narrative)Student's t-testDatabaseData miningMathematical analysisSimulationProcess (computing)Scheduling (computing)Query languageAlgorithmHand fanComa BerenicesNormed vector spaceRegulärer Ausdruck <Textverarbeitung>Object (grammar)MaizeData typeUnicodeInterior (topology)LengthNumberNumerical digitBitTable (information)Type theoryDifferent (Kate Ryan album)Extension (kinesiology)ImplementationLatent heatOverhead (computing)System callPositional notationElectronic mailing listResource allocationIntegerTotal S.A.Wide area networkVirtual realityVariable (mathematics)Rule of inferenceMathematical optimizationPoint (geometry)Exception handlingRead-only memoryMetropolitan area networkDemosceneRange (statistics)String (computer science)TupleSource codeCodeLatin squareSingle-precision floating-point formatSummierbarkeitObject (grammar)Multiplication signSpeicherbereinigungMathematical analysisData miningDisk read-and-write headTupleProcess (computing)NumberDependent and independent variablesDifferent (Kate Ryan album)Data managementPhysical systemAutocovarianceMathematicsComputing platformCartesian coordinate systemElectronic mailing listBasis <Mathematik>String (computer science)MereologyStudent's t-testLengthForcing (mathematics)Mathematical optimizationConstraint (mathematics)Read-only memorySet (mathematics)Resource allocationWebsiteSign (mathematics)Form (programming)IntegerSheaf (mathematics)Social classPattern languageAdditionPoint (geometry)ImplementationSymbol tableLatent heatOverhead (computing)Strategy gameUniqueness quantificationMixed realityType theoryLibrary (computing)Open setException handlingPresentation of a groupSoftware testingNeuroinformatikArithmetic meanProgrammer (hardware)ResultantVariable (mathematics)View (database)SoftwareIntegrated development environmentHypothesisPerformance appraisalProduct (business)Element (mathematics)Rule of inferenceUnicodeFormal languageRange (statistics)WordSource codeDatabaseAlgorithmInterior (topology)Well-formed formulaProjective planeInstance (computer science)Stochastic processScheduling (computing)Computer simulationCodeEmail
Virtual realitySummierbarkeitQuery languageUniform resource nameString (computer science)Table (information)Data dictionaryKey (cryptography)Computer programIntelPairwise comparisonSocial classModule (mathematics)Pointer (computer programming)Instance (computer science)State diagramAttribute grammarOvalStrategy gameResource allocationRead-only memorySet (mathematics)Electronic mailing listReduction of orderFunction (mathematics)NumberSystem callMathematical optimizationRead-only memoryThresholding (image processing)Coma BerenicesFreewareValue-added networkElement (mathematics)SpacetimeLengthOperations researchBinary fileBus (computing)WebsiteShift operatorComputer clusterMemory managementGroup actionArc (geometry)Personal area networkReal numberHash functionCodeChannel capacityRepresentation (politics)Object (grammar)Field (computer science)Lemma (mathematics)CountingMereologyCausalityCycle (graph theory)Programmable read-only memoryAliasingLocal ringOrder (biology)Positional notationUser profileWeb pageProcess (computing)Physical systemComputing platformLetterpress printingSampling (statistics)Inversion (music)Read-only memoryElement (mathematics)Channel capacityObject (grammar)View (database)Integrated development environmentEndliche ModelltheorieConvex setOperator (mathematics)Core dumpFreewareSet (mathematics)Order (biology)Sign (mathematics)Term (mathematics)Data dictionarySocial classMathematical optimizationResource allocationMultiplication signVideo gameFunctional (mathematics)Student's t-testOverhead (computing)InternetworkingNumberSpacetimePairwise comparisonInformationProcess (computing)Extension (kinesiology)Computer architectureLemma (mathematics)Cycle (graph theory)Physical systemPointer (computer programming)Point (geometry)Group actionGraph (mathematics)Electronic mailing listRepresentation (politics)Type theoryField (computer science)Read-only memoryString (computer science)Office suiteTask (computing)Software developerTupleAuthorizationStrategy gameTable (information)CountingComputer programmingSpeicherbereinigungMaxima and minimaSystem softwareOcean currentKey (cryptography)CodeDifferent (Kate Ryan album)Revision controlAttribute grammarInformation overloadInstance (computer science)Cross-platformProfil (magazine)MathematicsMultitier architectureBoolean algebraSource codeXML
Letterpress printingRead-only memoryModule (mathematics)Read-only memoryDebuggerSpecial unitary groupCAN busLine (geometry)TrailFunction (mathematics)Plot (narrative)Inclusion mapMountain passGraph (mathematics)Object (grammar)Graph (mathematics)Supremum3 (number)Electronic visual displayMemory managementInformationSet (mathematics)Mathematical analysisSchmelze <Betrieb>Integrated development environmentSimilarity (geometry)Graphical user interfaceMereologyComputer programmingArmRing (mathematics)Absolute valueInterior (topology)Network operating systemCountingMIDIExecution unitSocial classTrigonometric functionsMaxima and minimaTotal S.A.System on a chipCore dumpOnline helpImplementationDifferent (Kate Ryan album)Personal digital assistantLibrary (computing)Structural loadMathematicsPhysical systemVarianceValue-added networkSummierbarkeitAsynchronous Transfer ModeStatement (computer science)LeakExtension (kinesiology)World Wide Web ConsortiumBuildingConfiguration spaceLevel (video gaming)Revision controlInteractive televisionResource allocationPhysical lawMetropolitan area networkGame theoryPerturbation theoryRootCausalityProcess (computing)Row (database)SpeichermodellInformation technology consultingPrincipal idealJames Waddell Alexander IIRed HatSource codeComputer configurationRead-only memorySimulationMultiplication signMereologyFormal languageMiddlewareLoop (music)Projective planeMaxima and minimaProcess (computing)Overhead (computing)Different (Kate Ryan album)Presentation of a groupCodeObject (grammar)WebsiteMultiplicationWeb-DesignerInformationInsertion lossBlock (periodic table)Reduction of orderOperator (mathematics)Direction (geometry)ResultantRankingActive contour modelProof theoryForm (programming)Internet service providerFrequencyLibrary (computing)Host Identity ProtocolPhysical systemNumberType theoryTask (computing)Profil (magazine)Cartesian coordinate systemPlotterLine (geometry)Cycle (graph theory)Game theoryProduct (business)Functional (mathematics)Information retrievalIntegerUser profileMoment (mathematics)Set (mathematics)Endliche ModelltheorieRead-only memoryRootConvex functionRevision controlService (economics)Asynchronous Transfer ModeBitExtension (kinesiology)CASE <Informatik>Server (computing)Software testingMathematicsPiBuildingCombinational logicZoom lensImplementationDebuggerThresholding (image processing)CausalityLeakMathematical analysisGraph (mathematics)Boolean algebraMemory managementCore dumpComputer animation
Process (computing)Revision controlObject (grammar)Multiplication signVideoconferencingResource allocationMemory managementFreewareRead-only memoryLevel (video gaming)Interpreter (computing)BitDifferent (Kate Ryan album)SpeicherbereinigungLibrary (computing)Physical systemConnectivity (graph theory)Read-only memoryCodeSolid geometryMoment (mathematics)Computer animationLecture/Conference
Transcript: English(auto-generated)
Now we have a talk from Piotr, Piotr Primus, difficult name, sorry, sorry Piotr.
Everything you always want to know about Python, about memory in Python. So yeah, but we're afraid to ask. I think, have fun, and I give the word to Piotr. Thank you very much.
Thank you very much, and at the beginning I have to say that's bad news for you. It won't cover all the topics about Python memory, because the subject is too complex and I had to choose something. But I will try to do my best and probably I will run out of time.
And so if you have any questions, please catch me during the lunch, and after lunch I'm going back home. Okay, so a few words about me. I'm still a PhD student and I work as a research assistant at Nikolas Kupernikos University. My main interests and scientific interests are databases and GPU computing.
I try to combine these two. And also I did some stuff with data mining. I have at least eight years of Python experience. And I did some projects with Python. I list here three of them.
So I was working with, I was responsible for preparing parts of a trading platform for an asset management company. It was mostly backtesting and trading algorithms. I also was responsible for preparing muscle biomonitoring analysis and data mining software
for laboratory use. And now we are thinking about commercializing it. And for my PhD thesis I prepared a simulator of heterogeneous processing environment for evaluation database query scheduling algorithm. And I mentioned these products here on purpose because all of them had something in common.
They all were memory intensive. They were long-running. And during these computations they tend to grow with memory. And at some point I decided that I have to know something more about how the Python manages the memory and what are the size of different types
and what are the strategies for allocating different containers. So this will be mentioned in the first two sections. And later I will try to say a few things about memory profiling tools. So let's start.
Here's some basic stuff. I have a C, C++ background and I also teach this language to students. And after, I don't know, first two months they already know what are the sizes of different types in C or C++. And in Python this knowledge isn't required for you.
So actually you don't even have to care about what is the size until some point. So when your application is large enough and it allocates a lot of memory, then you start to think about what are the sizes of different Python types. So I won't go into details with this table.
I'll just show you some interesting stuff. Okay, one of the interesting stuff is that long in Py2 and int in Python 3 are actually limited by your memory. So as far as you have enough memory, you can allocate large enough number.
And you have to also note that these types are pretty large if you compare them to C or C++. But that's because there is an overhead of garbage collector and other things.
And it's also a word to note how the strings and unicodes are represented in memory. So we have a pretty large header and we also have to pay two bytes for each element in your string on unicode. The same goes for tuples where you actually have even larger header
and you have to pay four and eight bytes for each element. So you can do it yourself. From Python to CX you have sys.getSizeOf and you can put anything in there and you'll get size in bytes with some restrictions.
So all the built-in objects will return your correct results. Both of you saw third-party libraries. It might return some crazy stuff, so be aware of that. And actually it calls the sizeOf method and it will add the additional garbage collector overhead
if the object is managed by the garbage collector. So let's do something more interesting. Here is a fun example because creating a place is fun. Here we create two lists with the same size
and here we create numbers with this formula and here is exactly the same number plus something more. Do you think that there will be any difference between memory allocated between listing one and listing two?
So this is a fun example, so yes, there should be some difference. And as you can see, the size of the first list is actually less than half of the second one. So this is because of object interning. So what's that?
In Python there is a general rule for creating objects. So when we create an object and assign it to a variable, so this object is created and assigned, so the variables just point to this object, so they do not hold the memory. And interning of objects is an exception to this rule.
It's mainly due to the performance optimizations done in Python and it's highly implementation-specific. All the examples from this presentation are from CPython and actually it might change over time
and there was at least one change in the Python implementation about object interning during the time. So what is the interning object? So often used objects are pre-allocated and instead of creating every time that we say a equals zero,
so the zero won't be created all the time, it will be shared among all the instances. So here we have the code that visualizes this. So here we assign zero to a and b and if you write a is b we will get true
and of course value is also true. And here in this example you can see that we assigned a large number and a is b will return false and the values are of course the same. Actually someone showed me an amazing test from two days ago
and there was a similar question there but still this is highly Python implementation-specific. So let's talk about something more about the object interning. First of all, warning. I will say this once more. This is Python implementation dependent.
This may change in future and probably will. This is not documented in the Python documentation for programmers. If you want to reference for those values here, you have to consult the source code. Okay, so in C Python 2.7 to 3.4,
we got object interning for integers from this range. We also have object interning for strings and unicode in Python 2 and Python 3 and unicode and strings in Python 2 and Python 3 and the interning will be for empty strings
and all strings that are lengths of them is equal one and with the restriction for the unicode for only the Latin one symbols. And also empty tuple is another example of object that will be shared among.
Okay, now something a little bit different but still interning. It's string interning. So we start with a simple example. We will create two strings, almost the same. Here we will add the missing letter to the A and try if it's the same
and we will get false, of course. But if we use intern, this is for Python 2, and try this one and we will get true. So let's try to use it for something evil. We will create a large list with those strings
and as we can see we got 57 megabytes of resident memory used and if we do the same with the intern here, we actually reduce the memory usage. But what's actually happening when we use intern?
So string interning is almost Wikipedia definition. It's a method for storing only one copy of each distinct string. But we have to remember that they should be immutable. So for Python 2 and Python 3, we have a function intern
and in Python 3 it was relocated to the sys module, so we have sys intern. And if we use this function, we will actually enter a string into the table of interned strings. We will get a reference to the interned string. It might actually be the same string if it was already interned
or it might be a copy of the string. And when can we use it? So we can get this also from documentation. We can get a little performance on dictionary lockups. And some of Python names will be automatically interned, so in programs and actually the dictionaries that hold module, class
and instance of attributes have interned keys. And as the previous example, we can also reduce the space used if we have a lot of same strings in our code.
Okay. So let's say something more about mutable containers. There are different mutable containers in Python lists and dictionaries and actually behind the scenes, there is a strategy for allocating these containers.
So a good strategy will try to prepare for growth or shrinkage. So to prepare for growth, we will slightly over-allocate the memory, so each time we append an element to a list, we won't have to relocate the memory in our system.
So we leave the room for growth. And we also have to remember that sometimes we have to shrink the allocated memory for a mutable container. So this will reduce the number of expensive function calls like reloc, non-copy and so on. And of course, we will try to use an optimal layout for performance reasons.
So let's start with a very, very simple example. This is a list. First time we put an element into the list, we will get an allocation, but not for one element, but for four elements. And after that, if we append something or change something in the list,
we will have a free append, so it's memory operations free. So we can put another element, another element, another element. When we put fifth element, we have to reloc. So Python will relocate the array for our list,
and we will prepare with four more elements. So how does it exactly work? So lists in Python are represented as fixed-length array of pointers. So we just point to objects. And by design, it will over-allocate the list.
So at the beginning, it will be something like that, but for large lists, it will be less than this percentage. So okay, some consideration about performance due to memory actions involved when using lists.
We actually, when we put something on the end of the list, these operations will be cheap. But if we put something in the middle or in the beginning, we will have to copy the memory or shift the memory to perform this operation. It is also important to note that for one, two, and five elements lists,
we waste a lot of space. So if we have a large number of small lists, we will have to over-allocate for more elements. Okay, here's the overhead for allocating arrays, and you have to pay this price for each element for different architectures.
And the shrinkage of the lists will happen when the number of elements that we use will go below the half of the allocated space. Okay, let's talk about allocating for dictionaries and sets.
It's pretty similar, but here we will over-allocate when we reach the two tiers of capacity of a dictionary or set. Actually, for small dictionaries and small sets, we will quadruple the capacity,
and when the set or list is big enough, we will double the capacity to not exceed the memory. Then we will have to calculate actual used size for this object and allocate the memory, and the shrinkage of the dictionary or set will happen when we remove the large number of keys from it.
Okay, so another example. We can represent various data in different ways. So we can use old style class, new style class, we can use slots, we can use name tuples, tuples, lists, and dictionary data.
And I recreated an example from PyCon 2010 for current versions of Python, but I added more objects and added some more fields, and actually can see how they differ for storing the same data
just by defining different types. So as you can see with some restrictions, because when you put slots into your class, you've got a lot of restrictions for this class, but you can gain a memory minimization boost. You get less memory used for those classes.
Okay, some notes on garbage corrector and reference count in Python. So actually, as you probably all know, Python has a garbage corrector, and it will collect objects when the reference count goes to zero. There are some operations that will increment the reference counter.
There are some operations that will decrement the reference counter. But there is a warning, and it's put into the official documentation that if you actually overload dell method, you can have problems, because Python garbage corrector currently can deal with cycles in object references.
But when you use dell methods, it's not possible for Python to guess the correct order of using the dell methods for the objects in the cycle. So actually, the cycle won't be deallocated from your memory.
Okay, I have some more time. So I'll talk about some tools that you can use for Python memory profiling. Let's start with psutil. It's pretty simple.
It's actually a cross-platform API for system utilities. And actually, to get some information about the current process memory, you can just use the psutil process, get a bit of your process,
and transform this information into a dictionary. And then you can return those simple informations. For most of the examples, I just use the code, because it's most reliable for this purpose. And another tool is memory profiler.
And it recommends to use psutil. So it's good to have the psutil as a dependency. It will work faster. And memory profiler might work in three different modes. So you can get a line-by-line profiler, you can get memory usage monitor,
and you can use it as a debugger trigger. So let's start with line-by-line profiler. You have to put in your code a profile decorator on the function you want to profile. And then you can run it with something like that. Of course, it should be the name of the code.
And then you will get such results that you will get line-by-line memory usage and the increment from the memory usage for each line. And here we see that the for loop is the main memory contributor. And the second way that we can use memory profiler is by using it as memory usage monitor in time.
So we will just monitor the process memory usage in time. And actually, you can use it for any type of process, not only for Python. But if you want to use it with Python, you should put profile decorator
for functions that you want to track and run it with the option Python. And here I run some simulation. And here is the result. I got a plot. And here I get the connect function marked as the one that does the operations here.
So I see that probably connect is responsible for the growth from here. And the run function is marked here. And as we can see, it doesn't change a lot from our memory. And the third option for memory profiler
is to use it as a debugger trigger so we can set up a threshold of used memory and run our process. And then we will step into the debugger when we reach the memory that we set as the threshold.
Another tool is object graph. It's a cool tool for visualizing object references in Python. And actually for small projects, it's pretty cool because you get such plots like this. It's a good tool for finding reference cycles in your code.
But if your project is large enough, the plot that will be generated will be pretty large and it will be hard to track something there. But with some code manipulation, all of this is in the tutorial of object graph. You can actually track down the object reference cycles
pretty easily with this. Okay, the next two tools probably there will be covered more in the third talk from this session. Hippie and Maria. They are a hip analysis tool. They are pretty the same with some difference
that are pretty good described on the page of this project. So let's see what we can do with them. So we actually can run some code and do a hip snapshot here and do some more memory extensive operations and do another hip snapshot.
And we actually can do some arithmetics on those hips and get such results. So we see that we allocated a lot of integers and lists with this one operation. Another tool is a combination of Maria and RunSnakeRun.
So actually you can use this tool to dump all the objects in your code and then use the RunSnakeMem with the dump of memory that you did and get such interactive plot. So you can zoom in, zoom out to see
how the memory is allocated for different objects. Okay, and this is almost the end of my talk. So you can also use different malloc implementations with Python.
It's pretty easy. And you will find probably many block entries about using different memory allocators. And it's got some problems. So you can actually gain with very little ignorance in your code, some additional, I don't know,
better process to system memory retrieval. But it does also count. So actually it might work against you. So it depends from your application type. So if you want to use different malloc implementation,
you have to install of course the different libraries. And then you can run the Python with LDAP preload and with the path to the library you want to use. And you can get different results. I actually prepared some small tests. So my code got some server steps
and I used malloc, gem malloc, and tc malloc with the same code. And you can see how the memory changes in different cases. So as you can see for malloc, malloc is actually pretty good now on Python. A few years ago there were some problems, but now it works pretty, pretty good.
But as you can see with gem malloc you can get pretty the same results and probably for different applications you can gain something. But with tc malloc, for this example, actually you end up with a little bit more memory allocated, not returned to the system. But again, this depends on the application type
of your application type. Some other useful tools. You can always build Python in debug mode. You can use Valgrid with Python. It will pretty good cooperate with it. You can use the experimental extension for gdb.
And probably for most of web developers you can use one of those or those are probably more convenient because it's a whiskey middleware version of cherry pie memory. So this one, and you can just put it in your whiskey and get some memory profiling.
So summary, try to understand better the underlying memory or pay attention to hotspots, use profiling tools. Seek and destroy, this is actually the hardest part. So try to find the root cause and fix the memory leak. So probably the next talk will be about this.
And there are also some quick and dirty solutions, sometimes dirty. So you can delegate memory into another, the memory intensive operations into another process, process it and collect the results and then kill the process or stop the process. And you can actually restart the process
if it generates too much memory overhead. And also you can always go for low hanging fruits like slots or try different memory allocations. Here are some great references that I used when I was preparing this presentation. So give it a try. Some of them are outdated like this one but they give a great insight about Python memory insights.
So thank you very much. Okay, I think we have some time for questions.
So please come to the microphone over there to video team if you have some questions. Hello, I've experienced it sometimes that I've created many objects in Python and then I removed all the references to them and actually forced the garbage collector on. But the system memory still wasn't free.
So does it just take some time or why does Python sometimes not free memory? It's actually made depends from the version of your Python interpreter. This is one. And second one, sometimes the memory allocator will have problems with returning the memory. It's a little bit more complicated but you can try the different Pemul allocation libraries
that I showed and try to see if it will help with your problem. Okay, thanks. Yeah, of course. Do you have any hints? You showed this hippie and so on. I knew already. And do you have any hints how to debug off-heap memory problems?
So what I experienced sometimes using psychopg2, the whole process was using like four gigabytes of memory but hippie only showed very few heap memory stuff. So I guess it was related with something, yeah. So the question is?
The question is do you have any hints how to debug like this off-heap memory problems? So I don't know if you tried the debug version of Python. So compile it with the debug version and then you can see the objects that weren't deallocated by garbage collector
and you can use Walgreens if you want to go low level. So we can talk about it in a moment. Okay. Thank you very much. And now...