We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Is it me or Python memory management?

00:00

Formal Metadata

Title
Is it me or Python memory management?
Title of Series
Number of Parts
131
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Have you ever wondered if Python memory management is playing tricks on you? Starting small, everything runs smoothly. But as your application scales, complexity grows, and memory issues rear their head. You ask yourself, "Is it me or Python memory management?"In this talk, we'll show you how Python memory works, provide tools to analyze memory usage and share practical optimization tips. Whether you're a seasoned Python developer or just starting on your Python journey, this talk is designed to provide you with techniques to overcome Python memory management challenges and write more efficient, memory-conscious code.
Read-only memoryMemory managementSemiconductor memorySemiconductor memoryMemory managementCartesian coordinate systemMultiplication signCausalityComputer animationMeeting/Interview
Router (computing)State diagramDesign of experimentsPoint cloudTrigonometric functionsSoftwareComponent-based software engineeringSicMaß <Mathematik>MKS system of unitsBoom (sailing)InformationArchitectureRead-only memoryNP-hardMiniDiscSolid geometryState of matterCache (computing)Semiconductor memoryData storage deviceTerm (mathematics)Computer fileOperations researchBefehlsprozessorAbstractionMemory managementSpacetimeElectronic mailing listMereologyIntegrated development environmentType theoryObject (grammar)TelecommunicationRAMLarge eddy simulationDifferent (Kate Ryan album)Cloud computingPoint cloud1 (number)Host Identity ProtocolNeuroinformatikComputer programmingSpeicherbereinigungComputer fileSpacetimeObject (grammar)RoboticsMultiplication signMemory managementBefehlsprozessorCache (computing)Computer hardwareComputer architectureData dictionaryType theorySet (mathematics)Data storage deviceMereologyProgramming languageOperator (mathematics)LaptopConnectivity (graph theory)Information2 (number)Computer graphics (computer science)Line (geometry)WordExecution unitInternet service providerCodeVideo gameCore dumpSemiconductor memoryLibrary (computing)TwitterVirtual machineProcess (computing)Right angleIntegrated development environmentSoftwareState of matterMathematical optimizationCartesian coordinate systemTelecommunicationFreewareHard disk driveAuthorizationMachine learningAbstractionLecture/ConferenceMeeting/InterviewComputer animation
Read-only memorySpacetimeIntegrated development environmentComponent-based software engineeringType theoryObject (grammar)Maß <Mathematik>TelecommunicationSolid geometryState of matterMiniDiscNP-hardRAMCache (computing)Semiconductor memoryMemory managementStrategy gameVariable (mathematics)Quality of serviceElectronic mailing listIntegerObject (grammar)Electronic mailing listSet (mathematics)CountingType theorySemiconductor memoryVariable (mathematics)Computer programmingPointer (computer programming)Point (geometry)Different (Kate Ryan album)Representation (politics)High-level programming languageCategory of beingPrice indexFunctional (mathematics)Statement (computer science)Logical constantCASE <Informatik>Strategy gameInteractive televisionField (computer science)WordThread (computing)InformationSpeicherbereinigungLevel (video gaming)Vector spaceData typeSpeicheradresseModule (mathematics)Address spaceMultiplication signDataflowMemory managementInstance (computer science)LengthTupleSpacetimeMereologyState of matterConnectivity (graph theory)Execution unitCodeMathematical optimizationSocial classNeuroinformatikVirtual machineLoginComputer animationLecture/Conference
CountingModule (mathematics)Read-only memoryThresholding (image processing)Pointer (computer programming)StatisticsSemiconductor memoryObject (grammar)Sweep line algorithmVariable (mathematics)RootLocal ringThermische ZustandsgleichungIntegrated development environmentSpacetimeSpeicherbereinigungMemory managementElectronic program guidePauli exclusion principleCASE <Informatik>Object (grammar)Electric generatorSemiconductor memoryModule (mathematics)MereologyTemporal logicRouter (computing)Projektiver ModulSpeicherbereinigungCycle (graph theory)CountingFerry CorstenParameter (computer programming)Source codeCodeComputer programmingPoint (geometry)Memory managementThresholding (image processing)Level (video gaming)Variable (mathematics)Functional (mathematics)Social classElectronic mailing listCartesian coordinate systemStrategy gameRadio-frequency identificationLine (geometry)CodeLocal ringRootSet (mathematics)CircleInterior (topology)Image resolutionGoodness of fitWebsiteMultiplication signStatisticsDifferent (Kate Ryan album)Type theoryInformationPointer (computer programming)Pattern languageRepresentation (politics)State of matterArithmetic meanPower (physics)Computer animationLecture/Conference
Receiver operating characteristicAddressing modeSineComputer programmingMemory managementMultiplication signSemiconductor memoryCodeLevel (video gaming)Mathematical optimizationVariable (mathematics)HTTP cookieInteractive televisionTupleObject (grammar)Electronic mailing listCASE <Informatik>Data structureComputer configurationDemosceneMechanism designType theoryLecture/Conference
Cone penetration testHTTP cookieMultiplication signLecture/ConferenceComputer animation
Transcript: English(auto-generated)
Thank you, everyone. So is it me or Python memory manager? Today we're going to talk about Python memory management, as it was three times already. And sometimes when we build up our application, we start to run and everything goes well.
And after some time, we start to see some performance decrease. And then we want to understand the causes of that. And today we're going to try to dive in some of these issues and try to get some answers. My name is Liza Shoa, and I'm here with Yulia Badabash.
We are both cloud engineers at Nord Cloud. We work with different cloud providers and provide cloud solutions. And our daily life, we work a lot with Python. And sometimes we face issues related to our code performance. So let's talk about performance.
But before we dive in the computing world of performance, let's try to understand what's performance in the human world. So I have here a fictional character called Bob. I guess I hope there is no Bob out there, but it's not you. And Bob wants to be an athlete. There are some things that could perhaps affect Bob's
performance, although Bob knows the technique, knows how to swim. There are some stuff that could happen in external environment that could affect him. For example, maybe he can't sleep. He will have a talk at a little Python. Maybe the weather is bad, not great today, and could affect his health.
Maybe he's low in vitamin D. And maybe he's not sticking to his diet. So here is just an understanding that he's not taking care of his hardware. He's not taking care of his body. And he could affect his performance, although during the execution, he could run as well.
So a lot of times when we talk about software and when we talk about Python, and there will be great talks today and by the end of this conference about optimization with Python. And we always try to save that microseconds and trying to find these little times to save it. But we should not forget that there is no software
without hardware. And Python lives inside all laptops and computers. And the performance depends on the hardware. But let's try to understand some of the components of hardware, specifically the communication between computing units and memory units, for example.
So let's go over some computing units, components of hardware. So first of all, I will talk about computing units, units which are responsible for the execution of operations. A lot of times when you're going over interviews, they show a code, and they try to ask you to check if the code is faster than the author
just based on the code instruction. But computing units is also related to the speed of the instructions in execution. Some of the types of computing units would be CPU. The CPU is known as the brain of the PC.
It is very good for serial process. It has low latency and can do a handful of operations at the same time. And it has a couple of cores. But right now, it's also a lot of trend on GPU, which is Graphical Process Units, which
is related to the qualities of GPU. That is high throughput. It's very good for parallel processing. You will see that GPU is being used a lot in machine learning. For example, libraries like TensorFlow and PyTorch, they all have support to use the GPU. And GPU works with many cores and can do thousands
of operations at the same time, although those operations are more simplified. So if I say that CPU is the brain, I'll say that GPU would be the soul of the PC. So we talked about processing, but we have to put this data somewhere.
So we're going to talk about the storing. So going over the components of hardware in memory units, we have types of stores. They are used for storing data and information. They are influenced by the latency and the architecture of the memory units. And they are related to types, for example, like HDD, SSD, RAM, and cache.
Going over the first one, which is HDD and SSD, they are long-term storage. You can imagine sometimes you are running your Python program and you modify a file. Tomorrow you want this file to be modified. So you're definitely going to be doing this on this type of storage. They are lower in speed write and read,
so it's much slower than RAM. And RAM is slower than cache, we will see. And then we have cache, cache L1, L2, and L3. It's used to store the most frequently used data by CPU. So if you're using a lot of this data, it will end up in the cache. And this fast is small compared to the RAM
and often store the most often executed instructions, as I say. There is also the concept of cache hit and miss. So what's cache hit and miss? So a CPU goes and looks in the cache. If it's in the cache, this is a hit. If it's not in the cache, either it has to calculate it, compute it, or it has to look for another lower storage unity.
It would be a miss. And the execution of your program also could depend on that, that if you had a lot of misses and it would end up not being so efficient. One of these memory units that is quite special
is the RAM. The RAM is short-term memory. It stores data and objects that are currently used by the program. It is special for us because it's where the Python private heap space is located. We're going to dive in on this concept. So if we go over how the CPU reached the RAM,
it would have CPU, and we have cache in between, and we have the RAM. And as long as we go in this line, things start to get slower to retrieve. So it's not so efficient to go to run always. So we are going in the Python world and trying to find a Python abstraction layer
compared to the hardware. So we have it here, the RAM. The RAM is special, as I said, because it has the Python private heap space. The name private, it means that it will be Python which will manage this heap space, not you, because some other programming language allows that you do this manipulation and try to find a performance
efficiency there. So part of the memory that is managed by Python is located in the private heap space. You can think it's where, when Python is running, things go and be saved there. So as we know in Python, everything is an object. And I mean that.
We have list, dictionaries, and set, and they are all objects which will be stored in the private heap space. And we have here two keywords would be the Python's memory managed and the garbage collection. When we talk about the Python's memory managed and the garbage collected, they
will be the ones who kind of garbage collector will clean up things for you, remove objects that are not used, and the memory manager will try to allocate in the allocated memory space. So in between Python and the run, manager will be a friendly robot like doing this cleanup for us.
We could think about the memory managed like, for example, I live in a small apartment. So I would love to have such a robot that could put the things that I want in the right places. And once that notices that I'm no longer using something, tell garbage collector, invoke garbage collector,
remove that, and free up that space for a new thing. And that's how you could think about memory management. And so memory management inside the private heap space works together with garbage collector. So Julia will talk more about garbage collector. But yeah, memory manager will basically do the allocation and free up of space
depending on the state of the application. So there are a lot of things here related to hardware that we learned today. This is just a recap from the first part. But the most important thing is that all code is not isolated. It runs in a machine, and these things are also related to the performance.
So the main components we saw was computing units and memory units and how they communicate and some of the types of memory that we saw. Also that all objects are stored in the private heap space. And the private heap space is managed by a little friend memory manager. So this is it for the first part.
And Julia will talk more about some of these concepts here. Thank you. So as Liza already mentioned, what is going outside our Python and how it can influence our performance. But let's talk about what's actually going inside our Python and how basically Python interacts with our RAM
and what helps him to do this. As we know, our Python is high-level programming language. It means that it takes for us overhead for managed memory, for locating memory and freeing memory. And basically, this is done based on two strategies.
The first strategy is reference counting. Maybe a lot of you already heard this, because this word comes very, very often here. Another word that also everyone knows is garbage collector. But why this thing is important? As we know, first of all, reference counting is basically a field that will represent
how many different objects reference to our object. And is it right now in use? For this, it's one important thing that is usually a property of the object. That's why in Python, everything is the object. It means we have representation for the code, function, non-integer, also container data types,
such as list, set, tuple, and so on. That's why when you execute your program, what Python does, it will parse your code, classes, functions, methods, everything constant as object. It will keep in the private heap space in the hands of the state.
It will recall it and use it. But actually, why it's important? If we have a look on the C Python, it's basically here. We have some representation. And we can see on the top, we have one important thing. It's basically py object. Py object is the object from where all objects are inherited,
two fields, reference count and also type. And basically, it depends on these fields and values of these properties. Python knows what to do. For example, we have py flowout object, py module object,
py log object, py set object that inherits these properties. And also, for example, here, we can also see pyvar object. In case of pyvar object, it's used for definition of the length. So for example, for list and tuples, if you want to know the length,
you can set function one. It's when basically we have this field, and we inherit it from py list object and py tuple objects. So to know that it's because these objects will use for keeping our object, and size can change during execution.
And let's look at better clues on this. For example, we have a small example here that we assigned flowout object. We have statements that we assigned flowout to our variable a. What's basically happening inside our Python? We will have the object will be represented
as follow with the three properties, value, type, reference count. Value basically is the value it has. Type basically gives information in Python how to thread this object. For example, it knows that it's a flowout, what kind of the method. It can interact additionally how we can place this memory.
How we should place memory for this? Because for some kind of execution, Python uses fixed memory location. It means depending on the type, it knows how much memory we need this object needs. And reference count basically indicates that one object right now references to this one reference.
This object has one reference. And these two fields, type and reference count, basically are inherited from py object. Also, do not always pass the pointer of the memory. Python uses variables where basically variables will serve as labels on the pointers. And this we know that by variables,
we can reference to our object inside the memory. For example, here we have the examples that we assigned now treated py to b. And we know that a, b will reference to the same body object. Even they will assign different statements because it already exists in memory.
And also, we see that reference counts will increase. But let's have a closer look what's going on inside our memory during execution. We can see that during, for example, we have two statements, a, 3 assigned to a, and 6 assigned to a as well.
And during execution, we see when we assign a, what it does, it will create the object inside our memory. Value 3, type in, and reference count 1. After that, when it executes the following, we will not change the old object. But instead, we create a new object. With value 6, type in, reference count 1. And for the old object, we can
see that we decrease our value because it's never in use. And usually, with constant data types, it will not modify the object but instead creates a new one and abandons the old one. In case we have more complicated, immutable objects, for example, list is immutable object. And we are hiring for py list objects.
And usually, list represent with the following types. It has reference count that indicates that one variable in the reference to our object. Size is how many items in our size. And also, allocated. It means how much memory we allocate for our list.
Usually, for list, it can sometimes allocate more memory to avoid. It's also for optimization purposes. Do not always relocate memory for our list. It, at some point of time, can allocate more. But what is important is that vector pointers. Usually, what is the indicate? It indicates the address of the memory
where we keep array of pointers that basically will indicate to the objects that included in this array. For example, we know in the list we can get in, flow out, object instances, and a lot of different stuff. That's why it's possible because, basically,
on low level, we will keep not exactly object in the vector and array. We keep just pointers of them. For example, here, we can see that we will keep pointers to the integer one and to the integer two. It gives us great possibility. But sometimes, it can be also punishment for us.
For example, we have, again, our example that one and two assigned to be a list of one and two. Again, we create the object type of list in our memory. And we have this indication in our memory. After that, we append, again, new value.
We can see that our list now reference count increased to two because now list reference to itself. We can see the size also increase. And basically, right now, we have two references to our list, from B and itself. And now, after some execution, we realize we don't need anymore our list.
And we decided to remove references to it. In our opinion, we thought it was removed. But actually, no, because reference count just decreased by one. But actually, the still references in our code is basically a list reference itself. And after execution, some kind of code, in this thing,
it's called a circle of references. Basically, it's what skip can promote, have problems with the memory. Because in case garbage collector, in case reference count doesn't reach zero, because zero is also important value for garbage collector. Garbage collector relies on reference counts to understand if it's reached zero,
it means you can collect our objects. In case with reference counts, in this case, we don't do not reach one because reference count doesn't give us possibility to do. So in this case, we can have the memory state as follow. We can see that reference count zero
for obsolete object of int, we still have a new object here, zero object int. It's good for us. And what is next one? We have the old power list int. But after some execution of the memory, we don't need this.
And what actually we need to do with this? What actually we can do with them? And what has happened actually to our old objects? And what's happened actually to our old objects? It's basically garbage collector takes care of it. Basically garbage collector runs alongside your application and takes care for our memory.
For example, it basically collects your old objects and also solves problems with circular references. And we also can interact with this. But for example, we can trigger it,
or also we can enable it, disable it. We can also set up threshold if, for example, you want to run it more often. For this, you need to use the module GC. Also in GC, it's possible to check how many object refers to your object as well. And it can be used for debugging.
But as we know, garbage collector also can be quite consuming because it's always checking all objects that we have. That's why in garbage collector, we have so-called generations. Basically, generation will represent disjoint sets,
disjoint lists of objects. In Python, we have generation 0, generation 1, and generation 2. And the idea of the generation depends on which generation your object is, how often will garbage collector check
if we need to collect information, if we need to reveal memory for this object. It's not more in use. For example, we have generation 0 where all newly created objects are safe. It means in this generation, we keep on the objects that we consider will not retain long in our memory.
For example, we have function. We call our function with some parameters after execution. We don't need these parameters. It was created some objects inside. And to be sure that we execute our program fast, usually this kind of temporal short long object plays in generation 0.
If they survive after being some time in generation 0 and some cycles of garbage collector cycle, they promote it to generation 1. And basically, in generation 1, there's no big difference. The difference here is just garbage collector will not so often check these objects because it understands
that this object can be retained for a longer time. There's no point of collecting them or revealing the memory. And finally, we have generation 2. It's also basically all the objects that survive inside generation 1.
And this also creates a pattern considered as long-term objects that will not, they can, for example, global variables that will never exit your program. You never need to free memories of it. And also, garbage collector do not need to always check it.
That's why in generation 2, most of the time, we'll keep the long-term objects that will not, that we don't need to check so often. And after execution of the memory, as we recall, we have this nice graphic that represents the type of the list. We have also our old list that we don't need anymore.
We also have our int that we don't need anymore. And also, we have new int. And after execution of the garbage collector, usually by garbage collector reference to this value 0 is that it's no more in the use.
It means that there's no references. And it will remove it because reference count of 0. It means we don't need anymore. It's removed. But again, we also have this list. And even we don't use it. We understand there's nothing here that refers to it. But we still have this reference count.
That's why in the Python, there's another strategy. It's called mark and sweep. What it means? So basically, at this point, what garbage collector does, it will collect all global variables, all local variables. And it's used to understand what of the objects are reachable or not reachable.
It means that there are some objects that are reachable from our program. And there are some objects that are not reachable. That's why it will use global variables and local variables as root. And we'll go through these references inside our Python private heap to mark it as reachable.
And after going through all these references, you know what is object are reachable are not reachable. Reachable object, it means we can reach it from global variable or local variable. It still reaches this object. In case it's not reachable, it just remove it.
It means there's no global variables. All local variables are reference to values. And after this, we can see that its strategy will help you to remove also our obsolete list, because no variables are reached this object. And if you can recap what you think you need to think about Python
and when you work with Python, it's also quite important. Maybe it's just first step to understand how we can better use it. It gives you better understanding of it. You don't need only to think about how you'll write your code, what's the line of your code, but also you can think about your execution.
Remember that everything is object inside Python, so even you have obsolete code that you think I'm never going to use it. It will not consume my memory. It will consume your memory, at least RAM memory, because your function class will be kept inside your Python as object. Object are stored in private heap. And basically, who is managing this private heap? Its memory manager is a part of C Python
that has abstract levels that helps to manage memory. And basically, memory manager relies on the reference garbage collector to collect this data to reveal the memory. And if you want to learn more, one of the good important sources
is memory management in Python, C internals, high-performance Python, and also good talks about immortal objects and pointers in my Python. And thank you for your attention. Wow.
So thank you very much for these beautiful insights into Python memory management. So I think the last time I heard heap management or heap memory was when I did my embedded programming, so quite interesting. I've not seen questions on Discord, but we are now entering Q&A session. So feel free to come here to the microphone
and ask your questions to the speaker. Or you can talk to us after. Of course. But we got the option for asking the speakers. And if there are no questions from the audience, I got at least one question.
And at least I'm remembering when you're executing C code, when you're compiling it. So there will be some optimization. So in Python, can I expect that the code will run the same way, or the memory will be the same way always? Or are there some optimization mechanisms?
No, it's not always. Can you repeat your question again? So are there any optimization methods just in time in memory? So can I expect that the code will be or the memory will be managed every time the same way? So is there some optimization or behind the scenes?
I think it's a good question. Basically, for example, you need to probably Please give some speakers have still a Q&A session. And it's impolite that you're not quiet. When you want to leave the room, please do it quiet.
Thank you. So one of the most important things when you work with Python, you need to understand, as I mentioned, what the py object is using is typed to understand how to locate memory for your object. That's why, if you check a lot of articles,
it mentions why you need to understand data structure. Because it depends on the interaction of data structure, Python will reveal memory for it. For example, that's why tuple are more better suitable for the objects that will not change over execution. Because it knows that it doesn't need to relocate memory.
So it's why tuple is better than list. In case the list is also more flexible, because it knows that you will insert some values, that's why Python considers these facts. That's why, when you're writing your code, check, for example, data structure. Check also cleanliness of your code. Do not use global variables, because global variables always
retain your memory. It's also too good to use slots. What else? So do always memory profiling before optimization. Thank you very much. And no speaker leaves the stage without cookies. I hope you love cookies.
So here are the cookies for the speaker. Thank you very much for your time.