Merken

Fun with cPython memory allocator

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
but
OK the next so again about time
memory you have to the trick buy some memory from to make sure I have written this little accident was the water so but as this kind of now and reconstruct have fun hello amenities like I'm going not going on the internet and and today I want to talk a little
bit about vitamins memory and some disclaimers and I'll have code
examples on my slides and everything was executed and went to that the 1204 64 bits and on the right and 2 . 7 this is kind of older set up it comes from this right from my company's production uh if you if you run this goes on on other systems and on other side and Python implementations you will get all the different results and this that might might have been invalid and are not experts in the vitamins I I I can Python for a long time but I don't know like to dwell into its got and so at some level of these talk it's kind of I don't know what I'm doing everything that I will touch here is more way more complicated than I have time to do to get into so this is also alliance in its oversimplified and and again I'm not even sure anything here is true and this
that is it is a case study our report from battlefields mostly out of a bottle to keep my sanity intact and and my web developer I work for a company and we have long believed that processes they In order to this normal trying to set up so they want them to to keep leaving for a for a long time and we for some legacy reasons we have a couple of requests might and actually 1 request 1 resource that if you if you had a with http it will crunch some numbers and generate reports and show it to very important people and but during this this request there will be maybe half a gigabyte the more memory allocated just for this report it's 5 we so we have the capacity planning that includes this we know from our business this request happens only maybe twice or 3 times of the day so it's OK if if there's 1 process Trent turning this memory other processes should be unharmed but for some reason and this memory this this have topic he the gigabyte memories never released this kind of links to the question we had for the previous talk and and for some damage it was OK for us we just to start the process is periodically but then then we need wanted to kind of dig deeper and know what's the reason and and to improve our capacity planning to
improve resource is utilizations wanted to get rid of this problem so requests status we just want to do this we just want this memory to be released and for our application to 2 get back it's kind of a baseline mainly after spending many hours fruitlessly trying to define what it what it is that it is that it causes it either reduced our are code to something like this we
have on function our group look at a piece of code that allocates roughly 100 thousand times on strings to always and these at each of these things is roughly 5 kilobytes being after that we we do some report on this and then the other case some small and small amount of memory to keep results and gather results and improvement displayed as a summary later on the code doesn't need to at the big list anymore so it to be sitting in a in in a real college was more of a it at the beginning variable ahead would just go out of scope about 4 4 here we just deleted and we report some some more laughter what's there and report functions here we will report memory usage as well as in residents have memory so here's the output of this of this program so this distance with this program so after allocating this big list of strings we have half a gigabyte of memory used and after using delete so reducing the reference count is less the memory space and time index and this is not normal by the behavior and I like because I don't have to deal with memory I don't have to deal with manually the allocating memory about thinking about this just leave it to the interpreter and go on about my business and my employer's business features that they want to implement and at the at the 1st sight it was thinking feeling well maybe there is some kind of Hidden cycling dependency maybe I'll just needed the Bush garbage collector a little bit more to work so that's what it I have introduced garbage collection into a forest garbage collection into this small program didn't
help at all actually importing modules because because the program to use of the more memories of and at this point we have lots of started to question my own sanity had the a I needed to rest and I need to get my thoughts and you after resting some of decided I to friends and you need someone to help me with attitude that to debug this from I was thinking well where's the memory leak out I have to have a memory leak probably somewhere this codes silently you know send some strings into the into the into outer space and never assume that reduces it in the memory and tried to find a couple of friends on the internet mostly in terms of tools I can use the debug memory usage and jokes in the time before describe some of them and I found those tools you usually unusable to the person that is being distance there need of knowing what's happening and if you have time get to know those tools they're kind of nice but the documentation is mostly uh horribly convoluted and the output of those programs it's really complicated complicated
and the only thing I found that's the not I don't need do that I found it worked for me and I can understand it and he's got the and and it's more memory related parts called he died and you can find it on the on SourceForge you can also do people and so that the and it's works recommendation well they claim to have it there are some examples and but most importantly it it is kind of self explanatory so you don't have to actually you don't need that much of the of the commendation so that haven't others and although I received e-mail from of because who implemented a very nice and module for Python 3 calls trace not traced out and I haven't I haven't had time to
test this on my coat but it looks like it's it's actually helpful and well that can be documented on the but under the condition that picture so how does deputy work what what does it do and actually and by government actually that on the piece reduces so he by name
parts he buys from and from the GOP and then you insert they need they need you and then you request that you want to take a snapshot of of of few heat and it's a it's kind of all list so you can also slice and then if you print it will display nice an overview of what's happening in your program I've introduced this into my small a case study to see what is happening and then you find sane or not
and that's the output it gave me so you can see that and after the allocation of big list we had half of you half of give represented an in string so you can see it and in red bold number right it is vital for and so on and it is the count is actually what we kind of expected of around 100 thousand and will more but the rest of it is just vitamins C vitamins and implementation detail it keeps a lot of strings in memory and after the allocating the memory and we can see that roughly 100 thousand strings go away and memory used by strings is only 800 kilobytes he repair reports memory as seen from inside of Python interpret the so energy if you see if you if you look at the total memory after deallocating don't eat it claims the Python the memory used by Haydn by objects the title is aware of is only 1 and half megabytes so where did not I believe that the answer is easy for anyone who recently finished in university uh corresponding system of operating systems or memory management but for me that was almost 10 years ago so it was not clear for me so a refresher those of
phenomenon and called memory fragmentation and what is it well if you think about memory from from the point of view of but interpreter the memories of light interior to to Python as a continuous address space and it has this property of growing and shrinking only on 1 side and let's say uh right side so the you but if your program allocates some memory in it's there then you request more memory and like in my example test case it's it's being added to the right and then we only the big chunk of
memory right it's released yeah but it cannot it it cannot be reused because that would require the little memory space to shrink from the other so you know this this gave me and having a really good can't work the problem or the problem can be and I thought well I will go and relentlessly remove all the small allocations and do them before the big earliest allocation so so the prepared a memory so that the allocation can be freely released to the system memory at the this never happened I
mean I did that i put my people that but still the memory has not been released to the system and I still hasn't been sure that I saying and now this part
will will include the previous presentation and at this point I had to go out into the wild internets and vast plains of undocumented interpret the implementation details and I had to learn about how I can actually uses system calls to grants you memory you wish to use and and basic lessons learned from this is that Python
doesn't use the system and the melody so actually system call so it's a suspended library called but for our purposes here mental it from the blood I would call it system called and it by and use it directly because it's too costly for small objects there so I understand the for calling the function going assists of function that's leaves in the kernel space that is separate from your program and kernel has to do and all the tasks and required to protect kernel from your malicious programs of course we know our programs and acknowledges that the kind doesn't so there's a lot of overhead and and implements more more more sophisticated allocated on top of Mount look simple or system library the capital all all
improvements interested you do
1 of them is and cold freely and and because Titan interpreter runs your code and highly and highly dynamically it actually is true that you that you can you can look at at your code classes and instances and are just some dictionaries and lists that are perhaps some some special semantic to it some special and way of describing it but from the present perspective of and a program running in memory it just lists and dictionaries and uh if you can imagine that from if you have been for previous talk you you've seen that there is something a bit of overhead memory overhead and 2 dictionaries and this so patent tries to keep those objects in the world a close at hand so it doesn't immediately release your list if it goes away kind of tries to the people a handful of of lists and dictionaries ready to be reused and because it will be reused to index function you will gonna call In general will be represented as well as object called frame and it will have a list of of of variables of the air inside and so on and so on sulfur many them for a handful of of most common types there is a news especially Ganesh for where where where they're kept after you've we them to Python and from what I've been told it's speed of code execution and sleep yeah and this also in the possibility to play a little bit with this and see how we can abuse it so to to check whether it's
actually you know what I've read on the internet you know as I as some kind of relation to uh the interpreter that I have installed on my system I have and devices and handful freebies torch or which just allocates and list of growing land so the Alec function will allocate a list that's uh that has some strings in it and it's out of land of fire so that each list in at and will be a little bit bigger and have just immediately after making this this long list there they those going just take every element of this of this newly created and we put it in another 1 so we release this newly created and and this is before better than the previous our context in In pools that has a similar memory memory footprint so shorter list will be believed in the shorter lists small slightly bigger will be nearby and so on
and so on so here after deleting the be that big variable that this list and the memory
usage will not drop because of this of this problem the but at this point I have decided that in you know the the the whole thing about my company having a request that allocates half a gigabyte of memory is actually completely ridiculous and I decided to offload this work to the past few so the safe in to get more Americans to store stuff process and kind of we that freedom from uh from the immediate danger of going mad with uh trying to divide 10 see internal memory management and so if you ever have this kind of problem and and you try to to think about it there are a couple of solutions I a couple of recommendations I 1st of all try to make better use of memory and usually OK if you have a objects together with you know will be like living longer than other ones try to and allocate them in that kind of longevity and a reverse longevity order right so that the the longest clear longest leaving objects 1st and then and then the rest if you haven't no ability to do that try to offload of memories search to super sub process and then let the system operating system take care of reclaiming memory and and cleaning up after you this is the lazy man and uh solution and I'm lazy so I use as just said before there are than other implementations of Mach 1 I've tried and found that it actually helps me my own problem was G. Muller and you can be loaded using uh we foresight and use it by using LDA preload system environment the environment by a variable and and I've tried to use
it with with my an example and you can see that it helps but it has drawbacks so and after output you can see what what we've seen before or just pure it's just copied here and after using GATE MolLoc you can see that 1st of all the how all time peak memory usage is bigger and G. Mahler is much more sophisticated than the knowledge it it tries to be kind of implements its phone and memory allocation algorithm quite in quite sophisticated way so here we can we can see that it it actually has now bigger overhead but then it is actually easier for for pattern to reduce this memory to the system so and this would work for my case as well on the other hand I didn't want to replace that is for for everything because I would I fear of assuming that this is kind of this would require me to test every piece of most of my of my program to see whether you know I haven't broke broken any other part of it and you can actually see that after this GC has his really some of some more memory so this is even better Jesus the MolLoc is used by some big names in another industry I think that phrase it is using it for at least part of their systems and their
heavily involved in in development of G and the best of my time's running out so couple of common conclusions and 1st conclusion is that sometimes
memory is not what it what you think it is and sometimes you have to go back to the school to remember where my memory might be hiding and the other thing is that Malik from each season the best of breed funny stories that tried to show this problem with a friend of mine and I brought program with me from work to move into the home and at home I have as much as you can see and the problem was the we solve and immediate was nonexistent and now the the reason being um my my my room in the kernels on used of some kind of 1 version of out so this is nice feature for me to have on your you know want a system and as said as I mentioned memory-intensive work work well what best in subprocesses and cross the kind of of people that are mentioned here and if you're using any kind of C extension they are not using Python and they they might not be using Python and memory allocator which means that there might be the kind of break fight and ability to release system memory to the system because so they will allocate it using what kind of complicated but we've we've seen this actually to harm where that's it and
innovations it if so any questions at this
gives much from the hi and thanks for the title of the food to comment on the use of replacing Malevich of figure it said it's good for a stopgap solution for now but the remember remember that knowledge is going to improve over time there's there's you guys are going to make you feel better and better of the time and that all the other tools going to take advantage of that in 10 years from now something that I believe you use see that the default mode you program will improve even without you cutting @ @ @ @ @ sister and it's not only Malik actually Python management in memory management Python entries actually improving a lot as it said that interesting has written very good mood this module called trace quality that allows you to do the about memory which is really really nice I was wondering so in your case have that you're having case where you would like allocated bunch memory and a request and I was happening over and over again and you're like building up like we had like but like a memory right like it was a lot of memory and time have been done good job describing the problem from was we had and we had a couple of web workers from the worker processes and our capacity plans said they are allowed to to be added baseline memory of let's say 50 megabytes of RAM and but we haven't planned for them to be half of gigabyte all the time if it was OK for them to to go and and eat memory once in a while provided that there were too many of them concurrently and with this pro problem we we've seen that you know they decided eating memory and then keep it keep it in there at at the high peak in not releasing it and that because if that goes our our capacity planning to fail and that wasn't pleasant with increasing over time I would like I to know it's not it would just speak at half a gigabyte and state that and no other request would would even touch it and so we give you could see that these memories being and we used internally by the by not return to the system and that was a good for us 1 last question is is that there's memory of fragmentation and that the Python that memory area so with the maybe the weight of the fragment so that the fragments that area and like that Python reallocates memory areas and and and the question is I ask if if it's possible for Python to defragment memory by moving objects around you have no that's happened for the fight and it's not possible 1 cell at once once they're the objects state and you can actually see that by running an IDE on on an object it will just give you a rock and roll my pointer value face and correct but I might I might not be correct but they're not they're not moving around and then if you want that I think a job that has this futuristic write you could try running during your program with giantin and that would help also has a lot better than a garbage collector and although it's it's about as far as in check last time it hasn't had this ability to move objects around but it was much more sense than what we have in C 5 OK I think it'll thank you very much
Amenable Gruppe
Code
Wasserdampftafel
Festspeicher
Vorlesung/Konferenz
ROM <Informatik>
Computeranimation
Internetworking
Resultante
Expertensystem
Implementierung
Physikalisches System
Biprodukt
ROM <Informatik>
Code
Computeranimation
Übergang
Rechenschieber
Menge
Rechter Winkel
Code
Festspeicher
Beobachtungsstudie
Prozess <Physik>
Softwarewerkzeug
Automatische Handlungsplanung
Zahlenbereich
Kanalkapazität
Kartesische Koordinaten
Binder <Informatik>
ROM <Informatik>
Code
Computeranimation
W3C-Standard
Metropolitan area network
Generator <Informatik>
Festspeicher
ROM <Informatik>
Web-Designer
Vorlesung/Konferenz
Ordnung <Mathematik>
Verkehrsinformation
Resultante
Explosion <Stochastik>
Bit
Punkt
Gruppenkeim
Euler-Winkel
Zählen
Term
Code
Raum-Zeit
Computeranimation
Internetworking
Leck
Code
Abstand
Optimierung
Hilfesystem
Minkowski-Metrik
Funktion <Mathematik>
Lineares Funktional
Wald <Graphentheorie>
Mailing-Liste
Funktion <Mathematik>
Automatische Indexierung
Festspeicher
Dreiecksfreier Graph
Codierung
Speicherbereinigung
Verkehrsinformation
Zeichenkette
Softwaretest
Konditionszahl
Festspeicher
Mereologie
Systemaufruf
Vorlesung/Konferenz
Punkt
Beobachtungsstudie
Partitionsfunktion
Explosion <Stochastik>
Indexberechnung
Zahlenbereich
Implementierung
Mailing-Liste
Physikalisches System
Zählen
Objekt <Kategorie>
Energiedichte
Mailing-Liste
Funktion <Mathematik>
Rechter Winkel
Festspeicher
Total <Mathematik>
Mereologie
Zählen
Speicherverwaltung
Optimierung
Grundraum
Verkehrsinformation
Funktion <Mathematik>
Zeichenkette
Softwaretest
Interpretierer
Sichtenkonzept
Punkt
Kategorie <Mathematik>
Adressraum
Güte der Anpassung
Physikalisches System
Raum-Zeit
Computeranimation
Rechter Winkel
Festspeicher
Speicherverwaltung
Optimierung
Analytische Fortsetzung
Punkt
Festspeicher
Mereologie
Implementierung
Vorlesung/Konferenz
Physikalisches System
Computeranimation
Internetworking
Lineares Funktional
Systemaufruf
Physikalisches System
Raum-Zeit
Computeranimation
Kernel <Informatik>
Objekt <Kategorie>
Task
Freeware
Rahmenproblem
Code
Typentheorie
Programmbibliothek
Optimierung
Overhead <Kommunikationstechnik>
Informationssystem
Explosion <Stochastik>
Bit
Rahmenproblem
Klasse <Mathematik>
Abgeschlossene Menge
Element <Mathematik>
Code
Computeranimation
Internetworking
Freeware
Variable
Code
Typentheorie
Datentyp
Optimierung
Lineares Funktional
Interpretierer
Relativitätstheorie
Mailing-Liste
Ähnlichkeitsgeometrie
Physikalisches System
Kontextbezogenes System
Data Dictionary
Zeichenkette
Objekt <Kategorie>
Rahmenproblem
Automatische Indexierung
Festspeicher
Boolesche Algebra
Overhead <Kommunikationstechnik>
Informationssystem
Instantiierung
Zeichenkette
Punkt
Prozess <Physik>
Implementierung
Mailing-Liste
Physikalisches System
ROM <Informatik>
Computeranimation
Eins
Objekt <Kategorie>
Mailing-Liste
Funktion <Mathematik>
Reverse Engineering
Festspeicher
ROM <Informatik>
Vererbungshierarchie
Vorlesung/Konferenz
Speicherverwaltung
Ordnung <Mathematik>
Speicher <Informatik>
Programmierumgebung
Metropolitan area network
Algorithmus
Verknüpfungsglied
Festspeicher
Mereologie
Mustersprache
Vorlesung/Konferenz
Speicherverwaltung
Physikalisches System
Softwareentwickler
Optimierung
Overhead <Kommunikationstechnik>
Computeranimation
Funktion <Mathematik>
Singularität <Mathematik>
Versionsverwaltung
Physikalisches System
Maßerweiterung
ROM <Informatik>
Ranking
Computeranimation
Kernel <Informatik>
Data Mining
Festspeicher
Kontrollstruktur
Speicherverwaltung
Optimierung
Maßerweiterung
ATM
Gewicht <Mathematik>
Güte der Anpassung
Automatische Handlungsplanung
Kanalkapazität
Zellularer Automat
Physikalisches System
Modul
Computeranimation
Objekt <Kategorie>
Benutzerbeteiligung
Flächeninhalt
Rechter Winkel
Prozess <Informatik>
Festspeicher
ROM <Informatik>
Vorlesung/Konferenz
Speicherverwaltung
Zeiger <Informatik>
Optimierung
Speicherbereinigung
Gleitendes Mittel
Default
Figurierte Zahl
Aggregatzustand

Metadaten

Formale Metadaten

Titel Fun with cPython memory allocator
Serientitel EuroPython 2014
Teil 113
Anzahl der Teile 120
Autor Paczkowski, Tomasz
Lizenz CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/20042
Herausgeber EuroPython
Erscheinungsjahr 2014
Sprache Englisch
Produktionsort Berlin

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Tomasz Paczkowski - Fun with cPython memory allocator Working with Python does not usually involve debugging memory problems: the interpreter takes care of allocating and releasing system memory and you get to enjoy working on real world issues. But what if you encounter such problems? What if your program never releases memory? How do you debug it? This talk describes some of the lesser known properties of cPython memory allocator and some ways to debug memory-related problems, all this based on real events. ----- Working with Python does not usually involve debugging memory problems: the interpreter takes care of allocating and releasing system memory and you get to enjoy working on real problems. But what if you encounter such problems? What if your program never releases memory? How do you debug it? I will tell a story of one programmer discovering such problems. The talk will take listeners on a journey of issues they can encounter, tools they can use to debug the problems and possible solutions to seek out. There will also be a brief mention of general memory management principles. cPython uses a combination of its own allocator, `malloc`, and `mmap` pools to manage memory of Python programs. It usually is smart enough, but there are some darker corners that are not well known by an average Joe Programmer (read: me). There are tools that can help debug memory problems, but those are also relatively unknown, and tend to have documentation that one might find lacking. I will describe one such tool, called `guppy`, which I have found particulary helpful.
Schlagwörter EuroPython Conference
EP 2014
EuroPython 2014

Ähnliche Filme

Loading...