Post-Mortem Debugging with Heap-Dumps
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Teil | 75 | |
Anzahl der Teile | ||
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/19943 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache | ||
Produktionsort | Berlin |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
| |
Schlagwörter |
00:00
ProgrammierungSpeicherverwaltungSpeicherabzugComputeranimationVorlesung/Konferenz
00:39
FontInverser LimesBenutzerfreundlichkeitProgrammSpeicherabzugComputerSystemprogrammierungOperations ResearchMakrobefehlProzess <Informatik>CodeMixed RealityROM <Informatik>MAPAppletAusnahmebehandlungFehlermeldungStochastische AbhängigkeitModulare ProgrammierungAnalogieschlussDebuggingGebäude <Mathematik>p-BlockDemo <Programm>SpieltheorieVersionsverwaltungSystemzusammenbruchNeuroinformatikVersionsverwaltungSpeicherabzugSoftwareProgrammFunktionale ProgrammierungSpieltheoriep-BlockVerkehrsinformationHalbleiterspeicherFehlermeldungZweiProgrammierungAppletGemeinsamer SpeicherImplementierungAnalysisNichtlinearer OperatorSchlussregelKonditionszahlMAPFormale SpracheNetzbetriebssystemSpeicherverwaltungGarbentheorieGrenzschichtablösungInterpretiererModulare ProgrammierungMaschinenspracheProgrammbibliothekCompilerGebäude <Mathematik>ProgrammfehlerMultiplikationsoperatorAusnahmebehandlungObjektorientierte ProgrammierungKonfiguration <Informatik>CodeProzess <Physik>GeradePhysikalischer EffektSystemzusammenbruchQuantenzustandPhysikalisches SystemZahlensystemProjektive EbeneKonstanteInternetworkingPunktSystemprogrammierungZusammenhängender GraphKlassische PhysikLie-GruppeGewicht <Ausgleichsrechnung>BitrateInhalt <Mathematik>ÄhnlichkeitsgeometrieDialektQuantisierung <Physik>TeilbarkeitQuaderResultanteRechter Winkelt-TestNichtlineares GleichungssystemComputeranimation
06:41
Modulare ProgrammierungZehnGammafunktionRechenwerkZeitzonep-BlockHilfesystemSpieltheorieMaßerweiterungWeb SiteSpeicherabzugElektronische PublikationHook <Programmierung>AusnahmebehandlungPhasenumwandlungWort <Informatik>SkriptsprachePunktInstallation <Informatik>StandardabweichungWasserdampftafelProgramm/QuellcodeComputeranimation
08:09
Kontrollstrukturp-BlockSpieltheorieHilfesystemQuantenzustandHumanoider RoboterMenütechnikThreadSpeicherabzugHill-DifferentialgleichungRestklasseMIDI <Musikelektronik>WechselsprungKonvexe HülleChinesischer RestsatzVorhersagbarkeitRechenwerkElektronischer DatenaustauschBewegungsunschärfeChi-Quadrat-VerteilungSpieltheorieSystemzusammenbruchBitArithmetische FolgeKartesische KoordinatenMessage-PassingResultanteProgrammierungKonfiguration <Informatik>Modulare ProgrammierungGeradeTermSoftwarepirateriet-TestSpeicherabzugSystemprogrammierungPartikelsystemParametersystemDebuggingElektronische PublikationRPCComputeranimationProgramm/Quellcode
09:45
PortscannerRechenwerkEin-AusgabeFluss <Mathematik>Konvexe HülleMIDI <Musikelektronik>CAN-BusAusnahmebehandlungSystemzusammenbruchObjektorientierte ProgrammierungObjektorientierte ProgrammierspracheSpieltheorieVerknüpfungsgliedFehlertoleranzDickeInstantiierungSpeicherabzugModulare ProgrammierungKonditionszahlAttributierte GrammatikVariableCodeMessage-PassingCoxeter-GruppeDebuggingPhysikalischer EffektRahmenproblemDreiecksfreier GraphPolstelleRuhmasseGamecontrollerWort <Informatik>MinimumSpezifisches VolumenPunktComputeranimation
12:17
VererbungshierarchieRechenwerkWechselsprungSpeicherabzugDebuggingWort <Informatik>SpeicherverwaltungMultiplikationsoperatorProgrammierungFunktionale ProgrammierungNichtlinearer OperatorSystemaufrufAusnahmebehandlungElektronisches ForumInhalt <Mathematik>Digitales ZertifikatFehlermeldungZahlenbereichInformationModulare ProgrammierungSpeicherabzugHypercubeDebuggingHook <Programmierung>Kartesische KoordinatenProgramm/QuellcodeComputeranimation
14:05
SpeicherabzugRechenwerkQuilt <Mathematik>SchlussregelZehnZeitzoneZeiger <Informatik>DateiformatHIP <Kommunikationsprotokoll>TropfenPi <Zahl>UnendlichkeitLokales MinimumIRIS-TDreizehnChi-Quadrat-VerteilungData MiningMinimalgradInhalt <Mathematik>InformationMessage-PassingDateiformatElektronische PublikationMereologieProzess <Informatik>E-MailProgramm/QuellcodeComputeranimation
14:57
Inhalt <Mathematik>SpeicherabzugDebuggingE-MailBinärdatenAusnahmebehandlungThreadRahmenproblemGruppenoperationObjektorientierte ProgrammierungAlgebraisch abgeschlossener KörperQuellcodeProzess <Informatik>Message-PassingData DictionaryGeradeModulare ProgrammierungDateiformatImplementierungStandardabweichungVersionsverwaltungMobiles EndgerätProgrammbibliothekp-BlockKontextbezogenes SystemOrdinalzahlMultiplikationHackerSpielkonsoleOffene MengeROM <Informatik>ComputersicherheitInformationPatch <Software>CodeSoftwaretestBenutzerschnittstellenverwaltungssystemGatewayElektronischer FingerabdruckPersonal Area NetworkCodeAggregatzustandFunktionale ProgrammierungSchnittmengeMereologieObjektorientierte ProgrammierungSystemprogrammierungSoftwaretestBefehl <Informatik>SystemaufrufMobiles EndgerätKonfiguration <Informatik>Eigentliche AbbildungAusnahmebehandlungMultiplikationsoperatorRelationentheorieQuellcodeRechter WinkelStichprobenumfangOpen SourceNichtlineares GleichungssystemInstantiierungAdditionElektronisches ForumAutomatische HandlungsplanungKlasse <Mathematik>Deskriptive StatistikWindkanalSchreiben <Datenverarbeitung>HalbleiterspeicherCoxeter-GruppeImplementierungAnalysisp-BlockRechenschieberModulare ProgrammierungKontextbezogenes SystemComputersicherheitNormalvektorDatenverwaltungDebuggingMomentenproblemProgrammierungQuick-SortDifferenteInformationProgrammbibliothekObjektorientierte ProgrammierspracheRadikal <Mathematik>AbstandAxiomPi <Zahl>VariableInhalt <Mathematik>Algebraisch abgeschlossener KörperBildschirmmaskeGesetz <Physik>Umsetzung <Informatik>Produkt <Mathematik>RahmenproblemVersionsverwaltungBefehlscodeElektronische PublikationProgrammfehlerNeuroinformatikBildschirmfensterKeller <Informatik>CASE <Informatik>OrdinalzahlZweiMessage-PassingThreadDateiformatProzess <Informatik>SpeicherabzugSpielkonsoleEindringerkennungGüte der AnpassungBitGrenzschichtablösungComputeranimation
24:40
QuellcodeThreadMultiplikationsoperatorServerGruppenoperationSpeicherverwaltungVerkehrsinformationVariableStellenringAusnahmebehandlungDigitale PhotographieKartesische KoordinatenProgrammfehlerSpeicherabzugInformationBitGibbs-VerteilungInverseRechter WinkelResultanteProjektive EbeneVollständiger VerbandRuhmasseElektronisches ForumFlächeninhaltGradientHypermediaVorlesung/Konferenz
Transkript: English(automatisch erzeugt)
00:15
And the next talk is, we welcome Ansem Cruz with the topic of post-mortem debugging with heap dumps.
00:23
Welcome. Hello, everybody. So it's probably a well-known problem that every serious program is bugging.
00:41
And there are various methods to handle it. My talk is about post-mortem debugging for Poisson. Just very short, I'm working for science and computing. And I'm a senior software architect. In my spare time, I'm doing some Poisson access.
01:03
So some program failures occur very infrequently. And they are sometimes also hard to reproduce. And think of a large compute cluster where once a day a job dies.
01:22
You have no access to these jobs. And in this case, it's a very common approach and a very old approach to use some kind of post-mortem analysis to find the cause of a failure. The classical approach is to create a core dump.
01:44
And the core dump is a file. And you can later load it in the debugger and analyze it. And while unfortunately, Poisson has no usable core dumps yet. So I thought it could be a chance for a little project.
02:00
And when I started, I didn't know whether it would work out. But well, actually, it works to some kind. So there's a lot of previous work. And core dumps date back into the origins of computers. So the oldest reference I found to the classical core dumps
02:21
is in the programmer's manual for the share operating system from 1959. That's the second operating system ever created. So it's really old. And today, almost every operating system has a feature to create a dump of the memory of a program
02:44
that caused some fault conditions. People have used this operating level dumps to analyze interpreted languages and interpreted programs running within a native code interpreter.
03:02
And so in the internet, you can find various reports of people trying this for Poisson. And so I have mixed results, a few projects. And I think I got most of them. You can find it in the reference section.
03:22
Actually, it's complicated and highly dependent on the implementation and the compiler options and the compiler version and operating system and so it's not really practical. Then it's, of course, possible to move the feature
03:43
to create a dump from the operating system into the interpreter. And there are some reports about dump features for interpreted languages. And the most prominent example is probably Java. So IBM implementation of Java directly
04:05
supports some kind of Java heap dumps. And you can later debug the Java program. Well, for Poisson, there's also some ongoing work. In 2012, Elie Feiner released a PyDump module.
04:25
Its idea is to catch an exception, pick up the traceback, and then use a BDB post-mortem function to analyze the unpicked traceback. And this, in theory, this works well. But in practice, most serious traceback
04:42
contains some unpicked level objects. And well, it fails with unpickling error. And now we come to PyHipDump. That's the module I created. The name PyDump was already used.
05:01
So I had to choose a different one. And I used PyHipDump. It's still experimental work. It's currently 2.7 only, because the aspect library I depend on is 2.7. But it's possible to port it, and there's also
05:20
an experimental port of this aspect library. So if it may just, we will get it for Poisson's rule. The building blocks and the basic idea is similar as PyDump. Some exception handling code and some separation of the dump and glue code to insert the dump
05:45
into the debugger. And indeed, I used a few lines of code from Elie Feiner's PyDump module, because when I found it, I was like, well, perhaps I can improve it. Turned out to be just a few lines left.
06:05
So it's time for a little demonstration. Think of the following situation. You installed a little Python game for a partner or your kids or a customer. And then the she or he complains
06:21
about some crashes occurring ever now and then. And you'll have to catch the bug now. And please note, I used the game block fortress, and I introduced the bug, so the upstream version is perfectly OK. And here we are.
06:43
So first, we have to instrument our Python installation to create the dump. So it's as simple as a pip install by heapdump. It's already installed.
07:01
I didn't want to depend on the network, yes. Then I created a little PTH script. Here is it in the Python installation. It just, you know about what this kind of files do, how they work.
07:22
If Python finds during startup a file with extension PTH in its site packages directory, it's, well, let's say, it executes this file in some sense, yes. So it imports a pipdump package and then registers
07:47
the dump on unhandled exception handler. It's registered with this exception hook. So let's play the game.
08:10
Should be, should work out, yes. OK, so I have tried a little bit to find a solution
08:21
where the game reliable crashes. And here is the crash. And we got a message that we have here in a heapdump file. That's fine. So we can now load this into the debugger. It's fairly simple.
08:44
There's, we simply call Python minus m per heapdump. And there's a nice help option. So you can have some arguments. The idea is simply to tell the Python work file to debug.
09:09
And I want to use a pydef debugger that is included. So it's a debugger for the Eclipse pydef module. And it has a nice remote debugging feature.
09:21
And therefore, it's well-suited for this here, for this application. And I have to tell Python whether the debugger module is actually located. This is a long line here, yes. So let's debug it.
09:42
So then I have to go to the debugger, yes. And you see, here we have a message that the debugger shows me the exception that was called. And while attribute error, ball object
10:01
has no attribute at bonus. Obviously, that's a reason for the crash. But we could also ask why did it happen. So we have, can you read it? I think, yes, it should be possible.
10:20
So you see here, there are some more or less complicated conditions when this at bonus method is called. So we can look into the variables. And we see here, for instance, solve. So here are all the values we need, yes.
10:45
And we can find, OK, the combo length is 14. That's true, and some other values, yes. And this complicated condition is the cause why the crash is occurring on the rally.
11:01
Fine. Actually, I introduced this code here. You see, we can also look at other frames. And we get all the variables. We can, for instance here, look into the, oh, that's an interesting thing.
11:22
We can look in the objects. And here we have an interesting object because the game object was actually not really unpickable. For some reasons, it depends probably
11:40
on some resources or something like that. So we get a through gate object instant. And so it doesn't hinder debugging because it's OK, objects inserted by the fault tolerant unpickler of the heat dump module
12:03
has all the attributes the original object would have. So we are still able to analyze the problem. OK, so far the demonstration. And back to the presentation.
12:25
So the application of the Hyper Heat Dump module is very simple, yeah? You have to set up some exception handler. You can have various ways to do it. Usually, the most common and comfortable way
12:41
is the function dump on unhandled exceptions. And it can register an this except hooker, as in this module, there's an hook to, that's called if an unhandled exception, of course,
13:01
in the main thread, yes? Or it can work as a decorator for a function. So if this function raises an exception, it will be called. And there are also some low-level functions available. It's documented in the manual of Hyper Heat Dump.
13:23
Then you have to instruct your customer or the operator to send you any heat dump files, yes? And then you have to wait. And if you have good luck, you have to wait forever because your program is not bugging. And finally, you analyze it using a common debugger.
13:45
So how does this work? Yes, we have, that's a complicated question. So let's divide it into some simpler questions. And the first question is, what is the content of the information in a heat dump file?
14:03
Let's have a quick, oh, I have to finish debugging here, so simply finish, yes, yes.
14:20
So actually, the file is a kind of mime message format. And the idea was to make a file that a human could read to some degree. And so it contains some headers with information
14:41
about the Python process that created the file. And then there's a large binary part. And it contains the real information.
15:03
So what is in this binary blob? It's a content of our comp, the content is a compressed pickle of a dictionary. And the dictionary contains traceback, the traceback of the exception stack frames of selected or all Python threads,
15:23
or in case of stackless Python of all tasklets. And then the transitive closure of the objects reachable from the frames or tasklets. And optionally, you can also include the sources of the code objects from all these frames,
15:41
and some other interesting objects, like the process IDs, pass modules, because if you create a dump, for instance, on a Linux system and analyze it on a Windows system, you sometimes need to know how to interpret the passes and the file names of the source code
16:04
files in the code objects. And so you need to pass then and thread IDs and things like that. How does Pahiptom create this content? Well, the basic idea is very simple, create a dictionary with the content and pickle it.
16:22
And there's a challenge here. You can't pick almost all classes. Yes, the kind of data that is pickleable in Python is fairly limited. Yes, it's typical data objects and objects designed to be pickled, yes, but surely not everything.
16:44
And the second challenge is multithreading, because think of several threads running at the same time and changing the state of your program. And if you have an exception, it could depend on the states, not only of the threads that
17:02
actually had the exception, but also on other threads. So how can we pickle arbitrary objects? Yes, you probably all know a little bit about pickling. And pickling is, on one hand, it's a data format.
17:24
Yes, you can read the exact description of all these little opcodes and details in the source of the CPython library. And there's also standard implementation. Yes, it's a pickle or a CPickle module.
17:43
So the basic idea of the standard implementation is to serialize data in a portable way, portable between different Python versions. And it's really fast. There's a second implementation of the pickler, the SPickle
18:00
module, science and computing created and released as an open source in 2010. And the idea of this module is to serialize well-behaved objects, but not between Python versions, between different Python versions.
18:22
It's fairly slow, because it's written entirely in Python. And pyheapdump builds on this SPickle library and adds some additional features. And the important feature is a fault-tolerant pickling and unpickling.
18:42
So basic idea is we are not required to serialize and restore the data in an exact way, because we are not interested in continuing the program.
19:01
We just want to look at its state. So it's enough to preserve the state in a way that is useful for analysis, but that's all we need. So we have enough additional freedom to handle problematic objects.
19:22
So the second challenge, multithreading. That's not a perfect solution possible. It would be in an ideal world, we would have a method to stop all other frames,
19:43
write our pickle, or create a pickle, at least. Yes, and then let everything go on earth. But we can make some best-effort solution. And it's indeed possible to block other threads, as long as you don't release the kill.
20:03
Yes, that's a system check interval function. And if you set it to a very long interval, or in Stackdriver, you could use an atomic context manager, then you effectively block other threads.
20:24
And then you can block the threads, make a copy of all the local variables of the frames you're interested in. And then you can actually pickle this copied frame objects.
20:46
Pickling it by itself could release the kill because if you pick a class that has a custom good use or custom get state method or something like that,
21:01
this method could actually make a call to an external C function, and it could release a kill. And then you get all the methods. So short note, final note about the debugger support.
21:20
PDB and PyDefD already support post-mortem debugging. PDB has a nice API method named post-mortem. In PyDef, you need to hack a little bit around in the internals. PyDef supports the inspection of additional stack frames
21:44
which do not belong to a set so-called custom frames. And that's a really useful feature here because if you have multiple threads, we can use this custom frames to make it very simple to access the other threads besides the threads that
22:08
cause the exception. Yes, and PyDef should probably add some API for advanced debugger features like this post-mortem debugging or adding custom frames into the debugger console.
22:21
It has all these features, but it's not accessible. Future goals. Well, actually, at the moment, it is useful. We are already using PyDefDump in one of our products.
22:40
But up to now, we got just very few dumps that were caused by real bugs. That's probably due to the testing and quality assurance we have for this product. So just very few bugs.
23:01
And the open questions, the memory usage. How reliable is this concept? And very, very important security. Yes, these dump files contain an enormous amount of information. So probably, you have to handle some carefully.
23:24
And always, if you pick it or especially unpickle something, you're running code. Pickle file is a program. So you have to be sure that you can trust the source of this code.
23:42
Yes. And probably, I'll ask Fabio to provide some better APIs for PyDefD. And in the long term, I plan to support Python 3.0. Probably not every version, but 3.3 or 3.4 and ongoing.
24:05
So then, thanks to my boss, he approved the publication of this code and my colleague, Tanya, for testing and my wife, Insta, for a lot of patients and many hours in the evening when I did some of this.
24:26
And here are the references. Probably the slides will be available somewhere. So many thanks, very kind attentions. And I think we have another two minutes for questions.
24:45
So if there are questions, please go out. Take this microphone, it's OK. Thanks for the great effort you've done. And the question is, that's probably only my impression, that it kind of doubles the efforts done for a sentry
25:03
project. Because actually, what you could do is to extend a bit their reporting of the exception to include also other threats, if it's needed, actually. Because most of the time, what you need is just a traceback, right, with local variables
25:20
and some source code to understand the problem. But core dump is a bit of overhead most of the time. Basically, for most of the projects, you just need to report into the sentry server of an exception. And you can cover most of bugs.
25:43
Do you agree with that? Yes, and by heap dump, it's also possible. I didn't show it to limit the information that included to just a local thread. And it really depends on your application, what you need.
26:03
Yes, and there are also other solutions, like look at Django or something like that, where you get a very nice. The thing is that the sentry receives any exceptions. It doesn't matter. It can be any application. So it just needs some implementation,
26:22
or it needs open-end. It can be local, so it's relaxed exception, right? Yeah, yes, that's surely possible. But it depends on the infrastructure and the situation you have, yes. OK, we're running out of time. If there's a group photo afterwards,
26:40
please join us outside. Thank you very much, Anselm, for your presentation. Thank you.