Apport - Automatic Application Crash Reporting for openSUSE
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Alternative Title |
| |
Title of Series | ||
Number of Parts | 70 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/39531 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 20098 / 70
3
4
6
16
17
18
19
20
21
22
23
24
25
26
27
29
30
31
32
33
35
40
41
44
46
47
48
50
51
54
55
57
58
59
60
62
65
67
68
69
70
00:00
Crash (computing)Traffic reportingSoftware maintenanceEnterprise architectureInformation securityProduct (business)Open setMultiplication signAdditionMessage passingTraffic reportingKernel (computing)Cartesian coordinate systemProjective planePhysical systemProcess (computing)CuboidPublic key certificateInformationSoftware developerSpacetimeWindowNeuroinformatikService (economics)Crash (computing)MiniDiscDirectory serviceMappingPhysicsMatching (graph theory)QuicksortException handlingCore dumpCodeSystem callGoodness of fitComputer fileHeat transferMultiplicationMoment (mathematics)Integrated development environmentServer (computing)Different (Kate Ryan album)MereologyOpen sourceStudent's t-testForm (programming)Computer programmingVector potentialMultilaterationSoftware bugContent (media)Validity (statistics)DatabaseError messageOperating systemCodeGoogolLoginSinc functionComputer animationLecture/Conference
08:42
Crash (computing)Computer fileElectronic mailing listTraffic reportingMoment (mathematics)Structural loadMultiplication signIntegrated development environmentCompass (drafting)MappingMoment <Mathematik>InformationObject (grammar)Core dumpContent (media)Function (mathematics)DataflowAddress spaceFrame problemData structureTracing (software)Thread (computing)BuildingStack (abstract data type)Intrusion detection systemSource code
10:32
Traffic reportingAssociative propertyWebsiteDatabaseService (economics)Core dumpProduct (business)WordFreewareServer (computing)Crash (computing)Enterprise architectureMiniDiscRevision controlMoment (mathematics)Factory (trading post)Computer animationLecture/Conference
12:46
Crash (computing)Linear regressionInternetworkingProcess (computing)Software maintenanceTraffic reportingDatabaseFeedbackServer (computing)Moment (mathematics)VarianceWeb pageCore dumpCartesian coordinate systemCodeSemiconductor memorySoftware bugComputer filePressureSineResidual (numerical analysis)PolygonLecture/Conference
15:30
Line (geometry)Cartesian coordinate systemBinary codeException handlingDistribution (mathematics)Traffic reportingSensitivity analysisAddress spaceInformationSymbol tableDirectory serviceProgramming languageCore dumpTable (information)TunisDatabaseVideo gameString (computer science)Tracing (software)BitSystem callMoment (mathematics)Field (computer science)Server (computing)Client (computing)Ocean currentBuildingSpacetimeIntrusion detection systemMobile appLecture/Conference
18:51
Server (computing)HoaxKey (cryptography)CompilerModule (mathematics)Cartesian coordinate systemLatent heatCodeFlow separationDirectory serviceBitInformationCASE <Informatik>Semiconductor memoryWordDemonTraffic reportingComputer fileSet (mathematics)Process (computing)Patch (Unix)Linker (computing)Variable (mathematics)Service (economics)BackupComplex (psychology)Film editingMultiplication signMoment (mathematics)Insertion lossLibrary (computing)Theory of relativityDatabaseMaizeNeuroinformatikFunctional (mathematics)Function (mathematics)DivisorInformation securityImplementationPhysical systemSoftware developerCellular automatonINTEGRALGoodness of fitPattern languageTable (information)Message passingWavePoint (geometry)Kernel (computing)Projective planeQuicksortMobile appWritingConnected spacePlastikkarteIntegrated development environmentHookingConfiguration spacePlanningMilitary baseOperator (mathematics)Normal (geometry)Form (programming)Asynchronous Transfer ModeLimit (category theory)System callWindowMereologyCore dumpAddress spaceHacker (term)Right angleLine (geometry)Crash (computing)Game controllerOvalElectric generatorLevel (video gaming)CuboidTransport Layer SecurityRoboticsCollaborationismCompass (drafting)Lipschitz-StetigkeitOpen sourceClosed setExtension (kinesiology)Power (physics)Link (knot theory)AreaElectronic mailing listMiniDiscSystem administratorSoftwareSoftware bugRevision controlVarianceParallel portGoogolMathematical analysisWikiFile formatFluid staticsKeyboard shortcutProper mapBranch (computer science)Scripting languageBuildingComputer configurationGastropod shellComputer hardwareDrop (liquid)Lecture/Conference
Transcript: English(auto-generated)
00:04
My name is Jan Blunk, I'm currently working as a technology architect for L3E maintenance security department in Novell, so I'm responsible for the open SUSE and the enterprise product.
00:24
And I will talk about application crash reporting. So actually what is this all about? Well this is all about bugs and finding bugs and that is actually a copy of the logbook.
00:44
It's the story that the first bug was found and it was actually a moth inside of a relay or something from one of the very old computers. So during that time people wrote down logbooks for their computer and what they were doing
01:07
and what problems they were finding. So nowadays at least I don't know anybody who is writing a logbook for his own computer and most people don't even read their log messages stuff or at least not usually.
01:28
And most of the application crashes are not logged to somewhere in the system. So most of the time we don't even write out core files.
01:43
So the user usually don't realise that the application has a problem and is segfaulting or whatever. So what does the other have? The others like Microsoft, they have Windows error reporting service.
02:05
This is, I think it's existing since Windows XP, it was the first release that they included that. It's also available for ISVs. So you can register in the winqual project and the only thing that you need is you
02:33
need to have a valid Verizon certificate but on the other side, so the program itself
02:43
is actually free even for on Windows for ISVs. And the Windows error reporting is collecting certain information about the application
03:03
itself as far as I know it saves the core file and it's uploaded to a server. So Mac has it as well. They have also a problem reporter as far as, I could not really find out if this
03:27
is also available for other companies or for ISVs on Mac, I'm not sure. Even the iPhone has it. So if you put your iPhone and connect it with iTunes automatically, the crash reports
03:48
are downloaded from the iPhone and sent to Apple to analyse the bugs. And yeah, they do it also system wide, so you can also see the kernel problems in the crash reports there.
04:05
And even Sun has it, so actually I didn't find a really good picture of Solaris brand, so I choose the Solarium. And they are cheating because they use DTrace for it.
04:24
We could use SystemTap as well for generating these reports, but actually we don't do. So what does Linux have? Thanks to Ubuntu we have AppPort, it's an application crash reporting system which
04:46
is actually kernel based, or kernel supported while running in user space. And AppPort was ported to Fedora, so it's also available on Fedora. And thanks to Google, they have this Google Summer of Code project and a student was interested
05:08
in porting AppPort to OpenSUSE as well. So now we have application crash reporting since 11.1 as well. So what is AppPort doing?
05:23
AppPort is basically a collection of Python code which is called automatically by the kernel during the application crash. So instead of writing out the core file to disk, it's calling this application and
05:43
the application is then collecting information about the application crash. It gathers potential user information about the process environment and the operating system. So since it's, I will go into more detail later, since it's a two-step process, the
06:05
first step is very similar to just get the dump, get proc maps, get all your information that you cannot gather later. But all the information about the process environment, get it now and write it out
06:23
to disk. And when the user is notified, then you can collect additional information. So, yeah, it runs on multiple steps.
06:43
It notifies the user by a small applet, there is an applet available for Qt 4 and there is an applet available for GTK, or written in GTK, both written in Python
07:03
as well. So actually what the applets are doing, it's very simple, they are just watching or putting a notify on the directory where the crash reports are stored from the application. And then you get a pop-up and then you can do the additional stuff.
07:23
And optionally, the applets support you to send the crash report to the developer. So at the moment, we only have one central server, but I will go into more detail later.
07:43
It is possible to support multiple servers, so to have one open-user server and one Firefox server and one server for GTK or something, and to upload the reports into different databases.
08:01
So how does it look like? This is the GTK applet, because I'm one of the very few persons actually inside who is using GNOME, and most of the people use KDE, but yeah, I use that one. So here you can see it's the application's only purpose is to segfault.
08:24
So then you get this pop-up, this notification, you can press report the problem and then you can send it or you can have a look in the contents of the report.
08:46
So the report itself looks like this. It's just a plain text file.
09:05
Here you can see it's just the output of proc maps, so this was a compass crash. And the reports are structured like that, so it's a key value thing. List is command line, distro release, so this is all information that you can gather afterwards.
09:24
The proc environment, so which path was active at that moment. And this is the core file. And here's even more information.
09:41
Build IDs and load addresses of the shared objects that were loaded during that time. Package dependencies, which files were modified in the package dependencies, so that
10:00
just by looking at that list, if something was modified, so yeah, very long list. And then it dumps out information from GDB, the stack trace, stack trace top, which
10:21
is actually the five topmost frames, and the thread is stack trace, so yeah. So what happens with the reports?
10:43
As I said, we have a crash database server. At the moment, we have only support for OpenSUSE, so no enterprise products or such things, only for OpenSUSE and only starting with 11.1 due to different other reasons.
11:03
You can find the crash database server at crashdb opensus.org. Yeah, as I said, application-specific servers are also possible. I have the local version of the database. I have it here.
11:21
This is how it looks like, so I tried to integrate it into the common OpenSUSE website look and feel. So you can search for specific reports, all reports from 11.1.
11:41
That's actually because I don't use factory yet, otherwise you would see a few things here. And then you can see that was the report that we looked at just now on disk here. And this is how it looks like when it's uploaded.
12:02
It gets a UUID. You can see when the crash happened and when it was uploaded. And here you can see that the core dump itself is removed. So, oh no, here it is. Oh, so that's a bug.
12:20
But usually it should be removed, so I think the others, yeah, the others don't have a core dump. Yeah, this is how it looks like. Free text search is also available, so you can search for specific words in the application crashes.
12:49
So, what is missing or what I'm working on at the moment is the further processing of crash reports. So, it would be ideal if one person sends a crash report that you can automatically detect duplicates of the crash report and connect them together.
13:10
So, actually I want to gather all the reports, so I don't want to prevent the uploading, but on the database it's easier if you look at the reports if you have the duplicates as well.
13:24
So, on the roadmap there's also searching for available fixes or workarounds. So, to add a feedback channel to say, oh yeah, your problem is fixed with maintenance updates, blah, blah, blah, something like that.
13:43
And searching for regressions. So, this would also be interesting to have regression detection. So, if the backtrace actually is found and the database server thinks that the bug is already fixed in a specific version,
14:04
and then the bug shows up in a later maintenance update again, so that you can automatically, yeah. What happens if my application crashes?
14:21
I click on send report, but I do not have internet access at the moment. Is it stored or is it lost? Yeah, it's stored, it's saved under a VAR crash and then you can send it later. At the moment there's not, it's not, you have to run it from the command line then and give it the full path.
14:42
So, upward command line interface minus C and then the report file, yeah. Would it be possible also not to send the code but only the backtraces because some people may not want to have the memory of the process sent? Yeah, so, I don't know why this happened here,
15:03
probably because I uploaded the crash report with call, but the normal applet is removing the core dump and it's doing, I have a page on that as well. Oh, and this is also analyzing the reports.
15:36
So, it's not only the core file that might contain sensitive information,
15:43
but it's also usernames and such things. So, the account name and the geekos fields are replaced by user name, just the string user name. So, the current working directory is also removed from slash proc
16:04
because this is information which is not, yeah, most of the people don't want to send, but nevertheless the user always have the possibility to review the report before it's getting sent and he should really do that
16:21
because I think in certain situations you just don't want to automatically send the report. Hi, so you generate the backtraces on the client side, do you have full debug info available when doing that? No, but this is one reason why we only have it since 11.1
16:43
because I enabled build IDs for openSUSE as well and that was not enough for generating backtraces, you need to have unwind info. So, we built everything with asynchronous unwind tables
17:02
and we don't strip the unwind tables. So, actually for C plus or all applications that supported exceptions or where the programming language supported exceptions, it was necessary to build that as well.
17:21
For all applications that don't support exceptions, the asynchronous unwind tables are usually quite small, so it's not bloating up your application binary. And so it was, I think it was roughly like 5% or so,
17:42
which the distribution was getting bigger, but that was only a problem for the live CDs and not for the DVDs, so there was enough space available. And what we do is you could retrace a report as well
18:02
without the core. So, actually I'm parsing the report and I can extract the addresses from stack trace and I can look up the correct symbol information from the debug info files.
18:21
This application exists. It's all written in Python as well, as the app itself. And the problem is to get the debug info data about the build IDs out into a database and so that is missing actually at the moment
18:43
and the uploading of the retraced backtrace to the server which is also not implemented yet. This is due to the reason that we created our own server.
19:02
I have a second question if you will. You've created a separate crashdb.opensees.org rather than using the bug tracker, so how are you going to end up with a parallel bug tracker where people have to check two places? The thing is due to internal political complexity
19:27
the idea of extending bugzilla in that way that it supports uploading of crash reports was immediately dwarfed because it is close to impossible
19:41
to get certain extensions. It seems like you could use crashdb as the upload target but have that file bugs for you. Would that be possible? Yeah, but still that would be something like communicating or scripting bugzilla. It would be possible and it is also planned
20:02
to add support for linking back to the crashdb in the bugzilla because that would be much easier that you have your reports in bugzilla saying there is a crash report for this bug existing in the crash database. But that way I think it was much easier
20:21
to come up with a basic implementation than to start working with the bugzilla stuff. Yeah, but it is actually planned and also what else is planned? Notifications. We have a notification service for the build service it is called Hermos
20:40
and to connect this uploading or the crash database with a notification service so that the responsible developer gets a notification about the application crash. That will be to annoy them.
21:01
So how does it work? Technically speaking it is a kernel patch which is existing in the upstream kernel since I don't know when. It is a core pattern or it is introducing a sys control
21:20
called core pattern where you can write in it was intended to write in a special format of the core file name. But Andy Clean extended it to have support for piping into an application. So what is actually done instead of writing the core to disk
21:41
the kernel pipes the core dump into the app port application itself and gives them process ID and core size limit and such things. So what app port is then doing is writing
22:01
the crash report out to disk. Here you can see it is usually under var crash. This is how the file name is set up. It is the name of the binary the user ID of the application that was running
22:21
under which the user ID and there it is picked up by the notification applets Here it is supported by the GNOME settings daemon so it is attached just to the file to monitor a directory
22:41
and for KDE it is a module for KDE daemon You use a unique ID for the files because in the use case I am thinking
23:00
when you go with your computer for maybe one month without access to a network I am sure you will end up squashing crash reports Yeah, the interesting thing is there is a core drop which is removing crash reports that are older
23:20
than one week At the moment it is expected that you sent the report in this time and after that you can always download the war report how it was sent upstream from the database server but this file
23:40
name format is also used to detect duplicates so when your compass or your pigeon is constantly crashing all the time you don't want to fill the disk by 100 duplicated reports that are all named in a different way so this was I think the simplest way to do that
24:01
but that is how it is done in Ubuntu and in Fedora as well I think we stick to the naming in OpenSUSE as well but yeah, it would be an idea to extend that or at least to make it possible to configure that
24:21
but at the moment it is not You could use a unique name for the file and then in the first line have that same information Yeah, something like that So, a question Oh, Dr. Conkey
24:46
Yeah No It's quite easy the desktop specific application
25:01
and crash handlers are hooking into the segfault handler so actually that application are never segfaulting anymore therefore airport is not getting called it's very simple but you can disable the desktop specific crash handler
25:22
and then use airport so I extended, but I think I have a thought about that later I extended the the apports or this step to be more flexible for OpenSUSE
25:42
because I know that the KDE project is very proud of Dr. Conkey so I made it possible that you can call Dr. Conkey from your crashing application handler as well, so
26:01
that would enable us to disable the stuff in the KDE applications, but still have the intention or the user experience would stay the same because Dr. Conkey is called as well so it's very, very flexible
26:20
and it integrates quite good into GNOME and Google Breakpad and whatever so, and yeah and it's very good because at the moment the OpenSUSE the GNOME Breakpad implementation is broken for OpenSUSE so it's segfaulting itself, but I can
26:42
get the reports of the segfaulting crash handler of GNOME I can capture them and report them as well so at the moment it's good to have it yeah technologies are used as I said it's the
27:01
core pattern feature which is upstream since 2624 the piping stuff the linker features with the build IDs, it's basically a toolchain feature which is new in OpenSUSE 11.1 the compiler features about the asynchronous unwind tables
27:21
to be able to produce or to correctly produce a backtrace without having debug info or without having full debug info and system management features you need libzip bindings for Python which are
27:41
only available on 11.1 because the libzip bindings for 11.0 are broken and nobody wants to fix them so, what's in there for developers? as I said it's very flexible so you can add your own
28:01
hooks to it that are called during the applet, so when you click the report problem button, it's starting all these hooks and collecting gathering data so you can easily hook in there just by adding another file into it, so it's searching
28:21
for or it's actually executing all the Python files which are there then you can also do that per package package specific hooks and all these hooks need to implement an add info function
28:41
which is then adding certain information to the report it's also possible to delete certain information from the report so all the hooks are very powerful there you can execute arbitrary Python code
29:01
so this is how it looks this is an example hook this is an Ubuntu example, you just define add info and then you can just make a new
29:20
key in your report and pipe the information so this is adding a lot of a lot of files here's another example this is quite
29:40
interesting, because by this you can disable the report generation or the sending of the report upstream if you detect certain things this is looking for specific
30:00
things in the backtrace and if it shows up in the backtrace it says the crash report is likely that it's invalid so it's an unreportable reason to send a bug report and you are not able to send them upstream or to the server
30:24
so then there's something which is unique to openSUSE, it's the developer mode you can enable it by just putting in the config file developer mode and just enabling it in the config file
30:42
this is generating backtraces or crash reports also for unpackaged applications that means applications that are not officially signed by by openSUSE build key and also
31:00
applications like my segfault application which is not coming with the package and so the package the packaging system doesn't know of it here you can see also this unreportable reason tag, this is not a genuine SUSE package and then you cannot send
31:22
the report upstream but you can save it so and then you can look at it or do whatever you want for it so this is another thing which is unique
31:40
to openSUSE appport version it's on appcrush invoke environment variable so if the application itself has the environment variable set while it's crashing appport will call
32:01
the crash handler which is defined here so here you can see it was this application crashed and it had set a path or a file name here it was invoke.sh it's just a shell script so you can run arbitrary code there
32:21
you can also see the security feature so my username is of course not username but it was replaced and so this thing was then run and here you can see it's just the output of the thing that you're running is just attached to the
32:41
application report so by that means it's very easy to add other crash reporting handlers also on that case so when I say it's unique to the openSUSE version
33:01
that's just because I wasn't able to push this upstream yet I'm working quite the collaboration with Ubuntu with Martin Pitt is quite good so it's he's taking patches from us as well but of course he wants
33:20
me to prepare the patch and not tell him where the branch is and let him do the merging work but this will be upstream I hope in a few weeks or so so a list of missing features it's quite long
33:43
so the proper integration with bugbuddy, drconkey and kerneloops that is also the problem is since I'm only using GNOME that would be the easier part but somebody should do the KDE
34:02
stuff I hope that I can find somebody in house probably or somebody in the community and then a rewrite of the core application so which is located under user sha-apart so this is the application that is run by the kernel
34:22
I really wanted to be to have minimal requirements and I think that the python requirement at that point is not very optimal so I want to rewrite at least that part in C probably or in C++ I don't know
34:42
and then something disabled apod.per.process I have a question about when you mean rewriting, you mean rewriting as an execute table again? or as a library? No, as a standalone application but that the application which is called from the kernel
35:01
so the initial gathering of the initial report gathering which is very it's it's very simple work, it's only doing writing out the core dump and doing all this proc file that's exactly the opposite of what windows does in a windows wall when you
35:21
crash the application what you are doing is you linked to a static library because when your application is crashing it might be very well because your system is unstable due to hardware or memory and the last thing you want to do is to start something new you want to run something that you already have in memory
35:42
no no the interesting thing is this is in a separate address space so it's it's not starting it's not like the segfault handling or segfault handler implementation you start a port which is a new application so if my problem was I was running out of memory
36:01
and that was the reason I was crashing, you are making things worse, not better yeah, but the python you know, you have to die one death no, I mean when you rewrite that in C it will make sense for it to be a library, a static library
36:21
yeah, probably so but I don't want to link it or have it loaded multiple times or if you have it loaded as a library and you don't want to run another application during crash, then
36:40
I have to be sure that the code of the library the text is not modified during the application crash so I really want to run in a separate address space so it makes sense to start another application in that
37:01
because this is the the worst problem that you have with Dr. Kornke and BugBuddy is if the application goes crazy it's not able to start the segfault handler anymore or even the segfault handler is crashing and dying then so and to get this right is
37:21
very very complex so what you can do is you map the part of the code that you need to run during the crash, you lock it and you map it read only no not read only so that it's not touchable by the
37:41
application itself, but that that would be a lot of hacking for a really small benefit I think that's exactly what the windows world does that's exactly what the windows world has been doing for 20 years
38:02
yeah ok so yeah there are some other things that are missing at least also on the crash report database as I told connections novella automatic report analysis so I'm currently
38:21
working on the report analysis and the report retracer yeah so that's it there are some people involved, Nikolaj who did the initial report Martin for merging our patches and
38:42
Andreas Bauer and Marcus Ruckert for helping me with Rails and yeah here's the wiki pages appport and appport for developers so if you're interested yeah
39:00
that's it, so questions? anybody? about the developer mode, you said if a package is some piece of software is not package by OpenSUSE, signed by SUSE
39:21
it will be not reported unless you enable developer mode what if I develop some software say commercial software is not signed by SUSE but I want it to have reporting features so this is the plan is with the
39:41
with the connection to the OpenSUSE build service that if the OpenSUSE build service has a reporter set a bug ID set that our crash db server also accepts that report
40:00
not an option software not built in the OpenSUSE build service you can always add your add your own crash database server so it is possible to have multiple servers yes, but in developer mode it doesn't send
40:21
it only saves yeah, but you can in the hook you can overwrite all this stuff in a package hook so that's possible you have to finish now sorry, finish your discussion
40:40
good ok, thanks