Bugzilla – Bug 138568
detaching ltrace -p crashes host process
Last modified: 2006-05-30 22:55:08 UTC
how to reproduce: xclock & ltrace -p `pidof xclock` <ctrl-c> gives [1]+ Trace/breakpoint trap xclock and a disappeared xclock.
Still reproducable?
no, apparently fixed meanwhile.
no, still happens
try xclock -update 1 & ltrace -p $(pidof xclock) it only happens if xclock at least once woke up from event loop (aka one second of attach time).
Philipp is ill, Stefan could reassign to an appropiate developer?
Looks like a race condition, at least on my X20, it does not reproduce always instantly. It can need a few tries to reproduce, but a few for-loops to automatically (re)start xclock, ltrace and kill ltrace do it good enough. I found that ltrace-fix_-p_bug.diff is the culprit. The header of it says: ---- This patch fixes a couple of problems with the '-p' option: break-points where not being inserted after attaching to the process. Now they are. ================================================================================ ---- This patch sets proc->pid in open_program() (which is called from open_pid) just before it calls breakpoints_init() which chauses this routine to not only set but enable all breakpoints immediately at this point., The problem is that the patch does not match the behaviour of ltrace 0.3.36: 1) ltrace 0.3.36 does indeed enable breakpoints when attaching to a program with -p (not enabling breakpoints would mean that -p traces no library calls) It does so in the routine enable_all_breakpoints() after open_program is called, breakpoints work fine and this patch is superflous in 0.3.36 and also in our patched version of it. 2) The real problem however is that this causes that the breakpoints are added twice. While this is not a problem while ltrace runs, it is a problem when ltrace detaches because then it has to restore the original program instructions at the locations of the breakpoint, but unfortunately when it does so, it only restores the code which has put in place by the first round of enabling the breakpoints, hence, the original program code is not restored when ltrace detaches, causing the program to fault at the next breakpoint. That patch is actually part of ltrace-0.4 (in changed form) where it is indeed neccessary, but ltrace 0.4 does not even run without that patch, yet ltrace-0.4 it still does not enable breakpoints properly with attaching to an updating ltrace. The patch was one of many patches added with this changelog: ------------------------------------------------------------------- Tue May 10 16:10:04 CEST 2005 - pth@suse.de - Incorporate RH patch for biarch support - Incorporate all pathches from IBM for biarch ltrace on ppc64 ------------------------------------------------------------------- To me, this patch looks like a hack to make an newer ltrace version working, probably the version with which IBM devopment used for biarch support on ppc64, but certainly this patch is wrong for our ltrace-0.3.36, so I am going to remove this patch. I've started mbuilds to have packages for testing with SLES10 below /mounts/mbuild/vesalius-bk-3/
Ok, the message of the patch certainly only applies to the ppc/ppc64 port, there, a hunk from ltrace-ppc_secure_PLTs.diff which looks up _stat in this example output somehow removes all previous breakpoints which have been added with add_library_symbol: DEBUG: elf.c:291: add_library_symbol(): addr: 0x10018600, symbol: "XCreatePixmap" DEBUG: elf.c:291: add_library_symbol(): addr: 0x10018604, symbol: "printf" DEBUG: elf.c:291: add_library_symbol(): addr: 0x10018608, symbol: "XftFontClose" DEBUG: elf.c:291: add_library_symbol(): addr: 0x1001860c, symbol: "__gmon_start__" DEBUG: elf.c:291: add_library_symbol(): addr: 0x10018610, symbol: "XftDrawPicture" WARNING: Couldn't find symbol "_start" in file "/proc/3857/exe" DEBUG: elf.c:291: add_library_symbol(): addr: 0x100027f0, symbol: "_start" WARNING: using e_entry from elf header (0x100027f0) for address of _start DEBUG: breakpoints.c:96: enable_all_breakpoints(): Enabling breakpoints for pid 3857... enable_breakpoint(3857,0x100027f0) (no more breakpoints added, only this one of _start) So since that is messed-up somehow, I looked what can be done and saw this: - proc = open_program(filename); - proc->pid = pid; + proc = open_program(filename,pid); + proc->breakpoints_enabled = -1; } passing pid causes that breakpoints are enabled by open_program, but after I have seen in ltrace.h things became clear: int breakpoints_enabled; /* -1:not enabled yet, 0:disabled, 1:enabled */ Enabling breakpoints, and then telling that they are not yet enabled yet does not work well, since enable_all_breakpoints then later enabled them again. Simply removing the "-" to set breakpoints_enabled to 1 (enabled now) fixes the crash, but since the whole patch only introduces a not-so-good side effect on the other archtiectures where -p works already, I'll also add a ifarch ppp ppc64 around the patch.
submitted updated ltracce with this incremental change to ltrace-fix_-p_bug.diff - proc = open_program(filename); - proc->pid = pid; + proc = open_program(filename,pid); -+ proc->breakpoints_enabled = -1; ++ proc->breakpoints_enabled = 1; } And the spec file has this diff: +# ltrace-fix_-p_bug.diff has a side effect which causes a regression +# for non-ppc archs. What it addresses works on all other archs +# and ltrace mainline does things better than this patch. This patch +# is only a kludge which only helps on ppc/ppc64, and even there, +# it's not getting it done as it should, and it hurts all others. +# See the header of the patch and bug 138568 for more information: +%ifarch ppc ppc64 %patch10 -p1 +%endif Changlog: - fix crash of every process traced with -p on ltrace exit (138568) To summarize, the crash is fixed on ppc/ppc64 and all other archs, on ppc/ppc64 the messages --- SIGSTOP (Stopped (signal)) --- --- SIGSTOP (Stopped (signal)) --- occur when attaching to a process with -p, but that is the hard-to-avoid side effect of the kludge which is now only active on ppc/ppc64 in order to have -p working at all there.
In ltrace-0.4, the kludge is fixed-up in exactly the same way, I found now.
rpms for verifiying are in: /work/built/mbuild/cube-bk-1