|
Bugzilla – Full Text Bug Listing |
| Summary: | strace: bogus ERESTARTSYS return ... | ||
|---|---|---|---|
| Product: | [openSUSE] SUSE Linux 10.1 | Reporter: | Michael Meeks <mmeeks> |
| Component: | Kernel | Assignee: | Andreas Kleen <ak> |
| Status: | RESOLVED WONTFIX | QA Contact: | E-mail List <nld10-bugs-qa> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | federico |
| Version: | Beta 7 | ||
| Target Milestone: | Beta 8 | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | Other | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
Seemingly the problem only manifests itself under strace; so - perhaps an strace issue ? Yes, ptrace sometimes funny stuff like this. But it has been doing this forever so how is it suddenly critical? so - the problem occurs with default kernel too. Andreas: is it fair to characterize your reply as: "strace never works, we know it's broken & don't care much" ;-) if so, that would tally with my strace experience of it hosing my system / applications in strange ways. To be fair, I've never bothered filing a bug before because it was not hard to pin the blame quickly. In which case - I'll downgrade this to 'Normal' - it's a shame not to have an strace the works reliably though. It usually works - there are just these nasty corner cases If you can give me a simple self contained reproducer it might be able to do something about it. But it's low priority for CODE10. It's somewhat critical right now because we are using an strace-based tool to do login profiling: http://primates.ximian.com/~federico/news-2006-03.html#login-time-2 ... and the program that gets the ERESTARTSYS really doesn't like it. We can have a temporary hack to "goto again" if we get ERESTARTSYS, just to make the program not break when being run under the profiling tool. Andi, do you have any idea of why ptrace does this? I have some guesses, but I won't touch this right now - far too risky. ... Reopening to keep as a reference. Ok, but you have to keep it then. I tried to reproduce this on a current kernel on ia64, but the my little test app doesn't see any ERESTARTSYS. It shows up in the strace output, but for me the read syscall just resumes after the SIGCHLD signal handler returns. It probably only happens in complex situations - rescheduling at the right time etc. Thanks for looking Olaf :-) the complex scenario FWIW is a full gnome-session on login, running under strace having threaded the activation server (custom libbonobo RPMs from http://go-oo.org/~michael/packages/). That lets us interleave the launch of 10 or so applets concurrently: so - quite some system load etc. either way while 100% reproducible in this setup, admittedly a difficult setup to reproduce & no (visible) problems without strace, though that may be a timing issue. Thanks. Federico, our team is not going to fix this realistically. Andreas, I'm going to reassign and mark resolved later instead of wontfix because you seem to agree its a bug. Hopefully thats ok. This is for later. From my side it's a WONTFIX wontfix |
I issue a 'read' syscall on a pipe: static ReadResult read_data (GString *str, gint fd, GError **error) { gssize bytes; gchar buf[4096]; again: bytes = read (fd, buf, 4096); if (bytes == 0) return READ_EOF; else if (bytes > 0) { g_string_append_len (str, buf, bytes); return READ_OK; } else if (bytes < 0 && errno == EINTR) goto again; else if (bytes < 0) { g_set_error (error, G_SPAWN_ERROR, G_SPAWN_ERROR_READ, _("Failed to read data from child process (%s)"), g_strerror (errno)); return READ_FAILED; } else return READ_OK; } I get an error message back from this method: "Failed to read from child pipe (Unknown error 512)" I notice #define ERESTARTSYS 512 in the relevant include file. the strace [ *yes* I appreciate it's supposed to show erestartsys messages ] is: 6778 1142002038.398155 waitpid(6789, <unfinished ...> 6778 1142002038.446495 <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 6789 6778 1142002038.450078 read(43, <unfinished ...> 6778 1142002039.277533 <... read resumed> 0xb4999aa4, 8) = ? ERESTARTSYS (To be restarted) 6778 1142002039.277786 waitpid(6789, <unfinished ...> 6778 1142002039.278148 <... waitpid resumed> NULL, 0) = -1 ECHILD (No child processes) 6778 1142002039.282166 close(43 <unfinished ...> 6778 1142002039.284476 <... close resumed> ) = 0 6778 1142002039.286248 close(45 <unfinished ...> the waitpid/close etc. are the "bad things happened" case from above. I'm reliably informed that ERESTARTSYS should not get to user-space, and the call should be re-started in the kernel: I have: Linux linux 2.6.16-rc5-git9-2-smp #1 SMP Tue Mar 7 14:56:34 UTC 2006 i686 i686 i386 GNU/Linux kernel-smp-2.6.16_rc5_git9-2 Interestingly, this is not an SMP machine - no idea why a yum upgrade would install (what is apparently) an SMP kernel. HTH.