Bugzilla – Bug 128037
100% CPU burn ... (gamin)
Last modified: 2007-10-12 13:39:08 UTC
After doing nothing in particular 'top' shows: 27859 michael 25 0 41896 14m 10m R 21.5 1.5 164:08.18 gnome-panel 27584 michael 25 0 44588 15m 11m R 20.2 1.6 165:08.54 nautilus 27603 michael 25 0 20320 6792 6044 R 19.6 0.7 163:33.39 gnome-settings- gnome-settings-daemon: poll([{fd=32, events=POLLIN}, {fd=17, events=POLLIN}, {fd=3, events=POLLIN}, {fd=26, events=POLLIN|POLLPRI}, {fd=28, events=POLLIN|POLLPRI}, {fd=29, events=POLLIN|POLLPRI}, {fd=30, events=POLLIN|POLLPRI}, {fd=37, events=POLLIN|POLLPRI}, {fd=36, events=POLLIN|POLLPRI}, {fd=35, events=POLLIN|POLLPRI}, {fd=31, events=POLLIN, revents=POLLNVAL}], 11, -1) = 1 ioctl(3, FIONREAD, [0]) = 0 poll([{fd=32, events=POLLIN}, {fd=17, events=POLLIN}, {fd=3, events=POLLIN}, {fd=26, events=POLLIN|POLLPRI}, {fd=28, events=POLLIN|POLLPRI}, {fd=29, events=POLLIN|POLLPRI}, {fd=30, events=POLLIN|POLLPRI}, {fd=37, events=POLLIN|POLLPRI}, {fd=36, events=POLLIN|POLLPRI}, {fd=35, events=POLLIN|POLLPRI}, {fd=31, events=POLLIN, revents=POLLNVAL}], 11, -1) = 1 ioctl(3, FIONREAD, [0]) = 0 poll([{fd=32, events=POLLIN}, {fd=17, events=POLLIN}, {fd=3, events=POLLIN}, {fd=26, events=POLLIN|POLLPRI}, {fd=28, events=POLLIN|POLLPRI}, {fd=29, events=POLLIN|POLLPRI}, {fd=30, events=POLLIN|POLLPRI}, {fd=37, events=POLLIN|POLLPRI}, {fd=36, events=POLLIN|POLLPRI}, {fd=35, events=POLLIN|POLLPRI}, {fd=31, events=POLLIN, revents=POLLNVAL}], 11, -1) = 1 ioctl(3, FIONREAD, [0]) = 0 poll([{fd=32, events=POLLIN}, {fd=17, events=POLLIN}, {fd=3, events=POLLIN}, {fd=26, events=POLLIN|POLLPRI}, {fd=28, events=POLLIN|POLLPRI}, {fd=29, events=POLLIN|POLLPRI}, {fd=30, events=POLLIN|POLLPRI}, {fd=37, events=POLLIN|POLLPRI}, {fd=36, events=POLLIN|POLLPRI}, {fd=35, events=POLLIN|POLLPRI}, {fd=31, events=POLLIN, revents=POLLNVAL}], 11, -1) = 1 ioctl(3, FIONREAD, [0]) = 0 poll([{fd=32, events=POLLIN}, {fd=17, events=POLLIN}, {fd=3, events=POLLIN}, {fd=26, events=POLLIN|POLLPRI}, {fd=28, events=POLLIN|POLLPRI}, {fd=29, events=POLLIN|POLLPRI}, {fd=30, events=POLLIN|POLLPRI}, {fd=37, events=POLLIN|POLLPRI}, {fd=36, events=POLLIN|POLLPRI}, {fd=35, events=POLLIN|POLLPRI}, {fd=31, events=POLLIN, revents=POLLNVAL}], 11, -1) = 1 ioctl(3, FIONREAD, [0]) = 0 Looks like a runaway g_idle or something. Suspending gnome-settings-daemon - the other 2 keep belting away; likewise for nautilus; likewise the panel. ie. there is some independent stimulus that causes this bad behavior.
Amazingly - this stoppped (apparently) for no reason; and then it was mono's turn to burn 100% of the CPU doing (essentially nothing): [pid 775] time( <unfinished ...> [pid 7390] gettimeofday( <unfinished ...> [pid 775] <... time resumed> NULL) = 1129194102 [pid 7390] <... gettimeofday resumed> {1129194102, 797411}, NULL) = 0 [pid 775] time( <unfinished ...> [pid 7390] gettimeofday( <unfinished ...> [pid 775] <... time resumed> NULL) = 1129194102 [pid 7390] <... gettimeofday resumed> {1129194102, 799790}, NULL) = 0 [pid 775] time( <unfinished ...> [pid 7390] gettimeofday( <unfinished ...> [pid 775] <... time resumed> NULL) = 1129194102 [pid 7390] <... gettimeofday resumed> {1129194102, 799880}, NULL) = 0 [pid 775] time( <unfinished ...> [pid 7390] gettimeofday( <unfinished ...> [pid 775] <... time resumed> NULL) = 1129194102 [pid 7390] <... gettimeofday resumed> {1129194102, 803702}, NULL) = 0 [pid 775] time( <unfinished ...> [pid 7390] gettimeofday( <unfinished ...> [pid 775] <... time resumed> NULL) = 1129194102 [pid 7390] <... gettimeofday resumed> {1129194102, 811756}, NULL) = 0 [pid 775] time( <unfinished ...> [pid 7390] gettimeofday( <unfinished ...> [pid 775] <... time resumed> NULL) = 1129194102 [pid 7390] <... gettimeofday resumed> {1129194102, 819719}, NULL) = 0 [pid 775] time( <unfinished ...> [pid 7390] gettimeofday( <unfinished ...> [pid 775] <... time resumed> NULL) = 1129194102 [pid 7390] <... gettimeofday resumed> {1129194102, 831794}, NULL) = 0 [pid 775] time( <unfinished ...> [pid 7390] gettimeofday( <unfinished ...> [pid 775] <... time resumed> NULL) = 1129194102 [pid 7390] <... gettimeofday resumed> {1129194102, 839817}, NULL) = 0 [pid 775] time( <unfinished ...> [pid 7390] gettimeofday( <unfinished ...> [pid 775] <... time resumed> NULL) = 1129194102 [pid 7390] <... gettimeofday resumed> {1129194102, 847726}, NULL) = 0 [pid 775] time( <unfinished ...> [pid 7390] gettimeofday( <unfinished ...> [pid 775] <... time resumed> NULL) = 1129194102
Amazingly - the in-process panel applets also become unresponsive - clicking on it etc. does nothing at all - seems to be dead in a poll() syscall - but presumably not the mainloop syscall; possibly some outgoing CORBA call not responding - unclear. Unfortunately, after enough gdb / strace calls & killall -9 strace / gdbs - the kernel became acutely confused; ps auwx showing gnome-panel in the 'T'race state without any strace/gdb processes attached: very odd/broken - re-booted.
Interestingly; evolution is dumping loads of debug: end from FAM server connection invalid length 10794 invalid length 10794 invalid length 10794 invalid length 10794 end from FAM server connection invalid length 10794 invalid length 10794 invalid length 10794 invalid length 10794 end from FAM server connection but of course could be entirely un-related.
Not sure, but maybe your FAM backend was dead. FAM is File Alteration Monitor.FAM is File Alteration Monitor. - Do you have running fam daemon (if yes, fam is used as FAM backend)? - Do you have installed gamin package (if yes and fam daemon is not running, gamin is used as FAM backend). - Did you already update gnome-vfs2 and libgda from YOU? It fixes fam/gamin wrapper.
I don't have a running fam daemon; I have a gam_server running - and there are only 2 YOU updates I havn't installed both marked optional: the NVIDIA graphics driver and the 'File Alternation Monitoring Daemon' which (from the description) sounds dangerous & unnecessary :-)
It means, that your GNOME is using gamin for FAM service. And "end from FAM server connection" means, that your gam_server was probably dead. And maybe it caused 100% load (waiting for response?). You don't need fam YOU, it only adds missing package for people not running KDE or GNOME. The needed YOU is gnome-vfs2 and libgda, which fixes the wrapper.
ah - so, I killed that daemon to see if it was causing the panel lockup & it had no effect; so I suspect those messages are just related to that & are a red-herring :-)
(In reply to comment #7) > ah - so, I killed that daemon to see if it was causing the panel lockup & it > had > no effect; so I suspect those messages are just related to that & are a > red-herring :-) > I have seen similar behaviour using SuSE 10 x86_64. However, I also noticed from the message logs the following: Oct 20 16:00:54 anat0098 kernel: gam_server[18065]: segfault at 0000000000500000 rip 00002aaaaadbfec7 rsp 00007ffffff0e4a0 error 4 Oct 20 16:00:56 anat0098 kernel: gam_server[18081]: segfault at 0000000000500000 rip 00002aaaaadbfec7 rsp 00007fffffe849e0 error 4 Oct 20 16:00:57 anat0098 kernel: gam_server[18087]: segfault at 0000000000500000 rip 00002aaaaadbfec7 rsp 00007fffff857490 error 4 Oct 20 16:00:58 anat0098 gam_server: *** glibc detected *** double free or corruption (fasttop): 0x000000000051eba0 *** Now gam_server is provided by gamin, and there would appear to be a bloody great big BUG in it - so why is this bug assigned priority P5 - none??????? I have applied all the latest patches (gnome-vfs, etc) and still have the problem. So what's the solution???? Cheers, jon
Are you able to reproduce this bug? It never happened to me. You can try to build it with debug information before or install version from OpenSuSE altogether with debuginfo subpackage. Then run your application, get PID of your gam_server, attach to it in gdb, wait for crash and provide bt. There is easy work-around of this bug - run fam service on boot, but we should debug gamin, too.
Jonathan, have you pulled down the SL 10.0 updates? we fixed a double free in an update of gamin.
Ah, I see above you did pull them down. Any better success if you install fam instead of gamin? gamin appears to be dieing up stream in favour of direct inotify support. We could either backport or go back to using fam (although dnotify apparently is not that great).
Hi, Sorry for not replying sooner - I got fed up with this problem and eventually switched over to the dark side (KDE). Shame since I love the clean lines of Gnome. If you can provide me with some simple instructions on how to switch to using fam instead of gam, I would be more than happy to try Gnome again. Thanks for taking a look into this bug(?). I take it that it is a bug in gamin? Cheers, Jon.
How to swith to the fam: - Install fam+fam_daemon. - Run fam by default in runlevel editor. - You can uninstall gamin just now. gnome-vfs2 and libgda has a fam/hamin wrapper, which prefers fam if fam daemon is running.
(reassign to my ximian address to work around bugzilla.novell.com stupidity)
This is being fixed by killing off gamin (which is no longer maintained upstream) in favor of fam and inotify.
Marcus, we're not using gamin any more for GNOME so I'm re-assigning to you in case you want to maintain it and moving it to you in pdb, otherwise I'd consider the bug closed and the package dead.
could we stop this rumor that gamin is no longer maintained upstream? upstream doesnt know anything about that. he just doesnt add new features. but it is still maintained in regard of bugfixing.
even if this is no longer used in 10.1 you maybe should fix this bug. but that is your decision as it is your package.
Well, upstream GNOME de-facto dropped it because inotify support went directly into gnome-vfs so there isn't that much of a difference. We could ship an update that used only dnotify i suppose. Robert, Dan?
Oh, we could just release a patch script that installs fam instead too.
I hadn't realized this was a 10.0 bug... If we want to fix this for 10.0 users, I'd say we just drop it in favor of fam, as none of us has any clue about the internals of gamin, and it seems silly to spend a lot of time trying to figure it out since we're not using it going forward.
I agree with Dan.
We cannot switch in a released product from gamin to fam. So, what can be done for 10.0? Fix inotify? Or remove it again?
Why we cannot? No application uses gamin directly and the wrapper will automatically switch to fam. There are problems to solve: - patch script should ensure, that fam, fam-server are installed (I guess, that some magic in YOU can do it). - patch script has to start fam by default
We cannot do this with patches on 10.0. And we really should fix bugs instead of inventing hacks around them.
aj: The problem is not with inotify, it is with gamin.
Isn't this the inotify support *in* gamin?
I don't think we know for sure whether the bug is in the inotify module in gamin or the core gamin code. But I was responding to "Fix inotify? Or remove it again?" -- the bug is not inotify.
I see - I should have written: "Fix inotify support in gamin or remove inotify support from gamin?"
Very few complaints since, 10.1 is out and almost 10.2. I don't think its worth fixing at this point.
libgda FAM->gnome-vfs migration patch upstreamed as http://bugzilla.gnome.org/show_bug.cgi?id=486021