Bugzilla – Bug 131322
inotify support broken
Last modified: 2006-01-10 16:38:30 UTC
it seems the kernel randomly sends IN_DELETE_SELF for files which are definitely never ever deleted. it seems to somehow have something to do with NFS mounts, however not with automount and not if its your own home directory on NFS or not. the attached test application runs into the assert in the IN_DELETE_SELF branch after a while of randomly causing I/O (find &; ldconfig are good triggers, also starting yast2 package manager sometimes helps), and it reports an IN_DELETE_SELF on /etc/exports then. which is definitely never ever touched or even deleted. also, note the false IN_IGNORED for a inotify wd that was already successfully removed. something in the kernel state machine seems broken. we can trigger it on two machines (g76.suse.de and portia.suse.de) but not on another (oldboy.suse.de). even though g76 and oldboy run exactly (100%) the same kernel.
Created attachment 55935 [details] test app
Created attachment 56009 [details] better test application
Robert, could you please take a look at this one?
Chris: Yup. Dirk: Can you retry with a 2.6.14 kernel (now in kotd)? A related inotify fix went in.
2.6.14-rc5-2-default can't trigger it. but maybe it has something to do with SMP. does rc5 already contain your inotify fix?
I believe it went in before -rc5 but am not 100% sure. Did you trigger it with -rc4?
Okay ... I am not able to reproduce the incorrect event on /etc/exports. Am I correct in assuming that /etc/exports is local, but that you just happen to have (unrelated) NFS mounts? My first thought is that the event is correct. Why is it impossible that the file is being touched? Recall that inotify works on the inode level, so the pathname can exist and if the _inode_ is removed, you will get IN_DELETE_SELF. For example: "mv dog /etc/exports" will trigger the IN_DELETE_SELF. As for the IN_IGNORED -- you are supposed to receive IN_IGNORED when a watch is removed, even if you did so yourself. But you do not need to remove the watch, because IN_DELETE_SELF => watch removal. Maybe that is what is confusing, that the delete event itself is triggering IN_IGNORED? Going back to the original issue ... it is very hard to picture inotify sending incorrect events because it is essentially hooks in the VFS. Such incorrect events would more likely imply VFS bugs than inotify bugs -- and that is not likely.
well, I don't know which daemon would delete /etc/exports and recreate it with exactly the same stat date (including device and inode number). is there a way to debug who deleted a file? it seems stunning to me that random files on my root partition are deleted and instantly recreated (without inotify ever reporting the create event) with the same dates, same inode number and same path.. but I can try finding out.
which particular commit are you referring to? http://www.kernel.org/git/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=blobdiff;h=9fbaebfdf40bef99428d0feaa50d9b69cba234d0;hp=a37e9fb1da589fa1457929e2079e176220ad34c0;hb=8d3b35914aa54232b27e6a2b57d84092aadc5e86;f=fs/inotify.c this one?
Unfortunately, there is no way to debug who deleted the file via a standard interface. If willing to recompile your kernel, you could add a printk() in the right spot, but that is a pain. If the inode number stays the same, that sounds as if the file is not truly being deleted. Hrm. I would really like to see reproduction (or lack thereof) on 2.6.14 final. Just to be sure. I am still running the test app, with no activity.
do you run the smp kernel? i can trigger it on 2.6.14rc2-smp, which is the most current suse kernel version i can find right now on dist.suse.de
I do run SMP kernels, but I was running your test on UP. I will try the test tomorrow on SMP. If it turns out to be an SMP race, it might be hard to track down, but at least it is a tangible problem that I can fix. ;-) Also - kotd has 2.6.14.
stephan has to comment if this is gone with the newer kernel, I isolated this problem on his machine. in any case the issue is there with current 10.0 kernel so related patches imho should be backported (10.0 uses inotify in many user space applications, both GNOME and KDE).
FYI, above mentioned host g76.suse.de has now a fixed IP/is named klempnerei.suse.de and has after reboot kernel 2.6.14-20051031230013-smp running and new IP. So far I didn't experience the problem.
I think this can be considered as fixed for the current kernel. Maybe this is something which could be backported to the 10.0 kernel (for Beagle/newer KDE packages)?
This is fixed for the current kernel, yes. I am going to mark as resolved -- I don't want to push a kernel update unless Beagle / something else in SUSE 10.0 is hitting a specific problem.
why else do you think was this reported against SL 10.0 ?
Let's test whether the patch that went into 2.6.14/2.6.15 backported to our 10.0 kernel fixes this - and if it does, please get it into the 10.0 branch so that our next update kernel contains this fix.
I back-ported all of the inotify fixes between 2.6.13 and 2.6.15-rc5. This bug is fixed -- I believe -- by the fs/dcache.c hunk of the patch.
Created attachment 60580 [details] Inotify fixes for 2.6.13 Backport of fixes from 2.6.15-rc5 to 2.6.13.
If this fixes the issues, let's add it to our 10.0 kernel.
*** Bug 139580 has been marked as a duplicate of this bug. ***
*** Bug 140785 has been marked as a duplicate of this bug. ***
Hi, people. It's been around 15 days and this problems is still not solved by any (YOU) update. I don't know about you, but since I have this problem I can't use any editor that is based on Kwrite/Kate, which includes among others, Kdevelop and Kile. This means I cannot work until this problem is solved. If it solved, when can I expect it to be delivered? Thanks for the help.
Robert, could you add your patches to our 10.0 kernel, build a test kernel and let Hugo test it? If it does, we'll add this to our 10.0 kernel and the patch will go out with our next security update for the kernel - we're not releasing an extra kernel for this issue.
Alright. I built i586 UP test kernels with the inotify updates applied. You can grab them here: http://primates.ximian.com/~rml/misc/ Everyone experiencing this problem: Confirm or deny if this fixes it. If not, I don't think we are looking at an inotify problem.
Created attachment 61968 [details] updated version of the inotify-updates patch
Hugo, could you report back on this, please?
I installed it at 9:50 AM GMT+0 (Lisbon) time. At the end of the day I will report what happened, or earlier if the problem shows up in the meantime.
Just one thing, is it possible to provide also the kernel-source rpm? Because otherwise I can't reinstall the ATI drivers, so I'm currently without 3D acceleration.
Put up source packages in the same place: http://primates.ximian.com/~rml/misc/
Well, it is now 16:50 GMT+0, so 7 hours have passe, in which I had always Kile and Kdevelop opened withou the problem showing up. So it seems that you have succeed in correcting this bug. Thanks a lot.
Thanks for the testing. Robert, please add the patch to our 10.0 CVS.
Let me just say one thing. The kernel-nongpl provided in http://primates.ximian.com/~rml/misc/ doesn't contain nothing, so updating means, among other things, loosing wireless in a centrino laptop, which happened ot me.
Yes, this is because I did not build any binary-only modules. Our update kernel will contain all the requisite drivers.
Ok, but can a kernel rpm with everything be found anywhere, so that one doesn't have to abdicate of some functionality?
Andreas: Submitted to 10.0 CVS
Hugo: If we rebuild the kotd for 10.0 (I don't know how often we sync these), the kernels will show up at ftp://ftp.suse.com/pub/projects/kernel/kotd/10.0-i386/SL100_BRANCH Otherwise, you will need to wait for the update kernel.
Robert: Ok, thanks again.