Bugzilla – Bug 113379
suspend to disk complains about cifsd not stopped
Last modified: 2007-11-15 19:25:50 UTC
if you have a cifs share mounted (mount -t cifs //server/bla /mnt) suspend to disk complains: Restarting tasks... Strange, cifsd not stopped Also unmounting the cifs share does not help, just removing the kernel module makes suspend work again.
Steve, I guess it's okay for you to paste your mail here for the others: Steve French wrote: Is suspend enabled in SLES9? Is this (in SLES10) swsupd2? I don't have a SLES10 system installed nearby (I will check around some more). For older distros - I had trouble working through the userspace tools needed to do this (ie suspend) with the stock kernel 2.6.13-rc6. My guess is that the problem is that cifsd is either blocking signals that the new suspend utils wants to send to it, or that the network stack can not suspend when a tcp peek is pending (cifsd is the demultiplexing thread - waiting for responses from the server so almost always blocked in tcp_peek).
We don't use suspend2. cifs is pretty ugly code, and it seems it has no suspend support. Oops. Try attached patch.
Created attachment 47950 [details] should help...
is this supposed to be in recent kernel RPMs now?
probably not without any feedback if it helps... ;-)
well, I actually don't have the time and computing ressources to patch compile a kernel myslef right now ;-)
I'd really like to know if the patch above helps...
Then add it to our kernel package and let Björn test it as soon as we have prebuild packages. He doesn't have the build power and time to create his own kernel.
Is the patch in out kot?
No, it is not. I do not have the build power and time to create suse kernel :-(.
(In reply to comment #4) > is this supposed to be in recent kernel RPMs now? This has been in the cifs-2.6.git development tree on kernel.org for quite a while and seems to test out ok. I plan to push a similar patch to mainline kernel in the next few days.
status from needinfo -> assigned (what is the name of the game which is played in this report? Blaming Steve for his code and making fun of external people who can't compile a whole kernel privately? This game sucks.)
The name is "don't want to commit patch noone ever tested, and do not have cifs here to test". If it sucks for you, sorry, evidently bug is not important enough. It is likely to be fixed in suse10.1 because patch is already in in cifs-2.6 so it is going to get to linus and to us eventually.
What "sucks for me" are smug comments like #10, repeating me and saying that *you* don't have build power etc. and it sucks to rant on code of people who are in the cc of this bug. Right on if you think this is the right way to do your work and to treat other people.
Andreas/ Olaf: Could one of you please add the suggested fix to the current kernel tree to allow Bjoern some testing? Steve: Any objections?
In principle, I agree with Pavel - we cannot just commit a patch to CVS without making sure it actually fixes the problem; and we do not always have the time to build test kernels. That said, I triggered some mbuild jobs for test kernels including this patch: kalman-okir-420 kernel-default: IN PROGRESS - 10.0-i386: not started yet kalman-okir-421 kernel-smp: IN PROGRESS - 10.0-i386: not started yet kalman-okir-422 kernel-default: IN PROGRESS - 10.0-x86_64: not started yet kalman-okir-423 kernel-smp: IN PROGRESS - 10.0-x86_64: not started yet I'll provide you with a download location once they're finished.
Kernels will show up in ftp://ftp.suse.de/private/okir/113379 within the next 30 minutes or so
thanks, suspend to disk works with that kernel and no new problems came up so far.
The patch is in the 10.0 branch now. Steve, can we expect to pick up this fix from upstream in SL10.1?
One last comment on the conversation tone in this bugzilla. Exchanges like this do not make me happy. I think it was not Pavel's intention to make fun of anyone by saying he doesn't have the time and resources to build test kernels. It is simply a matter of fact that Suse R&D does not always have the time to build test kernels for all customers; and that is particularly true for 10.0 because it was scheduled back to back with SLES9 service pack 3.
the bug doesn't seem to be fixed fully. Occasionally (in about one out of twenty suspend-to-disk's) it still comes up with the "cifsd not stopped" message.
Is it under high load or under unloaded system? Anyway, I'm not able to debug cifs. Do we have someone who can work on that?
no, the system is quite idle, running syspend to disk directly again after the failed try then worked fine without any change of the systems's state
Someone with cifs experience needs to fix this. Or perhaps we can just leave it as-is: suspend *is* allowed to fail after all.
Lars, can you or someone from IBM look at this one? I mean, there are cifs updates every other day anyway...
Presumably this is due to a place in the cifsd thread where it is temporarily blocked uninterruptible (on certain error conditions) instead of in the normal location (in which it would be waiting on a tcp socket read). I will try to reproduce it, but I will need to install Suse workstatation 10 on something in which all of the hardware/drivers support suspend - it gets past cifs on my AthlonXP/64 motherboard with Suse workstation 10 but hit problems with another driver - I will need to probe further on this or reinstall on a different test machine. I do agree with Pavel though - suspend is allowed to fail from time to time if the message is obvious enough to the user, no data is lost, and retry works.
I don't agree, i'd like to "fire and forget" when suspending, so it may not fail ;-) The problem is that the error is only in the logs, and there is no easy way to present the error to the user (i would not like to grep the logs for suspicious lines and present them in a popup box ;-) So if we can fix this in the future, this would really be nice.
Any idea where in the suspend code I could find the code that walks the list of blocked processes and retries on ones that are blocked in states that can not be worken up? cifsd thread only has a few cases in which it sleeps for a second or more (and those are out of memory cases when we can not get memory from the pool for reading in the tcp response from the server).
Steve, See freeze_processes() in kernel/power/process.c
Last update is from 2005, can you try if this is fixed in opensuse 10.3 and close the bug?
Reassign to kernel-maintainers this is not Samba.
Well -- it used to be samba -- cifsd not stopped. But I believe I had a patch, and it is probably even upstream, so this is probably long fixed (in opensuse-10.3). Bjoern, can you verify that?
This has been fixed and merged into mainline for over two years, and is current SuSEs. There was a useful, loosely related, fix (which addressed the problem of suspend when the client was trying to reconnect after the connection to the server was lost) which went in a year later (and has been in mainline over a year) and is also in mainline and current SuSE releases. CIFS version 1.44 or later looks like it contains all suspend related fixes. There are probably various other long fixed CIFS bugzilla entries in the Novell bugzilla which could also be closed. Lars (or one of the SuSE kernel guys) and I should go through the list one by one to verify and close out some of the old, long fixed ones.
Okay, so this one can be closed.