Bug 113379 - suspend to disk complains about cifsd not stopped
Summary: suspend to disk complains about cifsd not stopped
Status: RESOLVED FIXED
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 3
Hardware: Other All
: P5 - None : Normal
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-26 17:09 UTC by Bjoern Jacke
Modified: 2007-11-15 19:25 UTC (History)
2 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
should help... (328 bytes, patch)
2005-08-29 10:11 UTC, Pavel Machek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Bjoern Jacke 2005-08-26 17:09:44 UTC
if you have a cifs share mounted  (mount -t cifs //server/bla /mnt) suspend to
disk complains:

Restarting tasks... Strange, cifsd not stopped

Also unmounting the cifs share does not help, just removing the kernel module
makes suspend work again.
Comment 1 Bjoern Jacke 2005-08-27 00:53:51 UTC
Steve, I guess it's okay for you to paste your mail here for the others:

Steve French wrote:
Is suspend enabled in SLES9?  Is this (in SLES10) swsupd2?  I don't have a
SLES10 system installed nearby (I will check around some more).

For older distros - I had trouble working through the userspace tools
needed to do this (ie suspend) with the stock kernel 2.6.13-rc6.

My guess is that the problem is that cifsd is either blocking signals that
the new suspend utils wants to send to it, or that the network stack can
not suspend when a tcp peek is pending (cifsd is the demultiplexing thread
- waiting for responses from the server so almost always blocked in
tcp_peek).
Comment 2 Pavel Machek 2005-08-29 10:10:17 UTC
We don't use suspend2.

cifs is pretty ugly code, and it seems it has no suspend support. Oops. Try
attached patch.
Comment 3 Pavel Machek 2005-08-29 10:11:30 UTC
Created attachment 47950 [details]
should help...
Comment 4 Bjoern Jacke 2005-09-04 01:26:52 UTC
is this supposed to be in recent kernel RPMs now?
Comment 5 Forgotten User ZhJd0F0L3x 2005-09-04 11:35:35 UTC
probably not without any feedback if it helps... ;-)
Comment 6 Bjoern Jacke 2005-09-05 08:10:40 UTC
well, I actually don't have the time and computing ressources to patch compile a
kernel myslef right now ;-)
Comment 7 Pavel Machek 2005-09-06 21:05:51 UTC
I'd really like to know if the patch above helps...
Comment 8 Lars Müller 2005-09-06 23:14:36 UTC
Then add it to our kernel package and let Björn test it as soon as we have
prebuild packages.  He doesn't have the build power and time to create his own
kernel.
Comment 9 Lars Müller 2005-09-19 11:11:55 UTC
Is the patch in out kot?
Comment 10 Pavel Machek 2005-09-19 14:13:59 UTC
No, it is not. I do not have the build power and time to create suse kernel :-(.
Comment 11 Steve French 2005-09-19 15:33:07 UTC
(In reply to comment #4)
> is this supposed to be in recent kernel RPMs now?

This has been in the cifs-2.6.git development tree on kernel.org for quite a
while and seems to test out ok.  I plan to push a similar patch to mainline
kernel in the next few days.
Comment 12 Bjoern Jacke 2005-10-11 06:37:00 UTC
status from needinfo -> assigned (what is the name of the game which is played
in this report? Blaming Steve for his code and making fun of external people who
can't compile a whole kernel privately? This game sucks.)
Comment 13 Pavel Machek 2005-10-11 08:17:34 UTC
The name is "don't want to commit patch noone ever tested, and do not have cifs
here to test". If it sucks for you, sorry, evidently bug is not important enough.

It is likely to be fixed in suse10.1 because patch is already in in cifs-2.6 so
it is going to get to linus and to us eventually.
Comment 14 Bjoern Jacke 2005-10-11 08:36:47 UTC
What "sucks for me" are smug comments like #10, repeating me and saying that
*you* don't have build power etc. and it sucks to rant on code of people who are
in the cc of this bug. Right on if you think this is the right way to do your
work and to treat other people.
Comment 15 Lars Müller 2005-10-11 10:53:29 UTC
Andreas/ Olaf: Could one of you please add the suggested fix to the current
kernel tree to allow Bjoern some testing?

Steve: Any objections?
Comment 16 Olaf Kirch 2005-10-11 11:17:47 UTC
In principle, I agree with Pavel - we cannot just commit a patch to  
CVS without making sure it actually fixes the problem; and we do not  
always have the time to build test kernels.  
  
That said, I triggered some mbuild jobs for test kernels including  
this patch:  
  
kalman-okir-420 kernel-default: IN PROGRESS  
 - 10.0-i386: not started yet  
kalman-okir-421 kernel-smp: IN PROGRESS  
 - 10.0-i386: not started yet  
kalman-okir-422 kernel-default: IN PROGRESS  
 - 10.0-x86_64: not started yet  
kalman-okir-423 kernel-smp: IN PROGRESS  
 - 10.0-x86_64: not started yet 
 
I'll provide you with a download location once they're finished. 
Comment 17 Olaf Kirch 2005-10-11 13:17:35 UTC
Kernels will show up in ftp://ftp.suse.de/private/okir/113379 within the 
next 30 minutes or so 
Comment 18 Bjoern Jacke 2005-10-12 08:43:43 UTC
thanks, suspend to disk works with that kernel and no new problems came up so far.
Comment 19 Olaf Kirch 2005-10-12 08:54:34 UTC
The patch is in the 10.0 branch now. Steve, can we expect to
pick up this fix from upstream in SL10.1?
Comment 20 Olaf Kirch 2005-10-12 08:59:36 UTC
One last comment on the conversation tone in this bugzilla.
Exchanges like this do not make me happy. 

I think it was not Pavel's intention to make fun of anyone by saying he
doesn't have the time and resources to build test kernels. It is simply
a matter of fact that Suse R&D does not always have the time to build
test kernels for all customers; and that is particularly true for 10.0
because it was scheduled back to back with SLES9 service pack 3.
Comment 21 Bjoern Jacke 2005-10-20 16:19:43 UTC
the bug doesn't seem to be fixed fully. Occasionally (in about one out of twenty suspend-to-disk's) it still comes up with the "cifsd not stopped" message.
Comment 22 Pavel Machek 2005-10-21 11:12:44 UTC
Is it under high load or under unloaded system?

Anyway, I'm not able to debug cifs. Do we have someone who can work on that?
Comment 23 Bjoern Jacke 2005-10-21 19:18:11 UTC
no, the system is quite idle, running syspend to disk directly again after the failed try then worked fine without any change of the systems's state
Comment 24 Pavel Machek 2005-10-22 18:25:39 UTC
Someone with cifs experience needs to fix this. Or perhaps we can just leave it as-is: suspend *is* allowed to fail after all.
Comment 25 Hubert Mantel 2005-10-24 08:58:24 UTC
Lars, can you or someone from IBM look at this one? I mean, there are cifs updates every other day anyway...
Comment 26 Steve French 2005-10-24 17:17:38 UTC
Presumably this is due to a place in the cifsd thread where it is temporarily blocked uninterruptible (on certain error conditions) instead of in the normal location (in which it would be waiting on a tcp socket read).  I will try to reproduce it, but I will need to install Suse workstatation 10 on something in which all of the hardware/drivers support suspend - it gets past cifs on my AthlonXP/64 motherboard with Suse workstation 10 but hit problems with another driver - I will need to probe further on this or reinstall on a different test machine.

I do agree with Pavel though - suspend is allowed to fail from time to time if the message is obvious enough to the user, no data is lost, and retry works.
Comment 27 Forgotten User ZhJd0F0L3x 2005-10-24 17:35:17 UTC
I don't agree, i'd like to "fire and forget" when suspending, so it may not fail ;-)

The problem is that the error is only in the logs, and there is no easy way to present the error to the user (i would not like to grep the logs for suspicious lines and present them in a popup box ;-)
So if we can fix this in the future, this would really be nice.
Comment 28 Steve French 2005-10-24 18:24:54 UTC
Any idea where in the suspend code I could find the code that walks the list of blocked processes and retries on ones that are blocked in states that can not be worken up?   cifsd thread only has a few cases in which it sleeps for a second or more (and those are out of memory cases when we can not get memory from the pool for reading in the tcp response from the server).
Comment 29 Dave Kleikamp 2005-10-24 21:36:41 UTC
Steve,
See freeze_processes() in kernel/power/process.c
Comment 30 Pavel Machek 2007-11-15 17:32:25 UTC
Last update is from 2005, can you try if this is fixed in opensuse 10.3 and close the bug?
Comment 31 Lars Müller 2007-11-15 18:11:13 UTC
Reassign to kernel-maintainers this is not Samba.
Comment 32 Pavel Machek 2007-11-15 18:14:19 UTC
Well -- it used to be samba -- cifsd not stopped. But I believe I had a patch, and it is probably even upstream, so this is probably long fixed (in opensuse-10.3). Bjoern, can you verify that?
Comment 33 Steve French 2007-11-15 19:23:21 UTC
This has been fixed and merged into mainline for over two years, and is current SuSEs.  There was a useful, loosely related, fix (which addressed the problem of suspend when the client was trying to reconnect after the connection to the server was lost) which went in a year later (and has been in mainline over a year) and is also in mainline and current SuSE releases.  CIFS version 1.44 or later looks like it contains all suspend related fixes.

There are probably various other long fixed CIFS bugzilla entries in the Novell bugzilla which could also be closed.  Lars (or one of the SuSE kernel guys) and I should go through the list one by one to verify and close out some of the old, long fixed ones.
Comment 34 Pavel Machek 2007-11-15 19:25:50 UTC
Okay, so this one can be closed.