Bug 137095 - Cifs modules freezes the system
Summary: Cifs modules freezes the system
Status: RESOLVED WONTFIX
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Network (show other bugs)
Version: unspecified
Hardware: x86-64 SuSE Linux 10.0
: P5 - None : Critical
Target Milestone: ---
Assignee: Steve French
QA Contact: Adrian Schröter
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-06 01:46 UTC by Tazio Ceri
Modified: 2007-02-01 13:14 UTC (History)
1 user (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
hwinfo information (216.99 KB, text/plain)
2005-12-06 16:09 UTC, Tazio Ceri
Details
Requested syslog (886 bytes, text/plain)
2005-12-06 16:21 UTC, Tazio Ceri
Details
500 lines of /var/log/messages (40.30 KB, text/plain)
2005-12-06 16:22 UTC, Tazio Ceri
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tazio Ceri 2005-12-06 01:46:32 UTC
I apologize for not having followed bugzilla but I was really busy at work.
Here it is the original report:

I use the cifs module (instead of smbfs) to access samba shares. If the
connection is fast, over 6Mbyte/sec on a 100Mbit LAN, sometimes I get a total
freeze. It happens mainly when connecting to a Debian Stable Pc, with others it usually get many stalls, but nothing else happens. When freezing, the machine stop answering ping too.
The only solution is rebooting.
Booting with nmi_watchdog=1 does not change the situation.
How do I attach the hwinfo information?
Comment 1 Michael Gross 2005-12-06 14:24:36 UTC
Please execute `hwinfo > config.txt'. This file can be attched here with the attachment feature (see below, the direct link for this bug is https://bugzilla.novell.com/attachment.cgi?bugid=137095&action=enter). Please also attach 500 lines of your syslog in /var/log/messages with the same method. Thanks.
Comment 2 Michael Gross 2005-12-06 14:30:31 UTC
One note: Please always choose the correct component and version. openSUSE is the category with bugs within the Wiki at opensuse.org only. Changing component.
Comment 3 Tazio Ceri 2005-12-06 16:09:08 UTC
Created attachment 59926 [details]
hwinfo information
Comment 4 Tazio Ceri 2005-12-06 16:21:12 UTC
Created attachment 59928 [details]
Requested syslog

These are not 500 lines but it covers the moment from mounting the share to the crash.
Comment 5 Tazio Ceri 2005-12-06 16:22:43 UTC
Created attachment 59929 [details]
500 lines of /var/log/messages

These are my last 500 lines of /var/log/messages
Comment 6 Michael Gross 2005-12-07 13:40:07 UTC
Normally, if such a thing happens, the kernel oupses or at least outputs some useful information. Please check your syslog for such errors or reproduce the problem and look what it produces. If there is a hang without any logged information (rather unlikely) we will have little chance to fix this.
Comment 7 Michael Gross 2005-12-19 17:38:52 UTC
Sorry, but I have to close this bug until the required information can be provided. In that case, please reopen this report.
Comment 8 Per Öberg 2006-04-11 06:19:30 UTC
Note: this is my first bug-report/edit ever. Please have patiance with me if i dont do everything correct at once.

I have had this problem to i think. The problem is not that my computer freezes, the problem is that the amount of log messages generated from the cifsd, user space program completely overwhelms my computer. However, killing cifsd will give me back complete control of my computer but leaves the mounts in a io-error type of state which seames unrecoverable. 

"uname -a" says:
Linux mg 2.6.13-15.8-default #1 Tue Feb 7 11:07:24 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux

The log messages looks normal up to messages that looks like (this is the messages that floods the log and is basically all that i see.)
----------------------
Mar 15 09:54:48 mg kernel:  CIFS VFS: No task to wake, unknown frame rcvd!
Mar 15 09:54:48 mg kernel: Received Data is: : dump of 37 bytes of data at 0xffff81002ff0fe80
Mar 15 09:54:48 mg kernel:
Mar 15 09:54:48 mg kernel:  3a000000 424d53ff 00000032 c0418000 . . . : ÿ S M B 2 . . . . . A À
Mar 15 09:54:48 mg kernel:  00000000 00000000 00000000 43600001 . . . . . . . . . . . . . . ` C
Mar 15 09:54:48 mg kernel:  06cd0064 0000020a d . Í . .
----------------------

The bug is generally trigged when using emacs/auctex with preview and source specials. The operations involved when saving-and-compiling touches about 10 files at once and this is when the bug shows up.

Now, since i have a solution i thought i'd stop here. The sollution for me was to download the latest kernel and make a diff in the linux-xxx/fs/cifs directory. Since the functions called by file.c, connect.c and inode.c have changed somewhat since the kernel i use "2.6.13-15.8-default" i had to edit them slightly. 
1) Change kzalloc( to kcalloc(1, in connect.c
2) Change filemap_write_and_wait to filemap_fdatawrite followed by filemap_fdatawait in inode.c and file.c

This solves the problem i had. The operations that triggered the bug now seems to do something that generates a single log-message exactly the same as shown above and the recovers. However, the save-and-compile does not, when the bug is triggered, complete its operation so some of the index-files are left empty. This is however a minor problem.

I guess what i'm asking is that you upgrade the cifs part of the suse kernel to the content of the latest kernel release so that i wont have to each time the kernel is updated.

Comment 9 Michael Gross 2006-04-11 12:13:37 UTC
Reassigning to the kernel maintainers.
Comment 10 Lars Marowsky-Bree 2006-04-11 15:31:01 UTC
Jeremy, can you help?
Comment 11 Olaf Kirch 2006-04-11 18:23:09 UTC
For CIFS problems, it's usually best to involve Steve French directly.
Comment 12 Chris L Mason 2006-04-11 18:36:51 UTC
Whoops, mid-air collision with Olaf.  Steve will need to see this reproduced on a 10.1 kernel.
Comment 13 Per Öberg 2006-04-12 05:46:16 UTC
So what you are telling me is that i should try and compile the 10.1 kernel on my 10.0 system to make sure the behaviour is unacceptable in 10.1 too?

Comment 14 Andreas Jaeger 2007-02-01 13:14:47 UTC
STill in NEEDINFO for 7months.  Markings as WONTFIX.