Bug 113707

Summary: Samba server locks machine up after large data transfer
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Phil Stopford <phil>
Component: BasesystemAssignee: The 'Opening Windows to a Wider World' guys <samba-maintainers>
Status: RESOLVED INVALID QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None    
Version: Beta 2   
Target Milestone: ---   
Hardware: i686   
OS: SuSE Pro 9.3   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Phil Stopford 2005-08-29 00:04:06 UTC
Irrespective of single large transfer or multiple small files, samba will
eventually lock the server up hard. You cannot log in via SSH or otherwise - the
only recovery is to pull the power cord. This affects SuSE 9.1, 9.2, 9.3 and
persists with upgrade of SuSE 9.3 pro to use samba 3.0.20rc1

I have not yet been able to test SuSE 10.0, but will in the next couple of days.
Comment 1 Jeremy Allison 2005-08-29 16:19:08 UTC
This is almost certainly not a Samba problem. We are a *userspace* application.
The machine locking up like that sounds like a hardware problem to me. At the
very worst it would be a kernel bug. I'd suggest downloading and running
memtest86 on this box. See here:

http://www.memtest86.com/

for details.

Jeremy.
Comment 2 Phil Stopford 2005-08-30 21:05:06 UTC
This has occurred on multiple boxes - desktops, laptops, etc. and across both
wireless and wired networks. The same hardware running Windows doesn't show an
issue. The issue also is client independent - Windows, Mac or SuSE. I'm not
convinced in any way that there is a hardware fault to blame in this given these
factors.

Logs don't show anything obvious - the transfer just stalls and the server fails
to respond to any input made locally or otherwise.
Comment 3 Jeremy Allison 2005-08-30 21:26:38 UTC
Ok, then it's a kernel bug - there is *NO WAY* an smbd server as a user
application can lock up the box so you have to reboot. But you're going to have
to do a lot more triage to even show evidence of a kernel bug given the
vagueness of your report.
Jeremy.
Comment 4 Phil Stopford 2005-08-30 21:38:31 UTC
If I had a starting point, I'd possibly be able to do something. With no obvious
information in the logs, no error on screen and no information to hand to help
me make a start (or even get a remote login set up), as a regular user I'm left
to make 'vague reports'. I can only do what I can do *shrug*
Comment 5 Lars Müller 2005-09-06 17:11:06 UTC
Phil: Can you add a null modem cabel to the server, configure a serial console
and provide mem and task dump of the machine when the problem occures again?

I suggest to use screen to attach to the serial console as it's possible to grab
all data (ctrl+a + ctrl+h).

Enable also sysrq (/etc/sysctl.conf or set
/etc/sysconfig/sysctl:ENABLE_SYSRQ="yes").

If you have this and have

serial --unit=0 --speed=38400 --word=8 --parity=no --stop=1
terminal --timeout=5 serial console

kernel ... console=tty0 console=ttyS0,38400

in /boot/grub/menu.lst of grub, then it's possible to send sysrqs via the serial
console (e.g. sync ctrl+a + ctrl+s + s).
Comment 6 Lars Müller 2005-09-06 17:16:41 UTC
screen example command line missing:

screen /dev/ttyS0 38400,cs8
Comment 7 Phil Stopford 2005-09-07 00:14:51 UTC
I'll give it a whirl, but it will very likely be Monday before I get time. The
problem surfaces after a couple of large transfers so the only tricky thing
would be getting a serial console set up, but hopefully your instructions will
help me out there.

The tightly compressed beta schedule of SuSE makes it very tricky (for me at
least) to get time to look at these things before another release appears, so I
cannot predict whether it will be beta 4 or RC1 that gets the testing.
Comment 8 Phil Stopford 2005-09-12 19:35:51 UTC
I have now been shifting data via samba on 10.0beta4 for the full day.
Conservatively, between 40 and 50 GB of data has been moved, consisting of
several large ~2 GB files and lots of smaller files. No problems have been seen.
Clients have been Windows, SUSE 9.3 and SUSE 10.0, with a single Windows client
pulling a full 15 GB of data without issue.

9.3 does continue to show problems, but I'm uncertain how to proceed - the sysrq
instructions above don't seem to dump data to the serial console and the serial
console does not seem to pass/take input from the keyboard at that console.
There does appear to be information in /var/log/messages, however that might not
be usable/relevant. Advice would be welcome, assuming 9.3 is a maintenance
target for SuSE.

I am not changing status of this bug as 9.3 is affected, but won't complain if
this is changed. :)
Comment 9 Lars Müller 2005-09-19 11:11:07 UTC
lmuelle@gab:~> cat /proc/sys/kernel/sysrq 
1

If you got a '0' please call

echo "1">/proc/sys/kernel/sysrq

as user root.  See /usr/src/linux/Documentation/sysrq.txt for more details on
sysrq handling.  Please check if Alt+Sysrq+s results in syslog messages.

Comment 10 Lars Müller 2006-03-15 13:41:45 UTC
No additional information provided since quite a time.