Bug 106103 - 10.0 beta kernel deadlocking
Summary: 10.0 beta kernel deadlocking
Status: RESOLVED FIXED
: 105779 (view as bug list)
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 2
Hardware: Other All
: P5 - None : Blocker
Target Milestone: ---
Assignee: Jens Axboe
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-22 10:01 UTC by Ruediger Oertel
Modified: 2005-08-25 10:03 UTC (History)
2 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
serial console sysrq dump (138.09 KB, text/plain)
2005-08-22 10:02 UTC, Ruediger Oertel
Details
dmesg of all four machines affected in our office (15.95 KB, text/plain)
2005-08-23 10:03 UTC, Ruediger Oertel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ruediger Oertel 2005-08-22 10:01:27 UTC
over the weekend, basically every 10.0 beta machine (all x86_64) 
were stuck. I've been able to get another hang within 15 minutes on fatou, 
I'll attach sysrq-t output.
Comment 1 Ruediger Oertel 2005-08-22 10:02:15 UTC
Created attachment 46854 [details]
serial console sysrq dump
Comment 2 Ruediger Oertel 2005-08-22 10:02:43 UTC
up to blocker according to aj  
Comment 3 Thomas Schmidt 2005-08-22 11:19:47 UTC
host "bessel" also froze, but it is a 32bit + hyperthreading CPU.
Comment 4 Chris L Mason 2005-08-22 15:54:19 UTC
Jens, reiserfs is waiting on the disk. 
Comment 5 Chris L Mason 2005-08-22 15:55:38 UTC
Rudi, could you please try mounting barrier=none? 
Comment 6 Jens Axboe 2005-08-22 15:59:49 UTC
Lovely...
Comment 7 Ruediger Oertel 2005-08-22 16:01:06 UTC
ok, added to params on galerkin and rebooted. 
added to params on fatou, I'll reboot before I leave. 
 
Thomas will do the same on "bessel". 
 
Comment 8 Jens Axboe 2005-08-22 18:29:40 UTC
Did beta1 work fine, btw? I'd also like a full dmesg from both of these systems.
Comment 9 Ruediger Oertel 2005-08-23 10:00:58 UTC
no, I've been having problems with "machine gets stuck" with more recent 
kernels, but I blamed the nvidia driver at first until I saw the first 
backtrace. 
 
"barrier=none" did not seem to work however: 
> dmesg | grep barrier 
Bootdata ok (command line is root=/dev/sda2 vga=0x31a selinux=0  splash=silent 
resume=/dev/sda1  console=tty0 console=ttyS0,57600 splash=silent showopts 
barrier=none) 
Kernel command line: root=/dev/sda2 vga=0x31a selinux=0  splash=silent 
resume=/dev/sda1  console=tty0 console=ttyS0,57600 splash=silent showopts 
barrier=none 
reiserfs: using flush barriers 
reiserfs: using flush barriers 
 
Comment 10 Ruediger Oertel 2005-08-23 10:03:35 UTC
Created attachment 47137 [details]
dmesg of all four machines affected in our office
Comment 11 Jens Axboe 2005-08-23 10:11:06 UTC
Rudi, you need to use barrier=none as a mount parameter!
Comment 12 Jens Axboe 2005-08-23 10:13:21 UTC
Can you try the barrier=none on 1 system, and on another do:

# echo 2 > /sys/block/sda/queue/iosched/max_depth

for sda and any other hard drive that is mounted with barriers enable on another
machine? I'd like to see if both of these settings will allow the machine to work.
Comment 13 Ruediger Oertel 2005-08-23 22:33:09 UTC
galerkin now has: echo 2 > /sys/block/hda/queue/iosched/max_depth 
 
fatou will get the barrier=none mount options (on next reboot) 
 
  
Comment 14 Jens Axboe 2005-08-24 10:50:37 UTC
*** Bug 105779 has been marked as a duplicate of this bug. ***
Comment 15 Jens Axboe 2005-08-24 10:51:15 UTC
Rudi, setting back to NEEDINFO as the needed info hasn't been posted yet :)
Comment 16 Ruediger Oertel 2005-08-25 09:45:51 UTC
galerkin and fatou are still running: 
galerkin:~ # uptime 
 11:44am  up 2 days 17:44,  1 user,  load average: 1.17, 1.05, 1.04 
galerkin:~ # uname -a 
Linux galerkin 2.6.13-rc6-git13-2-default #1 Sun Aug 21 18:48:53 UTC 2005 
x86_64 x86_64 x86_64 GNU/Linux 
fatou:~ # uptime 
 11:45am  up 1 day 11:10,  7 users,  load average: 1.54, 0.63, 0.48 
fatou:~ # uname -a 
Linux fatou 2.6.13-rc6-git12-2-smp #1 SMP Sun Aug 21 00:13:36 UTC 2005 x86_64 
x86_64 x86_64 GNU/Linux 
 
Comment 17 Jens Axboe 2005-08-25 10:03:58 UTC
Perfect, thanks for testing! The fix was checked in yesterday, closing..