Bugzilla – Bug 106103
10.0 beta kernel deadlocking
Last modified: 2005-08-25 10:03:58 UTC
over the weekend, basically every 10.0 beta machine (all x86_64) were stuck. I've been able to get another hang within 15 minutes on fatou, I'll attach sysrq-t output.
Created attachment 46854 [details] serial console sysrq dump
up to blocker according to aj
host "bessel" also froze, but it is a 32bit + hyperthreading CPU.
Jens, reiserfs is waiting on the disk.
Rudi, could you please try mounting barrier=none?
Lovely...
ok, added to params on galerkin and rebooted. added to params on fatou, I'll reboot before I leave. Thomas will do the same on "bessel".
Did beta1 work fine, btw? I'd also like a full dmesg from both of these systems.
no, I've been having problems with "machine gets stuck" with more recent kernels, but I blamed the nvidia driver at first until I saw the first backtrace. "barrier=none" did not seem to work however: > dmesg | grep barrier Bootdata ok (command line is root=/dev/sda2 vga=0x31a selinux=0 splash=silent resume=/dev/sda1 console=tty0 console=ttyS0,57600 splash=silent showopts barrier=none) Kernel command line: root=/dev/sda2 vga=0x31a selinux=0 splash=silent resume=/dev/sda1 console=tty0 console=ttyS0,57600 splash=silent showopts barrier=none reiserfs: using flush barriers reiserfs: using flush barriers
Created attachment 47137 [details] dmesg of all four machines affected in our office
Rudi, you need to use barrier=none as a mount parameter!
Can you try the barrier=none on 1 system, and on another do: # echo 2 > /sys/block/sda/queue/iosched/max_depth for sda and any other hard drive that is mounted with barriers enable on another machine? I'd like to see if both of these settings will allow the machine to work.
galerkin now has: echo 2 > /sys/block/hda/queue/iosched/max_depth fatou will get the barrier=none mount options (on next reboot)
*** Bug 105779 has been marked as a duplicate of this bug. ***
Rudi, setting back to NEEDINFO as the needed info hasn't been posted yet :)
galerkin and fatou are still running: galerkin:~ # uptime 11:44am up 2 days 17:44, 1 user, load average: 1.17, 1.05, 1.04 galerkin:~ # uname -a Linux galerkin 2.6.13-rc6-git13-2-default #1 Sun Aug 21 18:48:53 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux fatou:~ # uptime 11:45am up 1 day 11:10, 7 users, load average: 1.54, 0.63, 0.48 fatou:~ # uname -a Linux fatou 2.6.13-rc6-git12-2-smp #1 SMP Sun Aug 21 00:13:36 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux
Perfect, thanks for testing! The fix was checked in yesterday, closing..