Bug 159435 - attempt to mount software raid1 device hangs
Summary: attempt to mount software raid1 device hangs
Status: RESOLVED FIXED
: 158732 159818 159828 160068 160135 160938 161246 162712 162727 (view as bug list)
Alias: None
Product: SUSE Linux 10.1
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 8
Hardware: x86-64 Other
: P5 - None : Critical (vote)
Target Milestone: ---
Assignee: Neil Brown
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-03-20 08:20 UTC by Scott Burson
Modified: 2006-04-10 23:22 UTC (History)
5 users (show)

See Also:
Found By: Beta-Customer
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
patch to fix problem (483 bytes, patch)
2006-03-23 03:35 UTC, Neil Brown
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Scott Burson 2006-03-20 08:20:48 UTC
This is in a fresh install of Beta8 on x86_64 SMP.  I have a partition on a software raid1 pair on two SATAII drives (Silicon Image 3114 controller chip, built into Tyan S2882-D motherboard).  The raid1 device appears to be configured correctly on boot, and `/proc/mdstat' looks normal, but attempting to mount the device hangs -- on one occasion the `mount' process hung in the driver; I rebooted and tried again, and this time the whole kernel hung.  `reiserfsck' on the partition reports no problems.
Comment 1 Chris L Mason 2006-03-20 15:24:22 UTC
Please collect the output from sysrq-t and sysrq-p over a serial or network console.

Neil, please take a look.
Comment 2 Neil Brown 2006-03-21 09:14:36 UTC
I think I know this one.  The sysrq-p will show reiserfs in journal_read (Among other things).
There is a bug with the RW_BARRIER handling in raid1 such that if the first write is a BARRIER write, raid1 doesn't cope.  Unfortunately it seems that reiserfs typically does a barrier write as it's first write (marking the journal clean or something like that).

I'm not yet sure exactly what the bug in raid1 is that is causing this, so I don't have a fix yet, but you can expect one in a day or so.
Comment 3 Chris L Mason 2006-03-21 18:10:22 UTC
*** Bug 159818 has been marked as a duplicate of this bug. ***
Comment 4 Chris L Mason 2006-03-21 18:50:30 UTC
*** Bug 159828 has been marked as a duplicate of this bug. ***
Comment 5 Thomas Fehr 2006-03-22 12:03:06 UTC
Could this problem also lead to a failing mount of reiserfs on a raid5?

I have a case in YaST2 logfiles where a the mounting of a freshly created
reiserfs on a raid5 failed. There were no suspicious kernel messages.
Comment 6 Chris L Mason 2006-03-22 20:34:54 UTC
*** Bug 160135 has been marked as a duplicate of this bug. ***
Comment 7 Neil Brown 2006-03-23 03:35:01 UTC
Created attachment 74582 [details]
patch to fix problem

The attached patch should fix the problem. 
Please try an confirm.

(It really is an embarassingly silly bug).
Comment 8 Scott Burson 2006-03-23 08:17:26 UTC
Yep, that fixed it!  Thanks!!!
Comment 9 LTC BugProxy 2006-03-23 16:30:32 UTC
---- Additional Comments From thinh@us.ibm.com(prefers email via th2tran@austin.ibm.com)  2006-03-23 11:24 EDT -------
SuSE team,
Would this fix be in Beta 9?
Thanks. 
Comment 10 Chris L Mason 2006-03-23 18:31:11 UTC
*** Bug 160068 has been marked as a duplicate of this bug. ***
Comment 11 Neil Brown 2006-03-24 00:05:33 UTC
Yes, this fix should be in the next Beta - Beta-9.
Comment 12 Neil Brown 2006-03-24 22:24:49 UTC
*** Bug 158732 has been marked as a duplicate of this bug. ***
Comment 13 Chris L Mason 2006-03-27 16:59:46 UTC
*** Bug 160938 has been marked as a duplicate of this bug. ***
Comment 14 Michael Gross 2006-03-29 12:29:56 UTC
*** Bug 161246 has been marked as a duplicate of this bug. ***
Comment 15 LTC BugProxy 2006-04-03 03:20:11 UTC
----- Additional Comments From wangzyu@cn.ibm.com  2006-04-02 23:16 EDT -------
  SLES10 Beta9 fixs this defect. 
Comment 16 LTC BugProxy 2006-04-03 13:30:42 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |FIXEDAWAITINGTEST
         Resolution|                            |FIX_BY_DISTRO




------- Additional Comments From thinh@us.ibm.com(prefers email via th2tran@austin.ibm.com)  2006-04-03 09:29 EDT -------
Great.
 Mark this as CLOSED. Fixed in SLAE10 Beta 9

IBM bug submitters from other duplication bugs,
 Please verify and close those bug as well.

Thanks. 
Comment 17 Michael Gross 2006-04-03 16:16:28 UTC
This obviously still is an issue and not limited to software RAID.
Comment 18 Michael Gross 2006-04-03 16:17:03 UTC
*** Bug 162712 has been marked as a duplicate of this bug. ***
Comment 19 Michael Gross 2006-04-03 16:18:59 UTC
*** Bug 162727 has been marked as a duplicate of this bug. ***
Comment 20 Forgotten User QtBI7gWTIh 2006-04-03 16:41:06 UTC
On my system, a SMP board i386, with pure SCSI and a adaptec controller, SLES10 is not installable. when I config raid5 drive's i have problems to format the drive's. When I config Raid1 drive's I have the problem's, Bug #162727 
Comment 21 Neil Brown 2006-04-04 00:36:40 UTC
(In reply to comment #17)
> This obviously still is an issue and not limited to software RAID.
> 

Could you please clarify what you mean.  If there is something that is not related to software RAID, then possibly it is a different bug and needs a new bugzilla entry?
Comment 22 Neil Brown 2006-04-04 00:38:00 UTC
(In reply to comment #20)
> On my system, a SMP board i386, with pure SCSI and a adaptec controller, SLES10
> is not installable. when I config raid5 drive's i have problems to format the
> drive's. When I config Raid1 drive's I have the problem's,  Bug #162727 
> 

Which beta of SLES10 are you trying to install?
What problems do you have with raid5?
Comment 23 Forgotten User QtBI7gWTIh 2006-04-04 06:34:15 UTC
comment #20

I use SLES10 beta9.

problems the  partitions can't formated ?

My way, to install SLES10 Beta9

I install a SL10.1 on a single partition, next I create the raid1 and raid5 partitions and format this partitions.

I start a new SLES10 installation, and only mount the partition's

/root
/
/usr
/var
/srv
/data
/dat1
..
then I have to wait ....

I mean YaST2 have Problems to create the fstab ?

After first reboot, the System can't start correctly ?

I have to boot with CD1 again, start "installed system", then I can adapt the /etc/sysconfig/kernel (add raid1, raid5 exist) make a mk_initrd and reboot the system, then I can install the rest. 
Comment 24 Martin Wilck 2006-04-04 12:12:23 UTC
Please note bug #159828. It wasn't directly related to MD/RAID in the first place as far as I could see. I haven't retested with beta9 so far.
Comment 25 Neil Brown 2006-04-06 07:38:06 UTC
(In reply to comment #24)
> Please note bug #159828. It wasn't directly related to MD/RAID in the first
> place as far as I could see. I haven't retested with beta9 so far.
> 

The 'dmesg' trace shows that a raid1 was active on the system.  The call trace
looked very similar to the known MD/RAID1 bug.  If you didn't have any
ext3 on the raid1, then please reopen that bug, the 'resulved as duplicate'
must haave been wrong in that case.
Comment 26 Neil Brown 2006-04-10 23:22:58 UTC
I'm closing this bug again as it is resolved.

If you are still have problems, either re-open or open a different bug.

Thanks,
NeilBrown