Bug 118833 - Install to Software Raid fails
Summary: Install to Software Raid fails
Status: RESOLVED WONTFIX
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: X11 3rd Party (show other bugs)
Version: RC 1
Hardware: Other All
: P5 - None : Major
Target Milestone: ---
Assignee: Neil Brown
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-26 16:16 UTC by David Rigler
Modified: 2006-06-20 03:13 UTC (History)
1 user (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
content of var/log subdirectory (139.50 KB, application/x-gzip)
2005-09-27 15:15 UTC, David Rigler
Details
output of hwsetup (1.41 KB, application/octet-stream)
2005-09-27 15:16 UTC, David Rigler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Rigler 2005-09-26 16:16:51 UTC
i used the disk editor during install to create 3 mirrored partitions.

sda1+sdb1 = md0 = 128M @ MP /boot FS ext2
sda2+sdb2 = md1 = 256M @ MP <swap>
sda3+sdb3 = md2 = 8G @ MP / FS Raiserfs

after creating these and installing packages from CD1, the install reboots.
while booting from harddrive for the first time it says
....
md: raid 1 personality registered as nr 3
waitinmg for device /dev/md1 to appear: ok
no record for 'md1' in database
Attempting manual resume
Kernel panic - not syncing: IO error reading memory image
Comment 1 Michael Radziej 2005-09-26 16:29:26 UTC
Please attach /var/log/YaST2 and hwinfo.
See http://www.opensuse.org/Bug_Reporting_FAQ#YaST

Kernel panic suggests kernel bug
Comment 2 Lars Marowsky-Bree 2005-09-27 08:20:27 UTC
Full boot up messages would be required. But this looks slightly more like a
problem with swsuspend; trying to resume from a software raid1 doesn't appear to
be working?
Comment 3 David Rigler 2005-09-27 15:15:12 UTC
Created attachment 50949 [details]
content of var/log subdirectory
Comment 4 David Rigler 2005-09-27 15:16:09 UTC
Created attachment 50950 [details]
output of hwsetup
Comment 5 David Rigler 2005-09-27 15:18:32 UTC
(In reply to comment #1)
> Please attach /var/log/YaST2 and hwinfo.
> See http://www.opensuse.org/Bug_Reporting_FAQ#YaST
> 
> Kernel panic suggests kernel bug

attached. 
this comes from booting a live cd, mounting /dev/sda3
Comment 6 David Rigler 2005-10-04 14:37:42 UTC
hello !! hello !! anyone home 
Comment 7 Neil Brown 2005-10-17 23:30:12 UTC
My best guess as to the problem here is that the partitions haven't been
marked as type FD - Linux raid autostart.

If they have, then it would seem to be a 'YaST' bug, probably not running
'raidatuorun' in the right place in the 'init' script in initrd.

What is happening is that the init script is asking the kernel to check
to see if the is a 'resume' image on /dev/md1 to resume from, the kernel
tries to read from /dev/md1 and gets a read-error, and so it panics.
This is probably becasue /dev/md1 hasn't been assembled properly yet.

Arguably the kernel should not panic at this point but should just fail
to resume.  However if that wouldn't fix the root problem which is that
md1 isn't being assembled at this point.

So: please check and report whether the partitions that the mirrorred 
pairs are made from are set to type FD or not.

Thanks,
NeilBrown
Comment 8 Thomas Fehr 2005-10-18 08:08:10 UTC
From the logs I can see that all partitions are correctly created with 
partition id 0xFD. 
But is is at all possible to resume from a raid device?
/boot/grub/menu.lst contains kernel paramater "resume=/dev/md1", please try
with either "resume=/dev/sda2" or with no resume entry at all.
Comment 9 Neil Brown 2005-10-18 11:19:45 UTC
Resume from md 'should' work (though I've never tried it).

From the logs I can see that everything you would expect to be needed has 
happened (raid1.ko has been loaded, /dev/md1 has been created), except for the
actually assembly of the raid device.

There might be some slightly interesting issues if the array was dirty and
needed a resync, but that shouldn't stop it from working.

Presumably the 'init' script explicitly loads raid1.ko.  Does it
run "raidautorun" as well?  Could it?  Should it?
Comment 10 Thomas Fehr 2005-10-18 11:45:17 UTC
If one of the modules "raid0 raid1 raid5 linear multipath" is loaded in initrd,
raidautorun should also put into initrd and executed after module loading.
 
This seems to work in general otherwise we never could use any md device as
root partition and this is quite common and so far I did not hear of any 
problems with this.
Comment 11 Neil Brown 2005-10-21 01:15:45 UTC
I had a look into the mysteries of 'mkinitrd' and it only causes raidautorun
to be run *after* resume has been attempted.  So resuming from a raid array
currently cannot work.
If we wanted to make it work with current technology, I would probably
do something like this in the init script created by mkinitrd.

case $resumedev in 
  /dev/md* )
   realdev=$(
    mdadm -Es -cpartitions | 
       while read a b c d 
       do
         if [ "x$a" = "xARRAY" -a "x$b" = "x$resumedev" -a "x$c" = "xlevel=raid1" ]
         then
           IFS=,=
           read a b c
           echo $b
         fi
       done
      )
   if [ -n "$realdev" ]; then resumedev=$realdev; else resume_mode= ; fi
 esac

That should probably go at the start of the udev_discover_resume function.
If not this, then at least it should check if resumedev looks like
/dev/md*, and ignore it if it does.

Who is responsible for making changes to "mkinitrd" ??

NeilBrown

Comment 12 Thomas Fehr 2005-10-24 10:19:05 UTC
Hannes Reinecke is doing most of the changes in mkinitd.
I am pretty sure he will accept patches.

I added him to CC:
Comment 13 Hannes Reinecke 2005-10-24 11:18:09 UTC
Hmm. Why do we have to check for 'raid1' partitions only? Shouldn't we consider all raid levels?
Comment 14 Thomas Fehr 2005-10-24 11:29:02 UTC
Where do we check only for raid1?
The check for raid I see in mkinitrd contains:

    if has_any_module raid0 raid1 raid5 linear multipath; then

Raid1 is the only raid level that can be used for the files in /boot but that
is nothing that should mother mkinitrd.
Comment 15 Neil Brown 2006-04-27 02:14:04 UTC
Hanes:  This bug has been sitting around unloved for some months.

Do you know if the important issues have been resolved?

Can we close it?
Comment 16 Hannes Reinecke 2006-04-27 07:54:26 UTC
I doubt we'll fix this for 10.0. This change is a bit intrusive and would have to be tested thoroughly. And as this no-one bothered during beta test I don't think it's important enough.
So we'll mark this a not supported.
Comment 17 Neil Brown 2006-06-20 03:13:12 UTC
Ok, I'll mark it as WONTFIX for now.  If it comes back to haunt us, so be it.