Bug 731230 - cannot boot machine with md raid disks.
Summary: cannot boot machine with md raid disks.
Status: RESOLVED FIXED
: 731135 (view as bug list)
Alias: None
Product: openSUSE 12.1
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Final
Hardware: x86-64 openSUSE 12.1
: P2 - High : Major (vote)
Target Milestone: Milestone 1
Assignee: Frederic Crozat
QA Contact: opensusebugs mailing_list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-18 02:28 UTC by kenneth zadeck
Modified: 2017-08-11 18:44 UTC (History)
8 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description kenneth zadeck 2011-11-18 02:28:06 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1

under opensuse 11.4 we were able to mount a raid file system during the system boot.    under 12.4, this causes the booting to fail.   The likely cause of this failure is some missing dependency in the startup order.   The value of the sixth parameter of the fstab entry had been 2.  But apparently this is no longer good enough.

We can get around the problem by setting the mount parameters for the raid partition to be noauto and having the raid partition mounted after the boot finishes by using a cron job.  This is a hack, but it works. 

Reproducible: Always

Steps to Reproduce:
1. create a raid file system that you want to mount during boot time.
2. reboot your machine.
3.
Actual Results:  
the system goes into single user mode because the file system cannot be properly fscked or mounted.    mounting it from single user mode followed by init 5 works.
Comment 1 Andreas Jaeger 2011-11-21 12:01:28 UTC
Does it work if you boot with SysV init? (you can toggle at boot prompt with F5)

Please attach /etc/fstab

How does the mounting fail? Does it hang or just does not mount the partition?
Comment 2 Andreas Löbel 2011-11-22 12:31:26 UTC
Hi,

same problems here, have several system with different md devices like this

/dev/md0             swap                 swap       defaults              0 0
/dev/md1             /                    ext4       acl,user_xattr        1 1
/dev/md2             /usr/local           ext4       defaults              1 2

Everythink worked well for openSuSE 11.4. I made a "zypper dup" with the new 12.1 repositories. Now the boot process hangs but can be continued manuelly with ctrl-D. I think systemd tries to mount /dev/md2 before it has been enabled:

<6>[    0.000000] Initializing cgroup subsys cpuset
<6>[    0.000000] Initializing cgroup subsys cpu
<5>[    0.000000] Linux version 3.1.0-1.2-desktop (geeko@buildhost) (gcc version 4.6.2 (SUSE Linux) ) #1 SMP PREEMPT Thu Nov 3 14:45:45 UTC 2011 (187dde0)
<6>[    0.000000] Command line: root=/dev/md1 splash=verbose quiet vga=0x31a
...
<6>[   10.170018] md: bind<sda2>
<6>[   10.292817] md: bind<sda4>

<6>[   10.774111] device-mapper: uevent: version 1.0.3
<6>[   10.774200] device-mapper: ioctl: 4.21.0-ioctl (2011-07-06) initialised: dm-devel@redhat.com

<30>[   10.794925] swapon[762]: swapon: /dev/md0: read swap header failed: Das Argument ist ungültig
<29>[   10.795248] systemd[1]: dev-md0.swap swap process exited, code=exited status=255
<29>[   10.809986] systemd[1]: Unit dev-md0.swap entered failed state.
<6>[   10.838103] md: bind<sdb4>
<30>[   10.864658] boot.lvm[629]: Reading all physical volumes.  This may take a while...
<30>[   10.864814] boot.lvm[629]: Activating LVM volume groups...
<6>[   10.904426] md/raid1:md2: active with 2 out of 2 mirrors
<6>[   10.904440] md2: detected capacity change from 0 to 444600942592
<30>[   10.912624] boot.md[620]: Starting MD RAID mdadm: /dev/md0 is already in use.
<6>[   10.947558]  md2: unknown partition table
<30>[   11.152105] systemd-fsck[768]: fsck.ext4: Das Argument ist ungültig beim Versuch, /dev/md2 zu öffnen
<30>[   11.152602] systemd-fsck[768]: /dev/md2:
<27>[   11.152606] systemd-fsck[768]: fsck failed with error code 8.
<28>[   11.152609] systemd-fsck[768]: Ignoring error.
<30>[   11.152612] systemd-fsck[768]: SuperBlock ist unlesbar bzw. beschreibt kein gültiges ext2
<30>[   11.153080] systemd-fsck[450]: Dateisystem.  Wenn Gerät gültig ist und ein ext2
<30>[   11.153336] systemd-fsck[450]: Dateisystem (kein swap oder ufs usw.) enthält,  dann ist der SuperBlock
<30>[   11.153658] systemd-fsck[450]: beschädigt, und sie könnten e2fsck mit einem anderen SuperBlock:
<30>[   11.154032] systemd-fsck[450]: e2fsck -b 8193 <Gerät>
<30>[   11.231812] boot.lvm[629]: No volume groups found
<30>[   11.232332] boot.lvm[629]: ..done
<6>[   11.405439] md: bind<sdb2>
<6>[   11.526691] md/raid1:md0: active with 2 out of 2 mirrors
<6>[   11.526705] md0: detected capacity change from 0 to 9179299840
<6>[   11.610110] EXT4-fs (md1): re-mounted. Opts: acl,user_xattr
<6>[   11.610283]  md0:
<30>[   11.655265] mount[798]: mount: /dev/md2 ist bereits eingehängt oder /usr/local wird gerade benutzt
<29>[   11.655636] systemd[1]: usr-local.mount mount process exited, code=exited status=32
<29>[   11.675220] systemd[1]: Job klog.service/start failed with result 'dependency'.
<29>[   11.675227] systemd[1]: Job remote-fs.target/start failed with result 'dependency'.
<29>[   11.675233] systemd[1]: Job local-fs.target/start failed with result 'dependency'.
<30>[   11.675236] systemd[1]: Triggering OnFailure= dependencies of local-fs.target.
<29>[   11.675690] systemd[1]: Unit usr-local.mount entered failed state.
<29>[   12.246362] systemd[1]: md.service: control process exited, code=exited status=3
<29>[   12.274239] systemd[1]: Unit md.service entered failed state.
<30>[   12.282953] boot.md[844]: Not shutting down MD RAID - reboot/halt scripts do this...missing
<30>[   12.293363] systemd[1]: Startup finished in 4s 842ms 507us (kernel) + 7s 329ms 795us (userspace) = 12s 172ms 302us.
<30>[   21.087131] udevd[860]: starting version 173
<30>[   21.107736] boot.corefile[864]: Setting core dump file name
<6>[   21.117180] Adding 8964156k swap on /dev/md0.  Priority:0 extents:1 across:8964156k 
<30>[   21.169633] boot.localnet[926]: Using boot-specified hostname 'opt80'
<30>[   21.170340] boot.localnet[926]: Setting up hostname 'opt80'..done
<30>[   21.171719] boot.localnet[926]: Setting up loopback interface RTNETLINK answers: File exists
<30>[   21.172684] boot.localnet[926]: ..done
<14>[   21.276422] mtp-probe[1044]: checking bus 1, device 3: "/sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3"
<14>[   21.277233] mtp-probe[1044]: bus: 1, device: 3 was not an MTP device
<14>[   21.283651] mtp-probe[1050]: checking bus 2, device 3: "/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.7"
<14>[   21.285257] mtp-probe[1050]: bus: 2, device: 3 was not an MTP device
<14>[   21.286139] mtp-probe[1054]: checking bus 2, device 4: "/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.8"
<30>[   21.293286] systemd-fsck[921]: /dev/md2: sauber, 27884/27140096 Dateien, 55382685/108545152 Blöcke
<14>[   21.297437] mtp-probe[1054]: bus: 2, device: 4 was not an MTP device
<30>[   21.304346] boot.lvm[1056]: Waiting for udev to settle...
<13>[   21.320311] ifup[1139]: Service network not started and mode 'auto' -> skipping
<30>[   21.665124] boot.md[928]: Starting MD RAID ..done
<6>[   21.715251] EXT4-fs (md1): re-mounted. Opts: acl,user_xattr
<30>[   21.839334] boot.lvm[1056]: Scanning for LVM volume groups...
<6>[   21.905398] EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: (null)


Sys V init completely hangs and requires a ctrl-C to continue:

<6>[    3.831689] EXT4-fs (md1): mounted filesystem with ordered data mode. Opts: acl,user_xattr
<6>[    3.919706] EXT4-fs (md1): re-mounted. Opts: acl,user_xattr

<30>[  163.765329] udevd[439]: starting version 173
<6>[  164.224244] input: Power Button as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input2
<6>[  164.224290] ACPI: Power Button [PWRB]
<6>[  164.224325] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
<6>[  164.224383] ACPI: Power Button [PWRF]
<6>[  164.248536] input: PC Speaker as /devices/platform/pcspkr/input/input4
Comment 3 Olivier P 2011-11-22 14:15:58 UTC
hi,

Same problem here: during boot sequence, mounting / or /home fails with superblock problem, sending me to recovery console.

Exiting console with ^D leads to a successful boot.

Here's my fstab:
% cat /etc/fstab
/dev/disk/by-id/ata-SAMSUNG_HD154UI_XXX-part4 swap                 swap       defaults              0 0
/dev/disk/by-id/ata-SAMSUNG_HD154UI_XXX-part4 swap                 swap       defaults              0 0
/dev/md1             /                    ext4       acl,user_xattr        1 1
/dev/md0             /boot                ext4       acl,user_xattr        1 2
/dev/md2             /home                ext4       acl,user_xattr        1 2
proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
Comment 4 kenneth zadeck 2011-11-22 15:01:00 UTC
Andreas Jaeger,

it is a holiday week in the US and I am out of town and so I cannot take that machine down to get the info until next week.    However, the other two bug reports look very much like my failure, in particular, the fstab entry for /dev/md2 is very similar to mine.   

I even think that the console messages are similar except of course, mine are in english.

The one difference between mine and oliverp's system is that my root file system is a conventional partition, and the md file system that i am having trouble with is where we have our svn server so it is mounted with a 2 in column 6.

The one thing that might be relevant is that my machine is a 4 core machine.   it could be that with fewer cores, the problem is not visible.

I will upload additional files if necessary next week, but it looks like this problem may not be hard to reproduce.   

Kenny
Comment 5 Olivier P 2011-11-22 16:20:49 UTC
(In reply to comment #4)
> 
> The one thing that might be relevant is that my machine is a 4 core machine.  
> it could be that with fewer cores, the problem is not visible.
> 

I have 4 cores here as well (Athlon 2 X4).
Same bug here : https://bugzilla.novell.com/show_bug.cgi?id=731135
Comment 6 Ramon Juan Canto Serra 2011-11-22 17:22:15 UTC
Same bug here with systemd, superblock error mounting md0 at boot. With an 4 cores processor too (qcore 6600, 64bits, 12.1). With traditional sysvinit seems to work fine.

My fstab:

/dev/system/root        /                       ext4    acl,user_xattr 1 1 
/dev/md0                /boot                   ext4    acl,user_xattr 1 2 
/dev/system/swap        swap                    swap    defaults 0 0 
/dev/system/var         /var                    ext4    acl,user_xattr 1 2 
/dev/system/home        /home                   ext4    acl,user_xattr 1 2 
/dev/system/usrlocal    /usr/local              ext4    acl,user_xattr 1 2 
proc                    /proc                   proc    defaults 0 0 
sysfs                   /sys                    sysfs   noauto 0 0 
debugfs                 /sys/kernel/debug       debugfs noauto 0 0 
usbfs                   /proc/bus/usb           usbfs   noauto 0 0 
devpts                  /dev/pts                devpts  mode=0620,gid=5 0 0 
nfsserver:/             /srv/share              nfs4    defaults,bg 0 0 


/dev/md0 => 2 disks raid 1, version 0.90. I tried version 1.2 too, with same results.

LVM is on top of 2 more raids 1, first of them is in the same disk than md0. They work fine.

The workarround,putting md0 like noauto, and mounting manually in boot.local (i.e.), works.
Comment 7 Craig Rogers 2011-11-28 02:16:12 UTC
I believe that bug 731135 is for this same problem.

As far as I can tell, the essence of the problem is that systemd doesn't have an explicit md.service description and associated target files (see /lib/systemd/system/localfs.service for comparison), and doesn't know about the dependency that boot.md (or its systemd equivalent) *must* complete before boot.localfs (or equivalent) is started.

In the SysV init world, boot.md runs before boot.localfs because of the sequence
numbers assigned to them in "/etc/init.d/boot.d".

The SysV init scripts also contain LSB comments that are supposed to document service startup and shutdown dependencies.  I think the LSB comments for boot.localfs are incorrect:  they seem to say that boot.localfs runs before boot.md, when it must be the other way around.
Comment 8 Olivier P 2011-11-28 13:26:59 UTC
*** Bug 731135 has been marked as a duplicate of this bug. ***
Comment 9 kenneth zadeck 2011-11-28 14:50:12 UTC
Andreas, Olivier,

I am back from vacation.   Is there any other information that I could provide to help you with this bug.    It looks like others have supplied the logs you asked for.

Kenny
Comment 10 Craig Rogers 2011-11-28 23:07:16 UTC
Thinking about this problem some more, what's really desired, and what systemd should be designed to support, is a finer-grained system that tracks dependencies on a per-filesystem basis.  For example, ordinary local filesystems could be mounted in parallel with each other, and in parallel with the mdadm instance that is building the RAID devices (/dev/md0, etc.).  Once a new raid device is available, an event of some sort would be generated so systemd could then attempt to mount the filesystem on the new device.  This could be viewed as an extension of mount point ordering -- there might be filesystems that with mount points that lie on top of the RAID-based filesystem.  It might also provide a better system administration experience in case of partial drive or filesystem failure.

A compromise might be to split localfs mounts into two phases:  one that mounts filesystems that don't depend upon md drives, and one that mounts filesystems that do depend on md drives.  Ultimately, I imagine, all this complexity will be hidden in the filesystems themselves, such as with the RAID support integrated into BTRFS.

In the short term, I'd be satisfied to have openSUSE 12.1's systemd-based initialization build all the md devices, then mount the local filesystems, just like the SysV init package does.
Comment 11 Frederic Crozat 2011-11-29 10:16:27 UTC
could people test package from home:fcrozat:systemd / systemd ?
Comment 12 Olivier P 2011-11-29 18:22:49 UTC
Hi,

This solved it :
sudo zypper ar "http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/home:fcrozat:systemd.repo"
sudo zypper dup

Thanks a lot.

Regards,
Olivier
Comment 13 Ramon Juan Canto Serra 2011-11-29 22:02:19 UTC
/dev/md0 is a raid1 version 0.90 (default is 1.0)

Tested:

Packages installed:

systemd|37-297.1|x86_64||home_fcrozat_systemd
systemd-presets-branding-openSUSE|0.1.0-29.1|noarch
systemd-sysvinit|37-297.1|x86_64||home_fcrozat_systemd


I rebooted a lot, and my system boots every time.

But, in 3 times, checking with "systemctl --failed" a md.service error apears:

# systemctl status md.service

md.service - LSB: Multiple Device RAID
          Loaded: loaded (/etc/init.d/boot.md)
          Active: failed since Tue, 29 Nov 2011 21:52:32 +0100; 1min 51s ago
         Process: 706 ExecStart=/etc/init.d/boot.md start (code=exited, status=1/FAILURE)
          CGroup: name=systemd:/system/md.service


despite that, /boot remains mounted correctly, and no more "fsck" and "SuperBlock" errors:

[   11.970454] systemd[1]: md.service: control process exited, code=exited status=1
[   11.997939] systemd[1]: Unit md.service entered failed state.
[   12.081093] md: bind<sda1>
[   12.178361] md/raid1:md0: active with 2 out of 2 mirrors
[   12.185430] md0: detected capacity change from 0 to 1003356160
[   12.225429]  md0: unknown partition table
[   12.856998] systemd-fsck[926]: boot: limpio, 43/61312 ficheros, 15060/244960 bloques
[   12.870665] systemd-fsck[929]: var: limpio, 1845/1966080 ficheros, 717746/7864320 bloques
[   12.884267] systemd-fsck[932]: home: limpio, 423283/68911104 ficheros, 164044109/275633152 bloques (comprobación después de 3 montajes)
[   12.904727] systemd-fsck[935]: usrlocal: limpio, 290/655360 ficheros, 86311/2621440 bloques (comprobación en el siguiente montaje)
[   13.082309] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[   13.103274] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[   13.135768] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[   13.273474] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: acl,user_xattr
Comment 14 Frederic Crozat 2011-11-30 09:09:19 UTC
ok, will be handled by maintenance update of systemd

*** This bug has been marked as a duplicate of bug 724912 ***
Comment 15 Carlos Robinson 2011-12-07 13:39:07 UTC
Please, could some one of you, reporters or assignees, add an entry for this issue in the "openSUSE:Most annoying bugs 12.1" wiki page?

http://en.opensuse.org/openSUSE:Most_annoying_bugs_12.1
Comment 16 Forgotten User uIDUwskfA7 2015-11-19 11:00:49 UTC
while booting from HDD it is giving that Failed Features : boot.localfs

Skipped features : boot.md
Comment 17 Tomáš Chvátal 2017-08-11 18:44:34 UTC
12.1 is out of support scope and currently, we test MD raid boots in openQA so they should be working just fine.

Please open a new bug if the problem is still present on current Leap or Tumbleweed releases.