Bugzilla – Bug 1054616
software RAID not initialized at boot after openSUSE-2017-847 patch applied
Last modified: 2017-09-27 07:59:38 UTC
I have a software RAID5 which was created about 6 years ago. It consists of 4 partitions on 4 separate SATA hard drives. The first partition in the RAID is a logical partition in an extended partition (sda6). The other three are primary partitions (sdb1, sdc1, sdd1). My openSuSE Leap 42.2 system would not boot after installing patches. /boot was on a primary partition and / was on a logical volume on the RAID along with other logical volumes for /home and data. Since Leap 42.3 was just released, I tried installing it with root on primary partition and /home on the logical volume. The installation and first boot went fine (pretty much all defaults except for partitioning). There were updates suggested, so I installed all but openSUSE-2017-847 because I was suspicious when I saw systemd and dracut in the title. After applying updates, I rebooted with no problems. I then applied openSUSE-2017-847 update. On reboot, the /home partition was not found, so I had to remove it from /etc/fstab. After poking around, I discovered that there was no /proc/mdstat indicating the RAID was never initialized, but if I ran "mdadm --assemble --scan" the RAID appeared. Since then, I've reinstalled 42.3 a couple of times and played with scripting and looked at logs. I tried reversing some of the scripting changes that were introduced in the openSUSE-2017-847 update. In particular, I tried reversing changes made to /usr/lib/dracut/modules.d/95udev-rules/module-setup.sh, /usr/lib/udev/rules.d/60-persistent-storage.rules and /usr/lib/udev/rules.d/61-persistent-storage.rules. Rebooting after those changes at least allowed the RAID to be assembled, but anything associated with the RAID wasn't initialized in /dev. Here is the output of 'dmesg | grep -i raid': [ 2.967352] raid6: sse2x1 gen() 4801 MB/s [ 3.035334] raid6: sse2x1 xor() 4806 MB/s [ 3.103349] raid6: sse2x2 gen() 8126 MB/s [ 3.171334] raid6: sse2x2 xor() 8154 MB/s [ 3.239348] raid6: sse2x4 gen() 8822 MB/s [ 3.307343] raid6: sse2x4 xor() 3932 MB/s [ 3.307345] raid6: using algorithm sse2x4 gen() 8822 MB/s [ 3.307345] raid6: .... xor() 3932 MB/s, rmw enabled [ 3.307346] raid6: using intx1 recovery algorithm [ 8.673240] md/raid:md0: device sdc1 operational as raid disk 2 [ 8.673247] md/raid:md0: device sdb1 operational as raid disk 1 [ 8.673250] md/raid:md0: device sda6 operational as raid disk 0 [ 8.673251] md/raid:md0: device sdd1 operational as raid disk 3 [ 8.674295] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2 I was not seeing the last 5 lines prior to making the script changes. But I'm making these changes blindly since I don't know much about dracut or udev configurations. Since then, another systemd patch has been released (openSUSE-2017-950), but installing that update hasn't changed the situation. Initially I reported this problem on the openSUSE Install/Boot/Login Forum. It was suggested I submit a bug report and after another user posted with the same problem, I figured it was time to submit the bug.
Martin, can you take a look?
For a starter, please revert manual changes to udev rules and provide a serial console log (or, better even journalctl -b captured in emergency mode).
(In reply to Martin Wilck from comment #2) > For a starter, please revert manual changes to udev rules and provide a > serial console log (or, better even journalctl -b captured in emergency > mode). Sorry, I've been on vacation so I haven't had a chance to generate the journal output until this morning. After running journalctl -b, I noticed the following line: Sep 08 09:15:01 linux mdadm[3327]: DeviceDisappeared event detected on md device /dev/md/linux:0 However, I believe the problem has been resolved with the openSUSE-2017-1005 systemd patch. After capturing the journal output, I applied the latest patches. Once I rebooted, the RAID appeared. I looked at the list of patches applied and made the assumption that the latest systemd patch fixed the problem. I tried to retrace my steps in order to nail down the fix. I reinstalled openSUSE, captured the journal on first boot, installed all patches except the three related to systemd (847, 950 and 1005), rebooted and again captured the journal. I tried to apply only patch 847 by deselecting 950 and 1005 from the list of Software Updates in the panel tray. After pushing the Install Updates button, I found that all systemd patches had been applied and when I rebooted, the RAID was present. So I used snapper to rollback the patches and tried using YaST Online Update to mark 950 and 1005 as taboo, but again all the patches related to systemd were installed. I guess installing the most recent patch version regardless of the patch summary selection is a "feature". I can't verify openSUSE-2017-1005 fixed the problem, but I'm happy it has been resolved. I'll be applying the systemd patches to my primary SuSE 42.3 system. Thanks
OK, closing bug. Feel free to reopen if this occurs again.