Bug 1054616

Summary: software RAID not initialized at boot after openSUSE-2017-847 patch applied
Product: [openSUSE] openSUSE Distribution Reporter: Mark Elliott <emark1000>
Component: BasesystemAssignee: Daniel Molkentin <daniel>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: daniel, emark1000, martin.wilck
Version: Leap 42.3   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 42.3   
See Also: http://bugzilla.opensuse.org/show_bug.cgi?id=1060226
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Mark Elliott 2017-08-20 01:50:50 UTC
I have a software RAID5 which was created about 6 years ago.  It consists of 4 partitions on 4 separate SATA hard drives.  The first partition in the RAID is a logical partition in an extended partition (sda6).  The other three are primary partitions (sdb1, sdc1, sdd1).  My openSuSE Leap 42.2 system would not boot after installing patches.  /boot was on a primary partition and / was on a logical volume on the RAID along with other logical volumes for /home and data.

Since Leap 42.3 was just released, I tried installing it with root on primary partition and /home on the logical volume.  The installation and first boot went fine (pretty much all defaults except for partitioning).  There were updates suggested, so I installed all but openSUSE-2017-847 because I was suspicious when I saw systemd and dracut in the title.  After applying updates, I rebooted with no problems.  I then applied openSUSE-2017-847 update.  On reboot, the /home partition was not found, so I had to remove it from /etc/fstab.  After poking around, I discovered that there was no /proc/mdstat indicating the RAID was never initialized, but if I ran "mdadm --assemble --scan" the RAID appeared.

Since then, I've reinstalled 42.3 a couple of times and played with scripting and looked at logs.  I tried reversing some of the scripting changes that were introduced in the openSUSE-2017-847 update.  In particular, I tried reversing changes made to /usr/lib/dracut/modules.d/95udev-rules/module-setup.sh, /usr/lib/udev/rules.d/60-persistent-storage.rules and /usr/lib/udev/rules.d/61-persistent-storage.rules.  Rebooting after those changes at least allowed the RAID to be assembled, but anything associated with the RAID wasn't initialized in /dev.  Here is the output of 'dmesg | grep -i raid':

[ 2.967352] raid6: sse2x1 gen() 4801 MB/s
[ 3.035334] raid6: sse2x1 xor() 4806 MB/s
[ 3.103349] raid6: sse2x2 gen() 8126 MB/s
[ 3.171334] raid6: sse2x2 xor() 8154 MB/s
[ 3.239348] raid6: sse2x4 gen() 8822 MB/s
[ 3.307343] raid6: sse2x4 xor() 3932 MB/s
[ 3.307345] raid6: using algorithm sse2x4 gen() 8822 MB/s
[ 3.307345] raid6: .... xor() 3932 MB/s, rmw enabled
[ 3.307346] raid6: using intx1 recovery algorithm
[ 8.673240] md/raid:md0: device sdc1 operational as raid disk 2
[ 8.673247] md/raid:md0: device sdb1 operational as raid disk 1
[ 8.673250] md/raid:md0: device sda6 operational as raid disk 0
[ 8.673251] md/raid:md0: device sdd1 operational as raid disk 3
[ 8.674295] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2

I was not seeing the last 5 lines prior to making the script changes.  But I'm making these changes blindly since I don't know much about dracut or udev configurations.

Since then, another systemd patch has been released (openSUSE-2017-950), but installing that update hasn't changed the situation.

Initially I reported this problem on the openSUSE Install/Boot/Login Forum.  It was suggested I submit a bug report and after another user posted with the same problem, I figured it was time to submit the bug.
Comment 1 Daniel Molkentin 2017-08-30 12:40:28 UTC
Martin, can you take a look?
Comment 2 Martin Wilck 2017-08-30 21:37:50 UTC
For a starter, please revert manual changes to udev rules and provide a serial console log (or, better even journalctl -b captured in emergency mode).
Comment 3 Mark Elliott 2017-09-08 20:51:05 UTC
(In reply to Martin Wilck from comment #2)
> For a starter, please revert manual changes to udev rules and provide a
> serial console log (or, better even journalctl -b captured in emergency
> mode).

Sorry, I've been on vacation so I haven't had a chance to generate the journal output until this morning.  After running journalctl -b, I noticed the following line:

Sep 08 09:15:01 linux mdadm[3327]: DeviceDisappeared event detected on md device /dev/md/linux:0

However, I believe the problem has been resolved with the openSUSE-2017-1005 systemd patch.  After capturing the journal output, I applied the latest patches.  Once I rebooted, the RAID appeared.  I looked at the list of patches applied and made the assumption that the latest systemd patch fixed the problem.

I tried to retrace my steps in order to nail down the fix.  I reinstalled openSUSE, captured the journal on first boot, installed all patches except the three related to systemd (847, 950 and 1005), rebooted and again captured the journal.  I tried to apply only patch 847 by deselecting 950 and 1005 from the list of Software Updates in the panel tray.  After pushing the Install Updates button, I found that all systemd patches had been applied and when I rebooted, the RAID was present.  So I used snapper to rollback the patches and tried using YaST Online Update to mark 950 and 1005 as taboo, but again all the patches related to systemd were installed.  I guess installing the most recent patch version regardless of the patch summary selection is a "feature".

I can't verify openSUSE-2017-1005 fixed the problem, but I'm happy it has been resolved.  I'll be applying the systemd patches to my primary SuSE 42.3 system.

Thanks
Comment 4 Martin Wilck 2017-09-11 07:36:02 UTC
OK, closing bug. Feel free to reopen if this occurs again.