Bugzilla – Bug 1212462
RAID Resync on every reboot with latest mdadm-4.1-150300.24.27.1 update
Last modified: 2024-01-23 08:54:03 UTC
One of my servers shows the issue, that RAID resync is started on every reboot since mdadm-4.1-150300.24.27.1 update. Downgrading mdadm to previous version mdadm-4.1-150300.24.24.2.x86_64 fixes the issue after two reboots. The changelog of mdadm-4.1-150300.24.27.1 shows these new patches. The bugreports bsc#1205493 and bsc#1205830 are unfortunately not accessible for everyone. * Mo Apr 24 2023 Coly Li <colyli@suse.de> - Fixes for mdmon to ensure it run at the right time in the fight mount namespace. This fixes various problems with IMSM raid arrays in 15-SP4 (bsc#1205493, bsc#1205830) - mdmon: fix segfault 0052-mdmon-fix-segfault.patch - util: remove obsolete code from get_md_name 0053-util-remove-obsolete-code-from-get_md_name.patch - mdmon: don't test both 'all' and 'container_name'. 0054-mdmon-don-t-test-both-all-and-container_name.patch - mdmon: change systemd unit file to use --foreground 0055-mdmon-change-systemd-unit-file-to-use-foreground.patch - mdmon: Remove need for KillMode=none 0056-mdmon-Remove-need-for-KillMode-none.patch - mdmon: Improve switchroot interactions. 0057-mdmon-Improve-switchroot-interactions.patch - mdopen: always try create_named_array() 0058-mdopen-always-try-create_named_array.patch - Improvements for IMSM_NO_PLATFORM testing 0059-Improvements-for-IMSM_NO_PLATFORM-testing.patch Here are some information about the server. I probably can only do limited testing of this server: * openSUSE Leap 15.4 * all updates installed * RAID 5 (fake RAID) configured in BIOS of Gigabyte H97-HD3 mainboard /proc/mdstat after reboot: mybox:~ # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md126 : active raid5 sda[2] sdb[1] sdc[0] 5860528128 blocks super external:/md127/0 level 5, 128k chunk, algorithm 0 [3/3] [UUU] [================>....] resync = 84.6% (2480653388/2930264064) finish=63.6min speed=117661K/sec md127 : inactive sdc[2](S) sdb[1](S) sda[0](S) 7560 blocks super external:imsm unused devices: <none> mybox:~ # cat /etc/mdadm.conf DEVICE containers partitions ARRAY metadata=imsm UUID=aa8b8c23:5079309c:9d892159:6fc14e0b ARRAY /dev/md/Volume1_0 container=aa8b8c23:5079309c:9d892159:6fc14e0b member=0 UUID=2ad07e84:caabad99:89364b2b:c078ad58 mybox:~ # mdadm --detail --scan ARRAY /dev/md/imsm0 metadata=imsm UUID=aa8b8c23:5079309c:9d892159:6fc14e0b ARRAY /dev/md/Volume1 container=/dev/md/imsm0 member=0 UUID=07e8d0c0:507479ce:4328be27:9d7f16a7
Do you mean after the resync complished, and reboot the system, then you see the resync start again?
(In reply to Coly Li from comment #1) > Do you mean after the resync complished, and reboot the system, then you see > the resync start again? Yes, exactly. Sorry for late reply.
(In reply to Björn Voigt from comment #2) > (In reply to Coly Li from comment #1) > > Do you mean after the resync complished, and reboot the system, then you see > > the resync start again? > > Yes, exactly. Sorry for late reply. Can I ask that after upgrading to latest mdadm package and openSUSE Leap kernel, does this issue still happen? I am not able to tell exactly what happened to the original report, if the latest openSUSE Leap 15.5 kernel and mdadm package don't help, let me try to see what happens... Many thanks. Coly Li
(In reply to Coly Li from comment #3) > Can I ask that after upgrading to latest mdadm package and openSUSE Leap > kernel, does this issue still happen? > > I am not able to tell exactly what happened to the original report, if the > latest openSUSE Leap 15.5 kernel and mdadm package don't help, let me try to > see what happens... I happened last time at end of November 2023 after I upgraded this machine from openSUSE Leap 15.4 to 15.5. Unfortunately it's a production machine. Rebuilding the RAID5 array takes around 8 hours and makes the machine unusably slow. So it's difficult to find a good time for testing again with the latest openSUSE Leap 15.5 kernel and mdadm packages. Do you think, that the newest mdadm package will only work reliable with the openSUSE patched kernels? I use a self-compiled vanilla LTS 6.1.x kernel (currently 6.1.74) mostly for security reasons. The kernel configuration .config is a nearly 1:1 copy of the previous openSUSE Tumbleweed 6.1.x kernel. If I find a good time for testing, please say, which information do you need. I had the problem, that the initrd SystemD boot scripts can fail (probably because of a timeout), if mdadm from Initrd starts a RAID5 rebuild. The hint from Initrd on the display to copy a /run/*/*log* file for later analysis is not so easy to fulfill. The RAID5 array is not ready for mounting at this stage and a VFAT formatted USB device is not recognized (missing codepage 437 or similar error).
I wonder if this is related also to: https://bugzilla.suse.com/show_bug.cgi?id=1216381
(In reply to Santiago Zarate from comment #5) > I wonder if this is related also to: > https://bugzilla.suse.com/show_bug.cgi?id=1216381 The symptoms of both issues look similar (initrd mount timeouts, MD devices not ready). A difference is, that this issue here is related to a Gigabyte BIOS RAID5 setup. The issue https://bugzilla.suse.com/show_bug.cgi?id=1216381 looks related to a KVM virtual host setup.