Bugzilla – Bug 1210443
[Build 90.1] openQA test fails in mdadm, after fail/detach a disk from raid 5, the md5sum of the file gets changed
Last modified: 2024-02-20 08:07:14 UTC
## Observation openQA test in scenario sle-15-SP5-Online-aarch64-extra_tests_textmode@aarch64 fails in [mdadm](https://openqa.suse.de/tests/10912669/modules/mdadm/steps/8) ## Test suite description Maintainer: QE Core, asmorodskyi,dheidler. Mainly console extratest ## Reproducible Fails since (at least) Build [72.4](https://openqa.suse.de/tests/10528743) ## Expected result Last good: [66.1](https://openqa.suse.de/tests/10339774) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=aarch64&distri=sle&flavor=Online&machine=aarch64&test=extra_tests_textmode&version=15-SP5)
Hello Coly, Here comes the log: Test 3: RAID 5 -------------- Creating disk image 1 of size 512MiB ... # fallocate -l 536870912 disk1.img Creating disk image 2 of size 512MiB ... # fallocate -l 536870912 disk2.img Creating disk image 3 of size 512MiB ... # fallocate -l 536870912 disk3.img Done! # losetup /dev/loop41 disk1.img # losetup /dev/loop42 disk2.img # losetup /dev/loop43 disk3.img # losetup -l NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO LOG-SEC /dev/loop43 0 0 0 0 /var/tmp/mdadm_test/30307/disk3.img 0 512 /dev/loop41 0 0 0 0 /var/tmp/mdadm_test/30307/disk1.img 0 512 /dev/loop42 0 0 0 0 /var/tmp/mdadm_test/30307/disk2.img 0 512 # mdadm --create --verbose /dev/md1054 --level=5 --raid-devices=3 --size=522240 /dev/loop41 /dev/loop42 /dev/loop43 mdadm: layout defaults to left-symmetric mdadm: layout defaults to left-symmetric mdadm: chunk size defaults to 512K mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md1054 started. # cat /proc/mdstat Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] md1054 : active raid5 loop43[3] loop42[1] loop41[0] 1044480 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_] [=>...................] recovery = 5.2% (27924/522240) finish=0.2min speed=27924K/sec unused devices: <none> Waiting for raid sync ... Waiting for raid sync ... Waiting for raid sync ... Waiting for raid sync ... 1044480 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] # cat /proc/mdstat Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] md1054 : active raid5 loop43[3] loop42[1] loop41[0] 1044480 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] unused devices: <none> # fdisk -l /dev/md1054 Disk /dev/md1054: 1020 MiB, 1069547520 bytes, 2088960 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 524288 bytes / 1048576 bytes # mkfs.ext4 /dev/md1054 mke2fs 1.46.4 (18-Aug-2021) Creating filesystem with 261120 4k blocks and 65280 inodes Filesystem UUID: c9b95669-25e2-4705-834d-986a9b65c644 Superblock backups stored on blocks: 32768, 98304, 163840, 229376 Allocating group tables: 0/8 done Writing inode tables: 0/8 done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: 0/8 done # mount /dev/md1054 /var/tmp/mdadm_test/30307/mnt # dd if=/dev/urandom of=random_data.raw bs=100M count=1 1+0 records in 1+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 1.44299 s, 72.7 MB/s 01636b67bac6ddf273917ff409229258 random_data.raw Copying random file 1 ... # cp random_data.raw /var/tmp/mdadm_test/30307/mnt/random_1.raw Copying random file 2 ... # cp random_data.raw /var/tmp/mdadm_test/30307/mnt/random_2.raw Copying random file 3 ... # cp random_data.raw /var/tmp/mdadm_test/30307/mnt/random_3.raw Copying random file 4 ... # cp random_data.raw /var/tmp/mdadm_test/30307/mnt/random_4.raw # md5sum /var/tmp/mdadm_test/30307/mnt/random_1.raw 01636b67bac6ddf273917ff409229258 /var/tmp/mdadm_test/30307/mnt/random_1.raw # md5sum /var/tmp/mdadm_test/30307/mnt/random_2.raw 01636b67bac6ddf273917ff409229258 /var/tmp/mdadm_test/30307/mnt/random_2.raw # md5sum /var/tmp/mdadm_test/30307/mnt/random_3.raw 01636b67bac6ddf273917ff409229258 /var/tmp/mdadm_test/30307/mnt/random_3.raw # md5sum /var/tmp/mdadm_test/30307/mnt/random_4.raw 01636b67bac6ddf273917ff409229258 /var/tmp/mdadm_test/30307/mnt/random_4.raw # mdadm /dev/md1054 --fail /dev/loop41 mdadm: set /dev/loop41 faulty in /dev/md1054 State : active, degraded # cat /proc/mdstat Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] md1054 : active raid5 loop43[3] loop42[1] loop41[0](F) 1044480 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU] unused devices: <none> # md5sum /var/tmp/mdadm_test/30307/mnt/random_1.raw 01636b67bac6ddf273917ff409229258 /var/tmp/mdadm_test/30307/mnt/random_1.raw # md5sum /var/tmp/mdadm_test/30307/mnt/random_2.raw 01636b67bac6ddf273917ff409229258 /var/tmp/mdadm_test/30307/mnt/random_2.raw # md5sum /var/tmp/mdadm_test/30307/mnt/random_3.raw 81d4b907b07858901d3e446492ccc0cb /var/tmp/mdadm_test/30307/mnt/random_3.raw Expected pattern "01636b67bac6ddf273917ff409229258" not found! So far, the test can be reproduced on aarch64 platform easily. please refer to https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/data/qam/mdadm.sh For test script.
Thanks for the detailed information. This is not a good thing, let me look at it after finishing current bug.
(In reply to Coly Li from comment #2) > Thanks for the detailed information. This is not a good thing, let me look > at it after finishing current bug. It is not reproduced on 5.14.21-150500.46-default on arm64 machine. I will do more testing later. BTW, what is the kernel version or ISO build for this testing?
(In reply to Coly Li from comment #3) > BTW, what is the kernel version or ISO build for this testing? Maybe it's this info somewhere in extra_tests_textmode, but IMHO the easiest way is to look at any LTP test in kernel validation group [1]. * SLE-15-SP5-Online-aarch64-Build90.1-Media1.iso has 5.14.21-150500.48-default (50e397b) (looking at kernel test [2]), from git branch: SLE15-SP5-GA [3] * SLE-15-SP5-Online-aarch64-Build93.1-Media1.iso has 5.14.21-150500.49-default (892448f) [4], obviously from the same git branch [5] [1] https://openqa.suse.de/group_overview/116 [2] https://openqa.suse.de/tests/10905035#step/boot_ltp/62 [3] https://openqa.suse.de/tests/10905035#step/boot_ltp/80 [4] https://openqa.suse.de/tests/10925920#step/boot_ltp/62 [5] https://openqa.suse.de/tests/10925920#step/boot_ltp/80
(In reply to Petr Vorel from comment #4) > (In reply to Coly Li from comment #3) > > BTW, what is the kernel version or ISO build for this testing? > > Maybe it's this info somewhere in extra_tests_textmode, but IMHO the easiest > way is to look at any LTP test in kernel validation group [1]. This is easier: all tests have in "Logs & Assets" section [2] serial0.txt file which contains dmesg with obvious "Linux version...": x86_64 [3], s390x [4], ppc64le [5], aarch64 [6] > [1] https://openqa.suse.de/group_overview/116 [2] https://openqa.suse.de/tests/10904692#downloads [3] https://openqa.suse.de/tests/10904692/file/serial0.txt [4] https://openqa.suse.de/tests/10906464/file/serial0.txt [5] https://openqa.suse.de/tests/10903819/file/serial0.txt [6] https://openqa.suse.de/tests/10912669/file/serial0.txt
(In reply to Richard Fan from comment #1) > So far, the test can be reproduced on aarch64 platform easily. please refer > to > https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/data/ > qam/mdadm.sh Hi Richard, I run the mdadm.sh scripts on my aarch64 machine (Apple M1 chip) for 1.5 days, I guess it should be 1000+ runs. No such md5csum issue reported. For this problem, does it only happen on one virtual machine, or also happen on other aarch64 virtual machine or real hardware? Currently I don't have clue, so I have to think whether it is a hardware related one. Thanks. Coly Li
WE can accept this fix. But please submit first to the upstream ;) as the last sr is still on hod because of that
(In reply to Radoslav Tzvetkov from comment #7) > WE can accept this fix. But please submit first to the upstream ;) as the > last sr is still on hod because of that Hmm, I don't understand the comments. I just described the problem was not reproduced, no fix yet.
(In reply to Coly Li from comment #6) > (In reply to Richard Fan from comment #1) > > > > So far, the test can be reproduced on aarch64 platform easily. please refer > > to > > https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/data/ > > qam/mdadm.sh > > Hi Richard, > > I run the mdadm.sh scripts on my aarch64 machine (Apple M1 chip) for 1.5 > days, I guess it should be 1000+ runs. No such md5csum issue reported. > > For this problem, does it only happen on one virtual machine, or also happen > on other aarch64 virtual machine or real hardware? > > Currently I don't have clue, so I have to think whether it is a hardware > related one. > > Thanks. > > Coly Li The issue is found on virtual machine on aarch64 platform so far. I didn't test it on BM. Can you please provide a debug kernel or mdadm package to reproduce the issue on my setup? then I can collect the logs for your investigation
is it possible that https://bugzilla.suse.com/show_bug.cgi?id=1216381 is a result of this? although they are different test modules
(In reply to Coly Li from comment #8) > (In reply to Radoslav Tzvetkov from comment #7) > > WE can accept this fix. But please submit first to the upstream ;) as the > > last sr is still on hod because of that > > Hmm, I don't understand the comments. I just described the problem was not > reproduced, no fix yet. Hi Coly, We have this one failing a bit more often[1], if you let me know what other information we can provide for you to debug, we can make openQA do that for us, in any case we can also get the machine to fail and leave it running for you so you can debug live. [1]: https://openqa.suse.de/tests/overview?arch=&flavor=&machine=&test=&modules=mdadm&module_re=&modules_result=failed&group_glob=¬_group_glob=&comment=&distri=sle&build=mdadm&version=15-SP5
(In reply to Santiago Zarate from comment #11) > (In reply to Coly Li from comment #8) > > (In reply to Radoslav Tzvetkov from comment #7) > > > WE can accept this fix. But please submit first to the upstream ;) as the > > > last sr is still on hod because of that > > > > Hmm, I don't understand the comments. I just described the problem was not > > reproduced, no fix yet. > > Hi Coly, We have this one failing a bit more often[1], if you let me know > what other information we can provide for you to debug, we can make openQA > do that for us, in any case we can also get the machine to fail and leave it > running for you so you can debug live. > > > [1]: > https://openqa.suse.de/tests/ > overview?arch=&flavor=&machine=&test=&modules=mdadm&module_re=&modules_result > =failed&group_glob=¬_group_glob=&comment=&distri=sle&build=mdadm&version=1 > 5-SP5 Sorry, this comment was meant for https://bugzilla.suse.com/show_bug.cgi?id=1219073 still, Coly it would be good for you to give a look if possible
SUSE-SU-2024:0469-1: An update that solves 19 vulnerabilities, contains eight features and has 41 security fixes can now be installed. Category: security (important) Bug References: 1065729, 1108281, 1141539, 1174649, 1181674, 1193285, 1194869, 1209834, 1210443, 1211515, 1212091, 1214377, 1215275, 1215885, 1216441, 1216559, 1216702, 1217895, 1217987, 1217988, 1217989, 1218005, 1218447, 1218527, 1218659, 1218713, 1218723, 1218730, 1218738, 1218752, 1218757, 1218768, 1218778, 1218779, 1218804, 1218832, 1218836, 1218916, 1218948, 1218958, 1218968, 1218997, 1219006, 1219012, 1219013, 1219014, 1219053, 1219067, 1219120, 1219128, 1219136, 1219285, 1219349, 1219412, 1219429, 1219434, 1219490, 1219512, 1219568, 1219582 CVE References: CVE-2021-33631, CVE-2023-46838, CVE-2023-47233, CVE-2023-4921, CVE-2023-51042, CVE-2023-51043, CVE-2023-51780, CVE-2023-51782, CVE-2023-6040, CVE-2023-6356, CVE-2023-6531, CVE-2023-6535, CVE-2023-6536, CVE-2023-6915, CVE-2024-0565, CVE-2024-0641, CVE-2024-0775, CVE-2024-1085, CVE-2024-1086 Jira References: PED-4729, PED-6694, PED-7322, PED-7615, PED-7616, PED-7620, PED-7622, PED-7623 Sources used: openSUSE Leap 15.5 (src): kernel-livepatch-SLE15-SP5-RT_Update_10-1-150500.11.5.1, kernel-source-rt-5.14.21-150500.13.35.1, kernel-syms-rt-5.14.21-150500.13.35.1 SUSE Linux Enterprise Live Patching 15-SP5 (src): kernel-livepatch-SLE15-SP5-RT_Update_10-1-150500.11.5.1 SUSE Real Time Module 15-SP5 (src): kernel-source-rt-5.14.21-150500.13.35.1, kernel-syms-rt-5.14.21-150500.13.35.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
SUSE-SU-2024:0516-1: An update that solves 21 vulnerabilities, contains nine features and has 40 security fixes can now be installed. Category: security (important) Bug References: 1065729, 1108281, 1141539, 1174649, 1181674, 1193285, 1194869, 1209834, 1210443, 1211515, 1212091, 1214377, 1215275, 1215885, 1216441, 1216559, 1216702, 1217895, 1217987, 1217988, 1217989, 1218005, 1218447, 1218527, 1218659, 1218689, 1218713, 1218723, 1218730, 1218752, 1218757, 1218768, 1218778, 1218779, 1218804, 1218832, 1218836, 1218916, 1218948, 1218958, 1218968, 1218997, 1219006, 1219012, 1219013, 1219014, 1219053, 1219067, 1219120, 1219128, 1219136, 1219285, 1219349, 1219412, 1219429, 1219434, 1219490, 1219512, 1219568, 1219582, 1219608 CVE References: CVE-2021-33631, CVE-2023-46838, CVE-2023-47233, CVE-2023-4921, CVE-2023-51042, CVE-2023-51043, CVE-2023-51780, CVE-2023-51782, CVE-2023-6040, CVE-2023-6356, CVE-2023-6531, CVE-2023-6535, CVE-2023-6536, CVE-2023-6915, CVE-2024-0340, CVE-2024-0565, CVE-2024-0641, CVE-2024-0775, CVE-2024-1085, CVE-2024-1086, CVE-2024-24860 Jira References: PED-4729, PED-6694, PED-7322, PED-7615, PED-7616, PED-7618, PED-7620, PED-7622, PED-7623 Sources used: openSUSE Leap 15.5 (src): kernel-livepatch-SLE15-SP5_Update_10-1-150500.11.5.1, kernel-source-5.14.21-150500.55.49.1, kernel-default-base-5.14.21-150500.55.49.1.150500.6.21.2, kernel-obs-build-5.14.21-150500.55.49.1, kernel-syms-5.14.21-150500.55.49.1, kernel-obs-qa-5.14.21-150500.55.49.1 SUSE Linux Enterprise Micro 5.5 (src): kernel-default-base-5.14.21-150500.55.49.1.150500.6.21.2 Basesystem Module 15-SP5 (src): kernel-source-5.14.21-150500.55.49.1, kernel-default-base-5.14.21-150500.55.49.1.150500.6.21.2 Development Tools Module 15-SP5 (src): kernel-obs-build-5.14.21-150500.55.49.1, kernel-source-5.14.21-150500.55.49.1, kernel-syms-5.14.21-150500.55.49.1 SUSE Linux Enterprise Live Patching 15-SP5 (src): kernel-livepatch-SLE15-SP5_Update_10-1-150500.11.5.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
SUSE-SU-2024:0514-1: An update that solves 21 vulnerabilities, contains nine features and has 41 security fixes can now be installed. Category: security (important) Bug References: 1065729, 1108281, 1141539, 1174649, 1181674, 1193285, 1194869, 1209834, 1210443, 1211515, 1212091, 1214377, 1215275, 1215885, 1216441, 1216559, 1216702, 1217895, 1217987, 1217988, 1217989, 1218005, 1218447, 1218527, 1218659, 1218689, 1218713, 1218723, 1218730, 1218738, 1218752, 1218757, 1218768, 1218778, 1218779, 1218804, 1218832, 1218836, 1218916, 1218948, 1218958, 1218968, 1218997, 1219006, 1219012, 1219013, 1219014, 1219053, 1219067, 1219120, 1219128, 1219136, 1219285, 1219349, 1219412, 1219429, 1219434, 1219490, 1219512, 1219568, 1219582, 1219608 CVE References: CVE-2021-33631, CVE-2023-46838, CVE-2023-47233, CVE-2023-4921, CVE-2023-51042, CVE-2023-51043, CVE-2023-51780, CVE-2023-51782, CVE-2023-6040, CVE-2023-6356, CVE-2023-6531, CVE-2023-6535, CVE-2023-6536, CVE-2023-6915, CVE-2024-0340, CVE-2024-0565, CVE-2024-0641, CVE-2024-0775, CVE-2024-1085, CVE-2024-1086, CVE-2024-24860 Jira References: PED-4729, PED-6694, PED-7322, PED-7615, PED-7616, PED-7618, PED-7620, PED-7622, PED-7623 Sources used: openSUSE Leap 15.5 (src): kernel-source-azure-5.14.21-150500.33.34.1, kernel-syms-azure-5.14.21-150500.33.34.1 Public Cloud Module 15-SP5 (src): kernel-source-azure-5.14.21-150500.33.34.1, kernel-syms-azure-5.14.21-150500.33.34.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
close as fixed