Bug 1210443 - [Build 90.1] openQA test fails in mdadm, after fail/detach a disk from raid 5, the md5sum of the file gets changed
Summary: [Build 90.1] openQA test fails in mdadm, after fail/detach a disk from raid 5...
Status: RESOLVED FIXED
Alias: None
Product: PUBLIC SUSE Linux Enterprise Server 15 SP5
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: unspecified
Hardware: aarch64 SLES 15
: P2 - High : Normal
Target Milestone: ---
Assignee: Coly Li
QA Contact:
URL: https://openqa.suse.de/tests/10912669...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-14 02:27 UTC by Richard Fan
Modified: 2024-02-20 08:07 UTC (History)
5 users (show)

See Also:
Found By: openQA
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Fan 2023-04-14 02:27:45 UTC
## Observation

openQA test in scenario sle-15-SP5-Online-aarch64-extra_tests_textmode@aarch64 fails in
[mdadm](https://openqa.suse.de/tests/10912669/modules/mdadm/steps/8)

## Test suite description
Maintainer: QE Core, asmorodskyi,dheidler. Mainly console extratest 


## Reproducible

Fails since (at least) Build [72.4](https://openqa.suse.de/tests/10528743)


## Expected result

Last good: [66.1](https://openqa.suse.de/tests/10339774) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=aarch64&distri=sle&flavor=Online&machine=aarch64&test=extra_tests_textmode&version=15-SP5)
Comment 1 Richard Fan 2023-04-14 02:34:17 UTC
Hello Coly,

Here comes the log:

Test 3: RAID 5
--------------

Creating disk image 1 of size 512MiB ...
# fallocate -l 536870912 disk1.img
Creating disk image 2 of size 512MiB ...
# fallocate -l 536870912 disk2.img
Creating disk image 3 of size 512MiB ...
# fallocate -l 536870912 disk3.img
Done!
# losetup /dev/loop41 disk1.img
# losetup /dev/loop42 disk2.img
# losetup /dev/loop43 disk3.img
# losetup -l
NAME        SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE                           DIO LOG-SEC
/dev/loop43         0      0         0  0 /var/tmp/mdadm_test/30307/disk3.img   0     512
/dev/loop41         0      0         0  0 /var/tmp/mdadm_test/30307/disk1.img   0     512
/dev/loop42         0      0         0  0 /var/tmp/mdadm_test/30307/disk2.img   0     512
# mdadm --create --verbose /dev/md1054 --level=5 --raid-devices=3 --size=522240 /dev/loop41 /dev/loop42 /dev/loop43
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 512K
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1054 started.
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] 
md1054 : active raid5 loop43[3] loop42[1] loop41[0]
      1044480 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
      [=>...................]  recovery =  5.2% (27924/522240) finish=0.2min speed=27924K/sec
      
unused devices: <none>
Waiting for raid sync ...
Waiting for raid sync ...
Waiting for raid sync ...
Waiting for raid sync ...
      1044480 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] 
md1054 : active raid5 loop43[3] loop42[1] loop41[0]
      1044480 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      
unused devices: <none>
# fdisk -l /dev/md1054
Disk /dev/md1054: 1020 MiB, 1069547520 bytes, 2088960 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 524288 bytes / 1048576 bytes
# mkfs.ext4 /dev/md1054
mke2fs 1.46.4 (18-Aug-2021)
Creating filesystem with 261120 4k blocks and 65280 inodes
Filesystem UUID: c9b95669-25e2-4705-834d-986a9b65c644
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376

Allocating group tables: 0/8   done                            
Writing inode tables: 0/8   done                            
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: 0/8   done
# mount /dev/md1054 /var/tmp/mdadm_test/30307/mnt
# dd if=/dev/urandom of=random_data.raw bs=100M count=1
1+0 records in
1+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 1.44299 s, 72.7 MB/s
01636b67bac6ddf273917ff409229258  random_data.raw
Copying random file 1 ...
# cp random_data.raw /var/tmp/mdadm_test/30307/mnt/random_1.raw
Copying random file 2 ...
# cp random_data.raw /var/tmp/mdadm_test/30307/mnt/random_2.raw
Copying random file 3 ...
# cp random_data.raw /var/tmp/mdadm_test/30307/mnt/random_3.raw
Copying random file 4 ...
# cp random_data.raw /var/tmp/mdadm_test/30307/mnt/random_4.raw
# md5sum /var/tmp/mdadm_test/30307/mnt/random_1.raw
01636b67bac6ddf273917ff409229258  /var/tmp/mdadm_test/30307/mnt/random_1.raw
# md5sum /var/tmp/mdadm_test/30307/mnt/random_2.raw
01636b67bac6ddf273917ff409229258  /var/tmp/mdadm_test/30307/mnt/random_2.raw
# md5sum /var/tmp/mdadm_test/30307/mnt/random_3.raw
01636b67bac6ddf273917ff409229258  /var/tmp/mdadm_test/30307/mnt/random_3.raw
# md5sum /var/tmp/mdadm_test/30307/mnt/random_4.raw
01636b67bac6ddf273917ff409229258  /var/tmp/mdadm_test/30307/mnt/random_4.raw
# mdadm /dev/md1054 --fail /dev/loop41
mdadm: set /dev/loop41 faulty in /dev/md1054
             State : active, degraded 
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] 
md1054 : active raid5 loop43[3] loop42[1] loop41[0](F)
      1044480 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
      
unused devices: <none>
# md5sum /var/tmp/mdadm_test/30307/mnt/random_1.raw
01636b67bac6ddf273917ff409229258  /var/tmp/mdadm_test/30307/mnt/random_1.raw
# md5sum /var/tmp/mdadm_test/30307/mnt/random_2.raw
01636b67bac6ddf273917ff409229258  /var/tmp/mdadm_test/30307/mnt/random_2.raw
# md5sum /var/tmp/mdadm_test/30307/mnt/random_3.raw
81d4b907b07858901d3e446492ccc0cb  /var/tmp/mdadm_test/30307/mnt/random_3.raw
Expected pattern "01636b67bac6ddf273917ff409229258" not found!

So far, the test can be reproduced on aarch64 platform easily.  please refer to 
https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/data/qam/mdadm.sh

For test script.
Comment 2 Coly Li 2023-04-15 14:47:58 UTC
Thanks for the detailed information. This is not a good thing, let me look at it after finishing current bug.
Comment 3 Coly Li 2023-04-18 16:02:16 UTC
(In reply to Coly Li from comment #2)
> Thanks for the detailed information. This is not a good thing, let me look
> at it after finishing current bug.

It is not reproduced on 5.14.21-150500.46-default on arm64 machine. I will do more testing later.

BTW, what is the kernel version or ISO build for this testing?
Comment 4 Petr Vorel 2023-04-18 16:42:16 UTC
(In reply to Coly Li from comment #3)
> BTW, what is the kernel version or ISO build for this testing?

Maybe it's this info somewhere in extra_tests_textmode, but IMHO the easiest way is to look at any LTP test in kernel validation group [1].

* SLE-15-SP5-Online-aarch64-Build90.1-Media1.iso has 5.14.21-150500.48-default (50e397b) (looking at kernel test [2]), from git branch: SLE15-SP5-GA [3]

* SLE-15-SP5-Online-aarch64-Build93.1-Media1.iso has 5.14.21-150500.49-default (892448f) [4], obviously from the same git branch [5]

[1] https://openqa.suse.de/group_overview/116
[2] https://openqa.suse.de/tests/10905035#step/boot_ltp/62
[3] https://openqa.suse.de/tests/10905035#step/boot_ltp/80
[4] https://openqa.suse.de/tests/10925920#step/boot_ltp/62
[5] https://openqa.suse.de/tests/10925920#step/boot_ltp/80
Comment 5 Petr Vorel 2023-04-18 16:51:09 UTC
(In reply to Petr Vorel from comment #4)
> (In reply to Coly Li from comment #3)
> > BTW, what is the kernel version or ISO build for this testing?
> 
> Maybe it's this info somewhere in extra_tests_textmode, but IMHO the easiest
> way is to look at any LTP test in kernel validation group [1].

This is easier: all tests have in "Logs & Assets" section [2] serial0.txt file which contains dmesg with obvious "Linux version...": x86_64 [3], s390x [4], ppc64le [5], aarch64 [6]

> [1] https://openqa.suse.de/group_overview/116
[2] https://openqa.suse.de/tests/10904692#downloads
[3] https://openqa.suse.de/tests/10904692/file/serial0.txt
[4] https://openqa.suse.de/tests/10906464/file/serial0.txt
[5] https://openqa.suse.de/tests/10903819/file/serial0.txt
[6] https://openqa.suse.de/tests/10912669/file/serial0.txt
Comment 6 Coly Li 2023-04-26 14:14:23 UTC
(In reply to Richard Fan from comment #1)


> So far, the test can be reproduced on aarch64 platform easily.  please refer
> to 
> https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/data/
> qam/mdadm.sh 

Hi Richard,

I run the mdadm.sh scripts on my aarch64 machine (Apple M1 chip) for 1.5 days, I guess it should be 1000+ runs. No such md5csum issue reported.

For this problem, does it only happen on one virtual machine, or also happen on other aarch64 virtual machine or real hardware?

Currently I don't have clue, so I have to think whether it is a hardware related one.

Thanks.

Coly Li
Comment 7 Radoslav Tzvetkov 2023-04-26 14:33:33 UTC
WE can accept this fix. But please submit first to the upstream ;) as the last sr is still on hod because of that
Comment 8 Coly Li 2023-04-26 15:47:43 UTC
(In reply to Radoslav Tzvetkov from comment #7)
> WE can accept this fix. But please submit first to the upstream ;) as the
> last sr is still on hod because of that

Hmm, I don't understand the comments. I just described the problem was not reproduced, no fix yet.
Comment 9 Richard Fan 2023-04-27 06:42:27 UTC
(In reply to Coly Li from comment #6)
> (In reply to Richard Fan from comment #1)
> 
> 
> > So far, the test can be reproduced on aarch64 platform easily.  please refer
> > to 
> > https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/data/
> > qam/mdadm.sh 
> 
> Hi Richard,
> 
> I run the mdadm.sh scripts on my aarch64 machine (Apple M1 chip) for 1.5
> days, I guess it should be 1000+ runs. No such md5csum issue reported.
> 
> For this problem, does it only happen on one virtual machine, or also happen
> on other aarch64 virtual machine or real hardware?
> 
> Currently I don't have clue, so I have to think whether it is a hardware
> related one.
> 
> Thanks.
> 
> Coly Li

The issue is found on virtual machine on aarch64 platform so far. 
I didn't test it on BM.

Can you please provide a debug kernel or mdadm package to reproduce the issue on my setup? then I can collect the logs for your investigation
Comment 10 Santiago Zarate 2023-10-18 14:42:59 UTC
is it possible that https://bugzilla.suse.com/show_bug.cgi?id=1216381 is a result of this? although they are different test modules
Comment 11 Santiago Zarate 2024-01-24 14:09:24 UTC
(In reply to Coly Li from comment #8)
> (In reply to Radoslav Tzvetkov from comment #7)
> > WE can accept this fix. But please submit first to the upstream ;) as the
> > last sr is still on hod because of that
> 
> Hmm, I don't understand the comments. I just described the problem was not
> reproduced, no fix yet.

Hi Coly, We have this one failing a bit more often[1], if you let me know what other information we can provide for you to debug, we can make openQA do that for us, in any case we can also get the machine to fail and leave it running for you so you can debug live.


[1]: https://openqa.suse.de/tests/overview?arch=&flavor=&machine=&test=&modules=mdadm&module_re=&modules_result=failed&group_glob=&not_group_glob=&comment=&distri=sle&build=mdadm&version=15-SP5
Comment 12 Santiago Zarate 2024-01-24 14:41:06 UTC
(In reply to Santiago Zarate from comment #11)
> (In reply to Coly Li from comment #8)
> > (In reply to Radoslav Tzvetkov from comment #7)
> > > WE can accept this fix. But please submit first to the upstream ;) as the
> > > last sr is still on hod because of that
> > 
> > Hmm, I don't understand the comments. I just described the problem was not
> > reproduced, no fix yet.
> 
> Hi Coly, We have this one failing a bit more often[1], if you let me know
> what other information we can provide for you to debug, we can make openQA
> do that for us, in any case we can also get the machine to fail and leave it
> running for you so you can debug live.
> 
> 
> [1]:
> https://openqa.suse.de/tests/
> overview?arch=&flavor=&machine=&test=&modules=mdadm&module_re=&modules_result
> =failed&group_glob=&not_group_glob=&comment=&distri=sle&build=mdadm&version=1
> 5-SP5

Sorry, this comment was meant for https://bugzilla.suse.com/show_bug.cgi?id=1219073 still, Coly it would be good for you to give a look if possible
Comment 24 Maintenance Automation 2024-02-14 16:30:03 UTC
SUSE-SU-2024:0469-1: An update that solves 19 vulnerabilities, contains eight features and has 41 security fixes can now be installed.

Category: security (important)
Bug References: 1065729, 1108281, 1141539, 1174649, 1181674, 1193285, 1194869, 1209834, 1210443, 1211515, 1212091, 1214377, 1215275, 1215885, 1216441, 1216559, 1216702, 1217895, 1217987, 1217988, 1217989, 1218005, 1218447, 1218527, 1218659, 1218713, 1218723, 1218730, 1218738, 1218752, 1218757, 1218768, 1218778, 1218779, 1218804, 1218832, 1218836, 1218916, 1218948, 1218958, 1218968, 1218997, 1219006, 1219012, 1219013, 1219014, 1219053, 1219067, 1219120, 1219128, 1219136, 1219285, 1219349, 1219412, 1219429, 1219434, 1219490, 1219512, 1219568, 1219582
CVE References: CVE-2021-33631, CVE-2023-46838, CVE-2023-47233, CVE-2023-4921, CVE-2023-51042, CVE-2023-51043, CVE-2023-51780, CVE-2023-51782, CVE-2023-6040, CVE-2023-6356, CVE-2023-6531, CVE-2023-6535, CVE-2023-6536, CVE-2023-6915, CVE-2024-0565, CVE-2024-0641, CVE-2024-0775, CVE-2024-1085, CVE-2024-1086
Jira References: PED-4729, PED-6694, PED-7322, PED-7615, PED-7616, PED-7620, PED-7622, PED-7623
Sources used:
openSUSE Leap 15.5 (src): kernel-livepatch-SLE15-SP5-RT_Update_10-1-150500.11.5.1, kernel-source-rt-5.14.21-150500.13.35.1, kernel-syms-rt-5.14.21-150500.13.35.1
SUSE Linux Enterprise Live Patching 15-SP5 (src): kernel-livepatch-SLE15-SP5-RT_Update_10-1-150500.11.5.1
SUSE Real Time Module 15-SP5 (src): kernel-source-rt-5.14.21-150500.13.35.1, kernel-syms-rt-5.14.21-150500.13.35.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 25 Maintenance Automation 2024-02-15 16:30:11 UTC
SUSE-SU-2024:0516-1: An update that solves 21 vulnerabilities, contains nine features and has 40 security fixes can now be installed.

Category: security (important)
Bug References: 1065729, 1108281, 1141539, 1174649, 1181674, 1193285, 1194869, 1209834, 1210443, 1211515, 1212091, 1214377, 1215275, 1215885, 1216441, 1216559, 1216702, 1217895, 1217987, 1217988, 1217989, 1218005, 1218447, 1218527, 1218659, 1218689, 1218713, 1218723, 1218730, 1218752, 1218757, 1218768, 1218778, 1218779, 1218804, 1218832, 1218836, 1218916, 1218948, 1218958, 1218968, 1218997, 1219006, 1219012, 1219013, 1219014, 1219053, 1219067, 1219120, 1219128, 1219136, 1219285, 1219349, 1219412, 1219429, 1219434, 1219490, 1219512, 1219568, 1219582, 1219608
CVE References: CVE-2021-33631, CVE-2023-46838, CVE-2023-47233, CVE-2023-4921, CVE-2023-51042, CVE-2023-51043, CVE-2023-51780, CVE-2023-51782, CVE-2023-6040, CVE-2023-6356, CVE-2023-6531, CVE-2023-6535, CVE-2023-6536, CVE-2023-6915, CVE-2024-0340, CVE-2024-0565, CVE-2024-0641, CVE-2024-0775, CVE-2024-1085, CVE-2024-1086, CVE-2024-24860
Jira References: PED-4729, PED-6694, PED-7322, PED-7615, PED-7616, PED-7618, PED-7620, PED-7622, PED-7623
Sources used:
openSUSE Leap 15.5 (src): kernel-livepatch-SLE15-SP5_Update_10-1-150500.11.5.1, kernel-source-5.14.21-150500.55.49.1, kernel-default-base-5.14.21-150500.55.49.1.150500.6.21.2, kernel-obs-build-5.14.21-150500.55.49.1, kernel-syms-5.14.21-150500.55.49.1, kernel-obs-qa-5.14.21-150500.55.49.1
SUSE Linux Enterprise Micro 5.5 (src): kernel-default-base-5.14.21-150500.55.49.1.150500.6.21.2
Basesystem Module 15-SP5 (src): kernel-source-5.14.21-150500.55.49.1, kernel-default-base-5.14.21-150500.55.49.1.150500.6.21.2
Development Tools Module 15-SP5 (src): kernel-obs-build-5.14.21-150500.55.49.1, kernel-source-5.14.21-150500.55.49.1, kernel-syms-5.14.21-150500.55.49.1
SUSE Linux Enterprise Live Patching 15-SP5 (src): kernel-livepatch-SLE15-SP5_Update_10-1-150500.11.5.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 26 Maintenance Automation 2024-02-15 16:30:35 UTC
SUSE-SU-2024:0514-1: An update that solves 21 vulnerabilities, contains nine features and has 41 security fixes can now be installed.

Category: security (important)
Bug References: 1065729, 1108281, 1141539, 1174649, 1181674, 1193285, 1194869, 1209834, 1210443, 1211515, 1212091, 1214377, 1215275, 1215885, 1216441, 1216559, 1216702, 1217895, 1217987, 1217988, 1217989, 1218005, 1218447, 1218527, 1218659, 1218689, 1218713, 1218723, 1218730, 1218738, 1218752, 1218757, 1218768, 1218778, 1218779, 1218804, 1218832, 1218836, 1218916, 1218948, 1218958, 1218968, 1218997, 1219006, 1219012, 1219013, 1219014, 1219053, 1219067, 1219120, 1219128, 1219136, 1219285, 1219349, 1219412, 1219429, 1219434, 1219490, 1219512, 1219568, 1219582, 1219608
CVE References: CVE-2021-33631, CVE-2023-46838, CVE-2023-47233, CVE-2023-4921, CVE-2023-51042, CVE-2023-51043, CVE-2023-51780, CVE-2023-51782, CVE-2023-6040, CVE-2023-6356, CVE-2023-6531, CVE-2023-6535, CVE-2023-6536, CVE-2023-6915, CVE-2024-0340, CVE-2024-0565, CVE-2024-0641, CVE-2024-0775, CVE-2024-1085, CVE-2024-1086, CVE-2024-24860
Jira References: PED-4729, PED-6694, PED-7322, PED-7615, PED-7616, PED-7618, PED-7620, PED-7622, PED-7623
Sources used:
openSUSE Leap 15.5 (src): kernel-source-azure-5.14.21-150500.33.34.1, kernel-syms-azure-5.14.21-150500.33.34.1
Public Cloud Module 15-SP5 (src): kernel-source-azure-5.14.21-150500.33.34.1, kernel-syms-azure-5.14.21-150500.33.34.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 27 Coly Li 2024-02-20 08:07:14 UTC
close as fixed