Bugzilla – Bug 1219734
multipathd daemon coredumps in iscsi scenario in 15-SP6 but on ppc64le architecture only
Last modified: 2024-02-23 09:42:07 UTC
Created attachment 872589 [details] core.multipathd.0.32578db1435145c3868f66bbd10029d8.5639.1707398365000000.zst ## Observation openQA test in scenario sle-15-SP6-Online-ppc64le-qa_kernel_multipath@ppc64le-virtio fails in [multipath_iscsi](https://openqa.suse.de/tests/13458888/modules/multipath_iscsi/steps/209) ## Test suite description Maintainer: pcervinka@suse.com ## Reproducible Fails since (at least) Build [48.2](https://openqa.suse.de/tests/13339498) ## Expected result Last good: [47.2](https://openqa.suse.de/tests/13319361) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Online&machine=ppc64le-virtio&test=qa_kernel_multipath&version=15-SP6) We have openQA multimachine scenario which tests multipath over iscsi to test basic functionality. Unfortunately, scenario started to fail on ppc64le arch only since build 48.2, last working build was 47.2 (failure was originaly masked by other test issue). Architectures aarc64 and x86_64 don't have this issue. We basically wait in the loop for the output of multipathd -k"show multipaths status", but we never get status. [ 53.317883] device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@redhat.com [ 53.454391] iscsi: registered transport (tcp) [ 53.470770] scsi host1: iSCSI Initiator over TCP/IP [ 53.482436] scsi 1:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5 [ 53.488296] scsi 1:0:0:0: Attached scsi generic sg1 type 12 [ 53.491714] scsi 1:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 [ 53.500527] scsi 1:0:0:1: Attached scsi generic sg2 type 0 [ 53.506151] sd 1:0:0:1: Power-on or device reset occurred [ 53.506683] scsi 1:0:0:2: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 [ 53.508591] sd 1:0:0:1: [sda] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB) [ 53.509216] sd 1:0:0:1: [sda] Write Protect is off [ 53.509263] sd 1:0:0:1: [sda] Mode Sense: 69 00 00 08 [ 53.509798] sd 1:0:0:1: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 53.510707] scsi 1:0:0:2: Attached scsi generic sg3 type 0 [ 53.518792] sd 1:0:0:2: Power-on or device reset occurred [ 53.519369] scsi 1:0:0:3: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 [ 53.521265] sd 1:0:0:2: [sdb] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB) [ 53.524351] sd 1:0:0:2: [sdb] Write Protect is off [ 53.524429] sd 1:0:0:2: [sdb] Mode Sense: 69 00 00 08 [ 53.525103] sd 1:0:0:2: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 53.527767] scsi 1:0:0:3: Attached scsi generic sg4 type 0 [ 53.549760] sd 1:0:0:3: Power-on or device reset occurred [ 53.550643] sd 1:0:0:3: [sdc] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB) [ 53.551231] sd 1:0:0:3: [sdc] Write Protect is off [ 53.551317] sd 1:0:0:3: [sdc] Mode Sense: 69 00 00 08 [ 53.551794] sd 1:0:0:3: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 53.553511] sd 1:0:0:1: [sda] Attached SCSI disk [ 53.559422] sd 1:0:0:2: [sdb] Attached SCSI disk [ 53.562492] sd 1:0:0:3: [sdc] Attached SCSI disk [ 53.752246] device-mapper: multipath service-time: version 0.3.0 loaded [ 53.913680] multipathd[5646]: segfault (11) at 128 nip 1010726e0 lr 1010725e4 code 1 in multipathd[101060000+30000] [ 53.913818] multipathd[5646]: code: e9490008 7be91f24 7c6a482a 2fa30000 419e01f0 4bff87a5 2f830001 409effc4 [ 53.913905] multipathd[5646]: code: e90100a8 2f880001 419efb1c e94100a0 <812a0128> 2f890000 409efb0c e9210080 PID: 5639 (multipathd) UID: 0 (root) GID: 0 (root) Signal: 11 (SEGV) Timestamp: Thu 2024-02-08 08:19:25 EST (15s ago) Command Line: /sbin/multipathd -d -s Executable: /sbin/multipathd Control Group: /system.slice/multipathd.service Unit: multipathd.service Slice: system.slice Boot ID: 32578db1435145c3868f66bbd10029d8 Machine ID: 46f03c983bdb421badfd33670453683b Hostname: susetest Storage: /var/lib/systemd/coredump/core.multipathd.0.32578db1435145c3868f66bbd10029d8.5639.1707398365000000.zst (present) Size on Disk: 929.2K Message: Process 5639 (multipathd) of user 0 dumped core. Stack trace of thread 5646: #0 0x00000001010726e0 n/a (multipathd + 0x126e0) #1 0x0000000101073270 n/a (multipathd + 0x13270) #2 0x00007fffb996195c n/a (libmultipath.so.0 + 0x4195c) #3 0x00007fffb99631c0 uevent_dispatch (libmultipath.so.0 + 0x431c0) #4 0x000000010106b570 n/a (multipathd + 0xb570) #5 0x00007fffb9411fd4 start_thread (libc.so.6 + 0xb1fd4) #6 0x00007fffb94c4c58 __clone3 (libc.so.6 + 0x164c58) ELF object binary architecture: PowerPC64
Created attachment 872590 [details] multipath_iscsi-journal.log
Judging from the logs, the kernel is unchanged (6.4.0-150600.4) in both working and broken cases. So it's rather a multipathd issue. Adding storage people to Cc.
I already had quick chat with Martin on slack and he mentioned that there might some upstream fix.
Please test with the packages from IBS sr#321656.
Latest build 57.1 looks much better in openQA. I restarted test 10 times and it is fine.
Closing per comment 8.