Bug 1219734

Summary: multipathd daemon coredumps in iscsi scenario in 15-SP6 but on ppc64le architecture only
Product: [openSUSE] PUBLIC SUSE Linux Enterprise Server 15 SP6 Reporter: Petr Cervinka <pcervinka>
Component: KernelAssignee: Martin Wilck <martin.wilck>
Status: RESOLVED FIXED QA Contact:
Severity: Normal    
Priority: P3 - Medium CC: hare, martin.wilck, pcervinka, rtsvetkov, tiwai
Version: unspecified   
Target Milestone: ---   
Hardware: PowerPC-64   
OS: Other   
URL: https://openqa.suse.de/tests/13458888/modules/multipath_iscsi/steps/209
Whiteboard:
Found By: openQA Services Priority:
Business Priority: Blocker: Yes
Marketing QA Status: --- IT Deployment: ---
Attachments: core.multipathd.0.32578db1435145c3868f66bbd10029d8.5639.1707398365000000.zst
multipath_iscsi-journal.log

Description Petr Cervinka 2024-02-08 13:33:04 UTC
Created attachment 872589 [details]
core.multipathd.0.32578db1435145c3868f66bbd10029d8.5639.1707398365000000.zst

## Observation

openQA test in scenario sle-15-SP6-Online-ppc64le-qa_kernel_multipath@ppc64le-virtio fails in
[multipath_iscsi](https://openqa.suse.de/tests/13458888/modules/multipath_iscsi/steps/209)

## Test suite description
Maintainer: pcervinka@suse.com


## Reproducible

Fails since (at least) Build [48.2](https://openqa.suse.de/tests/13339498)


## Expected result

Last good: [47.2](https://openqa.suse.de/tests/13319361) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Online&machine=ppc64le-virtio&test=qa_kernel_multipath&version=15-SP6)


We have openQA multimachine scenario which tests multipath over iscsi to test basic functionality. Unfortunately, scenario started to fail on ppc64le arch only since build 48.2, last working build was 47.2 (failure was originaly masked by other test issue).  Architectures aarc64 and x86_64 don't have this issue.


We basically wait in the loop for the output of multipathd -k"show multipaths status", but we never get status.


[   53.317883] device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@redhat.com
[   53.454391] iscsi: registered transport (tcp)
[   53.470770] scsi host1: iSCSI Initiator over TCP/IP
[   53.482436] scsi 1:0:0:0: RAID              IET      Controller       0001 PQ: 0 ANSI: 5
[   53.488296] scsi 1:0:0:0: Attached scsi generic sg1 type 12
[   53.491714] scsi 1:0:0:1: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
[   53.500527] scsi 1:0:0:1: Attached scsi generic sg2 type 0
[   53.506151] sd 1:0:0:1: Power-on or device reset occurred
[   53.506683] scsi 1:0:0:2: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
[   53.508591] sd 1:0:0:1: [sda] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[   53.509216] sd 1:0:0:1: [sda] Write Protect is off
[   53.509263] sd 1:0:0:1: [sda] Mode Sense: 69 00 00 08
[   53.509798] sd 1:0:0:1: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   53.510707] scsi 1:0:0:2: Attached scsi generic sg3 type 0
[   53.518792] sd 1:0:0:2: Power-on or device reset occurred
[   53.519369] scsi 1:0:0:3: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
[   53.521265] sd 1:0:0:2: [sdb] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[   53.524351] sd 1:0:0:2: [sdb] Write Protect is off
[   53.524429] sd 1:0:0:2: [sdb] Mode Sense: 69 00 00 08
[   53.525103] sd 1:0:0:2: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   53.527767] scsi 1:0:0:3: Attached scsi generic sg4 type 0
[   53.549760] sd 1:0:0:3: Power-on or device reset occurred
[   53.550643] sd 1:0:0:3: [sdc] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[   53.551231] sd 1:0:0:3: [sdc] Write Protect is off
[   53.551317] sd 1:0:0:3: [sdc] Mode Sense: 69 00 00 08
[   53.551794] sd 1:0:0:3: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   53.553511] sd 1:0:0:1: [sda] Attached SCSI disk
[   53.559422] sd 1:0:0:2: [sdb] Attached SCSI disk
[   53.562492] sd 1:0:0:3: [sdc] Attached SCSI disk
[   53.752246] device-mapper: multipath service-time: version 0.3.0 loaded
[   53.913680] multipathd[5646]: segfault (11) at 128 nip 1010726e0 lr 1010725e4 code 1 in multipathd[101060000+30000]
[   53.913818] multipathd[5646]: code: e9490008 7be91f24 7c6a482a 2fa30000 419e01f0 4bff87a5 2f830001 409effc4 
[   53.913905] multipathd[5646]: code: e90100a8 2f880001 419efb1c e94100a0 <812a0128> 2f890000 409efb0c e9210080 


           PID: 5639 (multipathd)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 11 (SEGV)
     Timestamp: Thu 2024-02-08 08:19:25 EST (15s ago)
  Command Line: /sbin/multipathd -d -s
    Executable: /sbin/multipathd
 Control Group: /system.slice/multipathd.service
          Unit: multipathd.service
         Slice: system.slice
       Boot ID: 32578db1435145c3868f66bbd10029d8
    Machine ID: 46f03c983bdb421badfd33670453683b
      Hostname: susetest
       Storage: /var/lib/systemd/coredump/core.multipathd.0.32578db1435145c3868f66bbd10029d8.5639.1707398365000000.zst (present)
  Size on Disk: 929.2K
       Message: Process 5639 (multipathd) of user 0 dumped core.
                
                Stack trace of thread 5646:
                #0  0x00000001010726e0 n/a (multipathd + 0x126e0)
                #1  0x0000000101073270 n/a (multipathd + 0x13270)
                #2  0x00007fffb996195c n/a (libmultipath.so.0 + 0x4195c)
                #3  0x00007fffb99631c0 uevent_dispatch (libmultipath.so.0 + 0x431c0)
                #4  0x000000010106b570 n/a (multipathd + 0xb570)
                #5  0x00007fffb9411fd4 start_thread (libc.so.6 + 0xb1fd4)
                #6  0x00007fffb94c4c58 __clone3 (libc.so.6 + 0x164c58)
                ELF object binary architecture: PowerPC64
Comment 1 Petr Cervinka 2024-02-08 13:53:57 UTC
Created attachment 872590 [details]
multipath_iscsi-journal.log
Comment 2 Takashi Iwai 2024-02-09 13:17:29 UTC
Judging from the logs, the kernel is unchanged (6.4.0-150600.4) in both working and broken cases.  So it's rather a multipathd issue.

Adding storage people to Cc.
Comment 3 Petr Cervinka 2024-02-09 13:28:59 UTC
I already had quick chat with Martin on slack and he mentioned that there might some upstream fix.
Comment 6 Martin Wilck 2024-02-16 13:51:18 UTC
Please test with the  packages from IBS sr#321656.
Comment 8 Petr Cervinka 2024-02-20 13:03:42 UTC
Latest build 57.1 looks much better in openQA. I restarted test 10 times and it is fine.
Comment 9 Martin Wilck 2024-02-23 09:42:07 UTC
Closing per comment 8.