Bug 1214432 - Kernel 5.14.21-150500.55.19 causes qlogic driver to fail
Summary: Kernel 5.14.21-150500.55.19 causes qlogic driver to fail
Status: NEW
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Leap 15.5
Hardware: x86-64 openSUSE Leap 15.5
: P5 - None : Major (vote)
Target Milestone: ---
Assignee: openSUSE Kernel Bugs
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-21 10:00 UTC by Frank Gießler
Modified: 2023-08-28 09:25 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Output of 'dmesg | grep qla2' (11.88 KB, text/plain)
2023-08-21 10:00 UTC, Frank Gießler
Details
Complete dmesg of an orderly boot procedure (81.04 KB, text/plain)
2023-08-22 11:06 UTC, Frank Gießler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Gießler 2023-08-21 10:00:36 UTC
Created attachment 868906 [details]
Output of 'dmesg | grep qla2'

The latest kernel patch causes the qlogic driver to fail on our machine. Reverting to previous kernel 5.14.21-150500.55.12 resolves the issue. Reverting to previous kernel-firmware-qlogic (20230724-150500.3.3.1) does NOT.

Let me know if you need more information.

Frank
Comment 1 Frank Gießler 2023-08-22 11:06:04 UTC
Created attachment 868935 [details]
Complete dmesg of an orderly boot procedure

I'm afraid it's more complicated than I first thought. It might not even be a kernel bug. The hardware is as follows:

o Motherboard Supermicro X7DB8
o QLogic ISP2422-based 4Gb Fibre Channel to PCI-X HBA
o 2 pieces of RAID controllers Eonstore A08F-G2422 with 8 HDs each
o Controllers are configured as JBODs so that each HD is seen as single LUN

If all goes well i.e., with the previous kernel, the OS sees 16 HDs attached to the FC HBA. With the new kernel only the HDs of one controller are seen. The other controller gives errors.

So it looks like one contoller is faulty while the other is not. Except that both work with the previous kernel, and I can't find a difference in their configuration. But it is always the same controller that looks faulty.

The attached dmesg shows the boot procedure of the system:

Until time stamp 36.8 the system boots up with both controllers turned off. At 79.36 the 'faulty' controller is turned on. It performs a self test until time stamp 192.82 and then starts to spit out erros resulting in 'unable to reconnect'.

At time stamp 456.72 the 'good' controller is turned on, performs the self test until 570.something, and starts the disks.

Might be a simple hardware problem to be found yet. But the previous kernel seems to be more tolerant.
Comment 2 Frank Gießler 2023-08-28 09:25:50 UTC
The only difference betwen the two controllers was the number of the FC it was using. The apparently 'good' one was using channel 1 while the apparentely 'faulty' one was using channel 0.
I now reconfigured the 'faulty' one to use also channel 1 and although I don't see why this should make a difference the previously 'faulty' one is now also working. If it weren't for the fact that the previous kernel had no problem with either controller I'd vote for a hardware issue. Go figure.

Nevertheless, as far as I'm concerned the bug can be closed now.

Thanks,
Frank