Bug 1212277

Summary: System unable to boot after upgrade from 15.4 to leap 15.5
Product: [openSUSE] openSUSE Distribution Reporter: Dominik Heidler <dheidler>
Component: BasesystemAssignee: openSUSE Kernel Bugs <kernel-bugs>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: daniel.wagner, tiwai
Version: Leap 15.5   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on: 1207827    
Bug Blocks:    
Attachments: rdsosreport.txt
hwinfo.txt
screenshot with error msg from dmesg in 15.5 live system

Description Dominik Heidler 2023-06-13 14:41:44 UTC
Created attachment 867541 [details]
rdsosreport.txt

It seems that the second of two identical M.2 SSDs (both form a redundant BTRFS mirror) is not recognized anymore after upgrade to 15.5, so I had to rollback.

See the attached logfile.


on 15.5:
----------------------------------------
lrwxrwxrwx 1 root root 13 Jun 13 14:04 nvme-ADATA_SX6000LNP_2K34291CAD1G -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Jun 13 14:04 nvme-ADATA_SX6000LNP_2K34291CAD1G-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Jun 13 14:04 nvme-ADATA_SX6000LNP_2K34291CAD1G-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Jun 13 14:04 nvme-ADATA_SX6000LNP_2K34291CAD1G-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root 13 Jun 13 14:04 nvme-eui.00000000010000004ce00018dd8c9084 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Jun 13 14:04 nvme-eui.00000000010000004ce00018dd8c9084-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Jun 13 14:04 nvme-eui.00000000010000004ce00018dd8c9084-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Jun 13 14:04 nvme-eui.00000000010000004ce00018dd8c9084-part3 -> ../../nvme0n1p3
----------------------------------------




on 15.4:
----------------------------------------
lrwxrwxrwx 1 root root   13 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K34291CAD1G -> ../../nvme0n1
lrwxrwxrwx 1 root root   13 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K34291CAD1G_1 -> ../../nvme0n1
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K34291CAD1G_1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K34291CAD1G_1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K34291CAD1G_1-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K34291CAD1G-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K34291CAD1G-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K34291CAD1G-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root   13 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K5129A4H1F2 -> ../../nvme1n1
lrwxrwxrwx 1 root root   13 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K5129A4H1F2_1 -> ../../nvme1n1
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K5129A4H1F2_1-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K5129A4H1F2_1-part2 -> ../../nvme1n1p2
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K5129A4H1F2_1-part3 -> ../../nvme1n1p3
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K5129A4H1F2-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K5129A4H1F2-part2 -> ../../nvme1n1p2
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-ADATA_SX6000LNP_2K5129A4H1F2-part3 -> ../../nvme1n1p3
lrwxrwxrwx 1 root root   13 13. Jun 16:19 nvme-eui.00000000010000004ce00018dd8c9084 -> ../../nvme1n1
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-eui.00000000010000004ce00018dd8c9084-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-eui.00000000010000004ce00018dd8c9084-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root   15 13. Jun 16:19 nvme-eui.00000000010000004ce00018dd8c9084-part3 -> ../../nvme0n1p3
----------------------------------------



In the end some script in dracut complains:
----------------------------------------
[  137.214104] ikarus dracut-initqueue[430]: Warning: dracut-initqueue: timeout, still waiting for following initqueue hooks:
[  137.215219] ikarus dracut-initqueue[430]: Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-id\x2fnvme-ADATA_SX6000LNP_2K5129A4H1F2-part1.sh: "[ -e "/dev/disk/by-id/nvme-ADATA_SX6000LNP_2K5129A4H1F2-part1" ]"
[  137.215882] ikarus dracut-initqueue[430]: Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f1bd4e33a-22c6-4038-b232-622f42b53426.sh: "if ! grep -q After=remote-fs-pre.target /run/systemd/generator/systemd-cryptsetup@*.service 2>/dev/null; then
[  137.215882] ikarus dracut-initqueue[430]:     [ -e "/dev/disk/by-uuid/1bd4e33a-22c6-4038-b232-622f42b53426" ]
[  137.215882] ikarus dracut-initqueue[430]: fi"
[  137.216731] ikarus dracut-initqueue[430]: Warning: dracut-initqueue: starting timeout scripts
[  137.786111] ikarus dracut-initqueue[430]: Warning: dracut-initqueue: timeout, still waiting for following initqueue hooks:
[  137.786932] ikarus dracut-initqueue[430]: Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-id\x2fnvme-ADATA_SX6000LNP_2K5129A4H1F2-part1.sh: "[ -e "/dev/disk/by-id/nvme-ADATA_SX6000LNP_2K5129A4H1F2-part1" ]"
----------------------------------------
Comment 1 Dominik Heidler 2023-06-13 14:42:02 UTC
Created attachment 867542 [details]
hwinfo.txt
Comment 2 Dominik Heidler 2023-06-13 15:27:11 UTC
Created attachment 867546 [details]
screenshot with error msg from dmesg in 15.5 live system
Comment 3 Dominik Heidler 2023-06-13 15:32:10 UTC
While both 15.4 and 15.5 list both SSDs in lspci, on 15.5 there is some extra error message (see screenshot above).

The
[    2.604070] nvme nvme0: failed to set APST feature (2)
[    2.613092] nvme nvme1: failed to set APST feature (2)
error doesn't seem to be critical as it appears on a working 15.4 system as well.



The error on 15.5 IS critical as directly afterwards only the partitions of nvme0 are listed, but not the ones of nvme1:

nvme nvme1: globally duplicate IDs for nsid 1
nvme nvme1: VID:DID 10ec:5763 model:ADATA SX6000LNP firmware:V9002s72
Comment 4 Dominik Heidler 2023-06-13 15:37:00 UTC
Maybe this patch could fix that:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme/host?h=v5.19-rc8&id=1629de0e0373e04d68e88e6d9d3071fbf70b7ea8

@Takashi: could we include that one?
Comment 5 Takashi Iwai 2023-06-13 16:18:23 UTC
(In reply to Dominik Heidler from comment #4)
> Maybe this patch could fix that:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> drivers/nvme/host?h=v5.19-rc8&id=1629de0e0373e04d68e88e6d9d3071fbf70b7ea8
> 
> @Takashi: could we include that one?

This was already backported in SLE15-SP5 GM kernel, as far as I see.
Comment 6 Dominik Heidler 2023-06-13 16:45:52 UTC
Hm - this patch is for pci device id 5762, but my devices have id 5763:

01:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5763] (rev 01)
Comment 9 Takashi Iwai 2023-06-13 17:01:24 UTC
Could you check with the kernel in OBS Kernel:SLE15-SP5 repo?
Comment 10 Takashi Iwai 2023-06-13 17:09:02 UTC
BTW, I see lots of entries in blacklist where the patches have been already committed in SLE15-SP5 branch.  Somehow the check of duplicated id didn't work.
I'm going to clean up.

Meanwhile, it revealed that some nvme-pci patches for bogus id haven't been backported yet.

80b2624094c8d369a3c6eab515e8f1564d2e5db2 # we don't ship bogus nvme id check
d5ceb4d1c50786d21de3d4b06c3f43109ec56dd8 # we don't ship bogus nvme id check
8d6e38f636ac063e8062a21e7616f7d9bf0df5d8 # we don't ship bogus nvme id check
9630d80655bfe7e62e4aff2889dc4eae7ceeb887 # we don't ship bogus nvme id check
74391b3e69855e7dd65a9cef36baf5fc1345affd # we don't ship bogus nvme id check

Daniel, could you check those?
Comment 11 Dominik Heidler 2023-06-19 11:04:42 UTC
With the kernel from Kernel:SLE15-SP5 this issue is gone.
The kernel version in the normal update repos doesn't contain the fix yet.


[root@ikarus ~]# uname -a
Linux ikarus 5.14.21-150500.137.gd7fcce4-default #1 SMP PREEMPT_DYNAMIC Wed Jun 14 06:38:39 UTC 2023 (d7fcce4) x86_64 x86_64 x86_64 GNU/Linux
[root@ikarus ~]# rpm -q kernel-default --changelog | grep 1207827
- nvme-pci: add bogus ID quirk for ADATA SX6000PNP (bsc#1207827).
Comment 12 Dominik Heidler 2023-08-02 14:31:34 UTC
Meanwhile fixed in current 15.5.