Bug 1214570 - Booting from NVMe broken since Linux 6.4.11
Summary: Booting from NVMe broken since Linux 6.4.11
Status: RESOLVED DUPLICATE of bug 1214428
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: openSUSE Kernel Bugs
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-24 11:19 UTC by Marius Kittler
Modified: 2023-08-24 14:18 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Screenshot of boot problem (198.89 KB, image/jpeg)
2023-08-24 11:19 UTC, Marius Kittler
Details
SMART values of NVMe (3.21 KB, text/plain)
2023-08-24 11:20 UTC, Marius Kittler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marius Kittler 2023-08-24 11:19:24 UTC
Created attachment 868988 [details]
Screenshot of boot problem

When booting from Linux 6.4.11 (provided by Tumbleweed's kernel-default package) I run into the following errors:

```
nvme nvme0: controller is down; it will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
nvme nvme0: Does your device have a faulty power saving mode enabled?
nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
```

This leads to further errors as the root filesystem is located on that NVMe. See the attached screenshot for details.

When booting from 6.4.8 or 6.4.9 the problem does not occur. I don't have 6.4.10 installed so I haven't tested that version. Maybe 6.4.10 would already introduce the problematic change.

I have re-installed the kernel-default package which also rebuilt the initramfs but the problem persisted.

When adding the kernel parameters as suggested by the error message the problem is gone. Since the error message also suggests to report a bug I'm doing that.

Note that the NVMe itself seems fine. I'm not sure how to answer the question regarding power saving asked by the error message. I only did the short and long SMART tests and they concluded with no error. Considering the system works just fine under older kernel versions or with those kernel parameters I don't think there is a very severe problem with my system and this is rather problematic behaviour of the kernel.

I will attach the SMART output of my NVMe because it might be useful (contains the exact model and firmware version).
Comment 1 Marius Kittler 2023-08-24 11:20:17 UTC
Created attachment 868989 [details]
SMART values of NVMe
Comment 2 Takashi Iwai 2023-08-24 11:29:55 UTC
Likely a dup of bsc#1214428 and bsc#1214397.

Try to blacklist rtsx_pci driver.
Comment 3 Marius Kittler 2023-08-24 12:53:03 UTC
Thanks, it is probably the same as those issues, indeed. I'll tray backlisting the module on the next occasion via the `module_blacklist=rtsx_pci` kernel parameter. Or would it be `BrokenModules=rtsx_pci`? Different sources mention a different parameter (https://wiki.archlinux.org/title/Kernel_module#Using_kernel_command_line_2 vs. https://en.opensuse.org/SDB:Linuxrc#Parameter_Reference).
Comment 4 Takashi Iwai 2023-08-24 13:32:08 UTC
For the already installed system, module_blacklist=xxx should work.
BrokenModules=xxx is for the installer image, IIRC.
Comment 5 Marius Kittler 2023-08-24 14:16:22 UTC
Ah, that makes sense. I've not just nevertheless specified both parameters. I can confirm that blacklisting that module mitigates the problem (and the system boots normally from the NVMe).
Comment 6 Takashi Iwai 2023-08-24 14:18:23 UTC
OK, then let's make it dup.

*** This bug has been marked as a duplicate of bug 1214428 ***