Bugzilla – Bug 1214570
Booting from NVMe broken since Linux 6.4.11
Last modified: 2023-08-24 14:18:23 UTC
Created attachment 868988 [details] Screenshot of boot problem When booting from Linux 6.4.11 (provided by Tumbleweed's kernel-default package) I run into the following errors: ``` nvme nvme0: controller is down; it will reset: CSTS=0xffffffff, PCI_STATUS=0xffff nvme nvme0: Does your device have a faulty power saving mode enabled? nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug ``` This leads to further errors as the root filesystem is located on that NVMe. See the attached screenshot for details. When booting from 6.4.8 or 6.4.9 the problem does not occur. I don't have 6.4.10 installed so I haven't tested that version. Maybe 6.4.10 would already introduce the problematic change. I have re-installed the kernel-default package which also rebuilt the initramfs but the problem persisted. When adding the kernel parameters as suggested by the error message the problem is gone. Since the error message also suggests to report a bug I'm doing that. Note that the NVMe itself seems fine. I'm not sure how to answer the question regarding power saving asked by the error message. I only did the short and long SMART tests and they concluded with no error. Considering the system works just fine under older kernel versions or with those kernel parameters I don't think there is a very severe problem with my system and this is rather problematic behaviour of the kernel. I will attach the SMART output of my NVMe because it might be useful (contains the exact model and firmware version).
Created attachment 868989 [details] SMART values of NVMe
Likely a dup of bsc#1214428 and bsc#1214397. Try to blacklist rtsx_pci driver.
Thanks, it is probably the same as those issues, indeed. I'll tray backlisting the module on the next occasion via the `module_blacklist=rtsx_pci` kernel parameter. Or would it be `BrokenModules=rtsx_pci`? Different sources mention a different parameter (https://wiki.archlinux.org/title/Kernel_module#Using_kernel_command_line_2 vs. https://en.opensuse.org/SDB:Linuxrc#Parameter_Reference).
For the already installed system, module_blacklist=xxx should work. BrokenModules=xxx is for the installer image, IIRC.
Ah, that makes sense. I've not just nevertheless specified both parameters. I can confirm that blacklisting that module mitigates the problem (and the system boots normally from the NVMe).
OK, then let's make it dup. *** This bug has been marked as a duplicate of bug 1214428 ***