Bugzilla – Bug 1218005
Leap 15.5 does not boot with any of the kernel-default from update repository
Last modified: 2024-02-15 16:30:39 UTC
Created attachment 871315 [details] rdosreport.txt from boot attempt with kernel version 5.14.21-150500.55.36.1 I have an Acer laptop that does not boot with any of the kernel-default from update repository. The boot process ends up with: " Starting Dracut Emergency Shell… Warning: /dev/disk/by-uuid/B2C3-C97D does not exist Warning: /dev/disk/by-uuid/d733130e-1ad1-4ee3-be20-8f74d7763d88 does not exist" B2C3-C97D refers to /boot/efi in a working system. d733130e-1ad1-4ee3-be20-8f74d7763d88 refers to / in a working system. With the following kernel parameters the boot succeed: nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Default kernel 5.14.21-150500.53.2 is working. E.g. kernel 5.14.21-150500.55.36.1 from update repository is NOT working.
Created attachment 871316 [details] dmesg after successful boot with default kernel
There are a few older update kernels, e.g. 5.14.21-150500.55.31.1, 5.14.21-150500.55.28.1, etc. You can see it via "zypper se -s kernel-default". Could you try them and figure out which kernel still worked and which not? i.e. narrowing down the regression range. Note that it might be safer to increase the number of installable kernels beforehand, by editing /etc/zypp/zypp.conf. Add more entries in multiversion.kernels line, e.g. multiversion.kernels = latest,latest-1,latest-2,latest-3,running This will allow 4 kernels to be kept on the system without purging. Also, please give the hwinfo and dmesg outputs from the working kernel, too.
If I remember correctly it started with the first of the update kernel. I will test again though.
There are tons of update kernels between *-150500.53.2 and *-150500.55.36. Apparently the update took only the latest one, and it was broken. You can install the older update kernels by installing the kernel with versions, e.g. zypper in --oldpackage kernel-default-optional-5.14.21-150500.55.31.1
Created attachment 871329 [details] rdosreport.txt from boot attempt with kernel version 5.14.21-150500.55.7.1
Created attachment 871330 [details] hwinfo for working kernel 5.14.21-150500.53
I was a bit unclear before. I had tested with the first kernel update that I could find which seems to be 5.14.21-150500.55.7.1. I have now retested with that kernel and it does not boot. I have added rdosreport.txt from this attempt. I also added hwinfo for the working kernel. The dmesg above is from the working kernel.
OK, thanks. So the breakage appeared already at the very first 15.5 update kernel. For further narrow-down, let's try to swap the kernel modules. You can copy the kernel modules from the working kernel to the broken kernel. e.g. mkdir /lib/modules/5.14.21-150500.55.7-default/updates cp /lib/modules/5.14.21-150500.53-default/kernel/drivers/nvme/*/*.ko.zst /lib/modules/5.14.21-150500.55.7-default/updates depmod -a 5.14.21-150500.55.7-default dracut -f --kver 5.14.21-150500.55.7-default and retest. This will replace only nvme-* modules while keeping the rest. If this works, you can reduce the modules from */updates and try to figure out which module broke.
I followed your instructions but the kernel still don't boot.
It means that it's triggered by changes in other parts, e.g. PCI core. Did you test only with pcie_aspm=off boot option?
I tested now with only pcie_aspm=off and the kernel booted up.
Anything else I can provide to help trouble shoot this?
Could you verify whether the problem is present with the recent upstream kernels? Install kernel-default from OBS Kernel:stable:Backport repo http://download.opensuse.org/repositories/Kernel:/stable:/Backport/standard/ If it works, there can be some already workaround in the upstream side we can backport to SLE15-SP5 kernel.
Unfortunately I have another issue when trying to load the kernel from Backport. I can't add it to UEFI as it seem that the password I have saved does not match. This is not good at all but a different problem.
Is Secure Boot disabled on BIOS?
I managed to get into UEFI by resetting the password. I have disabled secure boot and the kernel from backport do boot up without any issues as it seems.
(In reply to Anders Stedtlund from comment #16) > I managed to get into UEFI by resetting the password. I have disabled secure > boot and the kernel from backport do boot up without any issues as it seems. OK, could you give the dmesg output from the recent upstream kernel, too?
Created attachment 871739 [details] dmesg after successful boot with 6.6.9-lp155.2.g61d1d44-default
Thanks. I still couldn't identify the cause (nor the possible upstream fix) yet. Just to be sure, let's check whether the very latest SLE15-SP5 still suffers from the problem. Please test the kernel in OBS Kernel:SLE15-SP5 repo http://download.opensuse.org/repositories/Kernel:/SLE15-SP5/pool/ And, I'm building a test kernel with a few PCIe core patches reverted. It's being built in OBS home:tiwai:bsc1218005 repo. Once after the build finishes (takes an hour or so), the package will appear at http://download.opensuse.org/repositories/home:/tiwai:/bsc1218005/pool/ If the above kernel from OBS Kernel:SLE15-SP5 repo doesn't work, try mine later. Note that those kernels are unofficial builds, hence you'd need to Secure Boot if it's turned on. Also, the kernel revisions may be smaller than the official release kernels. You'd better to increase the number of installable kernels beforehand by editing /etc/zypp/zypp.conf. Increase the entries of multiversion.kernels, e.g. multiversion.kernels = latest,latest-1,latest-2,latest-3,running
I tested: kernel-default-5.14.21-150500.225.1.gcc7d8b6.x86_64 from: http://download.opensuse.org/repositories/Kernel:/SLE15-SP5/pool/ and: kernel-default-5.14.21-150500.1.1.ge47c72f.x86_64 from: http://download.opensuse.org/repositories/home:/tiwai:/bsc1218005/pool/ Both failed to boot.
Hmm, OK, then let's go back and verify the following: - 5.14.21-150500.53 kernel works as is without option - 5.14.21-150500.55.7 kernel boots only with pcie_aspm=off If both above are true, try to swap the whole modules of the latter kernel with the former in the following way: % mv /lib/modules/5.14.21-150500.55.7-default /lib/modules/5.14.21-150500.55.7-default.old % cp -a /lib/modules/5.14.21-150500.53-default /lib/modules/5.14.21-150500.55.7-default % depmod -a 5.14.21-150500.55.7-default % dracut -f --kver 5.14.21-150500.55.7-default And boot *-55.7 kernel without extra option, verify whether it boots or not. It'll get warnings about BTF, but those can be ignored. If this boots up, something in modules are problematic. If this doesn't boot up, it really means that some changes in the built-in kernel is problematic, instead.
- 5.14.21-150500.53 Boot OK! - 5.14.21-150500.55.7 Boot OK with pcie_aspm=off Replace modules in 5.14.21-150500.55.7-default with modules from 5.14.21-150500.53-default. Boot OK!
(In reply to Anders Stedtlund from comment #22) > - 5.14.21-150500.53 > Boot OK! > - 5.14.21-150500.55.7 > Boot OK with pcie_aspm=off > > Replace modules in 5.14.21-150500.55.7-default with modules from > 5.14.21-150500.53-default. > Boot OK! That's an interesting result. Then I scratched a wrong surface. It might be that the previous test with module replacement didn't work properly. The next step would be to identify which module actually breaks. It'll be great help for understanding what's going wrong. You can copy back the new modules from the saved directory (*-55.7-default.old) to *-55.7-default directory again, piece-by-piece. e.g. let's begin with the main suspect, nvme driver modules: % rm -r /lib/modules/5.14.21-150500.55.7-default/kernel/drivers/nvme % cp -a /lib/modules/5.14.21-150500.55.7-default.old/kernel/drivers/nvme lib/modules/5.14.21-150500.55.7-default/kernel/drivers/ % depmod -a % dracut -f --kver 5.14.21-150500.55.7-default This will replace the all nvme modules back to the *-55.7 again. Retest with this. If this breaks the boot, it's one (or more) of nvme modules. You can try again by copying each *.ko.zst from *-53-default directory and narrow down the culprit. OTOH, if replacing nvme drivers don't break, it's something else. Try to replace each directory until you hit.
Note that our interest is only about the actually loaded modules. You can check lsmod output on the working system, and check whether they are included in the directory you try to replace.
This is the module that seems to be the cuplrit: /lib/modules/5.14.21-150500.55.7-default/kernel/drivers/pci/controller/vmd.ko.zst If I copy vmd.ko.zst from 5.14.21-150500.53-default and keep all other modules from 5.14.21-150500.55.7-default, the kernel boot.
Thanks, that's a great info! I didn't think of this stuff.
I'm build another test kernel with the revert of the problematic change in PCI/vmd. It's being built in OBS home:tiwai:bsc1218005-2 repo. Once after the build finishes, the package will appear at http://download.opensuse.org/repositories/home:/tiwai:/bsc1218005-2/pool/ Please give it a try later.
Also, yet another kernel is being built in OBS home:tiwai:bsc1218005-3 repo. This one is with more complete backports of PCI/vmd stuff instead of reverting the patch. Check this one later when you have time, too.
Unfortunately, none of those kernels boot.
OK thanks. Then it must be yet another patch touching PCI/vmd stuff. It's the only one left between *-53 and *-55.7. I'm building another kernel in OBS home:tiwai:bsc1218005-4 repo. Please give it a try later. BTW, such a problem might be dependent on the hot or cold boot. Make sure that you do cold boot after updating the kernel.
(In reply to Takashi Iwai from comment #30) > OK thanks. Then it must be yet another patch touching PCI/vmd stuff. It's > the only one left between *-53 and *-55.7. > > I'm building another kernel in OBS home:tiwai:bsc1218005-4 repo. Please > give it a try later. > > BTW, such a problem might be dependent on the hot or cold boot. Make sure > that you do cold boot after updating the kernel. This kernel boot! Both hot and cold boot. Btw, I had issue adding your latest repos, including this latest. Could not find ./repo/repoinit.xml. Yast did not add them, I had to go with zypper.
Thanks, finally nailed down. The problematic patch was the backport of the commit 0a584655ef89541dae4d48d2c523b1480ae80284 PCI: vmd: Fix secondary bus reset for Intel bridges It's still not known which additional fix is missing (as the backport of all PCI/vmd didn't seem to help as in comment 20). So currently it's just reverted. The fix will be included likely in the regular update in February. (In reply to Anders Stedtlund from comment #31) > Btw, I had issue adding your latest repos, including this latest. Could not > find ./repo/repoinit.xml. Yast did not add them, I had to go with zypper. I don't know what's missing, but repomd.xml is present. In anyway, it's only for testing purpose, and not supposed to be used for long term to be added to your zypper repo list. Once after installing the kernel (and keep it until for the next update), remove this repo.
Thank you for your support! Let me know if you want me to test anyting before it gets official. I think I will use kernels from Kernel:/stable:/Backport/standard/ for the time being.
SUSE-SU-2024:0469-1: An update that solves 19 vulnerabilities, contains eight features and has 41 security fixes can now be installed. Category: security (important) Bug References: 1065729, 1108281, 1141539, 1174649, 1181674, 1193285, 1194869, 1209834, 1210443, 1211515, 1212091, 1214377, 1215275, 1215885, 1216441, 1216559, 1216702, 1217895, 1217987, 1217988, 1217989, 1218005, 1218447, 1218527, 1218659, 1218713, 1218723, 1218730, 1218738, 1218752, 1218757, 1218768, 1218778, 1218779, 1218804, 1218832, 1218836, 1218916, 1218948, 1218958, 1218968, 1218997, 1219006, 1219012, 1219013, 1219014, 1219053, 1219067, 1219120, 1219128, 1219136, 1219285, 1219349, 1219412, 1219429, 1219434, 1219490, 1219512, 1219568, 1219582 CVE References: CVE-2021-33631, CVE-2023-46838, CVE-2023-47233, CVE-2023-4921, CVE-2023-51042, CVE-2023-51043, CVE-2023-51780, CVE-2023-51782, CVE-2023-6040, CVE-2023-6356, CVE-2023-6531, CVE-2023-6535, CVE-2023-6536, CVE-2023-6915, CVE-2024-0565, CVE-2024-0641, CVE-2024-0775, CVE-2024-1085, CVE-2024-1086 Jira References: PED-4729, PED-6694, PED-7322, PED-7615, PED-7616, PED-7620, PED-7622, PED-7623 Sources used: openSUSE Leap 15.5 (src): kernel-livepatch-SLE15-SP5-RT_Update_10-1-150500.11.5.1, kernel-source-rt-5.14.21-150500.13.35.1, kernel-syms-rt-5.14.21-150500.13.35.1 SUSE Linux Enterprise Live Patching 15-SP5 (src): kernel-livepatch-SLE15-SP5-RT_Update_10-1-150500.11.5.1 SUSE Real Time Module 15-SP5 (src): kernel-source-rt-5.14.21-150500.13.35.1, kernel-syms-rt-5.14.21-150500.13.35.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
SUSE-SU-2024:0516-1: An update that solves 21 vulnerabilities, contains nine features and has 40 security fixes can now be installed. Category: security (important) Bug References: 1065729, 1108281, 1141539, 1174649, 1181674, 1193285, 1194869, 1209834, 1210443, 1211515, 1212091, 1214377, 1215275, 1215885, 1216441, 1216559, 1216702, 1217895, 1217987, 1217988, 1217989, 1218005, 1218447, 1218527, 1218659, 1218689, 1218713, 1218723, 1218730, 1218752, 1218757, 1218768, 1218778, 1218779, 1218804, 1218832, 1218836, 1218916, 1218948, 1218958, 1218968, 1218997, 1219006, 1219012, 1219013, 1219014, 1219053, 1219067, 1219120, 1219128, 1219136, 1219285, 1219349, 1219412, 1219429, 1219434, 1219490, 1219512, 1219568, 1219582, 1219608 CVE References: CVE-2021-33631, CVE-2023-46838, CVE-2023-47233, CVE-2023-4921, CVE-2023-51042, CVE-2023-51043, CVE-2023-51780, CVE-2023-51782, CVE-2023-6040, CVE-2023-6356, CVE-2023-6531, CVE-2023-6535, CVE-2023-6536, CVE-2023-6915, CVE-2024-0340, CVE-2024-0565, CVE-2024-0641, CVE-2024-0775, CVE-2024-1085, CVE-2024-1086, CVE-2024-24860 Jira References: PED-4729, PED-6694, PED-7322, PED-7615, PED-7616, PED-7618, PED-7620, PED-7622, PED-7623 Sources used: openSUSE Leap 15.5 (src): kernel-livepatch-SLE15-SP5_Update_10-1-150500.11.5.1, kernel-source-5.14.21-150500.55.49.1, kernel-default-base-5.14.21-150500.55.49.1.150500.6.21.2, kernel-obs-build-5.14.21-150500.55.49.1, kernel-syms-5.14.21-150500.55.49.1, kernel-obs-qa-5.14.21-150500.55.49.1 SUSE Linux Enterprise Micro 5.5 (src): kernel-default-base-5.14.21-150500.55.49.1.150500.6.21.2 Basesystem Module 15-SP5 (src): kernel-source-5.14.21-150500.55.49.1, kernel-default-base-5.14.21-150500.55.49.1.150500.6.21.2 Development Tools Module 15-SP5 (src): kernel-obs-build-5.14.21-150500.55.49.1, kernel-source-5.14.21-150500.55.49.1, kernel-syms-5.14.21-150500.55.49.1 SUSE Linux Enterprise Live Patching 15-SP5 (src): kernel-livepatch-SLE15-SP5_Update_10-1-150500.11.5.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
SUSE-SU-2024:0514-1: An update that solves 21 vulnerabilities, contains nine features and has 41 security fixes can now be installed. Category: security (important) Bug References: 1065729, 1108281, 1141539, 1174649, 1181674, 1193285, 1194869, 1209834, 1210443, 1211515, 1212091, 1214377, 1215275, 1215885, 1216441, 1216559, 1216702, 1217895, 1217987, 1217988, 1217989, 1218005, 1218447, 1218527, 1218659, 1218689, 1218713, 1218723, 1218730, 1218738, 1218752, 1218757, 1218768, 1218778, 1218779, 1218804, 1218832, 1218836, 1218916, 1218948, 1218958, 1218968, 1218997, 1219006, 1219012, 1219013, 1219014, 1219053, 1219067, 1219120, 1219128, 1219136, 1219285, 1219349, 1219412, 1219429, 1219434, 1219490, 1219512, 1219568, 1219582, 1219608 CVE References: CVE-2021-33631, CVE-2023-46838, CVE-2023-47233, CVE-2023-4921, CVE-2023-51042, CVE-2023-51043, CVE-2023-51780, CVE-2023-51782, CVE-2023-6040, CVE-2023-6356, CVE-2023-6531, CVE-2023-6535, CVE-2023-6536, CVE-2023-6915, CVE-2024-0340, CVE-2024-0565, CVE-2024-0641, CVE-2024-0775, CVE-2024-1085, CVE-2024-1086, CVE-2024-24860 Jira References: PED-4729, PED-6694, PED-7322, PED-7615, PED-7616, PED-7618, PED-7620, PED-7622, PED-7623 Sources used: openSUSE Leap 15.5 (src): kernel-source-azure-5.14.21-150500.33.34.1, kernel-syms-azure-5.14.21-150500.33.34.1 Public Cloud Module 15-SP5 (src): kernel-source-azure-5.14.21-150500.33.34.1, kernel-syms-azure-5.14.21-150500.33.34.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.