Bugzilla – Bug 1217344
Linux 6.6.x fails to boot
Last modified: 2024-06-25 18:01:11 UTC
The default 6.6.1 kernel installed on my Tumbleweed system does not boot. Upon selecting it in GRUB, I see: "Loading Linux 6.6.1-1-default ..." "Loading initial ramdisk ..." and the system hangs on this screen forever. My workstation's diagnostic LEDs turn on in a pattern that, according to the service manual, indicates an "other type of failure". The workstation is a Dell T5600. CPU: 2x Intel Xeon E5-2680 (v1) RAM: 8x 8 GB DDR3 1600 MHz GPU: Nvidia NVS510, I'm using nouveau drivers. My kernel cmdline: root=UUID=5136bb35-acaa-4796-b463-8c9ef307025d splash=silent quiet mitigations=auto The latest default 6.5.9 kernel works fine on this machine. I'm available for all kinds of testing/debugging, including kernel recompilation with experimental patches and the like.
Could you try to boot without native graphics via nomodeset boot option? If it boots, the problem is in the graphics driver. If it still doesn't boot, it's something else.
(In reply to Takashi Iwai from comment #1) > Could you try to boot without native graphics via nomodeset boot option? > If it boots, the problem is in the graphics driver. If it still doesn't > boot, it's something else. Sorry, I forgot to mention I have already tried this, and it did not boot.
(In reply to Tommaso Fonda from comment #2) > (In reply to Takashi Iwai from comment #1) > > Could you try to boot without native graphics via nomodeset boot option? > > If it boots, the problem is in the graphics driver. If it still doesn't > > boot, it's something else. > > Sorry, I forgot to mention I have already tried this, and it did not boot. Hm, then keep nomodeset and also drop quiet and splash=silent options. This will show more lines at boot and we may see at which line it hangs up. If this doesn't show any useful lines, we might need to enable the early printk. ... or it might be something to do with the CPU microcode loading. In that case, you can try to pass "dis_ucode_ldr" boot option to disable CPU microcode.
(In reply to Takashi Iwai from comment #3) > Hm, then keep nomodeset and also drop quiet and splash=silent options. This > will show more lines at boot and we may see at which line it hangs up. > > If this doesn't show any useful lines, we might need to enable the early > printk. > > ... or it might be something to do with the CPU microcode loading. In that > case, you can try to pass "dis_ucode_ldr" boot option to disable CPU > microcode. nomodeset without quiet and splash=silent changes nothing. Adding dis_ucode_ldr changes nothing. I've tried enabling earlyprintk (not sure I did it the right way) by removing quiet and splash=silent and adding earlyprintk=vga,keep and earlyprintk=serial,ttyS0,keep and this too changed nothing. During this earlyprintk test, I forgot to add nomodeset, though.
From my previous comment it sounds like I added both the vga and serial earlyprintk parameters at the same time. This is not the case: I've tested one first, and then the other, to no avail.
The same occurs on Linux 6.6.2 too.
Sorry for the bump, is there anything else we can try to debug this issue, or shall I start bisecting all the patches in the 6.5 -> 6.6 chain to find the problematic one? It sounds very tie consuming, but if it's my only option, I'll have to do it...
At first, try to check whether the latest 6.7-rc still suffers from the same problem. If it boots, it's something specific to 6.6.x, and we need to find an upstream fix. If 6.7-rc still doesn't boot, it has to be reported to the upstream. But without a proper log, it's a bit difficult to know to whom reporting. I guess you'd be asked to perform git bisect in such a case.
I have also kernel failure with the latest tumbleweed kernel. When installing it runs dracut for every single core. The system boots. But when I want to log into KDE, it hangs forever. Console reports a failure to start powerdevil (for whatever reason). So already the kernel install procedure seems fishy. Again, Kernel 6.5.9 just works fine, boots, KDE runs fine. I have a Dell latitude 7490. So even if it starts, I think kernel 6.6 has trouble interacting with KDE. There seems to be something fundamentally wrong. BTW, when installed for the first time, the system started and ran fine, even after reboot. I haven't done any install since then. Now I have updated to the latest tumbleweed 2023-12-02. That runs fine with kernel 6.5.9 but does not work with Kernel 6.6. I think this is a critical bug
(In reply to Takashi Iwai from comment #8) > At first, try to check whether the latest 6.7-rc still suffers from the same > problem. If it boots, it's something specific to 6.6.x, and we need to find > an upstream fix. > > If 6.7-rc still doesn't boot, it has to be reported to the upstream. But > without a proper log, it's a bit difficult to know to whom reporting. I > guess you'd be asked to perform git bisect in such a case. Thanks, I will build 6.7-rc as soon as possible and let you know.
(In reply to Rigo Wenning from comment #9) > I have also kernel failure with the latest tumbleweed kernel. When > installing it runs dracut for every single core. The system boots. But when > I want to log into KDE, it hangs forever. Console reports a failure to start > powerdevil (for whatever reason). So already the kernel install procedure > seems fishy. > > Again, Kernel 6.5.9 just works fine, boots, KDE runs fine. > > I have a Dell latitude 7490. > > So even if it starts, I think kernel 6.6 has trouble interacting with KDE. > There seems to be something fundamentally wrong. > > BTW, when installed for the first time, the system started and ran fine, > even after reboot. I haven't done any install since then. Now I have updated > to the latest tumbleweed 2023-12-02. That runs fine with kernel 6.5.9 but > does not work with Kernel 6.6. > > I think this is a critical bug If it boots ever, yours can be likely a different issue. There have been already a few different boot problems with 6.6.x. Make sure that your machine still has the boot problem with the latest kernel in OBS Kernel:stable repo, too. And if yes, check whether it shows the very same symptom (no relevant messages even if you boot with nomodeset boot option).
(In reply to Tommaso Fonda from comment #10) > (In reply to Takashi Iwai from comment #8) > > At first, try to check whether the latest 6.7-rc still suffers from the same > > problem. If it boots, it's something specific to 6.6.x, and we need to find > > an upstream fix. > > > > If 6.7-rc still doesn't boot, it has to be reported to the upstream. But > > without a proper log, it's a bit difficult to know to whom reporting. I > > guess you'd be asked to perform git bisect in such a case. > > Thanks, I will build 6.7-rc as soon as possible and let you know. There is already a kernel package for 6.7-rc available in OBS Kernel:HEAD repo. It can be tested quickly as well as the kernel in OBS Kernel:stable repo. I suppose you've tested only the Leap standard kernels, so far, right? i.e. you didn't build kernels by yourself?
(In reply to Takashi Iwai from comment #12) > There is already a kernel package for 6.7-rc available in OBS Kernel:HEAD > repo. > It can be tested quickly as well as the kernel in OBS Kernel:stable repo. > > I suppose you've tested only the Leap standard kernels, so far, right? i.e. > you didn't build kernels by yourself? Nice to know! I'll test it later today. So far, I've only tested official Tumbleweed 6.6.x kernels. I did not build 6.6.x myself yet.
Kernel 6.6.4 installed. When booting, the system boots normally into the X login screen. When I try to login into the KDE session, the background images of the plasma screen appear, but the desktop remains empty with only the mouse pointer shown. With kernel 6.5.9 the system starts normally. I will create another bug report as you said its unrelated and I did not find a similar bug.
(In reply to Rigo Wenning from comment #14) > Kernel 6.6.4 installed. When booting, the system boots normally into the X > login screen. When I try to login into the KDE session, the background > images of the plasma screen appear, but the desktop remains empty with only > the mouse pointer shown. > > With kernel 6.5.9 the system starts normally. I will create another bug > report as you said its unrelated and I did not find a similar bug. Then it's a different bug. Please create another one.
(In reply to Takashi Iwai from comment #12) > There is already a kernel package for 6.7-rc available in OBS Kernel:HEAD > repo. > It can be tested quickly as well as the kernel in OBS Kernel:stable repo. > > I suppose you've tested only the Leap standard kernels, so far, right? i.e. > you didn't build kernels by yourself? 6.7 rc4 from Kernel:HEAD doesn't boot either. Bisection, here I come... I guess it makes no sense to report this upstream without any log, right? I shall bisect and find the problematic patch in advance.
Yeah, even if you can't chase at the end, it'd be helpful to narrow down the regression range. I'm interested in whether you can boot your local built 6.5.x kernel properly. It might be something else, such as grub. In below you can find some info to reduce the kernel build time: https://docs.kernel.org/admin-guide/quickly-build-trimmed-linux.html This will help especially for git bisection.
(In reply to Takashi Iwai from comment #17) > Yeah, even if you can't chase at the end, it'd be helpful to narrow down the > regression range. > > I'm interested in whether you can boot your local built 6.5.x kernel > properly. It might be something else, such as grub. > > In below you can find some info to reduce the kernel build time: > https://docs.kernel.org/admin-guide/quickly-build-trimmed-linux.html > This will help especially for git bisection. I've been building my custom kernels for a long time, and my custom 6.5.x boots fine (just like TW's 6.5.x). I'll keep you updated.
Just as a guess, this might be related: https://bugzilla.kernel.org/show_bug.cgi?id=218173#c20
(In reply to Frank Krüger from comment #19) > Just as a guess, this might be related: > https://bugzilla.kernel.org/show_bug.cgi?id=218173#c20 Thanks, that looks suspicious, indeed. FWIW, the corresponding upstream commit is: a1b87d54f4e45ff5e0d081fb1d9db3bf1a8fb39a x86/efistub: Avoid legacy decompressor when doing EFI boot Tommaso, could you try to revert the commit? It'll lead to a conflict in arch/x86/include/asm/efi.h, but it should be trivially resolvable. Meanwhile I'm building a test kernel package with the revert in OBS home:tiwai:bsc1217344 repo. Once after we confirm it's the same problem, you can join to the upstream bugzilla entry for helping them to resolve the issue properly.
(In reply to Takashi Iwai from comment #20) > (In reply to Frank Krüger from comment #19) > > Just as a guess, this might be related: > > https://bugzilla.kernel.org/show_bug.cgi?id=218173#c20 > > Thanks, that looks suspicious, indeed. > FWIW, the corresponding upstream commit is: > a1b87d54f4e45ff5e0d081fb1d9db3bf1a8fb39a > x86/efistub: Avoid legacy decompressor when doing EFI boot > > Tommaso, could you try to revert the commit? > It'll lead to a conflict in arch/x86/include/asm/efi.h, but it should be > trivially resolvable. > > Meanwhile I'm building a test kernel package with the revert in OBS > home:tiwai:bsc1217344 repo. > > Once after we confirm it's the same problem, you can join to the upstream > bugzilla entry for helping them to resolve the issue properly. Yes!!! Reverting that commit fixed the issue. I'll leave a message in the upstream bug report right now.
The upstream fix commit 50d7cdf7a9b1 landed to Linus tree: efi/x86: Avoid physical KASLR on older Dell systems I'm going to backport the fix.
(In reply to Takashi Iwai from comment #22) > The upstream fix commit 50d7cdf7a9b1 landed to Linus tree: > efi/x86: Avoid physical KASLR on older Dell systems > > I'm going to backport the fix. JFYI: The fix is in kernel-default-6.6.7-1.1.g6869d09.x86_64 from Kernel:stable.
Let's close.