Bug 1213645 - Kernel 6.4.* hangs at boot
Summary: Kernel 6.4.* hangs at boot
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: x86-64 openSUSE Tumbleweed
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: openSUSE Kernel Bugs
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-25 13:32 UTC by Jakob Lorenz
Modified: 2024-06-25 17:51 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
kernel-default-6.4.4-1.1 recovery mode log (965.36 KB, image/jpeg)
2023-07-25 13:32 UTC, Jakob Lorenz
Details
boot with "initcall_debug" "3" (1.78 MB, image/jpeg)
2023-07-26 15:05 UTC, Jakob Lorenz
Details
dmesg - Kernel 6.4.6-2.g7f751cb-default (254.39 KB, text/plain)
2023-07-26 15:12 UTC, Jakob Lorenz
Details
output of dmidecode (17.87 KB, text/plain)
2023-07-26 15:53 UTC, Jakob Lorenz
Details
Fix patch (1.22 KB, patch)
2023-07-26 18:03 UTC, Takashi Iwai
Details | Diff
dmesg - Kernel 6.5.0-rc3-3.g9ba70bb-default (78.74 KB, text/plain)
2023-07-29 08:42 UTC, Jakob Lorenz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jakob Lorenz 2023-07-25 13:32:40 UTC
Created attachment 868423 [details]
kernel-default-6.4.4-1.1 recovery mode log

kernel-default 6.4.* hangs at boot.

The screen only shows:

ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PC00.MHBR], AE_NOT_FOUND (20221020/psargs-330)
ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PTID.PBAR], AE_NOT_FOUND (20221020/dsfield-500)

When I boot in recovery mode, it last shows

[full log attached]
input: ETPS/2 Elantech Touchpad as /devices/platform/i8042/serio1/input/input2 


Device: TUXEDO InfinityBook S 15 Gen7
CPU: Intel i5-1240P
Comment 1 Takashi Iwai 2023-07-25 14:12:07 UTC
Could you try to drop "splash=silent" and "quiet" boot options, and add "initcall_debug" and "3" options, and check the boot log?
This should show the boot steps more verbosely.
Comment 2 Jakob Lorenz 2023-07-26 13:39:17 UTC
(In reply to Takashi Iwai from comment #1)
> Could you try to drop "splash=silent" and "quiet" boot options, and add
> "initcall_debug" and "3" options, and check the boot log?
> This should show the boot steps more verbosely.

Unfortunately results in the same output as in the attachment.
Are there other options I can try?
Comment 3 Takashi Iwai 2023-07-26 13:41:57 UTC
(In reply to Jakob Lorenz from comment #2)
> (In reply to Takashi Iwai from comment #1)
> > Could you try to drop "splash=silent" and "quiet" boot options, and add
> > "initcall_debug" and "3" options, and check the boot log?
> > This should show the boot steps more verbosely.
> 
> Unfortunately results in the same output as in the attachment.

The output must be different, even if it stops at some point.  When initcall_debug is passed, it'll show much more outputs.
And, please test with the normal boot, not the recovery entry.

> Are there other options I can try?
Comment 4 Takashi Iwai 2023-07-26 13:45:41 UTC
In anyway, please try the latest kernel in OBS Kernel:stable repo at first.

Note that it's an unofficial build, hence you'd need to turn off Secure Boot if it's enabled in BIOS.
Comment 5 Jakob Lorenz 2023-07-26 15:05:10 UTC
Created attachment 868435 [details]
boot with "initcall_debug" "3"
Comment 6 Jakob Lorenz 2023-07-26 15:05:58 UTC
(In reply to Takashi Iwai from comment #3)
> The output must be different, even if it stops at some point.  When
> initcall_debug is passed, it'll show much more outputs.
> And, please test with the normal boot, not the recovery entry.

I checked it again and it produces the same output... I have attached it again anyway
Comment 7 Jakob Lorenz 2023-07-26 15:12:11 UTC
(In reply to Takashi Iwai from comment #4)
> In anyway, please try the latest kernel in OBS Kernel:stable repo at first.
> 
> Note that it's an unofficial build, hence you'd need to turn off Secure Boot
> if it's enabled in BIOS.

Kernel-default from OSB Kernel:stable hangs for 30s, but boots.
However, I get many

[full log attached]
i915 0000:00:02.0: [drm] *ERROR* Error on Pipe B: 0x00000200

errors.
Comment 8 Jakob Lorenz 2023-07-26 15:12:56 UTC
Created attachment 868436 [details]
dmesg - Kernel 6.4.6-2.g7f751cb-default
Comment 9 Takashi Iwai 2023-07-26 15:36:28 UTC
OK, this is basically a firmware bug regarding TPM.

The recent kernel has a test of TPM IRQ, and the firmware doesn't work with it.
There was no proper timeout in the last few kernels, and it caused a boot hang up.  Now it received some fallback after too many missing IRQs, and it boots up, at least.

You can see it in the log:
[    1.380363] tpm tpm0: [Firmware Bug]: TPM interrupt storm detected, polling instead
[    1.380379] tpm tpm0: Consider adding the following entry to tpm_tis_dmi_table:
[    1.380385] tpm tpm0: 	DMI_SYS_VENDOR: TUXEDO
[    1.380389] tpm tpm0: 	DMI_PRODUCT_VERSION: Not Applicable

As a workaround, please try to pass the boot option
  tpm_tis.interrupts=0

This should work with the other 6.4.x kernels, too.
You can drop other previous options for debugging now.
Comment 10 Takashi Iwai 2023-07-26 15:37:10 UTC
(In reply to Jakob Lorenz from comment #7)
> However, I get many
> 
> [full log attached]
> i915 0000:00:02.0: [drm] *ERROR* Error on Pipe B: 0x00000200
> 
> errors.

This is a different bug, and it's about the graphics stuff.
But let's see whether the TPM workaround changes anything at first.
Comment 11 Takashi Iwai 2023-07-26 15:46:00 UTC
Also, could you give the output of dmidecode (run as root)? 
If the command isn't present, install dmidecode package.
Comment 12 Jakob Lorenz 2023-07-26 15:50:46 UTC
> As a workaround, please try to pass the boot option
>   tpm_tis.interrupts=0
> 
> This should work with the other 6.4.x kernels, too.
> You can drop other previous options for debugging now.

With the option kernel 6.4.4-1-default boots without problems, thanks for your help.
Will there be a permanent solution to the problem?
Comment 13 Takashi Iwai 2023-07-26 15:52:33 UTC
Please give the dmidecode output.  Then I can cook a patch to apply the workaround statically in the kernel.
Comment 14 Jakob Lorenz 2023-07-26 15:53:10 UTC
Created attachment 868438 [details]
output of dmidecode

Output of dmidecode attached
Comment 15 Takashi Iwai 2023-07-26 16:05:06 UTC
Thanks!

Now I'm building a test kernel with the fix patch in OBS home:tiwai:bsc1213645 repo.  Once after the build finishes (takes an hour or so), could you try the kernel later without the boot option?

If it's confirmed to work, I'll submit the fix to the upstream and backport to TW kernel.
Comment 16 Jakob Lorenz 2023-07-26 17:03:39 UTC
Works perfectly, thank you for your quick help
Comment 17 Takashi Iwai 2023-07-26 18:03:20 UTC
Good to hear!
Now I submitted the patch and backported it to TW stable branch.
Until my fix gets merged and released, you can keep my test kernel.
Comment 18 Takashi Iwai 2023-07-26 18:03:42 UTC
Created attachment 868439 [details]
Fix patch
Comment 19 Takashi Iwai 2023-07-29 07:16:47 UTC
The upstream asked about the behavior with the latest 6.5-rc kernel.
Could you test quickly whether the problem hits on the kernel from OBS Kernel:HEAD repo?
Comment 20 Jakob Lorenz 2023-07-29 08:41:51 UTC
(In reply to Takashi Iwai from comment #19)
> The upstream asked about the behavior with the latest 6.5-rc kernel.
> Could you test quickly whether the problem hits on the kernel from OBS
> Kernel:HEAD repo?

On the first boot the kernel hang for 30s, but on the next boots it hung again forever :/
Comment 21 Jakob Lorenz 2023-07-29 08:42:48 UTC
Created attachment 868497 [details]
dmesg - Kernel 6.5.0-rc3-3.g9ba70bb-default
Comment 22 Takashi Iwai 2023-07-29 08:44:52 UTC
Thanks for quick testing.  I should have mentioned that the failure is expected; Kernel:HEAD doesn't contain my fix patch, so it confirms that the problem still persists in the latest upstream and my fix is still necessary.