|
Bugzilla – Full Text Bug Listing |
| Summary: | Kernel boot crashes on Thinkpad P14s Gen 3 AMD | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Takashi Iwai <tiwai> |
| Component: | Xen | Assignee: | Jürgen Groß <jgross> |
| Status: | NEW --- | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | jgross, santiago.zarate |
| Version: | Leap 15.6 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
dmesg with crash of Leap 15.6 kernel
dmesg from TW kernel Debug patch dmesg from the patched 6.9.7 kernel Debug patch V2 dmesg from the v2 patched 6.9.7 kernel logs from xen and normal boots acpidump output hwinfo output |
||
Created attachment 875833 [details]
dmesg from TW kernel
Related report on the net https://forum.qubes-os.org/t/kernel-panic-during-installation-on-lenovo-thinkpad-p16s-gen-2-amd-7840u-with-780m-igpu/21950 Created attachment 875846 [details]
Debug patch
Could you try to boot with the patch applied to your kernel? You'd need to add "xen_mc_debug" to the kernel commandline.
The kernel log should have some more data narrowing down the root cause.
Created attachment 875854 [details]
dmesg from the patched 6.9.7 kernel
The above is the log from the patched kernel. At this time, it was called with nomodeset, but it shouldn't matter. The bug happens right after modprobe of ucsi_acpi module.
As far as I understand, the second Oops ("BUG: unable to handle page fault for address: ffffc90040715100") happened at reading a byte value via ACPI_GET8(logical_addr_ptr) in acpi_ex_system_memory_space_handler().
(In reply to Takashi Iwai from comment #5) > The above is the log from the patched kernel. At this time, it was called > with nomodeset, but it shouldn't matter. The bug happens right after > modprobe of ucsi_acpi module. > > As far as I understand, the second Oops ("BUG: unable to handle page fault > for address: ffffc90040715100") happened at reading a byte value via > ACPI_GET8(logical_addr_ptr) in acpi_ex_system_memory_space_handler(). This is to be expected, as establishing the mapping did fail due to a negative return value from the hypervisor when trying to update a PTE. Created attachment 875905 [details]
Debug patch V2
Second try with more data being printed in the error case.
Can you please replace the first debug patch with this one?
Created attachment 875916 [details]
dmesg from the v2 patched 6.9.7 kernel
(In reply to Takashi Iwai from comment #8) > Created attachment 875916 [details] > dmesg from the v2 patched 6.9.7 kernel Thanks, this is making things much more clear. Seems as if the kernel is trying to map part of the MSI space (physical address range 0xfee00000 - 0xfeeff000). When running as dom0 this should not happen, as the hypervisor is owning this region and will deny mapping it. Seems as if the ucsi driver needs to be made Xen aware. Are you able to tell which I/O-resources are at physical address feec2000-feec2fff? Probably you should be able to find out when booting without Xen via "cat /proc/iomem" and/or "lspci -v". I'm pretty sure the region fee01000-feefffff should only be used as MSI space. Created attachment 875936 [details]
logs from xen and normal boots
There seems to be no BAR located in the area trying to be mapped. Could you please provide an acpidump? Created attachment 875954 [details]
acpidump output
Created attachment 875955 [details]
hwinfo output
|
Created attachment 875832 [details] dmesg with crash of Leap 15.6 kernel When I boot a recent kernel (openSUSE Leap 15.6 or TW 6.9.x kernel) with Xen (Dom0) on the Company's standard laptop (Thinkpad P14s Gen 3 AMD), it crashes with kernel oops and couldn't proceed the boot. After skimming over the net, I found that it's crashing at loading ucsi_acpi driver, and blacklisting it indeed made it booting further. (As a result, it lacks of the touchpad and some USB stuff, though.) Below is a dmesg output after manually loading ucsi_acpi module. I checked with 6.9.7 TW backport kernel, and it hits the same problem.