Bug 1202386

Summary: after upgrade, pc randomly freezes with kernel panic and segfault in power management
Product: [openSUSE] openSUSE Tumbleweed Reporter: Andrea Manzini <andrea.manzini>
Component: KernelAssignee: openSUSE Kernel Bugs <kernel-bugs>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: andrea.manzini, jslaby, malte, nicholas.yang, sw2000, tiwai
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: extract from journalctl --system output
stack trace of kernel fault
hwinfo output
new hwinfo after zypper update (kernel 5.19.1-1)
Boot with debug kernel journalctl output
Successful boot with USCI_ACPI patch
Successful second boot with USCI_ACPI patch

Description Andrea Manzini 2022-08-15 09:30:26 UTC
Created attachment 860780 [details]
extract from journalctl --system output

after upgrade my laptop to OpenSuse TumbleWeed 20220813, with kernel 5.19.0-1-default , I experience random kernel panics (PC freezes with caps lock blinking) and in system logs I find some error message about power management (see attachment).

 Hardware name: LENOVO 20YRS2D84A/20YRS2D84A, BIOS N37ET39W (1.20 ) 04/15/2022
Comment 1 Andrea Manzini 2022-08-15 09:59:44 UTC
Created attachment 860781 [details]
stack trace of kernel fault

add another stack trace from journalctl --system
Comment 2 Takashi Iwai 2022-08-17 11:25:38 UTC
Just to make sure : does it still happen with 5.19.1?

Also, in which situation does this bug occur?  Out of sudden during the normal operations?  Or disconnecting something like USB dock, or before/after the suspend/resume?

Last but not least, please give the hwinfo output.
Comment 3 Andrea Manzini 2022-08-17 12:17:32 UTC
Created attachment 860834 [details]
hwinfo output
Comment 4 Andrea Manzini 2022-08-17 12:19:27 UTC
kernel panic occurs out of sudden, during more or less 15 minutes of normal work; no device connect/disconnect and no suspend/resume. Please find attached hwinfo output. 

As it's my work laptop I had to restore an old snapshot, but I'll give a try with 5.19.1 and report ASAP.
Comment 5 Takashi Iwai 2022-08-18 08:08:06 UTC
*** Bug 1202507 has been marked as a duplicate of this bug. ***
Comment 6 Takashi Iwai 2022-08-18 08:09:07 UTC
bug 1202388 might be relevant, too.

Could you try to blacklist ucsi_acpi ?
Comment 7 Jiri Slaby 2022-08-18 08:17:56 UTC
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor instruction fetch in kernel mode
> #PF: error_code(0x0010) - not-present page
> PGD 0 P4D 0
> Oops: 0010 [#1] PREEMPT SMP NOPTI
> CPU: 8 PID: 899 Comm: systemd-logind Tainted: P           OE     5.19.1-1-default #1 openSUSE Tumbleweed f3b44bfb672cdb9f235aff53b57724eba8b9411b
> Hardware name: LENOVO 20YQ001XZA/20YQ001XZA, BIOS N37ET39W (1.20 ) 04/15/2022
> RIP: 0010:0x0
> Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
> RSP: 0018:ffffb1a741f0fc78 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffffffff95a5de28 RCX: ffffffff95a5de28
> RDX: ffffb1a741f0fc80 RSI: 0000000000000004 RDI: ffff8a02404ec800
> RBP: 0000000000000004 R08: ffff8a02404ec838 R09: ffff8a0203f1b500
> R10: 0000000000000000 R11: 0000000000000001 R12: ffff8a0207cf0000
> R13: ffff8a02404ec838 R14: ffffffff95a5de28 R15: ffff8a02404ec800
> FS:  00007f6913f87b80(0000) GS:ffff8a094fe00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffffffffd6 CR3: 0000000103e1a004 CR4: 0000000000770ee0
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  power_supply_show_property+0xb3/0x230
>  dev_attr_show+0x15/0x40
>  sysfs_kf_seq_show+0xa0/0xe0
>  seq_read_iter+0x11f/0x450
>  new_sync_read+0xf7/0x180
>  vfs_read+0x144/0x190
>  ksys_read+0x63/0xe0
>  do_syscall_64+0x58/0x80
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f691472042c
> Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 27 31 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 8d 31 f8 ff 48
> RSP: 002b:00007ffc86fa8490 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 0000000000001001 RCX: 00007f691472042c
> RDX: 0000000000001001 RSI: 000055e7440c0560 RDI: 0000000000000018
> RBP: 000055e7440c0560 R08: 0000000000000000 R09: 00007f691481cb10
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000018
> R13: 0000000000001001 R14: ffffffffffffffff R15: 0000000000001000
>  </TASK>
> Modules linked in: snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device hid_logitech_hidpp af_packet cmac algif_hash algif_skcipher af_alg nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 bnep nft_reject btusb btrtl btbcm btintel hid_logitech_dj btmtk nft_ct bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common hid_generic videodev mc usbhid ecdh_generic nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle nvidia_drm(POE) nvidia_modeset(POE) ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw nvidia_uvm(POE) iptable_security ip_set nfnetlink nvidia(POE) ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter qrtr dmi_sysfs snd_soc_dmic snd_hda_codec_realtek
>  snd_hda_codec_generic ee1004 iTCO_wdt snd_sof_pci_intel_tgl spi_nor snd_sof_intel_hda_common intel_pmc_bxt iTCO_vendor_support iwlmvm mtd soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda mei_wdt mei_hdcp mei_pxp intel_rapl_msr snd_sof_pci snd_sof_xtensa_dsp mac80211 snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core intel_tcc_cooling snd_soc_acpi_intel_match x86_pkg_temp_thermal intel_powerclamp snd_soc_acpi coretemp soundwire_bus libarc4 kvm_intel snd_soc_core snd_compress snd_pcm_dmaengine kvm snd_hda_intel irqbypass iwlwifi snd_intel_dspcfg snd_intel_sdw_acpi thinkpad_acpi snd_hda_codec iwlmei nls_iso8859_1 joydev ledtrig_audio efi_pstore nls_cp437 platform_profile snd_hda_core think_lmi i2c_i801 vfat spi_intel_pci int3403_thermal ac firmware_attributes_class cfg80211 wmi_bmof i2c_smbus spi_intel fat snd_hwdep rfkill snd_pcm mei_me thunderbolt igc snd_timer processor_thermal_device_pci_legacy mei
>  processor_thermal_device processor_thermal_rfim snd processor_thermal_mbox processor_thermal_rapl soundcore intel_rapl_common int340x_thermal_zone thermal intel_soc_dts_iosf int3400_thermal intel_pmc_core intel_hid acpi_thermal_rel sparse_keymap acpi_pad acpi_tad tiny_power_button fuse configfs ip_tables x_tables ext4 mbcache jbd2 sdhci_pci xhci_pci xhci_pci_renesas cqhci xhci_hcd ucsi_acpi crct10dif_pclmul sdhci nvme crc32_pclmul typec_ucsi crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd serio_raw roles mmc_core usbcore nvme_core typec wmi battery video pinctrl_tigerlake button l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs
> Unloaded tainted modules: acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 fjes():1 acpi_cpufreq():1 pcc_cpufreq():1 fjes():1
>  pcc_cpufreq():1 acpi_cpufreq():1 fjes():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 fjes():1 acpi_cpufreq():1 fjes():1 asus_ec_sensors():1 pcc_cpufreq():1 asus_ec_sensors():1 fjes():1 acpi_cpufreq():1
> CR2: 0000000000000000
Comment 8 Jiri Slaby 2022-08-18 08:19:48 UTC
Decoded call Trace:
> power_supply_show_property (drivers/power/supply/power_supply_sysfs.c:284)

So psy->desc->get_property is apparently -ENOMSG when called.

> dev_attr_show (drivers/base/core.c:2106)
> sysfs_kf_seq_show (fs/sysfs/file.c:59)
> seq_read_iter (fs/seq_file.c:230)
> new_sync_read (include/linux/fs.h:2052 (discriminator 1)) 
> vfs_read (fs/read_write.c:482)
> ksys_read (fs/read_write.c:620)
> do_syscall_64 (arch/x86/entry/common.c:50)
> entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
Comment 9 Jiri Slaby 2022-08-18 08:36:35 UTC
(In reply to Jiri Slaby from comment #8)
> Decoded call Trace:
> > power_supply_show_property (drivers/power/supply/power_supply_sysfs.c:284)
> 
> So psy->desc->get_property is apparently -ENOMSG when called.

No, CR2 (0xffffffffffffffd6) confused me. ->get_property is apparently NULL according to RAX and the second CR2.
Comment 10 Jiri Slaby 2022-08-18 10:14:03 UTC
*** Bug 1202388 has been marked as a duplicate of this bug. ***
Comment 11 Jiri Slaby 2022-08-18 10:18:52 UTC
(In reply to Takashi Iwai from comment #6)
> Could you try to blacklist ucsi_acpi ?

This still holds ^^. Anyone?

Common modules to the three reporters.
aesni_intel
battery
button
configfs
crct10dif_pclmul
crc32c_intel
crc32_pclmul
cryptd
crypto_simd
dm_mod
dm_multipath
efivarfs
fuse
ghash_clmulni_intel
ip_tables
libcrc32c
mc
mmc_core
msr
nvidia_drm(POE)
nvidia_modeset(POE)
nvidia(POE)
nvidia_uvm(POE)
nvme
nvme_core
pinctrl_tigerlake
roles
scsi_dh_alua
scsi_dh_emc
scsi_dh_rdac
serio_raw
sg
typec
typec_ucsi
ucsi_acpi
usbcore
video
videodev
wmi
xhci_hcd
xhci_pci
xhci_pci_renesas
x_tables
Comment 12 Andrea Manzini 2022-08-18 12:14:18 UTC
As additional info, I can report that I'm running since 24h with kernel  5.19.1-1-default and I still haven't observed the kernel panic. Attaching new hwinfo but at least for me the problem seems solved with the latest update.
Comment 13 Andrea Manzini 2022-08-18 12:15:44 UTC
Created attachment 860900 [details]
new hwinfo after zypper update (kernel 5.19.1-1)
Comment 14 Takashi Iwai 2022-08-18 12:28:53 UTC
Interesting.  Another report (bug 1202388) indicated that 5.19.1 still shows the problem, so I don't bet yet :)
Comment 15 Andrea Manzini 2022-08-18 13:31:21 UTC
the report on https://bugzilla.suse.com/show_bug.cgi?id=1202388 seems about 5.19.0 ; the reporter also mentions black screen, which could point us to a problem with NVIDIA drivers because I had to update also them and compile against the new kernel to have graphical mode working on my laptop.
Comment 16 Andrea Manzini 2022-08-18 13:37:09 UTC
so I'd suggest these actions (at least they worked for me with same hardware)

- zypper dup and make sure to install latest kernel 5.19.1 

- install updated NVIDIA drivers following https://en.opensuse.org/SDB:NVIDIA_drivers 

- use suse prime ( https://en.opensuse.org/SDB:NVIDIA_SUSE_Prime ) to enable NVIDIA graphics card, otherwise external display does not get any signal
Comment 17 Takashi Iwai 2022-08-18 14:19:13 UTC
This reminded me of one basic question: could the bug be rather in Nvidia driver?
i.e. Can anyone reproduce the bug without Nvidia driver at all?
Comment 18 Jiri Slaby 2022-08-18 14:36:04 UTC
(In reply to Takashi Iwai from comment #17)
> i.e. Can anyone reproduce the bug without Nvidia driver at all?

FWIW I've checked all in-kernel drivers and all set get_property before registering the device. So it now very looks like an nvidia issue...
Comment 19 Malte von der Lancken Wakenitz 2022-08-18 15:02:43 UTC
(In reply to Takashi Iwai from comment #6)
> bug 1202388 might be relevant, too.
> 
> Could you try to blacklist ucsi_acpi ?

I have blacklisted ucsi_acpi and subsequently booted twice (consecutively) successfully. I do not observe any null pointer dereferences in the logs.

The system seems to be stable so far.
Comment 20 Takashi Iwai 2022-08-18 15:18:00 UTC
FWIW, I'm building a test kernel with a debug patch that does NULL checks and some debug prints around the power supply driver in OBS home:tiwai:bsc1202386.
If it were a simple NULL ops, the crash could be avoided (and show the problematic device name).

Unfortunately, the package build gets stuck for hours, maybe because of the reduced build power in our build firm.  In anyway, once after the build finishes, it'd be great if anyone can test it (in the similar condition you triggered the bug in the past) and give the dmesg output.
The test kernel will appear at:
  http://download.opensuse.org/repositories/home:/tiwai:/bsc1202386/standard/

Note that it's an unofficial build, hence disable Secure Boot beforehand.
Comment 21 Malte von der Lancken Wakenitz 2022-08-18 22:40:30 UTC
(In reply to Takashi Iwai from comment #20)
> FWIW, I'm building a test kernel with a debug patch that does NULL checks
> and some debug prints around the power supply driver in OBS
> home:tiwai:bsc1202386.
> If it were a simple NULL ops, the crash could be avoided (and show the
> problematic device name).
> 
> Unfortunately, the package build gets stuck for hours, maybe because of the
> reduced build power in our build firm.  In anyway, once after the build
> finishes, it'd be great if anyone can test it (in the similar condition you
> triggered the bug in the past) and give the dmesg output.
> The test kernel will appear at:
>   http://download.opensuse.org/repositories/home:/tiwai:/bsc1202386/standard/
> 
> Note that it's an unofficial build, hence disable Secure Boot beforehand.

I have added the above repo to my normal repos. I see only i586 and i686 architecture kernels. But I have also never installed a "patched" debug kernel like this before. Am I missing something? If so could you perhaps point me towards some documentation that might help me understand what to do?
Comment 22 Jiri Slaby 2022-08-19 05:39:07 UTC
(In reply to Malte von der Lancken Wakenitz from comment #21)
> I have added the above repo to my normal repos. I see only i586 and i686
> architecture kernels.

I suppose the build hasn't finished back then yet. It's now all built.
Comment 23 Takashi Iwai 2022-08-19 05:40:04 UTC
Yes, now you can find x86-64 binaries, too.  The build took just too long yesterday :-<

I recommend *not* to add the repo to zypper, but just download kernel-default.rpm from the URL, and install it via zypper install.  You might need to pass --oldpackage option, too.

Also, another recommendation is to increase the number of installable kernels by editing /etc/zypp/zypp.conf beforehand.  Add more entries in the line "multiversion.kernels = ..." line, e.g.
  multiversion.kernels = latest,latest-1,latest-2,latest-3,running
Comment 24 Takashi Iwai 2022-08-19 05:42:27 UTC
Oh, for Nvidia driver, you might need a few more packages to install from the repo: kernel-devel, kernel-default-devel, at lest, I guess.
Comment 25 Nicholas Yang 2022-08-19 10:08:08 UTC
I installed the test kernel-default and kernel-default-devel. Zypper complains lack of dependency `kernel-devel` and nvidia drivers failed to link against it.

I boot the test kernel without nvidia drivers and kernel oops is still there.

BTW, I dump another copy of dmesg and hwinfo when 5.19.1-1-default boots successfully without blacklisting ucsi_acpi.

Do you need these log files?
Comment 26 Takashi Iwai 2022-08-19 10:27:08 UTC
(In reply to Nicholas Yang from comment #25)
> I installed the test kernel-default and kernel-default-devel. Zypper
> complains lack of dependency `kernel-devel` and nvidia drivers failed to
> link against it.

You need to install kernel-devel in addition to kernel-default-devel (both are available in the repo).

> I boot the test kernel without nvidia drivers and kernel oops is still there.

Please give dmesg output (and hwinfo output if possible).

> BTW, I dump another copy of dmesg and hwinfo when 5.19.1-1-default boots
> successfully without blacklisting ucsi_acpi.

That's what comment 12 implies too.  Not sure in which condition the bug gets triggered...

> Do you need these log files?

Yes, definitely.
Comment 27 Malte von der Lancken Wakenitz 2022-08-19 14:04:18 UTC
(In reply to Takashi Iwai from comment #23)
> Yes, now you can find x86-64 binaries, too.  The build took just too long
> yesterday :-<
> 
> I recommend *not* to add the repo to zypper, but just download
> kernel-default.rpm from the URL, and install it via zypper install.  You
> might need to pass --oldpackage option, too.
> 
> Also, another recommendation is to increase the number of installable
> kernels by editing /etc/zypp/zypp.conf beforehand.  Add more entries in the
> line "multiversion.kernels = ..." line, e.g.
>   multiversion.kernels = latest,latest-1,latest-2,latest-3,running

I successfully installed the debug kernel. Removed the module ucsi_acpi from the blacklist. Disabled secure boot. And tried to boot without success. It did get to a login screen (so nvidia driver was working), but on login the plymouth screen just started spinning it's busy image. I toggled to console (ctrl-alt-F1) and started seeing a stream of messages (presumably debug):

localhost.localdomain kernel: power_supply ucsi-source-psy-USBC000:002: XXX psy->desc->get_property is NULL

I could not log in on console. The system had become unresponsive. I then hard reset the machine. Added the module ucsi_acpi back into the blacklist and rebooted successfully.

I have attached as much log information as I could get from the attempt.
Comment 28 Malte von der Lancken Wakenitz 2022-08-19 14:08:01 UTC
Created attachment 860933 [details]
Boot with debug kernel journalctl output
Comment 29 Takashi Iwai 2022-08-19 14:59:21 UTC
Thanks, that helped understanding what's going on there.

I guess the problem is the upstream commit 87d0e2f41b8cc2018499be4e8003fa8c09b6f2fb
    usb: typec: ucsi: add a common function ucsi_unregister_connectors()

This commit looks as if it were a harmless cleanup, but this failed in a subtle way.
Namely, in the error scenario, the driver gets an error at ucsi_register_altmodes(), and goes to the error handling to release the resources.  Through this refactoring, the release part was unified to a funciton ucsi_unregister_connectors().  And there, it has a NULL check of con->wq, and it bails out the loop if it's NULL.
Meanwhile, ucsi_register_port() itself still calls destroy_workqueue() and clear con->wq at its error path.  This ended up in the leftover power supply device with the uninitialized / cleared device.

Now I'm building yet another kernel with the revert of the commit above in OBS home:tiwai:bsc1202386-2 repo.  The test kernel will appear at
  http://download.opensuse.org/repositories/home:/tiwai:/bsc1202386-2/standard/

Please give it a try later.
Comment 30 Takashi Iwai 2022-08-19 15:59:04 UTC
Now a new kernel was already built (fast!)  Can anyone do a quick check?
Comment 31 Malte von der Lancken Wakenitz 2022-08-19 16:14:14 UTC
(In reply to Takashi Iwai from comment #29)
> Thanks, that helped understanding what's going on there.
> 
> I guess the problem is the upstream commit
> 87d0e2f41b8cc2018499be4e8003fa8c09b6f2fb
>     usb: typec: ucsi: add a common function ucsi_unregister_connectors()
> 
> This commit looks as if it were a harmless cleanup, but this failed in a
> subtle way.
> Namely, in the error scenario, the driver gets an error at
> ucsi_register_altmodes(), and goes to the error handling to release the
> resources.  Through this refactoring, the release part was unified to a
> funciton ucsi_unregister_connectors().  And there, it has a NULL check of
> con->wq, and it bails out the loop if it's NULL.
> Meanwhile, ucsi_register_port() itself still calls destroy_workqueue() and
> clear con->wq at its error path.  This ended up in the leftover power supply
> device with the uninitialized / cleared device.
> 
> Now I'm building yet another kernel with the revert of the commit above in
> OBS home:tiwai:bsc1202386-2 repo.  The test kernel will appear at
>  
> http://download.opensuse.org/repositories/home:/tiwai:/bsc1202386-2/standard/
> 
> Please give it a try later.

I have successfully booted twice in consecutively. This error is still reported: ucsi_acpi USBC000:00: con2: failed to register alt modes, but without the subsequent null pointer dereference.

I assume this fixes the issue. How long before one might expect this fix to be included in an official kernel build? 

I do have a question though. What is the significance of the error? I see other USCI related errors, but also without any noticeable consequence. Are these just errors caused by the general "flakyness" of USB-C type docking stations?

I will attach the dmesg output from the last two boots.

And thank you very much for your help.
Comment 32 Malte von der Lancken Wakenitz 2022-08-19 16:15:22 UTC
Created attachment 860948 [details]
Successful boot with USCI_ACPI patch
Comment 33 Malte von der Lancken Wakenitz 2022-08-19 16:16:00 UTC
Created attachment 860949 [details]
Successful second boot  with USCI_ACPI patch
Comment 34 Takashi Iwai 2022-08-19 16:24:08 UTC
OK, thanks.  Now I pushed the temporary fix (with the revert) to my stable/for-next branch.  Once after the kbuild test passes, Jiri can merge it, and push to TW eventually later.

Meanwhile I'm going to ping the upstream about this bug.
Comment 35 Takashi Iwai 2022-08-25 08:20:47 UTC
Upstream took the revert patch in the end as a fix.
I refreshed the patch accordingly on stable branch.
Comment 36 Malte von der Lancken Wakenitz 2022-09-08 20:28:22 UTC
I have just upgraded to kernel 5.19.7-1.1 from the tumbleweed repo. I can boot the system. 

I still get these usci_acpi related error messages in the boot log:

Sep 08 22:08:11 localhost kernel: ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
Sep 08 22:08:13 localhost.localdomain kernel: ucsi_acpi USBC000:00: con2: failed to register alt modes
Sep 08 22:08:13 localhost.localdomain kernel: ucsi_acpi USBC000:00: PPM init failed (-110)

But no longer the Null pointer exceptions.
Comment 37 Takashi Iwai 2022-09-09 07:54:33 UTC
The error is still expected to appear.  What we fixed is that a bug that was triggered by this error handling.

But, it's still interesting whether this error appears newly in 5.19.x.  Did it appear on the earlier, 5.18.x kernels, too?
Comment 38 Wagner 2022-10-28 11:13:13 UTC
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1994126 looks very similar with the same backtrace similar hardware(T480 including thunderbolt-dock) and kernel version 5.19.0-21.
One difference is, that I can reproduce the issue on every boot.