Bug 1219406 - [BUG] kernel NULL pointer dereference with Linux 6.7.1-2-default
Summary: [BUG] kernel NULL pointer dereference with Linux 6.7.1-2-default
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: openSUSE Kernel Bugs
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-31 12:11 UTC by Kostas Peletidis
Modified: 2024-03-06 11:49 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kostas Peletidis 2024-01-31 12:11:06 UTC
I just saw this bug on my work laptop. The desktop froze and only the mouse pointer was responsive. I connected to the laptop from another machine and noticed that Xorg was a zombie process. I also saw the following kernel messages:


[ 8550.326847] pcieport 0000:00:08.1: PME: Spurious native interrupt!
[13172.962208] BUG: kernel NULL pointer dereference, address: 0000000000000000
[13172.962228] #PF: supervisor read access in kernel mode
[13172.962235] #PF: error_code(0x0000) - not-present page
[13172.962243] PGD 0 P4D 0 
[13172.962255] Oops: 0000 [#1] PREEMPT SMP NOPTI
[13172.962266] CPU: 11 PID: 2019 Comm: Xorg.bin Not tainted 6.7.1-2-default #1 openSUSE Tumbleweed d50116cfdb1b14a701e904c894d8f1c040bf1146
[13172.962281] Hardware name: LENOVO 20XGS0V508/20XGS0V508, BIOS R1NET47W (1.17) 12/21/2021
[13172.962289] RIP: 0010:drm_mode_rmfb+0xb6/0x1c0
[13172.962308] Code: 00 00 4c 89 ef e8 7a 0e 3e 00 48 8b 83 98 00 00 00 48 2d 98 00 00 00 48 39 c3 0f 84 eb 00 00 00 31 d2 b9 01 00 00 00 4c 39 e0 <48> 8b 80 98 00 00 00 0f 44 d1 48 2d 98 00 00 00 48 39 c3 75 e8 85
[13172.962317] RSP: 0018:ffffa86fc2bbfc80 EFLAGS: 00010202
[13172.962327] RAX: ffffffffffffff68 RBX: ffff941bc60f1800 RCX: 0000000000000001
[13172.962334] RDX: 0000000000000001 RSI: ffff941bc2004920 RDI: ffff941bc60f18a8
[13172.962341] RBP: ffff941e7352b318 R08: ffff941bc2004b18 R09: ffff941c88c80200
[13172.962347] R10: 0000000000000000 R11: 0000000000000000 R12: ffff941e7352b300
[13172.962354] R13: ffff941bc60f18a8 R14: ffffa86fc2bbfd68 R15: 0000000000000004
[13172.962361] FS:  00007faa06805980(0000) GS:ffff941e9ef80000(0000) knlGS:0000000000000000
[13172.962368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13172.962374] CR2: 0000000000000000 CR3: 00000001045ce000 CR4: 0000000000750ef0
[13172.962379] PKRU: 55555554
[13172.962384] Call Trace:
[13172.962390]  <TASK>
[13172.962403]  ? __die+0x23/0x70
[13172.962423]  ? page_fault_oops+0x14d/0x490
[13172.962434]  ? ttwu_queue_wakelist+0xef/0x110
[13172.962446]  ? srso_alias_return_thunk+0x5/0xfbef5
[13172.962468]  ? exc_page_fault+0x71/0x160
[13172.962480]  ? asm_exc_page_fault+0x26/0x30
[13172.962495]  ? drm_mode_rmfb+0xb6/0x1c0
[13172.962508]  ? __pfx_drm_mode_rmfb_ioctl+0x10/0x10
[13172.962516]  drm_ioctl_kernel+0xce/0x170
[13172.962525]  ? __pfx_drm_mode_page_flip_ioctl+0x10/0x10
[13172.962543]  drm_ioctl+0x256/0x490
[13172.962552]  ? __pfx_drm_mode_rmfb_ioctl+0x10/0x10
[13172.962561]  ? __pfx_drm_mode_page_flip_ioctl+0x10/0x10
[13172.962580]  amdgpu_drm_ioctl+0x4e/0x90 [amdgpu c19de16ba0fd72478b307639f09a9c13c52c8d28]
[13172.963085]  __x64_sys_ioctl+0x97/0xd0
[13172.963098]  do_syscall_64+0x64/0xe0
[13172.963108]  ? srso_alias_return_thunk+0x5/0xfbef5
[13172.963116]  ? syscall_exit_to_user_mode+0x2b/0x40
[13172.963122]  ? srso_alias_return_thunk+0x5/0xfbef5
[13172.963129]  ? do_syscall_64+0x70/0xe0
[13172.963137]  ? switch_fpu_return+0x50/0xe0
[13172.963147]  ? srso_alias_return_thunk+0x5/0xfbef5
[13172.963154]  ? exit_to_user_mode_prepare+0x142/0x1f0
[13172.963165]  ? srso_alias_return_thunk+0x5/0xfbef5
[13172.963172]  ? syscall_exit_to_user_mode+0x2b/0x40
[13172.963178]  ? srso_alias_return_thunk+0x5/0xfbef5
[13172.963185]  ? do_syscall_64+0x70/0xe0
[13172.963192]  ? srso_alias_return_thunk+0x5/0xfbef5
[13172.963199]  ? do_syscall_64+0x70/0xe0
[13172.963206]  ? syscall_exit_to_user_mode+0x2b/0x40
[13172.963212]  ? srso_alias_return_thunk+0x5/0xfbef5
[13172.963219]  ? do_syscall_64+0x70/0xe0
[13172.963227]  ? __irq_exit_rcu+0x3b/0xb0
[13172.963242]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[13172.963254] RIP: 0033:0x7faa067139ef
[13172.963332] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[13172.963338] RSP: 002b:00007fff5e104280 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[13172.963346] RAX: ffffffffffffffda RBX: 00005608e31df550 RCX: 00007faa067139ef
[13172.963351] RDX: 00007fff5e10431c RSI: 00000000c00464af RDI: 000000000000000e
[13172.963355] RBP: 00007fff5e10431c R08: 00000005608e3575 R09: 0000000000000007
[13172.963360] R10: 00005608e35751a0 R11: 0000000000000246 R12: 00000000c00464af
[13172.963364] R13: 000000000000000e R14: 00005608e12f3ff0 R15: 0000000000000040
[13172.963377]  </TASK>
[13172.963381] Modules linked in: tun rfcomm nf_conntrack_netbios_ns nf_conntrack_broadcast ccm af_packet nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink qrtr cmac algif_hash algif_skcipher af_alg bnep msr binfmt_misc snd_acp_legacy_mach snd_acp_mach snd_soc_nau8821 nls_iso8859_1 snd_soc_dmic snd_acp3x_pdm_dma snd_acp3x_rn snd_sof_amd_acp63 nls_cp437 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir vfat snd_sof_amd_acp fat snd_ctl_led snd_sof_pci snd_sof_xtensa_dsp snd_hda_codec_realtek mt7921e snd_sof mt7921_common snd_hda_codec_generic btusb mt792x_lib snd_sof_utils btrtl mt76_connac_lib snd_hda_codec_hdmi btintel uvcvideo snd_soc_core intel_rapl_msr btbcm intel_rapl_common mt76 videobuf2_vmalloc btmtk snd_compress uvc snd_pcm_dmaengine snd_hda_intel videobuf2_memops edac_mce_amd bluetooth videobuf2_v4l2 snd_pci_ps snd_intel_dspcfg snd_intel_sdw_acpi
[13172.963548]  snd_rpl_pci_acp6x mac80211 videodev r8169 snd_acp_pci libarc4 thinkpad_acpi kvm_amd snd_acp_legacy_common snd_hda_codec snd_pci_acp6x videobuf2_common snd_pci_acp5x snd_hda_core realtek ecdh_generic mc ledtrig_audio snd_hwdep kvm mdio_devres cfg80211 snd_rn_pci_acp3x snd_pcm think_lmi platform_profile snd_acp_config irqbypass firmware_attributes_class snd_timer snd_soc_acpi wmi_bmof tiny_power_button efi_pstore libphy rfkill k10temp snd_pci_acp3x i2c_piix4 snd thermal soundcore ac joydev button nvme_fabrics fuse configfs dmi_sysfs ip_tables x_tables usbhid amdgpu crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched nvme drm_suballoc_helper xhci_pci drm_buddy xhci_pci_renesas ucsi_acpi hid_multitouch drm_display_helper nvme_core xhci_hcd aesni_intel cec typec_ucsi video hid_generic nvme_auth crypto_simd cryptd usbcore ccp roles rc_core t10_pi typec sp5100_tco battery
[13172.963741]  i2c_hid_acpi wmi i2c_hid serio_raw btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq br_netfilter bridge stp llc efivarfs
[13172.963781] CR2: 0000000000000000
[13172.963787] ---[ end trace 0000000000000000 ]---
[13172.963792] RIP: 0010:drm_mode_rmfb+0xb6/0x1c0
[13172.963801] Code: 00 00 4c 89 ef e8 7a 0e 3e 00 48 8b 83 98 00 00 00 48 2d 98 00 00 00 48 39 c3 0f 84 eb 00 00 00 31 d2 b9 01 00 00 00 4c 39 e0 <48> 8b 80 98 00 00 00 0f 44 d1 48 2d 98 00 00 00 48 39 c3 75 e8 85
[13172.963807] RSP: 0018:ffffa86fc2bbfc80 EFLAGS: 00010202
[13172.963813] RAX: ffffffffffffff68 RBX: ffff941bc60f1800 RCX: 0000000000000001
[13172.963818] RDX: 0000000000000001 RSI: ffff941bc2004920 RDI: ffff941bc60f18a8
[13172.963822] RBP: ffff941e7352b318 R08: ffff941bc2004b18 R09: ffff941c88c80200
[13172.963827] R10: 0000000000000000 R11: 0000000000000000 R12: ffff941e7352b300
[13172.963831] R13: ffff941bc60f18a8 R14: ffffa86fc2bbfd68 R15: 0000000000000004
[13172.963836] FS:  00007faa06805980(0000) GS:ffff941e9ef80000(0000) knlGS:0000000000000000
[13172.963841] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13172.963846] CR2: 0000000000000000 CR3: 00000001045ce000 CR4: 0000000000750ef0
[13172.963851] PKRU: 55555554
[13172.963856] note: Xorg.bin[2019] exited with irqs disabled
Comment 1 Kostas Peletidis 2024-02-02 22:33:21 UTC
According to this email:

https://lkml.iu.edu/hypermail/linux/kernel/2401.3/05636.html

a very similar issue has been fixed recently:

"So we had a number of small annoying issues in rc1, including an
amdgpu scheduling bug that could cause a hung desktop (that would
*eventually* recover, but after a long enough timeout that most people
probably ended up rebooting instead. That one seems to have hit a fair
number of people."

Therefore, the fix for this bug may be this commit which has been included in Linux v6.8-rc2:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.8-rc2&id=bc8f6d42b1334f486980d57c8d12f3128d30c2e3
Comment 2 Takashi Iwai 2024-02-03 07:41:19 UTC
(In reply to Kostas Peletidis from comment #1)
> According to this email:
> 
> https://lkml.iu.edu/hypermail/linux/kernel/2401.3/05636.html
> 
> a very similar issue has been fixed recently:
> 
> "So we had a number of small annoying issues in rc1, including an
> amdgpu scheduling bug that could cause a hung desktop (that would
> *eventually* recover, but after a long enough timeout that most people
> probably ended up rebooting instead. That one seems to have hit a fair
> number of people."
> 
> Therefore, the fix for this bug may be this commit which has been included
> in Linux v6.8-rc2:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?h=v6.8-rc2&id=bc8f6d42b1334f486980d57c8d12f3128d30c2e3

This commit is likely irrelevant.

But there have been lots of fixes in 6.7.x stable.  Please try 6.7.3 (or later) on OBS Kernel:stable repo
  http://download.opensuse.org/repositories/Kernel:/stable/standard/
Comment 3 Kostas Peletidis 2024-02-03 23:31:07 UTC
Indeed. I don't have access to my work laptop these days but I found a TW virtual machine and saw that the commit I was hoping will fix the bug involves a file that doesn't exist in 6.7.2. So, although it fixes a null pointer dereference, it doesn't address the bug I saw.

I'll try a more recent kernel when I return from my leave.
Comment 4 Kostas Peletidis 2024-03-06 10:49:56 UTC
I haven't seen this bug with more recent kernels. Shall we close?
Comment 5 Takashi Iwai 2024-03-06 11:49:21 UTC
Yes, let's close, then.