Bug 1216533

Summary: [amdgpu] Screen goes black with 6.5.8-1-default - BUG: kernel NULL pointer dereference in drm_mode_rmfb()
Product: [openSUSE] openSUSE Tumbleweed Reporter: Kostas Peletidis <kostas.peletidis>
Component: KernelAssignee: openSUSE Kernel Bugs <kernel-bugs>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: bwiedemann, jslaby, patrik.jakobsson, sndirsch, tiwai, tzimmermann
Version: Current   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Kostas Peletidis 2023-10-24 09:45:51 UTC
I just saw this null pointer dereference bug. The laptop screen went black, the display connected via HDMI was still on, the mouse pointer could move around but clicks wouldn't get through to the applications. The keyboard wasn't responding either e.g. caps lock wouldn't toggle its LED when pressed and Alt+Ctrl+F2 wouldn't switch to a terminal.

Later on I saw in the log that Xorg exited with "irqs disabled" just after the bug was traced. Not sure if that is simply a symptom or something that needs to be investigated separately.

Log excerpt
-----------
Oct 24 11:10:38 savra kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Oct 24 11:10:38 savra kernel: #PF: supervisor read access in kernel mode
Oct 24 11:10:38 savra kernel: #PF: error_code(0x0000) - not-present page
Oct 24 11:10:38 savra kernel: PGD 0 P4D 0 
Oct 24 11:10:38 savra kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Oct 24 11:10:38 savra kernel: CPU: 12 PID: 3705 Comm: Xorg.bin Not tainted 6.5.8-1-default #1 openSUSE Tumbleweed f55c1e7dd42a60954146c55c0c3e76b2fe439e90
Oct 24 11:10:38 savra kernel: Hardware name: LENOVO 20XGS0V508/20XGS0V508, BIOS R1NET47W (1.17) 12/21/2021
Oct 24 11:10:38 savra kernel: RIP: 0010:drm_mode_rmfb+0xb6/0x1c0
Oct 24 11:10:38 savra kernel: Code: 00 00 4c 89 ef e8 4a 86 3c 00 48 8b 83 98 00 00 00 48 2d 98 00 00 00 48 39 c3 0f 84 eb 00 00 00 31 d2 b9 01 00 00 00 4c 39 e0 <48> 8b 80 98 00 00 00 0f 44 d1 48 2d 98 00 00 00 48 39 c3 75 e8 85
Oct 24 11:10:38 savra kernel: RSP: 0018:ffffbffe41c3fd18 EFLAGS: 00010202
Oct 24 11:10:38 savra kernel: RAX: ffffffffffffff68 RBX: ffffa06271f34c00 RCX: 0000000000000001
Oct 24 11:10:38 savra kernel: RDX: 0000000000000001 RSI: ffffa06244ed1240 RDI: ffffa06271f34ca8
Oct 24 11:10:38 savra kernel: RBP: ffffa064d95fee18 R08: ffffa06244ed13c0 R09: ffffa0625aa80200
Oct 24 11:10:38 savra kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffa064d95fee00
Oct 24 11:10:38 savra kernel: R13: ffffa06271f34ca8 R14: ffffa06271f34c00 R15: 0000000000000004
Oct 24 11:10:38 savra kernel: FS:  00007f640a40f980(0000) GS:ffffa0651f000000(0000) knlGS:0000000000000000
Oct 24 11:10:38 savra kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 24 11:10:38 savra kernel: CR2: 0000000000000000 CR3: 00000001814b0000 CR4: 0000000000750ee0
Oct 24 11:10:38 savra kernel: PKRU: 55555554
Oct 24 11:10:38 savra kernel: Call Trace:
Oct 24 11:10:38 savra kernel:  <TASK>
Oct 24 11:10:38 savra kernel:  ? __die+0x23/0x70
Oct 24 11:10:38 savra kernel:  ? page_fault_oops+0x14d/0x490
Oct 24 11:10:38 savra kernel:  ? inotify_handle_inode_event+0x9b/0x230
Oct 24 11:10:38 savra kernel:  ? srso_alias_return_thunk+0x5/0x7f
Oct 24 11:10:38 savra kernel:  ? srso_alias_return_thunk+0x5/0x7f
Oct 24 11:10:38 savra kernel:  ? fsnotify_insert_event+0x15c/0x160
Oct 24 11:10:38 savra kernel:  ? exc_page_fault+0x71/0x160
Oct 24 11:10:38 savra kernel:  ? asm_exc_page_fault+0x26/0x30
Oct 24 11:10:38 savra kernel:  ? drm_mode_rmfb+0xb6/0x1c0
Oct 24 11:10:38 savra kernel:  ? drm_mode_rmfb+0x96/0x1c0
Oct 24 11:10:38 savra kernel:  ? srso_alias_return_thunk+0x5/0x7f
Oct 24 11:10:38 savra kernel:  ? __fsnotify_parent+0x11b/0x340
Oct 24 11:10:38 savra kernel:  ? __pfx_drm_mode_rmfb_ioctl+0x10/0x10
Oct 24 11:10:38 savra kernel:  drm_ioctl_kernel+0xc5/0x170
Oct 24 11:10:38 savra kernel:  ? srso_alias_return_thunk+0x5/0x7f
Oct 24 11:10:38 savra kernel:  drm_ioctl+0x256/0x490
Oct 24 11:10:38 savra kernel:  ? __pfx_drm_mode_rmfb_ioctl+0x10/0x10
Oct 24 11:10:38 savra kernel:  amdgpu_drm_ioctl+0x4e/0x90 [amdgpu 0c0569563652e847865e2007f2498dffd65f1870]
Oct 24 11:10:38 savra kernel:  __x64_sys_ioctl+0x97/0xd0
Oct 24 11:10:38 savra kernel:  do_syscall_64+0x60/0x90
Oct 24 11:10:38 savra kernel:  ? srso_alias_return_thunk+0x5/0x7f
Oct 24 11:10:38 savra kernel:  ? do_syscall_64+0x6c/0x90
Oct 24 11:10:38 savra kernel:  ? do_syscall_64+0x6c/0x90
Oct 24 11:10:38 savra kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Oct 24 11:10:38 savra kernel: RIP: 0033:0x7f640a3139cf
Oct 24 11:10:38 savra kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Oct 24 11:10:38 savra kernel: RSP: 002b:00007ffe92d7c170 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Oct 24 11:10:38 savra kernel: RAX: ffffffffffffffda RBX: 0000562e25bb53b0 RCX: 00007f640a3139cf
Oct 24 11:10:38 savra kernel: RDX: 00007ffe92d7c20c RSI: 00000000c00464af RDI: 000000000000000e
Oct 24 11:10:38 savra kernel: RBP: 00007ffe92d7c20c R08: 0000000000000000 R09: 0000000000000001
Oct 24 11:10:38 savra kernel: R10: 0000562e271a7a00 R11: 0000000000000246 R12: 00000000c00464af
Oct 24 11:10:38 savra kernel: R13: 000000000000000e R14: 000000000000006c R15: 0000562e259b8090
Oct 24 11:10:38 savra kernel:  </TASK>
Oct 24 11:10:38 savra kernel: Modules linked in: tun rfcomm nf_conntrack_netbios_ns nf_conntrack_broadcast ccm cmac algif_hash algif_skcipher af_alg af_packet nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink qrtr bnep msr binfmt_misc nls_iso8859_1 nls_cp437 vfat fat snd_soc_dmic snd_acp3x_rn snd_acp3x_pdm_dma snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci mt7921e snd_sof_xtensa_dsp mt7921_common snd_ctl_led mt76_connac_lib btusb snd_sof intel_rapl_msr mt76 snd_hda_codec_realtek btrtl snd_sof_utils intel_rapl_common btbcm snd_hda_codec_generic snd_soc_core uvcvideo btintel edac_mce_amd snd_hda_codec_hdmi btmtk videobuf2_vmalloc snd_compress mac80211 snd_hda_intel uvc kvm_amd snd_intel_dspcfg videobuf2_memops bluetooth snd_pcm_dmaengine videobuf2_v4l2 libarc4 snd_intel_sdw_acpi kvm snd_pci_ps videodev snd_hda_codec snd_rpl_pci_acp6x snd_acp_pci snd_hda_core
Oct 24 11:10:38 savra kernel:  videobuf2_common ecdh_generic mc snd_pci_acp6x irqbypass snd_hwdep think_lmi efi_pstore snd_pci_acp5x r8169 cfg80211 firmware_attributes_class snd_pcm realtek snd_rn_pci_acp3x thinkpad_acpi tiny_power_button wmi_bmof snd_acp_config snd_soc_acpi mdio_devres ledtrig_audio snd_timer platform_profile k10temp snd_pci_acp3x libphy i2c_piix4 rfkill snd thermal soundcore ac button joydev fuse configfs dmi_sysfs ip_tables x_tables usbhid amdgpu i2c_algo_bit drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 drm_suballoc_helper amdxcp iommu_v2 drm_buddy xhci_pci gpu_sched xhci_pci_renesas nvme drm_display_helper xhci_hcd aesni_intel nvme_core ucsi_acpi hid_multitouch cec typec_ucsi video hid_generic crypto_simd usbcore cryptd roles ccp rc_core typec sp5100_tco t10_pi battery wmi i2c_hid_acpi i2c_hid serio_raw btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq sg br_netfilter bridge stp llc dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc
Oct 24 11:10:38 savra kernel:  scsi_dh_alua scsi_mod scsi_common efivarfs
Oct 24 11:10:38 savra kernel: CR2: 0000000000000000
Oct 24 11:10:38 savra kernel: ---[ end trace 0000000000000000 ]---
Oct 24 11:10:38 savra kernel: RIP: 0010:drm_mode_rmfb+0xb6/0x1c0
Oct 24 11:10:38 savra kernel: Code: 00 00 4c 89 ef e8 4a 86 3c 00 48 8b 83 98 00 00 00 48 2d 98 00 00 00 48 39 c3 0f 84 eb 00 00 00 31 d2 b9 01 00 00 00 4c 39 e0 <48> 8b 80 98 00 00 00 0f 44 d1 48 2d 98 00 00 00 48 39 c3 75 e8 85
Oct 24 11:10:38 savra kernel: RSP: 0018:ffffbffe41c3fd18 EFLAGS: 00010202
Oct 24 11:10:38 savra kernel: RAX: ffffffffffffff68 RBX: ffffa06271f34c00 RCX: 0000000000000001
Oct 24 11:10:38 savra kernel: RDX: 0000000000000001 RSI: ffffa06244ed1240 RDI: ffffa06271f34ca8
Oct 24 11:10:38 savra kernel: RBP: ffffa064d95fee18 R08: ffffa06244ed13c0 R09: ffffa0625aa80200
Oct 24 11:10:38 savra kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffa064d95fee00
Oct 24 11:10:38 savra kernel: R13: ffffa06271f34ca8 R14: ffffa06271f34c00 R15: 0000000000000004
Oct 24 11:10:38 savra kernel: FS:  00007f640a40f980(0000) GS:ffffa0651f000000(0000) knlGS:0000000000000000
Oct 24 11:10:38 savra kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 24 11:10:38 savra kernel: CR2: 0000000000000000 CR3: 00000001814b0000 CR4: 0000000000750ee0
Oct 24 11:10:38 savra kernel: PKRU: 55555554
Oct 24 11:10:38 savra kernel: note: Xorg.bin[3705] exited with irqs disabled
Comment 1 Takashi Iwai 2023-10-24 12:43:38 UTC
Looks like the same bug as the upstream report:
  https://gitlab.freedesktop.org/drm/amd/-/issues/2905

Please join there for helping debugging.
Comment 2 Takashi Iwai 2023-11-15 14:26:45 UTC
Does the bug persist with 6.6?  Just for keeping updated.
Comment 3 Kostas Peletidis 2023-11-15 16:23:32 UTC
(In reply to Takashi Iwai from comment #2)
> Does the bug persist with 6.6?  Just for keeping updated.

After upgrading the kernel I haven't seen this bug (even with 6.5.9).
Comment 4 Takashi Iwai 2023-11-20 14:48:54 UTC
Good to hear.  Then let's close now.
Feel free to reopen if the issue appears again.