Bug 1219983 - amdgpu : "kernel NULL pointer dereference" kernel 6.7.4-1
Summary: amdgpu : "kernel NULL pointer dereference" kernel 6.7.4-1
Status: REOPENED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel:Drivers (show other bugs)
Version: Current
Hardware: 64bit openSUSE Tumbleweed
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Kernel Bugs
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-02-15 19:27 UTC by Alan Lima
Modified: 2024-06-13 13:27 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
tiwai: needinfo? (alanemmanuel5)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alan Lima 2024-02-15 19:27:30 UTC
Hi, this is my first bug report so please let me know if anything is missing.

since i've upgraded to linux 6.6 (if my memory's right, could be 6.7 bnut i'm unsure, sorry about that) i've been getting system 
freezes, the entire systems locks up and the only way to reboot is either by doing a reset or with the alt-sysrq + R E I S U B thingy.


i've managed to get the kdump logs, here's the end of the last dmesg output:


```
[13239.754787] BUG: kernel NULL pointer dereference, address: 0000000000000000
[13239.754793] #PF: supervisor read access in kernel mode
[13239.754796] #PF: error_code(0x0000) - not-present page
[13239.754798] PGD 65476e067 P4D 65476e067 PUD 6546c2067 PMD 5b05d9067 PTE 0
[13239.754804] Oops: 0000 [#1] PREEMPT SMP NOPTI
[13239.754807] CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded Tainted: G           OE      6.7.4-1-default #1 openSUSE Tumbleweed fea28090662c8f2f65d915c36daa0c32bc4f1b65
[13239.754811] Hardware name: Gigabyte Technology Co., Ltd. X570S AORUS PRO AX/X570S AORUS PRO AX, BIOS F6c 09/20/2023
[13239.754813] RIP: 0010:dcn10_set_drr+0xa0/0xf0 [amdgpu]
[13239.755075] Code: 74 e0 48 8b 80 28 01 00 00 48 85 c0 74 08 48 89 e6 e8 64 9c 95 eb 45 85 e4 74 c7 45 85 ed 74 c2 48 8b 03 48 8b b8 f8 00 00 00 <48> 8b 07 48 8b 80 40 01 00 00 48 85 c0 74 a9 48 83 c3 08 ba 02 00
[13239.755077] RSP: 0018:ffffaa35c0490dc8 EFLAGS: 00010002
[13239.755080] RAX: ffff9a8fae9c1458 RBX: ffffaa35c0490e18 RCX: 0000000000000000
[13239.755082] RDX: 0000000080010055 RSI: ffff9a89c27ea340 RDI: 0000000000000000
[13239.755084] RBP: ffffaa35c0490e08 R08: 0000000080000000 R09: ffffaa35c0490c88
[13239.755085] R10: 0000000000000008 R11: 0000000000000000 R12: 000000000000045f
[13239.755087] R13: 0000000000000831 R14: ffffaa35c0490e20 R15: ffff9a89cb0e4de0
[13239.755089] FS:  0000000000000000(0000) GS:ffff9a90c8500000(0000) knlGS:0000000000000000
[13239.755091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13239.755093] CR2: 0000000000000000 CR3: 00000006213ec000 CR4: 0000000000750ef0
[13239.755095] PKRU: 55555554
[13239.755097] Call Trace:
[13239.755101]  <IRQ>
[13239.755103]  ? __die+0x23/0x70
[13239.755108]  ? page_fault_oops+0x14d/0x490
[13239.755112]  ? srso_alias_return_thunk+0x5/0xfbef5
[13239.755116]  ? generic_reg_set_ex+0xa1/0xe0 [amdgpu c5f1e826bdfd5671aa9c9484f369da20911bbd7b]
[13239.755314]  ? exc_page_fault+0x71/0x160
[13239.755317]  ? asm_exc_page_fault+0x26/0x30
[13239.755322]  ? dcn10_set_drr+0xa0/0xf0 [amdgpu c5f1e826bdfd5671aa9c9484f369da20911bbd7b]
[13239.755513]  ? dcn10_set_drr+0x8c/0xf0 [amdgpu c5f1e826bdfd5671aa9c9484f369da20911bbd7b]
[13239.755704]  dc_stream_adjust_vmin_vmax+0xaa/0xd0 [amdgpu c5f1e826bdfd5671aa9c9484f369da20911bbd7b]
[13239.755893]  dm_crtc_high_irq+0x193/0x1a0 [amdgpu c5f1e826bdfd5671aa9c9484f369da20911bbd7b]
[13239.756080]  amdgpu_dm_irq_handler+0x85/0x1d0 [amdgpu c5f1e826bdfd5671aa9c9484f369da20911bbd7b]
[13239.756267]  amdgpu_irq_dispatch+0xbb/0x200 [amdgpu c5f1e826bdfd5671aa9c9484f369da20911bbd7b]
[13239.756425]  amdgpu_ih_process+0x83/0x100 [amdgpu c5f1e826bdfd5671aa9c9484f369da20911bbd7b]
[13239.756577]  amdgpu_irq_handler+0x23/0x60 [amdgpu c5f1e826bdfd5671aa9c9484f369da20911bbd7b]
[13239.756728]  __handle_irq_event_percpu+0x4a/0x1a0
[13239.756733]  handle_irq_event+0x38/0x80
[13239.756735]  handle_edge_irq+0x8b/0x230
[13239.756738]  __common_interrupt+0x3f/0xa0
[13239.756741]  common_interrupt+0x81/0xa0
[13239.756744]  </IRQ>
[13239.756745]  <TASK>
[13239.756747]  asm_common_interrupt+0x26/0x40
[13239.756750] RIP: 0010:cpuidle_enter_state+0xcc/0x440
[13239.756752] Code: 9a 0c 48 ff e8 e5 f1 ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 b3 17 47 ff 45 84 ff 0f 85 56 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
[13239.756754] RSP: 0018:ffffaa35c01d7e90 EFLAGS: 00000246
[13239.756756] RAX: ffff9a90c853a380 RBX: ffff9a90c8c7dc00 RCX: 000000000000001f
[13239.756757] RDX: 000000000000000a RSI: 0000000021af2995 RDI: 0000000000000000
[13239.756758] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000000009
[13239.756760] R10: 0000000000000008 R11: ffff9a90c8538d64 R12: ffffffffade36060
[13239.756761] R13: 00000c0a9ee096c7 R14: 0000000000000001 R15: 0000000000000000
[13239.756766]  cpuidle_enter+0x2d/0x40
[13239.756769]  do_idle+0x20d/0x270
[13239.756773]  cpu_startup_entry+0x2a/0x30
[13239.756776]  start_secondary+0x11e/0x140
[13239.756779]  secondary_startup_64_no_verify+0x18f/0x19b
[13239.756784]  </TASK>
[13239.756785] Modules linked in: binfmt_misc rfcomm snd_seq_dummy snd_hrtimer snd_seq af_packet nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink razermouse(OE) qrtr ext4 mbcache jbd2 mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 mac80211 libarc4 cfg80211 igc joydev cmac algif_hash algif_skcipher af_alg bnep btusb btrtl btintel btbcm btmtk bluetooth ecdh_generic snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio nls_iso8859_1 intel_rapl_msr nls_cp437 intel_rapl_common vfat snd_hda_codec_hdmi fat edac_mce_amd snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_usb_audio uvcvideo videobuf2_vmalloc snd_hda_codec uvc snd_usbmidi_lib videobuf2_memops kvm_amd videobuf2_v4l2 snd_hda_core snd_ump snd_rawmidi snd_seq_device snd_hwdep videobuf2_common kvm snd_pcm rfkill snd_timer gigabyte_wmi wmi_bmof snd pcspkr acpi_cpufreq
[13239.756840]  k10temp i2c_piix4 i2c_nvidia_gpu soundcore i2c_ccgx_ucsi thermal tiny_power_button razerkbd(OE) nvme_fabrics fuse efi_pstore configfs dmi_sysfs ip_tables x_tables hid_generic usbhid amdgpu crct10dif_pclmul crc32_pclmul polyval_clmulni ahci polyval_generic gf128mul libahci ghash_clmulni_intel video amdxcp libata sha512_ssse3 i2c_algo_bit drm_ttm_helper sha256_ssse3 ttm sd_mod sha1_ssse3 scsi_dh_emc drm_exec nvme gpu_sched xhci_pci xhci_pci_renesas scsi_dh_rdac drm_suballoc_helper drm_buddy scsi_dh_alua xhci_hcd sg drm_display_helper aesni_intel nvme_core crypto_simd cec cryptd scsi_mod ccp nvme_auth usbcore rc_core sp5100_tco scsi_common t10_pi wmi button vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq v4l2loopback(O) videodev mc msr efivarfs
[13239.756890] CR2: 0000000000000000
```
Comment 1 Takashi Iwai 2024-02-16 08:45:06 UTC
If the bug is reproducible, could you try the latest 6.8-rc kernel from OBS Kernel:HEAD repo?
  http://download.opensuse.org/repositories/Kernel:/HEAD/standard/

If the problem persists there, we should report to the upstream.
Comment 2 Alan Lima 2024-02-16 11:17:29 UTC
(In reply to Takashi Iwai from comment #1)
> If the bug is reproducible, could you try the latest 6.8-rc kernel from OBS
> Kernel:HEAD repo?
>   http://download.opensuse.org/repositories/Kernel:/HEAD/standard/
> 
> If the problem persists there, we should report to the upstream.

hi, i've installed the krenel-default package from this repo, however i can no longer install kernel modules as the headers are missing apprently.
reproducting the bug might take a while
Comment 3 Takashi Iwai 2024-02-16 13:49:33 UTC
(In reply to Alan Lima from comment #2)
> (In reply to Takashi Iwai from comment #1)
> > If the bug is reproducible, could you try the latest 6.8-rc kernel from OBS
> > Kernel:HEAD repo?
> >   http://download.opensuse.org/repositories/Kernel:/HEAD/standard/
> > 
> > If the problem persists there, we should report to the upstream.
> 
> hi, i've installed the krenel-default package from this repo, however i can
> no longer install kernel modules as the headers are missing apprently.
> reproducting the bug might take a while

So you're using other out-of-tree modules?  It's better to be tested without such uncertain factor, in anyway.
Comment 4 Alan Lima 2024-02-18 00:22:48 UTC
well it's been 2 days on linux 6.8 and i am no longer experiencing any freeze/crash, even with the openrazer kernel modules.
it was usually one crash per day on 6.7. now it's completely stable.
i conclude that this is a kernel bug from 6.7, that also correlates with the fact that this issue has appeared out of nowhere after a few updates.
Comment 5 Takashi Iwai 2024-02-26 16:40:55 UTC
Yeah, it looks like a regression in 6.7.x.

There is 6.7.6 release in OBS Kernel:stable repo, and you can give it a try, too.  If this is still problematic, we can report to the upstream regression tracker:
  https://docs.kernel.org/admin-guide/reporting-regressions.html
Comment 6 Alan Lima 2024-02-26 18:27:07 UTC
well nevermind my previous comment, the crash happened again a minute ago, this time it's much MUCH rarer than last time, i mean i could spend over a week with no issues at all, but the problem came back while i had a game running and a youtube video on another monitor, i left the game on pause for a break and i watched a vid, after putting it in fullscreen, the system crashed after a few minutes of playtime

[30782.924181] BUG: kernel NULL pointer dereference, address: 0000000000000000
[30782.924186] #PF: supervisor read access in kernel mode
[30782.924188] #PF: error_code(0x0000) - not-present page
[30782.924190] PGD 17e2aa067 P4D 17e2aa067 PUD 1c111f067 PMD 145afc067 PTE 0
[30782.924195] Oops: 0000 [#1] PREEMPT SMP NOPTI
[30782.924197] CPU: 10 PID: 24224 Comm: DyingLightGame_ Kdump: loaded Tainted: G           OE      6.8.0-rc4-2.g6b6d2be-default #1 openSUSE Tumbleweed (unreleased) 462adc54754d2bc7f213189ada349c0000597978
[30782.924201] Hardware name: Gigabyte Technology Co., Ltd. X570S AORUS PRO AX/X570S AORUS PRO AX, BIOS F6c 09/20/2023
[30782.924203] RIP: 0010:dcn10_set_drr+0xa0/0xf0 [amdgpu]
[30782.924451] Code: 74 e0 48 8b 80 28 01 00 00 48 85 c0 74 08 48 89 e6 e8 f4 57 a8 f5 45 85 e4 74 c7 45 85 ed 74 c2 48 8b 03 48 8b b8 f8 00 00 00 <48> 8b 07 48 8b 80 40 01 00 00 48 85 c0 74 a9 48 83 c3 08 ba 02 00
[30782.924453] RSP: 0000:ffff9c87d6dcfd08 EFLAGS: 00010002
[30782.924456] RAX: ffff8c9b333c14e8 RBX: ffff9c87d6dcfd58 RCX: 0000000000000000
[30782.924457] RDX: 0000000080010055 RSI: ffff8c9ad0e01c60 RDI: 0000000000000000
[30782.924459] RBP: ffff9c87d6dcfd48 R08: 0000000080000000 R09: ffff9c87d6dcfbc8
[30782.924460] R10: 0000000000000008 R11: 0000000000000000 R12: 000000000000045f
[30782.924462] R13: 0000000000000831 R14: ffff9c87d6dcfd60 R15: ffff8c9aca1d82a0
[30782.924463] FS:  0000000100eff6c0(0000) GS:ffff8ca1c8500000(0000) knlGS:000000007fee0000
[30782.924465] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[30782.924466] CR2: 0000000000000000 CR3: 00000002716fe000 CR4: 0000000000750ef0
[30782.924468] PKRU: 55555554
[30782.924469] Call Trace:
[30782.924473]  <TASK>
[30782.924476]  ? __die+0x23/0x70
[30782.924480]  ? page_fault_oops+0x14d/0x490
[30782.924484]  ? srso_alias_return_thunk+0x5/0xfbef5
[30782.924487]  ? generic_reg_set_ex+0xa1/0xe0 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb]
[30782.924667]  ? exc_page_fault+0x71/0x160
[30782.924670]  ? asm_exc_page_fault+0x26/0x30
[30782.924676]  ? dcn10_set_drr+0xa0/0xf0 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb]
[30782.924873]  ? dcn10_set_drr+0x8c/0xf0 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb]
[30782.925070]  dc_stream_adjust_vmin_vmax+0xaa/0xd0 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb]
[30782.925248]  dm_crtc_high_irq+0x231/0x2b0 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb]
[30782.925442]  amdgpu_dm_irq_handler+0x8e/0x1d0 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb]
[30782.925646]  amdgpu_irq_dispatch+0xbb/0x200 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb]
[30782.925859]  amdgpu_ih_process+0x83/0x100 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb]
[30782.926015]  amdgpu_irq_handler+0x23/0x60 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb]
[30782.926168]  __handle_irq_event_percpu+0x4a/0x1a0
[30782.926173]  handle_irq_event+0x38/0x80
[30782.926175]  handle_edge_irq+0x8b/0x230
[30782.926179]  __common_interrupt+0x3f/0xa0
[30782.926182]  common_interrupt+0x43/0xa0
[30782.926185]  asm_common_interrupt+0x26/0x40
[30782.926188] RIP: 0033:0x6ffff4d468c7
[30782.926211] Code: 8b 86 90 00 00 00 48 ba ff ff ff ff ff ff 00 00 49 8b d8 49 8b c8 48 23 da 48 c1 e9 30 4c 8d 63 08 66 90 48 8b 3b f6 47 37 08 <0f> 84 aa 00 00 00 66 c1 e9 08 4d 8d be 98 00 00 00 4c 23 c2 84 c9
[30782.926213] RSP: 002b:000000000708f610 EFLAGS: 00000246
[30782.926215] RAX: 0000000051a1bf18 RBX: 0000000051a16e80 RCX: 000000000000006b
[30782.926217] RDX: 0000ffffffffffff RSI: 00006ffff50d7248 RDI: 00000000655e32e0
[30782.926218] RBP: 000000000708f710 R08: 006b0000519cb280 R09: 0000000000000001
[30782.926220] R10: 0000000000000001 R11: 00006ffffdc11b7f R12: 0000000051a16e88
[30782.926221] R13: 0000000000000001 R14: 00006ffff50d7170 R15: 0000ffffffffff00
[30782.926225]  </TASK>
Comment 7 Alan Lima 2024-02-26 18:39:37 UTC
(In reply to Takashi Iwai from comment #5)
> Yeah, it looks like a regression in 6.7.x.
> 
> There is 6.7.6 release in OBS Kernel:stable repo, and you can give it a try,
> too.  If this is still problematic, we can report to the upstream regression
> tracker:
>   https://docs.kernel.org/admin-guide/reporting-regressions.html

i'll try it
Comment 8 Takashi Iwai 2024-02-27 07:35:15 UTC
Also the usual place to report the amdgpu issue to the upstream is gitlab.freedesktop.org Issues drm/amd:
  https://gitlab.freedesktop.org/drm/amd/-/issues
Comment 9 Alan Lima 2024-02-27 10:04:00 UTC
(In reply to Takashi Iwai from comment #8)
> Also the usual place to report the amdgpu issue to the upstream is
> gitlab.freedesktop.org Issues drm/amd:
>   https://gitlab.freedesktop.org/drm/amd/-/issues

looks like this waqs already reported 
https://gitlab.freedesktop.org/drm/amd/-/issues/3158
https://gitlab.freedesktop.org/drm/amd/-/issues/3142
https://gitlab.freedesktop.org/drm/amd/-/issues/3149
Comment 10 Takashi Iwai 2024-05-24 17:07:54 UTC
The upstream tracker entry is still open, and seems persistent on 6.9.x kernel.

I'm building a test kernel with the workaround patch suggested in
  https://gitlab.freedesktop.org/drm/amd/-/issues/3142
It's being built in OBS home:tiwai:bsc1219983.  Once after the build finishes, the package will appear at
  http://download.opensuse.org/repositories/home:/tiwai:/bsc1219983/standard/

Please give it a try later.
Comment 11 Alan Lima 2024-05-24 17:55:00 UTC
i have been using the regular kernel from the official repos for a few months and i'm no longer able to reproduce the bug.
Comment 12 Takashi Iwai 2024-05-24 18:23:22 UTC
Ah, then it's maybe a different issue the upstream tracker hitting.  Let's close this entry.
Comment 13 Alan Lima 2024-06-01 20:18:12 UTC
(In reply to Takashi Iwai from comment #12)
> Ah, then it's maybe a different issue the upstream tracker hitting.  Let's
> close this entry.

well i've installed the latest update and the bug reappeared, so i suppose this issue shall be re-opened ?
Comment 14 Alan Lima 2024-06-08 14:32:03 UTC
[35555.910532] [      C8] BUG: kernel NULL pointer dereference, address: 0000000000000000
[35555.910539] [      C8] #PF: supervisor read access in kernel mode
[35555.910541] [      C8] #PF: error_code(0x0000) - not-present page
[35555.910543] [      C8] PGD 11fb64067 P4D 11fb64067 PUD 1041d0067 PMD 0 
[35555.910547] [      C8] Oops: 0000 [#1] PREEMPT SMP NOPTI
[35555.910550] [      C8] CPU: 8 PID: 4058 Comm: UnityGfxDeviceW Kdump: loaded Tainted: G           OE      6.9.1-1-default #1 openSUSE Tumbleweed c5471a56f12c40709b95530f47f6c0b39e75f136
[35555.910554] [      C8] Hardware name: Gigabyte Technology Co., Ltd. X570S AORUS PRO AX/X570S AORUS PRO AX, BIOS F6c 09/20/2023
[35555.910556] [      C8] RIP: 0010:dcn10_set_drr+0xa0/0xf0 [amdgpu]
[35555.910798] [      C8] Code: 74 e0 48 8b 80 28 01 00 00 48 85 c0 74 08 48 89 e6 e8 54 64 10 c9 45 85 e4 74 c7 45 85 ed 74 c2 48 8b 03 48 8b b8 f8 00 00 00 <48> 8b 07 48 8b 80 40 01 00 00 48 85 c0 74 a9 48 83 c3 08 ba 02 00
[35555.910800] [      C8] RSP: 0000:ffffba29d0047ce0 EFLAGS: 00210002
[35555.910802] [      C8] RAX: ffffa0ee692002d8 RBX: ffffba29d0047d30 RCX: 0000000000000000
[35555.910804] [      C8] RDX: 0000000080010015 RSI: ffffa0ecf1b64e00 RDI: 0000000000000000
[35555.910806] [      C8] RBP: ffffba29d0047d20 R08: 0000000080000000 R09: ffffba29d0047ba0
[35555.910807] [      C8] R10: 0000000000000008 R11: 0000000000000000 R12: 000000000000045f
[35555.910808] [      C8] R13: 0000000000000831 R14: ffffba29d0047d38 R15: ffffa0f036239480
[35555.910810] [      C8] FS:  00007f73f1c006c0(0000) GS:ffffa0f3fee00000(0000) knlGS:0000000000000000
[35555.910812] [      C8] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[35555.910814] [      C8] CR2: 0000000000000000 CR3: 0000000105540000 CR4: 0000000000750ef0
[35555.910815] [      C8] PKRU: 55555554
[35555.910817] [      C8] Call Trace:
[35555.910820] [      C8]  <TASK>
[35555.910823] [      C8]  ? __die_body.cold+0x14/0x24
[35555.910828] [      C8]  ? page_fault_oops+0x134/0x2a0
[35555.910834] [      C8]  ? exc_page_fault+0x73/0x170
[35555.910837] [      C8]  ? asm_exc_page_fault+0x26/0x30
[35555.910843] [      C8]  ? dcn10_set_drr+0xa0/0xf0 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad]
[35555.911066] [      C8]  dc_stream_adjust_vmin_vmax+0xd3/0x110 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad]
[35555.911261] [      C8]  dm_crtc_high_irq+0x231/0x2b0 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad]
[35555.911495] [      C8]  amdgpu_dm_irq_handler+0x85/0x1d0 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad]
[35555.911711] [      C8]  amdgpu_irq_dispatch+0xbb/0x200 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad]
[35555.911892] [      C8]  amdgpu_ih_process+0x83/0x100 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad]
[35555.912075] [      C8]  amdgpu_irq_handler+0x23/0x60 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad]
[35555.912248] [      C8]  __handle_irq_event_percpu+0x4a/0x190
[35555.912253] [      C8]  handle_irq_event+0x38/0x80
[35555.912255] [      C8]  handle_edge_irq+0x8b/0x230
[35555.912258] [      C8]  __common_interrupt+0x3f/0x90
[35555.912262] [      C8]  common_interrupt+0x42/0xa0
[35555.912265] [      C8]  asm_common_interrupt+0x26/0x40
[35555.912268] [      C8] RIP: 0033:0x7f7471b6c91a
[35555.912293] [      C8] Code: c5 fe 6f 0e c5 fe 6f 56 20 c5 fe 6f 5e 40 c5 fe 6f 66 60 48 83 ee 80 c5 fd 7f 0f c5 fd 7f 57 20 c5 fd 7f 5f 40 c5 fd 7f 67 60 <48> 83 ef 80 48 39 fa 77 cd c5 fe 7f 6a 60 c5 fe 7f 72 40 c5 fe 7f
[35555.912295] [      C8] RSP: 002b:00007f73f1bff7e8 EFLAGS: 00200203
[35555.912297] [      C8] RAX: 00007f735b814edc RBX: 0000000000000002 RCX: 00007f735b814edc
[35555.912298] [      C8] RDX: 00007f735b81505c RSI: 00007f7380652404 RDI: 00007f735b814f60
[35555.912300] [      C8] RBP: 00007f735b814edc R08: 0000000000000000 R09: 00007f72eeb46401
[35555.912301] [      C8] R10: 00007f73e1cdb6e0 R11: 0000000000000002 R12: 00007f7380652300
[35555.912303] [      C8] R13: 00007f73f1bffac0 R14: 0000000000000004 R15: 0000000000000200
[35555.912308] [      C8]  </TASK>
[35555.912310] [      C8] Modules linked in: binfmt_misc rfcomm snd_seq_dummy snd_hrtimer snd_seq af_packet joydev nf_nat_tftp nf_conntrack_tftp nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr cmac algif_hash algif_skcipher af_alg bnep mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 btusb btrtl mac80211 btintel btbcm btmtk libarc4 bluetooth cfg80211 ecdh_generic snd_hda_codec_realtek amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi edac_mce_amd snd_hda_intel ext4 nls_iso8859_1 snd_usb_audio snd_intel_dspcfg snd_intel_sdw_acpi nls_cp437 kvm_amd mbcache snd_usbmidi_lib vfat snd_ump fat jbd2 ledtrig_netdev snd_rawmidi uvcvideo snd_hda_codec kvm snd_hda_core videobuf2_vmalloc snd_seq_device snd_hwdep uvc videobuf2_memops gigabyte_wmi snd_pcm pcspkr wmi_bmof videobuf2_v4l2 acpi_cpufreq
[35555.912373] [      C8]  snd_timer videobuf2_common k10temp i2c_piix4 snd i2c_nvidia_gpu i2c_ccgx_ucsi soundcore rfkill igc razermouse(OE) razerkbd(OE) thermal tiny_power_button nvme_fabrics fuse loop dm_mod efi_pstore configfs nfnetlink dmi_sysfs ip_tables x_tables hid_generic usbhid amdgpu crct10dif_pclmul crc32_pclmul ahci polyval_clmulni libahci video polyval_generic amdxcp gf128mul i2c_algo_bit drm_ttm_helper libata ghash_clmulni_intel ttm drm_exec sha512_ssse3 gpu_sched sha256_ssse3 sd_mod sha1_ssse3 drm_suballoc_helper drm_buddy scsi_dh_emc nvme xhci_pci scsi_dh_rdac drm_display_helper xhci_pci_renesas scsi_dh_alua aesni_intel cec sg xhci_hcd nvme_core crypto_simd rc_core cryptd ccp sp5100_tco scsi_mod usbcore nvme_auth scsi_common t10_pi wmi button vfio_pci vfio_pci_core vfio_iommu_type1 vfio btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq v4l2loopback(O) videodev mc msr i2c_dev efivarfs
[35555.912436] [      C8] CR2: 0000000000000000


the problem re-appeared
Comment 15 Takashi Iwai 2024-06-13 13:27:31 UTC
Can you reproduce with the latest kernel in OBS Kernel:stable repo?