|
Bugzilla – Full Text Bug Listing |
| Summary: | amdgpu : "kernel NULL pointer dereference" kernel 6.7.4-1 | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | Alan Lima <alanemmanuel5> |
| Component: | Kernel:Drivers | Assignee: | Kernel Bugs <kernel-bugs> |
| Status: | REOPENED --- | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | alanemmanuel5, tiwai |
| Version: | Current | Flags: | tiwai:
needinfo?
(alanemmanuel5) |
| Target Milestone: | --- | ||
| Hardware: | 64bit | ||
| OS: | openSUSE Tumbleweed | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Alan Lima
2024-02-15 19:27:30 UTC
If the bug is reproducible, could you try the latest 6.8-rc kernel from OBS Kernel:HEAD repo? http://download.opensuse.org/repositories/Kernel:/HEAD/standard/ If the problem persists there, we should report to the upstream. (In reply to Takashi Iwai from comment #1) > If the bug is reproducible, could you try the latest 6.8-rc kernel from OBS > Kernel:HEAD repo? > http://download.opensuse.org/repositories/Kernel:/HEAD/standard/ > > If the problem persists there, we should report to the upstream. hi, i've installed the krenel-default package from this repo, however i can no longer install kernel modules as the headers are missing apprently. reproducting the bug might take a while (In reply to Alan Lima from comment #2) > (In reply to Takashi Iwai from comment #1) > > If the bug is reproducible, could you try the latest 6.8-rc kernel from OBS > > Kernel:HEAD repo? > > http://download.opensuse.org/repositories/Kernel:/HEAD/standard/ > > > > If the problem persists there, we should report to the upstream. > > hi, i've installed the krenel-default package from this repo, however i can > no longer install kernel modules as the headers are missing apprently. > reproducting the bug might take a while So you're using other out-of-tree modules? It's better to be tested without such uncertain factor, in anyway. well it's been 2 days on linux 6.8 and i am no longer experiencing any freeze/crash, even with the openrazer kernel modules. it was usually one crash per day on 6.7. now it's completely stable. i conclude that this is a kernel bug from 6.7, that also correlates with the fact that this issue has appeared out of nowhere after a few updates. Yeah, it looks like a regression in 6.7.x. There is 6.7.6 release in OBS Kernel:stable repo, and you can give it a try, too. If this is still problematic, we can report to the upstream regression tracker: https://docs.kernel.org/admin-guide/reporting-regressions.html well nevermind my previous comment, the crash happened again a minute ago, this time it's much MUCH rarer than last time, i mean i could spend over a week with no issues at all, but the problem came back while i had a game running and a youtube video on another monitor, i left the game on pause for a break and i watched a vid, after putting it in fullscreen, the system crashed after a few minutes of playtime [30782.924181] BUG: kernel NULL pointer dereference, address: 0000000000000000 [30782.924186] #PF: supervisor read access in kernel mode [30782.924188] #PF: error_code(0x0000) - not-present page [30782.924190] PGD 17e2aa067 P4D 17e2aa067 PUD 1c111f067 PMD 145afc067 PTE 0 [30782.924195] Oops: 0000 [#1] PREEMPT SMP NOPTI [30782.924197] CPU: 10 PID: 24224 Comm: DyingLightGame_ Kdump: loaded Tainted: G OE 6.8.0-rc4-2.g6b6d2be-default #1 openSUSE Tumbleweed (unreleased) 462adc54754d2bc7f213189ada349c0000597978 [30782.924201] Hardware name: Gigabyte Technology Co., Ltd. X570S AORUS PRO AX/X570S AORUS PRO AX, BIOS F6c 09/20/2023 [30782.924203] RIP: 0010:dcn10_set_drr+0xa0/0xf0 [amdgpu] [30782.924451] Code: 74 e0 48 8b 80 28 01 00 00 48 85 c0 74 08 48 89 e6 e8 f4 57 a8 f5 45 85 e4 74 c7 45 85 ed 74 c2 48 8b 03 48 8b b8 f8 00 00 00 <48> 8b 07 48 8b 80 40 01 00 00 48 85 c0 74 a9 48 83 c3 08 ba 02 00 [30782.924453] RSP: 0000:ffff9c87d6dcfd08 EFLAGS: 00010002 [30782.924456] RAX: ffff8c9b333c14e8 RBX: ffff9c87d6dcfd58 RCX: 0000000000000000 [30782.924457] RDX: 0000000080010055 RSI: ffff8c9ad0e01c60 RDI: 0000000000000000 [30782.924459] RBP: ffff9c87d6dcfd48 R08: 0000000080000000 R09: ffff9c87d6dcfbc8 [30782.924460] R10: 0000000000000008 R11: 0000000000000000 R12: 000000000000045f [30782.924462] R13: 0000000000000831 R14: ffff9c87d6dcfd60 R15: ffff8c9aca1d82a0 [30782.924463] FS: 0000000100eff6c0(0000) GS:ffff8ca1c8500000(0000) knlGS:000000007fee0000 [30782.924465] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [30782.924466] CR2: 0000000000000000 CR3: 00000002716fe000 CR4: 0000000000750ef0 [30782.924468] PKRU: 55555554 [30782.924469] Call Trace: [30782.924473] <TASK> [30782.924476] ? __die+0x23/0x70 [30782.924480] ? page_fault_oops+0x14d/0x490 [30782.924484] ? srso_alias_return_thunk+0x5/0xfbef5 [30782.924487] ? generic_reg_set_ex+0xa1/0xe0 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb] [30782.924667] ? exc_page_fault+0x71/0x160 [30782.924670] ? asm_exc_page_fault+0x26/0x30 [30782.924676] ? dcn10_set_drr+0xa0/0xf0 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb] [30782.924873] ? dcn10_set_drr+0x8c/0xf0 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb] [30782.925070] dc_stream_adjust_vmin_vmax+0xaa/0xd0 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb] [30782.925248] dm_crtc_high_irq+0x231/0x2b0 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb] [30782.925442] amdgpu_dm_irq_handler+0x8e/0x1d0 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb] [30782.925646] amdgpu_irq_dispatch+0xbb/0x200 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb] [30782.925859] amdgpu_ih_process+0x83/0x100 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb] [30782.926015] amdgpu_irq_handler+0x23/0x60 [amdgpu 47c3b97be6ad2f93582f5f4d891e2749a9f5edfb] [30782.926168] __handle_irq_event_percpu+0x4a/0x1a0 [30782.926173] handle_irq_event+0x38/0x80 [30782.926175] handle_edge_irq+0x8b/0x230 [30782.926179] __common_interrupt+0x3f/0xa0 [30782.926182] common_interrupt+0x43/0xa0 [30782.926185] asm_common_interrupt+0x26/0x40 [30782.926188] RIP: 0033:0x6ffff4d468c7 [30782.926211] Code: 8b 86 90 00 00 00 48 ba ff ff ff ff ff ff 00 00 49 8b d8 49 8b c8 48 23 da 48 c1 e9 30 4c 8d 63 08 66 90 48 8b 3b f6 47 37 08 <0f> 84 aa 00 00 00 66 c1 e9 08 4d 8d be 98 00 00 00 4c 23 c2 84 c9 [30782.926213] RSP: 002b:000000000708f610 EFLAGS: 00000246 [30782.926215] RAX: 0000000051a1bf18 RBX: 0000000051a16e80 RCX: 000000000000006b [30782.926217] RDX: 0000ffffffffffff RSI: 00006ffff50d7248 RDI: 00000000655e32e0 [30782.926218] RBP: 000000000708f710 R08: 006b0000519cb280 R09: 0000000000000001 [30782.926220] R10: 0000000000000001 R11: 00006ffffdc11b7f R12: 0000000051a16e88 [30782.926221] R13: 0000000000000001 R14: 00006ffff50d7170 R15: 0000ffffffffff00 [30782.926225] </TASK> (In reply to Takashi Iwai from comment #5) > Yeah, it looks like a regression in 6.7.x. > > There is 6.7.6 release in OBS Kernel:stable repo, and you can give it a try, > too. If this is still problematic, we can report to the upstream regression > tracker: > https://docs.kernel.org/admin-guide/reporting-regressions.html i'll try it Also the usual place to report the amdgpu issue to the upstream is gitlab.freedesktop.org Issues drm/amd: https://gitlab.freedesktop.org/drm/amd/-/issues (In reply to Takashi Iwai from comment #8) > Also the usual place to report the amdgpu issue to the upstream is > gitlab.freedesktop.org Issues drm/amd: > https://gitlab.freedesktop.org/drm/amd/-/issues looks like this waqs already reported https://gitlab.freedesktop.org/drm/amd/-/issues/3158 https://gitlab.freedesktop.org/drm/amd/-/issues/3142 https://gitlab.freedesktop.org/drm/amd/-/issues/3149 The upstream tracker entry is still open, and seems persistent on 6.9.x kernel. I'm building a test kernel with the workaround patch suggested in https://gitlab.freedesktop.org/drm/amd/-/issues/3142 It's being built in OBS home:tiwai:bsc1219983. Once after the build finishes, the package will appear at http://download.opensuse.org/repositories/home:/tiwai:/bsc1219983/standard/ Please give it a try later. i have been using the regular kernel from the official repos for a few months and i'm no longer able to reproduce the bug. Ah, then it's maybe a different issue the upstream tracker hitting. Let's close this entry. (In reply to Takashi Iwai from comment #12) > Ah, then it's maybe a different issue the upstream tracker hitting. Let's > close this entry. well i've installed the latest update and the bug reappeared, so i suppose this issue shall be re-opened ? [35555.910532] [ C8] BUG: kernel NULL pointer dereference, address: 0000000000000000 [35555.910539] [ C8] #PF: supervisor read access in kernel mode [35555.910541] [ C8] #PF: error_code(0x0000) - not-present page [35555.910543] [ C8] PGD 11fb64067 P4D 11fb64067 PUD 1041d0067 PMD 0 [35555.910547] [ C8] Oops: 0000 [#1] PREEMPT SMP NOPTI [35555.910550] [ C8] CPU: 8 PID: 4058 Comm: UnityGfxDeviceW Kdump: loaded Tainted: G OE 6.9.1-1-default #1 openSUSE Tumbleweed c5471a56f12c40709b95530f47f6c0b39e75f136 [35555.910554] [ C8] Hardware name: Gigabyte Technology Co., Ltd. X570S AORUS PRO AX/X570S AORUS PRO AX, BIOS F6c 09/20/2023 [35555.910556] [ C8] RIP: 0010:dcn10_set_drr+0xa0/0xf0 [amdgpu] [35555.910798] [ C8] Code: 74 e0 48 8b 80 28 01 00 00 48 85 c0 74 08 48 89 e6 e8 54 64 10 c9 45 85 e4 74 c7 45 85 ed 74 c2 48 8b 03 48 8b b8 f8 00 00 00 <48> 8b 07 48 8b 80 40 01 00 00 48 85 c0 74 a9 48 83 c3 08 ba 02 00 [35555.910800] [ C8] RSP: 0000:ffffba29d0047ce0 EFLAGS: 00210002 [35555.910802] [ C8] RAX: ffffa0ee692002d8 RBX: ffffba29d0047d30 RCX: 0000000000000000 [35555.910804] [ C8] RDX: 0000000080010015 RSI: ffffa0ecf1b64e00 RDI: 0000000000000000 [35555.910806] [ C8] RBP: ffffba29d0047d20 R08: 0000000080000000 R09: ffffba29d0047ba0 [35555.910807] [ C8] R10: 0000000000000008 R11: 0000000000000000 R12: 000000000000045f [35555.910808] [ C8] R13: 0000000000000831 R14: ffffba29d0047d38 R15: ffffa0f036239480 [35555.910810] [ C8] FS: 00007f73f1c006c0(0000) GS:ffffa0f3fee00000(0000) knlGS:0000000000000000 [35555.910812] [ C8] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [35555.910814] [ C8] CR2: 0000000000000000 CR3: 0000000105540000 CR4: 0000000000750ef0 [35555.910815] [ C8] PKRU: 55555554 [35555.910817] [ C8] Call Trace: [35555.910820] [ C8] <TASK> [35555.910823] [ C8] ? __die_body.cold+0x14/0x24 [35555.910828] [ C8] ? page_fault_oops+0x134/0x2a0 [35555.910834] [ C8] ? exc_page_fault+0x73/0x170 [35555.910837] [ C8] ? asm_exc_page_fault+0x26/0x30 [35555.910843] [ C8] ? dcn10_set_drr+0xa0/0xf0 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad] [35555.911066] [ C8] dc_stream_adjust_vmin_vmax+0xd3/0x110 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad] [35555.911261] [ C8] dm_crtc_high_irq+0x231/0x2b0 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad] [35555.911495] [ C8] amdgpu_dm_irq_handler+0x85/0x1d0 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad] [35555.911711] [ C8] amdgpu_irq_dispatch+0xbb/0x200 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad] [35555.911892] [ C8] amdgpu_ih_process+0x83/0x100 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad] [35555.912075] [ C8] amdgpu_irq_handler+0x23/0x60 [amdgpu 862ae11de1ab090ca7bb91314cbbd73412a175ad] [35555.912248] [ C8] __handle_irq_event_percpu+0x4a/0x190 [35555.912253] [ C8] handle_irq_event+0x38/0x80 [35555.912255] [ C8] handle_edge_irq+0x8b/0x230 [35555.912258] [ C8] __common_interrupt+0x3f/0x90 [35555.912262] [ C8] common_interrupt+0x42/0xa0 [35555.912265] [ C8] asm_common_interrupt+0x26/0x40 [35555.912268] [ C8] RIP: 0033:0x7f7471b6c91a [35555.912293] [ C8] Code: c5 fe 6f 0e c5 fe 6f 56 20 c5 fe 6f 5e 40 c5 fe 6f 66 60 48 83 ee 80 c5 fd 7f 0f c5 fd 7f 57 20 c5 fd 7f 5f 40 c5 fd 7f 67 60 <48> 83 ef 80 48 39 fa 77 cd c5 fe 7f 6a 60 c5 fe 7f 72 40 c5 fe 7f [35555.912295] [ C8] RSP: 002b:00007f73f1bff7e8 EFLAGS: 00200203 [35555.912297] [ C8] RAX: 00007f735b814edc RBX: 0000000000000002 RCX: 00007f735b814edc [35555.912298] [ C8] RDX: 00007f735b81505c RSI: 00007f7380652404 RDI: 00007f735b814f60 [35555.912300] [ C8] RBP: 00007f735b814edc R08: 0000000000000000 R09: 00007f72eeb46401 [35555.912301] [ C8] R10: 00007f73e1cdb6e0 R11: 0000000000000002 R12: 00007f7380652300 [35555.912303] [ C8] R13: 00007f73f1bffac0 R14: 0000000000000004 R15: 0000000000000200 [35555.912308] [ C8] </TASK> [35555.912310] [ C8] Modules linked in: binfmt_misc rfcomm snd_seq_dummy snd_hrtimer snd_seq af_packet joydev nf_nat_tftp nf_conntrack_tftp nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr cmac algif_hash algif_skcipher af_alg bnep mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 btusb btrtl mac80211 btintel btbcm btmtk libarc4 bluetooth cfg80211 ecdh_generic snd_hda_codec_realtek amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi edac_mce_amd snd_hda_intel ext4 nls_iso8859_1 snd_usb_audio snd_intel_dspcfg snd_intel_sdw_acpi nls_cp437 kvm_amd mbcache snd_usbmidi_lib vfat snd_ump fat jbd2 ledtrig_netdev snd_rawmidi uvcvideo snd_hda_codec kvm snd_hda_core videobuf2_vmalloc snd_seq_device snd_hwdep uvc videobuf2_memops gigabyte_wmi snd_pcm pcspkr wmi_bmof videobuf2_v4l2 acpi_cpufreq [35555.912373] [ C8] snd_timer videobuf2_common k10temp i2c_piix4 snd i2c_nvidia_gpu i2c_ccgx_ucsi soundcore rfkill igc razermouse(OE) razerkbd(OE) thermal tiny_power_button nvme_fabrics fuse loop dm_mod efi_pstore configfs nfnetlink dmi_sysfs ip_tables x_tables hid_generic usbhid amdgpu crct10dif_pclmul crc32_pclmul ahci polyval_clmulni libahci video polyval_generic amdxcp gf128mul i2c_algo_bit drm_ttm_helper libata ghash_clmulni_intel ttm drm_exec sha512_ssse3 gpu_sched sha256_ssse3 sd_mod sha1_ssse3 drm_suballoc_helper drm_buddy scsi_dh_emc nvme xhci_pci scsi_dh_rdac drm_display_helper xhci_pci_renesas scsi_dh_alua aesni_intel cec sg xhci_hcd nvme_core crypto_simd rc_core cryptd ccp sp5100_tco scsi_mod usbcore nvme_auth scsi_common t10_pi wmi button vfio_pci vfio_pci_core vfio_iommu_type1 vfio btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq v4l2loopback(O) videodev mc msr i2c_dev efivarfs [35555.912436] [ C8] CR2: 0000000000000000 the problem re-appeared Can you reproduce with the latest kernel in OBS Kernel:stable repo? |