Bugzilla – Bug 1227179
Nouveau crashing kernel with NULL pointer deference
Last modified: 2024-07-15 13:19:05 UTC
Created attachment 875765 [details] debug from when the lockup happens During a live presentation (ouch), when playing MP4 videos, the machine hard freezes. A hard reboot was required (nothing was responsive). NAME="openSUSE Tumbleweed" # VERSION="20240625" I'll attach logs. The logs are truncated to the point of the last freeze (it happens twice in the logs).
I tried playing the same MP4s outside of the presentation (using gnome Video), but I don't see the same problem. It only freezes in LibreOffice Present.
There's a kernel backtrace at Jun 28 15:34:44, and appears to have been caused by nouveau. This maybe looks more promising.
Jun 28 15:34:44 localhost.localdomain kernel: Call Trace: Jun 28 15:34:44 localhost.localdomain kernel: <TASK> Jun 28 15:34:44 localhost.localdomain kernel: ? __die_body.cold+0x14/0x24 Jun 28 15:34:44 localhost.localdomain kernel: ? page_fault_oops+0x134/0x2a0 Jun 28 15:34:44 localhost.localdomain kernel: ? exc_page_fault+0x73/0x170 Jun 28 15:34:44 localhost.localdomain kernel: ? asm_exc_page_fault+0x26/0x30 Jun 28 15:34:44 localhost.localdomain kernel: ? gp100_vmm_pgt_sgl+0x4a/0x160 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ? gp100_vmm_pgt_sgl+0xd8/0x160 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: nvkm_vmm_iter.isra.0+0x2f4/0x890 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ? __pfx_gp100_vmm_pgt_sgl+0x10/0x10 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ? __pfx_gp100_vmm_pgt_sgl+0x10/0x10 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: nvkm_vmm_ptes_get_map+0xb1/0xf0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ? __pfx_gp100_vmm_pgt_sgl+0x10/0x10 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: nvkm_vmm_map_locked+0x202/0x360 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: nvkm_vmm_map+0x89/0xe0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: nvkm_mem_map_sgl+0x5a/0x80 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: nvkm_uvmm_mthd+0xc25/0xe00 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ? nvkm_uvmm_mthd+0x1f9/0xe00 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ? nvkm_ioctl+0xd9/0x180 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: nvkm_ioctl+0xd9/0x180 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: nvif_object_mthd+0xa8/0x1f0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ? nvif_mmu_ctor+0x3d0/0x420 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ? nvif_object_mthd+0xbb/0x1f0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: nvif_vmm_map+0x11d/0x130 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ? nouveau_mem_host+0x108/0x1a0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: nouveau_mem_map+0x94/0xe0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: nouveau_bo_move+0x654/0x930 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ? kvmalloc_node+0x43/0xd0 Jun 28 15:34:44 localhost.localdomain kernel: ? drm_prime_sg_to_dma_addr_array+0x5c/0xa0 Jun 28 15:34:44 localhost.localdomain kernel: ttm_bo_handle_move_mem+0xb8/0x170 [ttm b5d04b8db497992450811abea646aff0c69751ea] Jun 28 15:34:44 localhost.localdomain kernel: ttm_mem_evict_first+0x2aa/0x450 [ttm b5d04b8db497992450811abea646aff0c69751ea] Jun 28 15:34:44 localhost.localdomain kernel: ? nouveau_vram_manager_new+0xab/0xc0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ttm_bo_mem_space+0x1e5/0x230 [ttm b5d04b8db497992450811abea646aff0c69751ea] Jun 28 15:34:44 localhost.localdomain kernel: ttm_bo_validate+0x6e/0x160 [ttm b5d04b8db497992450811abea646aff0c69751ea] Jun 28 15:34:44 localhost.localdomain kernel: ? nv50_head_atomic_check+0x3b2/0xbe0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: nouveau_bo_pin+0xbd/0x2c0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: nv50_wndw_prepare_fb+0x63/0x2d0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: ? try_wait_for_completion+0x4f/0x60 Jun 28 15:34:44 localhost.localdomain kernel: drm_atomic_helper_prepare_planes+0x74/0x210 Jun 28 15:34:44 localhost.localdomain kernel: nv50_disp_atomic_commit+0x8f/0x1b0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: drm_atomic_helper_page_flip+0x63/0xd0 Jun 28 15:34:44 localhost.localdomain kernel: drm_mode_page_flip_ioctl+0x5a4/0x680 Jun 28 15:34:44 localhost.localdomain kernel: ? __pfx_drm_mode_page_flip_ioctl+0x10/0x10 Jun 28 15:34:44 localhost.localdomain kernel: drm_ioctl_kernel+0xaa/0x100 Jun 28 15:34:44 localhost.localdomain kernel: drm_ioctl+0x25d/0x4c0 Jun 28 15:34:44 localhost.localdomain kernel: ? __pfx_drm_mode_page_flip_ioctl+0x10/0x10 Jun 28 15:34:44 localhost.localdomain kernel: ? eventfd_read+0xe2/0x210 Jun 28 15:34:44 localhost.localdomain kernel: nouveau_drm_ioctl+0x5a/0xb0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd] Jun 28 15:34:44 localhost.localdomain kernel: __x64_sys_ioctl+0x94/0xd0 Jun 28 15:34:44 localhost.localdomain kernel: do_syscall_64+0x82/0x170 Jun 28 15:34:44 localhost.localdomain kernel: ? syscall_exit_to_user_mode+0x75/0x230 Jun 28 15:34:44 localhost.localdomain kernel: ? do_syscall_64+0x8f/0x170 Jun 28 15:34:44 localhost.localdomain kernel: ? syscall_exit_to_user_mode+0x75/0x230 Jun 28 15:34:44 localhost.localdomain kernel: ? syscall_exit_to_user_mode+0x75/0x230 Jun 28 15:34:44 localhost.localdomain kernel: ? do_syscall_64+0x8f/0x170 Jun 28 15:34:44 localhost.localdomain kernel: ? __rseq_handle_notify_resume+0xa8/0x4d0 Jun 28 15:34:44 localhost.localdomain kernel: ? do_syscall_64+0x8f/0x170 Jun 28 15:34:44 localhost.localdomain kernel: ? syscall_exit_to_user_mode+0x75/0x230 Jun 28 15:34:44 localhost.localdomain kernel: ? switch_fpu_return+0x4f/0xd0 Jun 28 15:34:44 localhost.localdomain kernel: ? syscall_exit_to_user_mode+0x75/0x230 Jun 28 15:34:44 localhost.localdomain kernel: ? do_syscall_64+0x8f/0x170 Jun 28 15:34:44 localhost.localdomain kernel: ? syscall_exit_to_user_mode+0x75/0x230 Jun 28 15:34:44 localhost.localdomain kernel: ? do_syscall_64+0x8f/0x170 Jun 28 15:34:44 localhost.localdomain kernel: ? do_syscall_64+0x8f/0x170 Jun 28 15:34:44 localhost.localdomain kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e Jun 28 15:34:44 localhost.localdomain kernel: RIP: 0033:0x7f22c650f3df Jun 28 15:34:44 localhost.localdomain kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 Jun 28 15:34:44 localhost.localdomain kernel: RSP: 002b:00007f22b93fec50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Jun 28 15:34:44 localhost.localdomain kernel: RAX: ffffffffffffffda RBX: 00007f229c005a10 RCX: 00007f22c650f3df Jun 28 15:34:44 localhost.localdomain kernel: RDX: 00007f22b93fece0 RSI: 00000000c01864b0 RDI: 00000000000000c1 Jun 28 15:34:44 localhost.localdomain kernel: RBP: 00007f22b93fece0 R08: 00007f229c005b40 R09: 0000000000000077 Jun 28 15:34:44 localhost.localdomain kernel: R10: 00005613aa72bbd0 R11: 0000000000000246 R12: 00000000c01864b0 Jun 28 15:34:44 localhost.localdomain kernel: R13: 00000000000000c1 R14: 00007f229c005b40 R15: 00007f229c1da8d0 Jun 28 15:34:44 localhost.localdomain kernel: </TASK>
I'm re-assigning to kernel drivers, since that seems more appropriate. Feel free to bounce elsewhere if that's incorrect.
This is also happening consistently any time I connect my laptop to my docking station (with dual monitors). I see this right before the nouveau crash: Jul 02 15:36:45 localhost.localdomain kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Jul 02 15:36:45 localhost.localdomain kernel: #PF: supervisor read access in kernel mode Jul 02 15:36:45 localhost.localdomain kernel: #PF: error_code(0x0000) - not-present page
This also appears to be effecting openSUSE Aeon.
Right now the only work around is to blacklist nouveau.
The bug is not present in TW 20240606.
OK, let's close now. Feel free to reopen if you encounter again. Thanks.
You're confused. I'm saying I reverted to 20240606 (from a month ago), and the problem isn't present in last months build of TW. It *is* present in the current TW.
Ah OK. Then please check the latest 6.10-rc kernel. If the problem persists, you'd need to report the issue to the upstream devs, e.g. gitlab.freedesktop.org DRM/Nouveau issues.
FWIW, the latest 6.10-rc kernel is found in OBS Kernel:HEAD repo, http://download.opensuse.org/repositories/Kernel:/HEAD/standard/
(In reply to Takashi Iwai from comment #12) > FWIW, the latest 6.10-rc kernel is found in OBS Kernel:HEAD repo, > http://download.opensuse.org/repositories/Kernel:/HEAD/standard/ The latest kernel fixes the Nouveau issue, but breaks networking completely (2 different wifi controllers and even ethernet were dead).
OK, then you'd need to open another bug entry sooner or later :) This bug will be closed once when TW switching to 6.10 kernel in the next week or so.
As TW is moving to 6.10 now, this entry is closed as fixed. Feel free to reopen if you encounter the same problem again with 6.10.x kernel.