Bug 1227179

Summary: Nouveau crashing kernel with NULL pointer deference
Product: [openSUSE] openSUSE Tumbleweed Reporter: David Mulder <david.mulder>
Component: Kernel:DriversAssignee: Kernel Bugs <kernel-bugs>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: david.mulder, tiwai
Version: Current   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: debug from when the lockup happens

Description David Mulder 2024-06-28 14:11:22 UTC
Created attachment 875765 [details]
debug from when the lockup happens

During a live presentation (ouch), when playing MP4 videos, the machine hard freezes. A hard reboot was required (nothing was responsive).

NAME="openSUSE Tumbleweed"
# VERSION="20240625"

I'll attach logs. The logs are truncated to the point of the last freeze (it happens twice in the logs).
Comment 1 David Mulder 2024-06-28 14:44:09 UTC
I tried playing the same MP4s outside of the presentation (using gnome Video), but I don't see the same problem. It only freezes in LibreOffice Present.
Comment 2 David Mulder 2024-06-28 14:52:27 UTC
There's a kernel backtrace at Jun 28 15:34:44, and appears to have been caused by nouveau. This maybe looks more promising.
Comment 3 David Mulder 2024-06-28 14:52:56 UTC
Jun 28 15:34:44 localhost.localdomain kernel: Call Trace:
Jun 28 15:34:44 localhost.localdomain kernel:  <TASK>
Jun 28 15:34:44 localhost.localdomain kernel:  ? __die_body.cold+0x14/0x24
Jun 28 15:34:44 localhost.localdomain kernel:  ? page_fault_oops+0x134/0x2a0
Jun 28 15:34:44 localhost.localdomain kernel:  ? exc_page_fault+0x73/0x170
Jun 28 15:34:44 localhost.localdomain kernel:  ? asm_exc_page_fault+0x26/0x30
Jun 28 15:34:44 localhost.localdomain kernel:  ? gp100_vmm_pgt_sgl+0x4a/0x160 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ? gp100_vmm_pgt_sgl+0xd8/0x160 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  nvkm_vmm_iter.isra.0+0x2f4/0x890 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ? __pfx_gp100_vmm_pgt_sgl+0x10/0x10 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ? __pfx_gp100_vmm_pgt_sgl+0x10/0x10 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  nvkm_vmm_ptes_get_map+0xb1/0xf0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ? __pfx_gp100_vmm_pgt_sgl+0x10/0x10 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  nvkm_vmm_map_locked+0x202/0x360 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  nvkm_vmm_map+0x89/0xe0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  nvkm_mem_map_sgl+0x5a/0x80 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  nvkm_uvmm_mthd+0xc25/0xe00 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ? nvkm_uvmm_mthd+0x1f9/0xe00 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ? nvkm_ioctl+0xd9/0x180 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  nvkm_ioctl+0xd9/0x180 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  nvif_object_mthd+0xa8/0x1f0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ? nvif_mmu_ctor+0x3d0/0x420 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ? nvif_object_mthd+0xbb/0x1f0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  nvif_vmm_map+0x11d/0x130 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ? nouveau_mem_host+0x108/0x1a0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  nouveau_mem_map+0x94/0xe0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  nouveau_bo_move+0x654/0x930 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ? kvmalloc_node+0x43/0xd0
Jun 28 15:34:44 localhost.localdomain kernel:  ? drm_prime_sg_to_dma_addr_array+0x5c/0xa0
Jun 28 15:34:44 localhost.localdomain kernel:  ttm_bo_handle_move_mem+0xb8/0x170 [ttm b5d04b8db497992450811abea646aff0c69751ea]
Jun 28 15:34:44 localhost.localdomain kernel:  ttm_mem_evict_first+0x2aa/0x450 [ttm b5d04b8db497992450811abea646aff0c69751ea]
Jun 28 15:34:44 localhost.localdomain kernel:  ? nouveau_vram_manager_new+0xab/0xc0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ttm_bo_mem_space+0x1e5/0x230 [ttm b5d04b8db497992450811abea646aff0c69751ea]
Jun 28 15:34:44 localhost.localdomain kernel:  ttm_bo_validate+0x6e/0x160 [ttm b5d04b8db497992450811abea646aff0c69751ea]
Jun 28 15:34:44 localhost.localdomain kernel:  ? nv50_head_atomic_check+0x3b2/0xbe0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  nouveau_bo_pin+0xbd/0x2c0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  nv50_wndw_prepare_fb+0x63/0x2d0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  ? try_wait_for_completion+0x4f/0x60
Jun 28 15:34:44 localhost.localdomain kernel:  drm_atomic_helper_prepare_planes+0x74/0x210
Jun 28 15:34:44 localhost.localdomain kernel:  nv50_disp_atomic_commit+0x8f/0x1b0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  drm_atomic_helper_page_flip+0x63/0xd0
Jun 28 15:34:44 localhost.localdomain kernel:  drm_mode_page_flip_ioctl+0x5a4/0x680
Jun 28 15:34:44 localhost.localdomain kernel:  ? __pfx_drm_mode_page_flip_ioctl+0x10/0x10
Jun 28 15:34:44 localhost.localdomain kernel:  drm_ioctl_kernel+0xaa/0x100
Jun 28 15:34:44 localhost.localdomain kernel:  drm_ioctl+0x25d/0x4c0
Jun 28 15:34:44 localhost.localdomain kernel:  ? __pfx_drm_mode_page_flip_ioctl+0x10/0x10
Jun 28 15:34:44 localhost.localdomain kernel:  ? eventfd_read+0xe2/0x210
Jun 28 15:34:44 localhost.localdomain kernel:  nouveau_drm_ioctl+0x5a/0xb0 [nouveau c57bccd0d91f54927bbdce666ce34d4dee46d7fd]
Jun 28 15:34:44 localhost.localdomain kernel:  __x64_sys_ioctl+0x94/0xd0
Jun 28 15:34:44 localhost.localdomain kernel:  do_syscall_64+0x82/0x170
Jun 28 15:34:44 localhost.localdomain kernel:  ? syscall_exit_to_user_mode+0x75/0x230
Jun 28 15:34:44 localhost.localdomain kernel:  ? do_syscall_64+0x8f/0x170
Jun 28 15:34:44 localhost.localdomain kernel:  ? syscall_exit_to_user_mode+0x75/0x230
Jun 28 15:34:44 localhost.localdomain kernel:  ? syscall_exit_to_user_mode+0x75/0x230
Jun 28 15:34:44 localhost.localdomain kernel:  ? do_syscall_64+0x8f/0x170
Jun 28 15:34:44 localhost.localdomain kernel:  ? __rseq_handle_notify_resume+0xa8/0x4d0
Jun 28 15:34:44 localhost.localdomain kernel:  ? do_syscall_64+0x8f/0x170
Jun 28 15:34:44 localhost.localdomain kernel:  ? syscall_exit_to_user_mode+0x75/0x230
Jun 28 15:34:44 localhost.localdomain kernel:  ? switch_fpu_return+0x4f/0xd0
Jun 28 15:34:44 localhost.localdomain kernel:  ? syscall_exit_to_user_mode+0x75/0x230
Jun 28 15:34:44 localhost.localdomain kernel:  ? do_syscall_64+0x8f/0x170
Jun 28 15:34:44 localhost.localdomain kernel:  ? syscall_exit_to_user_mode+0x75/0x230
Jun 28 15:34:44 localhost.localdomain kernel:  ? do_syscall_64+0x8f/0x170
Jun 28 15:34:44 localhost.localdomain kernel:  ? do_syscall_64+0x8f/0x170
Jun 28 15:34:44 localhost.localdomain kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jun 28 15:34:44 localhost.localdomain kernel: RIP: 0033:0x7f22c650f3df
Jun 28 15:34:44 localhost.localdomain kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Jun 28 15:34:44 localhost.localdomain kernel: RSP: 002b:00007f22b93fec50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jun 28 15:34:44 localhost.localdomain kernel: RAX: ffffffffffffffda RBX: 00007f229c005a10 RCX: 00007f22c650f3df
Jun 28 15:34:44 localhost.localdomain kernel: RDX: 00007f22b93fece0 RSI: 00000000c01864b0 RDI: 00000000000000c1
Jun 28 15:34:44 localhost.localdomain kernel: RBP: 00007f22b93fece0 R08: 00007f229c005b40 R09: 0000000000000077
Jun 28 15:34:44 localhost.localdomain kernel: R10: 00005613aa72bbd0 R11: 0000000000000246 R12: 00000000c01864b0
Jun 28 15:34:44 localhost.localdomain kernel: R13: 00000000000000c1 R14: 00007f229c005b40 R15: 00007f229c1da8d0
Jun 28 15:34:44 localhost.localdomain kernel:  </TASK>
Comment 4 David Mulder 2024-06-28 14:53:48 UTC
I'm re-assigning to kernel drivers, since that seems more appropriate. Feel free to bounce elsewhere if that's incorrect.
Comment 5 David Mulder 2024-07-02 14:19:41 UTC
This is also happening consistently any time I connect my laptop to my docking station (with dual monitors).

I see this right before the nouveau crash:

Jul 02 15:36:45 localhost.localdomain kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Jul 02 15:36:45 localhost.localdomain kernel: #PF: supervisor read access in kernel mode
Jul 02 15:36:45 localhost.localdomain kernel: #PF: error_code(0x0000) - not-present page
Comment 6 David Mulder 2024-07-02 20:59:15 UTC
This also appears to be effecting openSUSE Aeon.
Comment 7 David Mulder 2024-07-02 21:00:09 UTC
Right now the only work around is to blacklist nouveau.
Comment 8 David Mulder 2024-07-03 16:16:26 UTC
The bug is not present in TW 20240606.
Comment 9 Takashi Iwai 2024-07-08 11:14:02 UTC
OK, let's close now.  Feel free to reopen if you encounter again.  Thanks.
Comment 10 David Mulder 2024-07-08 14:04:51 UTC
You're confused. I'm saying I reverted to 20240606 (from a month ago), and the problem isn't present in last months build of TW. It *is* present in the current TW.
Comment 11 Takashi Iwai 2024-07-08 14:32:59 UTC
Ah OK.  Then please check the latest 6.10-rc kernel.  If the problem persists, you'd need to report the issue to the upstream devs, e.g. gitlab.freedesktop.org DRM/Nouveau issues.
Comment 12 Takashi Iwai 2024-07-08 14:33:24 UTC
FWIW, the latest 6.10-rc kernel is found in OBS Kernel:HEAD repo,
  http://download.opensuse.org/repositories/Kernel:/HEAD/standard/
Comment 13 David Mulder 2024-07-09 14:56:20 UTC
(In reply to Takashi Iwai from comment #12)
> FWIW, the latest 6.10-rc kernel is found in OBS Kernel:HEAD repo,
>   http://download.opensuse.org/repositories/Kernel:/HEAD/standard/

The latest kernel fixes the Nouveau issue, but breaks networking completely (2 different wifi controllers and even ethernet were dead).
Comment 14 Takashi Iwai 2024-07-09 15:20:55 UTC
OK, then you'd need to open another bug entry sooner or later :)
This bug will be closed once when TW switching to 6.10 kernel in the next week or so.
Comment 15 Takashi Iwai 2024-07-15 13:19:05 UTC
As TW is moving to 6.10 now, this entry is closed as fixed.
Feel free to reopen if you encounter the same problem again with 6.10.x kernel.