Bug 1226220

Summary: X session ends abruptly
Product: [openSUSE] openSUSE Tumbleweed Reporter: Raúl Osuna <rosuna>
Component: X11 3rd Party DriverAssignee: Stefan Dirsch <sndirsch>
Status: RESOLVED WORKSFORME QA Contact: Stefan Dirsch <sndirsch>
Severity: Normal    
Priority: P3 - Medium CC: rosuna
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE Tumbleweed   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Raúl Osuna 2024-06-12 13:22:12 UTC
My X session has crashed abruptly 3 days in a row (Monday, Tuesday, Wednesday). During the 8-9-10 hours I'm working, this has happened only once every day.

I do have the latest Nvidia drivers provided by the Tumbleweed repository. I use X11, not Wayland, in case this gives any further information. Tumbleweed is pretty updated.

I'm attaching a supportconfig, I know the rest of this report doesn't give much information.

It looks like there seems to be some problem with the latest drivers, some source: https://www.gamingonlinux.com/2024/06/you-may-want-to-avoid-nvidia-driver-550-if-youre-on-a-laptop/

Lenovo ThinkPad P15 Gen 2i

Supportconfig within Engineering internal network, shared in my Export:
https://w3.suse.de/~rosuna/supportconfig/

Let me know if someone else needs to access it in a more public place.
Happy to report it against Nvidia if you consider that (and if you tell me how).
Comment 1 Raúl Osuna 2024-06-12 13:22:50 UTC
If you're looking for timestamps in the logs, the last crash was not long before the supportconfig was taken.
Comment 2 Stefan Dirsch 2024-06-12 13:32:39 UTC
Could also be related to latest 6.9 kernel.
Comment 3 Stefan Dirsch 2024-06-13 10:42:02 UTC
grep nvidia messages.txt |grep -i error|cut -d " " -f 3-30
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 1
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 2
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 1
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 2
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 1
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 2
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_revoke_sub_ownership [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to revoke sub-ownership from NVKMS
kernel: [drm:nv_drm_master_drop [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] nv_drm_atomic_helper_disable_all failed with error code -22 !
Comment 4 Stefan Dirsch 2024-06-13 10:49:49 UTC
If this isn't a regression of the driver, I suggest to try with an older kernel < 6.9.

https://download.opensuse.org/history/
https://download.opensuse.org/history/20240523/tumbleweed/repo/oss/x86_64/

We need to figure out if it's a driver or kernel regression. Driver 550.78 is still available.
Comment 5 Raúl Osuna 2024-06-13 12:19:40 UTC
(In reply to Stefan Dirsch from comment #4)
> If this isn't a regression of the driver, I suggest to try with an older
> kernel < 6.9.
> 
> https://download.opensuse.org/history/
> https://download.opensuse.org/history/20240523/tumbleweed/repo/oss/x86_64/
> 
> We need to figure out if it's a driver or kernel regression. Driver 550.78
> is still available.

It's my workstation and I have not such an easy to go back and forth testing. Specially if I have no clue how to trigger the crash (BTW, in case it was not clear what I meant with "crash": it means, the X session dies and after a couple seconds I'm at the initial login screen of the window manager).
Today it has not crashed so far, and I'm still on the same kernel and driver version:

raul@mordor:~$ uname -r
6.9.3-1-default
raul@mordor:~$ rpm -qa|grep -i nvidia-drivers
nvidia-drivers-G06-550.90.07-23.1.x86_64

If I go to an old driver, or to an old kernel, how long do I need to stay there to consider it "not crashing"?
Comment 6 Raúl Osuna 2024-06-13 12:20:05 UTC
Removing needinfo till I really know what/how/when to test.
Comment 7 Stefan Dirsch 2024-06-13 12:28:24 UTC
Thanks. Understood. Probably you would need to test a few days without crashes with the old kernel/driver to verify that it's a regression.
Comment 8 Raúl Osuna 2024-06-18 15:14:39 UTC
System did not crash since I opened the bug. It did not shut down properly once though, not sure whether related or not. Anyway, there's an update from 6.9.3-1 to 6.9.4-1, which I'm applying right now. Will report back if anything changes (otherwise, feel free to close the bug after a reasonable time with "worksforme" or something similar).
Comment 9 Stefan Dirsch 2024-06-18 15:52:00 UTC
Thanks for the update!
Comment 10 Stefan Dirsch 2024-07-08 11:16:16 UTC
Ok. Let's assume for now that things have improved. Closing.