Bugzilla – Bug 1216871
Thinkpad P16 with Open GPU kernel modules does not resume after sleep/hibernate (BUG: kernel NULL pointer dereference, address: 0000000000000008)
Last modified: 2024-04-04 16:13:39 UTC
This is similar to #1211950, but I don't use nouveau but these packages nvidia-open-driver-G06-signed-kmp-default kernel-firmware-nvidia-gsp-G06 nvidia-open-driver-G06-signed-kmp suggested in [1]. I tried both of these [2], but although sleep worked it did not resume: sudo systemctl suspend sudo systemctl hibernate (I searched to [2] due errors reported in dmesg after trying to suspend with echo mem > /sys/power/state). suspend is ok: [ 0.000000] Linux version 6.5.9-1-default (geeko@buildhost) (gcc (SUSE Linux) 13.2.1 20230912 [revision b96e66fd4ef3e36983969fb8cdd1956f551a074b], GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.40.0.20230412-5) #1 SMP PREEMPT_DYNAMIC Wed Oct 25 10:31:37 UTC 2023 (29edc7c) [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.9-1-default root=/dev/mapper/system-root splash=silent resume=/dev/system/swap mitigations=auto quiet security=apparmor nosimplefb=1 ... [ 247.802155] NVRM nvAssertFailedNoLog: Assertion failed: 0 @ mem_list.c:293 [ 247.802161] NVRM nvAssertOkFailedNoLog: Assertion failed: Call not supported [NV_ERR_NOT_SUPPORTED] (0x00000056) returned from pRmApi->Alloc(pRmApi, pMemoryManager->hClient, pMemoryManager->hSubdevice, pHandle, hClass, &listAllocParams, sizeof(listAllocParams)) @ mem_desc.c:4790 [ 247.802163] NVRM serverFreeResourceTree: hObject 0xcaf00003 not found for client 0xc1e00003 [ 247.802164] NVRM nvAssertOkFailedNoLog: Assertion failed: Call not supported [NV_ERR_NOT_SUPPORTED] (0x00000056) returned from memdescSendMemDescToGSP(pGpu, pFbsr->pSysMemDesc, &hSysMem) @ fbsr_gm107.c:113 [ 247.802165] NVRM nvAssertOkFailedNoLog: Assertion failed: Call not supported [NV_ERR_NOT_SUPPORTED] (0x00000056) returned from _fbsrInitGsp(pGpu, pFbsr) @ fbsr_gm107.c:548 [ 248.155408] PM: suspend entry (s2idle) [ 248.167460] Filesystems sync: 0.012 seconds [ 248.386725] Freezing user space processes [ 248.387547] Freezing user space processes completed (elapsed 0.000 seconds) [ 248.387549] OOM killer disabled. [ 248.387550] Freezing remaining freezable tasks [ 248.388805] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 248.388808] printk: Suspending console(s) (use no_console_suspend to debug) [ 248.801471] ACPI: EC: interrupt blocked [ 262.602984] ACPI: EC: interrupt unblocked [ 263.174146] iwlwifi 0000:00:14.3: WRT: Invalid buffer destination [ 263.252329] nvme nvme0: 24/0/0 default/read/poll queues ... [ 263.524598] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95) [ 263.719374] OOM killer enabled. [ 263.719375] Restarting tasks ... [ 263.719441] usb 1-3: USB disconnect, device number 3 [ 263.719734] done. [ 263.719738] random: crng reseeded on system resumption [ 263.788084] PM: suspend exit ... [ 264.672684] NVRM: GPU at PCI:0000:01:00: GPU-80c21799-19c6-1198-2255-31aa55463b1e [ 264.672688] NVRM: Xid (PCI:0000:01:00): 45, pid=1781, name=modprobe, Ch 00000000 [ 264.673536] NVRM: Xid (PCI:0000:01:00): 45, pid=1781, name=modprobe, Ch 00000001 [ 264.674314] NVRM: Xid (PCI:0000:01:00): 45, pid=2788, name=Xorg.bin, Ch 00000002 [ 264.675135] NVRM: Xid (PCI:0000:01:00): 45, pid=2788, name=Xorg.bin, Ch 00000003 [ 264.676290] NVRM kbusVerifyBar2_GM107: MMUTest BAR0 window offset 0x70d000 returned garbage 0x0 [ 264.676296] NVRM nvAssertOkFailedNoLog: Assertion failed: Generic memory error [NV_ERR_MEMORY_ERROR] (0x00000072) returned from kbusVerifyBar2_HAL(pGpu, pKernelBus, NULL, NULL, 0, 0) @ kern_bus_gm107.c:457 [ 264.676299] NVRM nvAssertOkFailedNoLog: Assertion failed: Generic memory error [NV_ERR_MEMORY_ERROR] (0x00000072) returned from gpuStateLoad(pGpu, IS_GPU_GC6_STATE_EXITING(pGpu) ? GPU_STATE_FLAGS_PRESERVING | GPU_STATE_FLAGS_PM_TRANSITION | GPU_STATE_FLAGS_GC6_TRANSITION : GPU_STATE_FLAGS_PRESERVING | GPU_STATE_FLAGS_PM_TRANSITION) @ gpu_suspend.c:247 [ 264.678312] NVRM: Xid (PCI:0000:01:00): 45, pid=1781, name=modprobe, Ch 00000000 [ 264.679493] NVRM: Xid (PCI:0000:01:00): 45, pid=1781, name=modprobe, Ch 00000001 [ 264.680580] NVRM: Xid (PCI:0000:01:00): 45, pid=2788, name=Xorg.bin, Ch 00000002 [ 264.681653] NVRM: Xid (PCI:0000:01:00): 45, pid=2788, name=Xorg.bin, Ch 00000003 [ 264.690483] NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi, nv->rmapi.hClient, nv->rmapi.hSubDevice, NV2080_CTRL_CMD_INTERNAL_DISPLAY_UNIX_CONSOLE, &unixConsoleParams, sizeof(unixConsoleParams)) @ unix_console.c:105 [ 264.690775] NVRM rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d00001; hParent=0x00010001; hObject=0x00010011; hClass=0x0000c670; paramsSize=0x00000000; paramsStatus=0x00000062; status=0x00000062 [ 264.690781] nvidia-modeset: ERROR: GPU:0: Failed to initialize display engine: 0x62 (Reset required [NV_ERR_RESET_REQUIRED]) [ 264.690799] NVRM serverFreeResourceTree: hObject 0x10011 not found for client 0xc1d00001 [ 264.691195] NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi, nv->rmapi.hClient, nv->rmapi.hSubDevice, NV2080_CTRL_CMD_INTERNAL_DISPLAY_UNIX_CONSOLE, &unixConsoleParams, sizeof(unixConsoleParams)) @ unix_console.c:105 [ 264.691372] NVRM rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d00001; hParent=0x00010001; hObject=0x00010011; hClass=0x0000c670; paramsSize=0x00000000; paramsStatus=0x00000062; status=0x00000062 [ 264.691375] nvidia-modeset: ERROR: GPU:0: Failed to initialize display engine: 0x62 (Reset required [NV_ERR_RESET_REQUIRED]) [ 264.691386] NVRM serverFreeResourceTree: hObject 0x10011 not found for client 0xc1d00001 [ 264.691889] NVRM unixCallVideoBIOS: int10h(4f02, 0000) vesa call failed! (4f02, 0000) [ 264.692482] NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi, nv->rmapi.hClient, nv->rmapi.hSubDevice, NV2080_CTRL_CMD_INTERNAL_DISPLAY_POST_RESTORE, &restoreParams, sizeof(restoreParams)) @ unix_console.c:197 [ 264.721179] NVRM rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d0000b; hParent=0x01000001; hObject=0x01000012; hClass=0x0000c56f; paramsSize=0x00000168; paramsStatus=0x00000062; status=0x00000062 [ 264.721182] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_channel.c:2588 [ 264.721185] NVRM nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from _kchannelSendChannelAllocRpc(pKernelChannel, pChannelGpfifoParams, pKernelChannelGroup, bFullSriov) @ kernel_channel.c:863 But the problem is with resume: ... [ 268.636544] wlp0s20f3: authenticated [ 268.654389] wlp0s20f3: associated [ 276.745060] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 276.745071] #PF: supervisor read access in kernel mode [ 276.745074] #PF: error_code(0x0000) - not-present page [ 276.745077] PGD 0 P4D 0 [ 276.745081] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 276.745086] CPU: 6 PID: 2788 Comm: Xorg.bin Tainted: G OE 6.5.9-1-default #1 openSUSE Tumbleweed eb5faaeb0a34bed614de16eec67e50ac769ec453 [ 276.745092] Hardware name: LENOVO 21D7S22N08/21D7S22N08, BIOS N3FET36W (1.21 ) 05/31/2023 [ 276.745095] RIP: 0010:EvoIsChannelMethodPendingC3+0x22/0xc0 [nvidia_modeset] [ 276.745174] Code: 00 00 00 00 00 00 00 00 f3 0f 1e fa 41 55 89 d0 49 89 cd 41 b8 14 00 00 00 41 54 49 89 c4 55 48 89 fd 53 48 89 f3 48 83 ec 28 <8b> 56 08 48 c7 44 24 14 00 00 00 00 48 8d 4c 24 08 48 c1 e2 20 48 [ 276.745177] RSP: 0018:ffffa72880d57b30 EFLAGS: 00010286 [ 276.745182] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa72880d57b8f [ 276.745184] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff933cb01c2008 [ 276.745185] RBP: ffff933cb01c2008 R08: 0000000000000014 R09: ffffa72882ccd008 [ 276.745187] R10: ffffa72881921008 R11: 000000000003f0e0 R12: 0000000000000000 [ 276.745189] R13: ffffa72880d57b8f R14: ffffa72880d57d30 R15: 0000000000000001 [ 276.745191] FS: 00007f743ce03980(0000) GS:ffff93434f700000(0000) knlGS:0000000000000000 [ 276.745193] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 276.745195] CR2: 0000000000000008 CR3: 0000000107780000 CR4: 0000000000f50ee0 [ 276.745197] PKRU: 55555554 [ 276.745199] Call Trace: [ 276.745202] <TASK> [ 276.745205] ? __die+0x23/0x70 [ 276.745212] ? page_fault_oops+0x14d/0x490 [ 276.745217] ? EvoIsModePossibleC3+0xe1/0x5b0 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745269] ? exc_page_fault+0x71/0x160 [ 276.745275] ? asm_exc_page_fault+0x26/0x30 [ 276.745279] ? EvoIsChannelMethodPendingC3+0x22/0xc0 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745326] nvRMIdleBaseChannel+0x6b/0xf0 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745385] nvSetDispModeEvo+0x12c9/0x42f0 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745442] ? Flip+0xf0/0xf0 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745501] nvKmsIoctl+0xdc/0x220 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745557] nvkms_ioctl+0x109/0x170 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745587] nvidia_frontend_unlocked_ioctl+0x3c/0x60 [nvidia ce71fbe41fb2be9720a1b7ffb01074e41d182b8e] [ 276.745757] __x64_sys_ioctl+0x94/0xd0 [ 276.745763] do_syscall_64+0x5d/0x90 [ 276.745768] ? do_user_addr_fault+0x179/0x640 [ 276.745772] ? exit_to_user_mode_prepare+0x133/0x1f0 [ 276.745778] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 276.745781] RIP: 0033:0x7f743cd139cf [ 276.745839] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 276.745842] RSP: 002b:00007ffc7c213670 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 276.745845] RAX: ffffffffffffffda RBX: 00000000c0106d00 RCX: 00007f743cd139cf [ 276.745846] RDX: 00007ffc7c2136d0 RSI: 00000000c0106d00 RDI: 0000000000000014 [ 276.745848] RBP: 00007ffc7c2136d0 R08: 0000000000000000 R09: 0000555aa6b25490 [ 276.745849] R10: 00007ffc7c22aa40 R11: 0000000000000246 R12: 0000000000000014 [ 276.745851] R13: 00007f743c41cbc8 R14: 00007ffc7c215fd8 R15: 0000000000000003 [ 276.745854] </TASK> [ 276.745855] Modules linked in: ccm cmac algif_hash algif_skcipher af_alg af_packet joydev nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct bnep nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security btusb btrtl btbcm btintel btmtk bluetooth nfnetlink uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 ebtable_filter ebtables videodev ip6table_filter videobuf2_common ip6_tables ecdh_generic iptable_filter bpfilter qrtr nvidia_drm(OE) nvidia_modeset(OE) nvidia_uvm(OE) binfmt_misc snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device mc snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic xfs snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink [ 276.745909] soundwire_cadence snd_sof_intel_hda nls_iso8859_1 snd_sof_pci nls_cp437 snd_sof_xtensa_dsp vfat snd_sof fat snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus iwlmvm snd_soc_core intel_uncore_frequency intel_uncore_frequency_common snd_compress intel_tcc_cooling snd_pcm_dmaengine mac80211 libarc4 x86_pkg_temp_thermal intel_powerclamp snd_hda_intel coretemp snd_intel_dspcfg snd_intel_sdw_acpi kvm_intel snd_hda_codec nvidia(OE) iwlwifi spi_nor snd_hda_core iTCO_wdt think_lmi mei_wdt pmt_telemetry processor_thermal_device_pci intel_pmc_bxt kvm mei_hdcp mei_pxp mtd iTCO_vendor_support processor_thermal_device snd_hwdep intel_rapl_msr pmt_class igc irqbypass pcspkr thunderbolt cfg80211 firmware_attributes_class wmi_bmof thinkpad_acpi processor_thermal_rfim i2c_i801 snd_pcm mei_me processor_thermal_mbox spi_intel_pci ledtrig_audio processor_thermal_rapl spi_intel i2c_smbus platform_profile snd_timer mei intel_rapl_common rfkill [ 276.745959] thermal intel_vsec fan snd int3403_thermal soundcore ac int340x_thermal_zone intel_hid int3400_thermal intel_pmc_core acpi_pad sparse_keymap acpi_thermal_rel acpi_tad tiny_power_button fuse efi_pstore configfs dmi_sysfs ip_tables x_tables dm_crypt essiv authenc trusted asn1_encoder tee hid_logitech_hidpp hid_logitech_dj hid_generic usbhid crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel xhci_pci rtsx_pci_sdmmc sha512_ssse3 xhci_pci_renesas mmc_core xhci_hcd aesni_intel ucsi_acpi nvme typec_ucsi crypto_simd cryptd usbcore nvme_core roles rtsx_pci typec button video battery wmi pinctrl_alderlake serio_raw br_netfilter btrfs bridge stp llc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sd_mod t10_pi sg scsi_mod blake2b_generic libcrc32c scsi_common crc32c_intel xor msr raid6_pq dm_mirror dm_region_hash dm_log dm_mod bbswitch(O) efivarfs [ 276.746018] CR2: 0000000000000008 [ 276.746021] ---[ end trace 0000000000000000 ]--- [ 276.746023] RIP: 0010:EvoIsChannelMethodPendingC3+0x22/0xc0 [nvidia_modeset] [ 276.746071] Code: 00 00 00 00 00 00 00 00 f3 0f 1e fa 41 55 89 d0 49 89 cd 41 b8 14 00 00 00 41 54 49 89 c4 55 48 89 fd 53 48 89 f3 48 83 ec 28 <8b> 56 08 48 c7 44 24 14 00 00 00 00 48 8d 4c 24 08 48 c1 e2 20 48 [ 276.746073] RSP: 0018:ffffa72880d57b30 EFLAGS: 00010286 [ 276.746076] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa72880d57b8f [ 276.746077] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff933cb01c2008 [ 276.746079] RBP: ffff933cb01c2008 R08: 0000000000000014 R09: ffffa72882ccd008 [ 276.746080] R10: ffffa72881921008 R11: 000000000003f0e0 R12: 0000000000000000 [ 276.746082] R13: ffffa72880d57b8f R14: ffffa72880d57d30 R15: 0000000000000001 [ 276.746083] FS: 00007f743ce03980(0000) GS:ffff93434f700000(0000) knlGS:0000000000000000 [ 276.746085] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 276.746087] CR2: 0000000000000008 CR3: 0000000107780000 CR4: 0000000000f50ee0 [ 276.746089] PKRU: 55555554 [ 276.746090] note: Xorg.bin[2788] exited with irqs disabled $ rpm -qa |grep -i -e nvidia kernel-firmware-nvidia-gsp-G06-525.116.04-2.1.x86_64 kernel-firmware-nvidia-gspx-G06-535.113.01-1.1.x86_64 nvidia-compute-G06-32bit-535.129.03-15.1.x86_64 libva-nvidia-driver-0.0.9-1.10.x86_64 libnvidia-egl-wayland1-1.1.12-1.2.x86_64 nvidia-compute-G06-535.129.03-15.1.x86_64 kernel-firmware-nvidia-gsp-G06-535.54.03-1.1.x86_64 nvidia-gl-G06-535.129.03-15.1.x86_64 nvidia-gl-G06-32bit-535.129.03-15.1.x86_64 nvidia-open-driver-G06-signed-kmp-default-535.129.03_k6.5.9_1-55.2.x86_64 nvidia-video-G06-32bit-535.129.03-15.1.x86_64 nvidia-open-driver-G06-signed-kmp-default-535.113.01_k6.5.6_1-51.3.x86_64 kernel-firmware-nvidia-gspx-G06-535.129.03-11.1.x86_64 nvidia-video-G06-535.129.03-15.1.x86_64 kernel-firmware-nvidia-20231019-1.1.noarch $ lspci |grep -i vga 01:00.0 VGA compatible controller: NVIDIA Corporation GA107GLM [RTX A1000 Laptop GPU] (rev a1) $ lsgpu card0 10de:25b9 drm:/dev/dri/card0 └─renderD128 drm:/dev/dri/renderD128 [1] https://sndirsch.github.io/nvidia/2022/06/07/nvidia-opengpu.html [2] https://download.nvidia.com/XFree86/Linux-x86_64/535.129.03/README/powermanagement.html
The crash is apparently from Nvidia opengpu driver. Stefan, please inform to Nvidia.
Did this work before, i.e. it is a regression?
Also this needs to be retested with 545 driver - once packages are available. I've planned this for week after hackweek. It would also be good to test with proprietary kernel driver, i.e. replacing nvidia-open-driver-G06-signed-kmp-default packages by nvidia-driver-G06-kmp-default package.
(In reply to Stefan Dirsch from comment #2) > Did this work before, i.e. it is a regression? No. Actually suspend/hibernate never worked for me on both nvidia and nouveau (#1211950).
(In reply to Stefan Dirsch from comment #3) > Also this needs to be retested with 545 driver - once packages are > available. I've planned this for week after hackweek. Great, thanks! > It would also be good to test with proprietary kernel driver, i.e. replacing > nvidia-open-driver-G06-signed-kmp-default packages by > nvidia-driver-G06-kmp-default package. I'll do (after hackweek).
After upgrade to 545 driver both 'systemctl suspend' and 'systemctl hibernate' still don't work. 'systemctl suspend' does not sleep (the power led is still on) and I had to power off the laptop. dmesg (see dmesg.2023-11-23.systemctl-suspend.txt) shows many lines with: [42875.640166] NVRM kbusVerifyBar2_GM107: MMUTest BAR0 window offset 0x70f000 returned garbage 0x0 'systemctl hibernate' puts correctly laptop to sleep, but there is no output on display after resume and I see kernel oops in dmesg (I could still ssh to the system): [ 761.013720] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xffffffff [ 761.013728] NVRM kgspExecuteBooterLoad_TU102: failed to execute Booter Load: 0xffff [ 761.013735] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteBooterLoad_HAL(pGpu, pKernelGsp, memdescGetPhysAddr(pKernelGsp->pSRMetaDescriptor, AT_GPU,0)) @ kernel_gsp_tu102.c:1152 [ 761.016748] NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspRestorePowerMgmtState_HAL(pGpu, pKernelGsp) @ gpu_suspend.c:197 [ 761.017864] ------------[ cut here ]------------ [ 761.017866] WARNING: CPU: 0 PID: 5064 at /home/abuild/rpmbuild/BUILD/open-gpu-kernel-modules-545.29.02/obj/default/kernel-open/nvidia/nv.c:4005 nv_restore_user_channels+0x4e/0x1e0 [nvidia] [ 761.017971] Modules linked in: ccm cmac algif_hash algif_skcipher af_alg snd_usb_audio ch341 usbserial snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device af_packet nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter qrtr bnep nvidia_drm(O) nvidia_modeset(O) btusb btrtl btintel btbcm btmtk bluetooth uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc ecdh_generic joydev binfmt_misc nvidia_uvm(O) xfs snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel [ 761.017999] snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof nls_iso8859_1 snd_sof_utils snd_soc_hdac_hda nls_cp437 snd_hda_ext_core vfat snd_soc_acpi_intel_match fat snd_soc_acpi soundwire_generic_allocation soundwire_bus iwlmvm snd_soc_core snd_compress intel_uncore_frequency intel_uncore_frequency_common snd_pcm_dmaengine intel_tcc_cooling mac80211 libarc4 x86_pkg_temp_thermal intel_powerclamp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi coretemp snd_hda_codec snd_hda_core spi_nor iTCO_wdt kvm_intel iwlwifi snd_hwdep pmt_telemetry intel_pmc_bxt mtd mei_hdcp mei_wdt mei_pxp iTCO_vendor_support intel_rapl_msr pmt_class nvidia(O) kvm snd_pcm thinkpad_acpi processor_thermal_device_pci think_lmi processor_thermal_device thunderbolt igc pcspkr irqbypass firmware_attributes_class wmi_bmof cfg80211 ledtrig_audio mei_me i2c_i801 processor_thermal_rfim spi_intel_pci snd_timer platform_profile processor_thermal_mbox spi_intel processor_thermal_rapl i2c_smbus mei thermal [ 761.018025] snd intel_rapl_common intel_vsec rfkill fan soundcore int3403_thermal ac int340x_thermal_zone intel_hid int3400_thermal intel_pmc_core acpi_tad sparse_keymap acpi_thermal_rel acpi_pad tiny_power_button fuse configfs efi_pstore dmi_sysfs ip_tables x_tables dm_crypt essiv authenc trusted asn1_encoder tee hid_generic usbhid crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel xhci_pci rtsx_pci_sdmmc sha512_ssse3 xhci_pci_renesas xhci_hcd mmc_core aesni_intel ucsi_acpi nvme typec_ucsi video crypto_simd cryptd nvme_core usbcore roles rtsx_pci typec button battery wmi pinctrl_alderlake serio_raw br_netfilter btrfs bridge stp llc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sd_mod t10_pi sg scsi_mod blake2b_generic libcrc32c scsi_common crc32c_intel xor msr raid6_pq dm_mirror dm_region_hash dm_log dm_mod bbswitch(O) efivarfs [ 761.018055] CPU: 0 PID: 5064 Comm: nvidia-sleep.sh Tainted: G O 6.6.1-1-default #1 openSUSE Tumbleweed 0c6504f7d2c054731662677f280b3e0e68eca996 [ 761.018058] Hardware name: LENOVO 21D7S22N08/21D7S22N08, BIOS N3FET36W (1.21 ) 05/31/2023 [ 761.018058] RIP: 0010:nv_restore_user_channels+0x4e/0x1e0 [nvidia] [ 761.018122] Code: 24 38 06 00 00 4c 89 ef e8 bf ab 56 ce f6 43 10 01 74 73 48 89 de 31 ff e8 ef d7 0f 00 41 89 c6 85 c0 0f 84 4b 01 00 00 31 ed <0f> 0b 49 81 c4 60 07 00 00 4c 89 e7 e8 91 ab 56 ce be 01 00 00 00 [ 761.018123] RSP: 0018:ffffc90001c7bd38 EFLAGS: 00010246 [ 761.018125] RAX: 000000000000000f RBX: ffff888109660000 RCX: 0000000000000000 [ 761.018126] RDX: ffffc90001c7bcb8 RSI: 0000000000000282 RDI: ffffc90001c7bc78 [ 761.018126] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000003f0e0 [ 761.018127] R10: ffffffffc17663b0 R11: ffffffffc17663f0 R12: ffff888109660000 [ 761.018127] R13: ffff888109660638 R14: 000000000000000f R15: 0000000000000000 [ 761.018128] FS: 00007f56ef2b9580(0000) GS:ffff88884f400000(0000) knlGS:0000000000000000 [ 761.018129] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 761.018130] CR2: 000055af2d3cfc18 CR3: 00000001469fc000 CR4: 0000000000f50ef0 [ 761.018130] PKRU: 55555554 [ 761.018131] Call Trace: [ 761.018133] <TASK> [ 761.018134] ? nv_restore_user_channels+0x4e/0x1e0 [nvidia c5d1169f64f374e73bddb2a1970ccca0c527acfc] [ 761.018190] ? __warn+0x81/0x130 [ 761.018193] ? nv_restore_user_channels+0x4e/0x1e0 [nvidia c5d1169f64f374e73bddb2a1970ccca0c527acfc] [ 761.018250] ? report_bug+0x171/0x1a0 [ 761.018253] ? handle_bug+0x3c/0x80 [ 761.018255] ? exc_invalid_op+0x17/0x70 [ 761.018257] ? asm_exc_invalid_op+0x1a/0x20 [ 761.018260] ? nv_restore_user_channels+0x4e/0x1e0 [nvidia c5d1169f64f374e73bddb2a1970ccca0c527acfc] [ 761.018318] ? nv_restore_user_channels+0x41/0x1e0 [nvidia c5d1169f64f374e73bddb2a1970ccca0c527acfc] [ 761.018374] nv_set_system_power_state+0xe9/0x470 [nvidia c5d1169f64f374e73bddb2a1970ccca0c527acfc] [ 761.018432] nv_procfs_write_suspend+0xd7/0x150 [nvidia c5d1169f64f374e73bddb2a1970ccca0c527acfc] [ 761.018497] proc_reg_write+0x5a/0xa0 [ 761.018500] vfs_write+0xeb/0x3e0 [ 761.018503] ksys_write+0x67/0xe0 [ 761.018505] do_syscall_64+0x5d/0x90 [ 761.018507] ? syscall_exit_to_user_mode+0x2b/0x40 [ 761.018508] ? do_syscall_64+0x6c/0x90 [ 761.018509] ? do_syscall_64+0x6c/0x90 [ 761.018510] ? exc_page_fault+0x71/0x160 [ 761.018511] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 761.018514] RIP: 0033:0x7f56ef10afb4 [ 761.018545] Code: 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 90 90 80 3d 75 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48 [ 761.018546] RSP: 002b:00007ffc56ec1b08 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 761.018548] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f56ef10afb4 [ 761.018548] RDX: 0000000000000007 RSI: 000055cf1a7e0630 RDI: 0000000000000001 [ 761.018549] RBP: 000055cf1a7e0630 R08: 0000000000000410 R09: 0000000000000001 [ 761.018549] R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000007 [ 761.018550] R13: 00007f56ef1ec5c0 R14: 00007f56ef1e9f20 R15: 0000000000000000 [ 761.018551] </TASK> [ 761.018552] ---[ end trace 0000000000000000 ]--- $ uname -a Linux p16 6.6.1-1-default #1 SMP PREEMPT_DYNAMIC Thu Nov 9 05:27:56 UTC 2023 (1fcc265) x86_64 x86_64 x86_64 GNU/Linux $ rpm -qa |grep -i nvidia | sort kernel-firmware-nvidia-gspx-G06-545.29.02-1.1.x86_64 kernel-firmware-nvidia-20231107-1.1.noarch libnvidia-egl-wayland1-1.1.12-1.2.x86_64 libva-nvidia-driver-0.0.10-1.1.x86_64 nvidia-compute-G06-32bit-545.29.02-18.1.x86_64 nvidia-compute-G06-545.29.02-18.1.x86_64 nvidia-gl-G06-32bit-545.29.02-18.1.x86_64 nvidia-gl-G06-545.29.02-18.1.x86_64 nvidia-open-driver-G06-signed-kmp-default-545.29.02_k6.6.1_1-1.1.x86_64 nvidia-video-G06-32bit-545.29.02-18.1.x86_64 nvidia-video-G06-545.29.02-18.1.x86_64 $ for i in /sys/power/state /sys/power/mem_sleep /sys/power/disk /sys/power/image_size /sys/power/resume; do echo "== $i =="; cat $i; echo; done == /sys/power/state == freeze mem disk == /sys/power/mem_sleep == [s2idle] == /sys/power/disk == [platform] shutdown reboot suspend test_resume == /sys/power/image_size == 13347745792 == /sys/power/resume == 254:1 NOTE: I have also journalctl logs, IMHO not needed, but let me know if you want to have it.
Created attachment 870919 [details] dmesg during systemctl suspend (nvidia-open-driver-G06-signed-kmp-default - open driver)
Created attachment 870920 [details] dmesg during systemctl hibernate (nvidia-open-driver-G06-signed-kmp-default - open driver)
Thanks for checking! (In reply to Petr Vorel from comment #5) > > It would also be good to test with proprietary kernel driver, i.e. replacing > > nvidia-open-driver-G06-signed-kmp-default packages by > > nvidia-driver-G06-kmp-default package. > > I'll do (after hackweek). That would be still useful.
I tested also the proprietary driver (nvidia-driver-G06-kmp-default). 'systemctl suspend' also fails to boot. 'systemctl hibernate' behaves differently - on my system, where I run simultaneously both window manager on xorg and sway on wayland, WM was possible to use, but although I saw display on sway, I could not do anything (mouse cursor was missing, keyboard didn't wrote anything, no window was reacting). Posting also logs. $ rpm -qa |grep -i nvidia | sort kernel-firmware-nvidia-gspx-G06-545.29.02-1.1.x86_64 kernel-firmware-nvidia-20231107-1.1.noarch libnvidia-egl-wayland1-1.1.12-1.2.x86_64 libva-nvidia-driver-0.0.10-1.1.x86_64 nvidia-compute-G06-32bit-545.29.02-18.1.x86_64 nvidia-compute-G06-545.29.02-18.1.x86_64 nvidia-driver-G06-kmp-default-545.29.02_k6.6.1_1-18.2.x86_64 nvidia-gl-G06-32bit-545.29.02-18.1.x86_64 nvidia-gl-G06-545.29.02-18.1.x86_64 nvidia-video-G06-32bit-545.29.02-18.1.x86_64 nvidia-video-G06-545.29.02-18.1.x86_64
Created attachment 870937 [details] dmesg during systemctl hibernate (nvidia-driver-G06-kmp-default - proprietary driver)
Created attachment 870938 [details] dmesg during systemctl suspend (nvidia-driver-G06-kmp-default - proprietary driver)
So hibernate works with proprietary kernel driver?
(In reply to Stefan Dirsch from comment #13) > So hibernate works with proprietary kernel driver? It works on xorg, do *not* work on wayland.
(In reply to Petr Vorel from comment #14) > (In reply to Stefan Dirsch from comment #13) > > So hibernate works with proprietary kernel driver? > > It works on xorg, do *not* work on wayland. Hmm. But you said you would run Xorg and Wayland sessions at the same time?
(In reply to Stefan Dirsch from comment #15) > (In reply to Petr Vorel from comment #14) > > (In reply to Stefan Dirsch from comment #13) > > > So hibernate works with proprietary kernel driver? > > > > It works on xorg, do *not* work on wayland. > > Hmm. But you said you would run Xorg and Wayland sessions at the same time? Yes, I run Xorg (with Fluxbox legacy WM) on start (tty2, the default) and then I login to tty1 and run sway. Is that a problem?
Well, you said, that hibernate works on Xorg, but not with Wayland. But you're running both at the same time. So how can you say that hibernate works with Xorg?
Whatever. 545.29.06 is available meanwhile.
Meanwhile 550.54.14 is available.
Meanwhile 550.67 is available.
Meanwhile Petr found a solution for himself by using nouveau driver. And I understand it's hassle testing new driver versions again. So let's close this one. @mvetter Not sure why you've added yourself to the ticket. In case you suffer from the same problem and find a driver version which works for you, please feel free to update the ticket.
(In reply to Stefan Dirsch from comment #21) > Meanwhile Petr found a solution for himself by using nouveau driver. And I > understand it's hassle testing new driver versions again. So let's close > this one. > > @mvetter Not sure why you've added yourself to the ticket. In case you > suffer from the same problem and find a driver version which works for you, > please feel free to update the ticket. Thanks Stefan. I was using the same as Petr but recently reinstalled my machine and am also using nouveau now. It works better now than when I tried it the first time.