Bug 1215981

Summary: Black Screen during boot on both internal and external screen in kernel 6.5.4-1 on Thinkpad P16 (Discrete Graphics mode)
Product: [openSUSE] openSUSE Tumbleweed Reporter: Petr Vorel <petr.vorel>
Component: X11 3rd Party DriverAssignee: Stefan Dirsch <sndirsch>
Status: RESOLVED WONTFIX QA Contact: Stefan Dirsch <sndirsch>
Severity: Normal    
Priority: P3 - Medium CC: patrik.jakobsson, petr.vorel, sndirsch, tiwai, tzimmermann
Version: Current   
Target Milestone: ---   
Hardware: Other   
OS: Other   
See Also: https://bugzilla.suse.com/show_bug.cgi?id=1213693
https://bugzilla.suse.com/show_bug.cgi?id=1211950
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: dmesg of the affected system
hwinfo of the affected system
dmesg of the affected system (cmdline cleanup)
hwinfo of the affected system (cmdline cleanup)
dmesg on Hybrid Graphics mode (where GUI works, just for a reference)
hwinfo on Hybrid Graphics mode (where GUI works, just for a reference)

Description Petr Vorel 2023-10-05 15:16:50 UTC
I have similar problem to #1213693, but on newer kernel 6.5.4-1, which should contain the fix.

#1213693 was broken by commit ca62297b2085 ("drm/edid: Fix csync detailed mode parsing") in v6.4-rc1,
which was fixed by revert it in 50b6f2c82977 ("Revert "drm/edid: Fix csync detailed mode parsing"") in v6.5-rc7.

In my case I'm not able to see anything after kernel being loaded. I have Tumbleweed kernel 6.5.4-1 and 6.5.2-1.

Problem is on Thinkpad P16 with 2 GPU:
00:02.0 VGA compatible controller: Intel Corporation Alder Lake-HX GT1 [UHD Graphics 770] (rev 0c)
01:00.0 VGA compatible controller: NVIDIA Corporation GA107GLM [RTX A1000 Laptop GPU] (rev a1)

The problem is on "Discrete Graphics" (Nvidia only) mode.
"Hybrid Graphics" (Intel + Nvidia) works, but I need for external screen to use

"Discrete Graphics" as it's the only way to get external screens working
(because external output is wired only to nvidia):
i.e. on Discrete Graphics there is only Intel card being used
$ drm_info |grep -i node: -A1
Node: /dev/dri/card0
Driver: i915 (Intel Graphics) version 1.6.0 (20201103)


I tested with internal screen only and with internal screen + 2 external GPU.

I tested to disable plymouth with rd.plymouth=0 plymouth.enable=0 plymouth=0
cmdline args, also tried fbcon=map:1 also boot to runlevel 1 and 3 instead the
default. None helped.

$ rpm -qa |grep -i -e nouveau -e intel -e ^kernel
kernel-firmware-nvidia-gsp-G06-525.116.04-2.1.x86_64
kernel-firmware-nvidia-gspx-G06-535.113.01-1.1.x86_64
kernel-firmware-serial-20230829-1.1.noarch
libdrm_nouveau2-2.4.116-2.1.x86_64
intel-vaapi-driver-2.4.1-5.11.x86_64
kernel-firmware-mwifiex-20230829-1.1.noarch
xf86-video-intel-2.99.917.916_g31486f40-3.6.x86_64
kernel-firmware-platform-20230829-1.1.noarch
kernel-firmware-intel-20230829-1.1.noarch
kernel-firmware-iwlwifi-20230829-1.1.noarch
kernel-firmware-all-20230829-1.1.noarch
intel-media-driver-23.3.3-1.1.x86_64
ucode-intel-20230808-1.1.x86_64
kernel-firmware-nvidia-gsp-G06-535.54.03-1.1.x86_64
kernel-firmware-amdgpu-20230829-1.1.noarch
kernel-firmware-usb-network-20230829-1.1.noarch
kernel-firmware-i915-20230829-1.1.noarch
kernel-macros-6.5.4-1.1.noarch
kernel-firmware-qcom-20230829-1.1.noarch
libvulkan_intel-23.2.0-1699.360.pm.1.x86_64
intel-gpu-tools-1.27.1-2.3.x86_64
kernel-firmware-sound-20230829-1.1.noarch
kernel-firmware-ath10k-20230829-1.1.noarch
libvdpau_nouveau-23.2.0-1699.360.pm.1.x86_64
kernel-firmware-bnx2-20230829-1.1.noarch
Mesa-dri-nouveau-23.2.0-1699.360.pm.1.x86_64
kernel-firmware-dpaa2-20230829-1.1.noarch
kernel-firmware-atheros-20230829-1.1.noarch
kernel-firmware-radeon-20230829-1.1.noarch
kernel-firmware-ueagle-20230829-1.1.noarch
kernel-firmware-brcm-20230829-1.1.noarch
kernel-firmware-chelsio-20230829-1.1.noarch
kernel-firmware-nvidia-20230829-1.1.noarch
kernel-firmware-ti-20230829-1.1.noarch
kernel-firmware-media-20230829-1.1.noarch
kernel-firmware-realtek-20230829-1.1.noarch
kernel-firmware-mellanox-20230829-1.1.noarch
libdrm_intel1-2.4.116-2.1.x86_64
kernel-firmware-network-20230829-1.1.noarch
kernel-firmware-ath11k-20230829-1.1.noarch
kernel-firmware-mediatek-20230829-1.1.noarch
kernel-firmware-bluetooth-20230829-1.1.noarch
kernel-firmware-prestera-20230829-1.1.noarch
kernel-firmware-liquidio-20230829-1.1.noarch
kernel-firmware-marvell-20230829-1.1.noarch
kernel-default-6.5.2-1.1.x86_64
kernel-firmware-nfp-20230829-1.1.noarch
kernel-default-devel-6.5.2-1.1.x86_64
kernel-devel-6.5.4-1.1.noarch
kernel-firmware-qlogic-20230829-1.1.noarch
kernel-default-devel-6.5.4-1.1.x86_64
kernel-default-6.5.4-1.1.x86_64
kernel-devel-6.5.2-1.1.noarch

$ lsmod |grep -i -e i915 -e nvidia -e nouveau
nvidia_drm             94208  0
nvidia_modeset       1794048  1 nvidia_drm
nvidia_uvm           3608576  0
i915                 4087808  5
drm_buddy              20480  1 i915
i2c_algo_bit           20480  1 i915
drm_display_helper    237568  1 i915
ttm                   102400  1 i915
cec                    90112  2 drm_display_helper,i915
nvidia               8843264  2 nvidia_uvm,nvidia_modeset
video                  77824  3 thinkpad_acpi,i915,nvidia_modeset

$ modinfo nvidia |grep -i version
version:        535.113.01
srcversion:     81566B70A70B0B19F40FD1A
vermagic:       6.5.4-1-default SMP preempt mod_unload modversions

$ cat /proc/cmdline # but I tested with others, see above
BOOT_IMAGE=/boot/vmlinuz-6.5.4-1-default root=/dev/mapper/system-root splash=silent resume=/dev/system/swap mitigations=auto quiet security=apparmor modprobe.blacklist=i915 nosimplefb=1

I use these non-factory repos:
https://download.opensuse.org/repositories/X11:/Drivers:/Video:/Redesign/openSUSE_Tumbleweed/
https://download.opensuse.org/repositories/X11:/XOrg/openSUSE_Tumbleweed/
https://download.nvidia.com/opensuse/tumbleweed
Comment 1 Patrik Jakobsson 2023-10-06 11:25:04 UTC
Can you access the system remotely? If so, please provide dmesg and hwinfo output.
Comment 2 Petr Vorel 2023-10-06 12:51:49 UTC
(In reply to Patrik Jakobsson from comment #1)
> Can you access the system remotely? If so, please provide dmesg and hwinfo
> output.

Unfortunately the system does not reply to ping. I'm able to get to working system if I switch in BIOS to "Discrete Graphics". I'm not sure if the system crashes, or network requires mn-applet to start. I'll try setup network over lan cable and setup SSH so that I can get some logs.
Comment 3 Petr Vorel 2023-10-06 21:20:18 UTC
Created attachment 869967 [details]
dmesg of the affected system
Comment 4 Petr Vorel 2023-10-06 21:21:07 UTC
Created attachment 869968 [details]
hwinfo of the affected system
Comment 5 Petr Vorel 2023-10-06 22:13:26 UTC
Created attachment 869969 [details]
dmesg of the affected system (cmdline cleanup)

I removed modprobe.blacklist=i915 nosimplefb=1 from cmdline. Obviously it did not solve problem, just to use the default cmdline.

There are some errors, not sure 

[    1.464073] BERT: [Hardware Error]: Skipped 1 error records
...
[    2.052280] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:01:00.0
[    2.052299] pci 0000:01:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[    2.052345] pci 0000:01:00.0:   device [10de:25b9] error status/mask=00100000/00000000
...
[    9.027482] sof-audio-pci-intel-tgl 0000:00:1f.3: init of i915 and HDMI codec failed
...
[   12.628660] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[   12.629139] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device

Nvidia card is visible:
$ lspci |grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GA107GLM [RTX A1000 Laptop GPU] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 2291 (rev a1)
Comment 6 Petr Vorel 2023-10-06 22:18:46 UTC
Created attachment 869970 [details]
hwinfo of the affected system (cmdline cleanup)

The main difference is that modprobe.blacklist=i915 nosimplefb=1 (previous log file) forced efi-framebuffer instead of the default simple-framebuffer and had "Generic Monitor".

But output is the same - none.
Comment 7 Petr Vorel 2023-10-06 22:20:07 UTC
Created attachment 869971 [details]
dmesg on Hybrid Graphics mode (where GUI works, just for a reference)
Comment 8 Petr Vorel 2023-10-06 22:21:25 UTC
Created attachment 869972 [details]
hwinfo on Hybrid Graphics mode (where GUI works, just for a reference)
Comment 9 Stefan Dirsch 2023-10-07 08:21:04 UTC
[   12.368440] NVRM: Open nvidia.ko is only ready for use on Data Center GPUs.
[   12.368442] NVRM: To force use of Open nvidia.ko on other GPUs, see the
[   12.368442] NVRM: 'OpenRmEnableUnsupportedGpus' kernel module parameter described
[   12.368443] NVRM: in the README.

So have you set this in modprobe.de/50-nvidia-default.conf ?
Comment 10 Petr Vorel 2023-10-09 09:01:49 UTC
(In reply to Stefan Dirsch from comment #9)
> [   12.368440] NVRM: Open nvidia.ko is only ready for use on Data Center
> GPUs.
> [   12.368442] NVRM: To force use of Open nvidia.ko on other GPUs, see the
> [   12.368442] NVRM: 'OpenRmEnableUnsupportedGpus' kernel module parameter
> described
> [   12.368443] NVRM: in the README.
> 
> So have you set this in modprobe.d/50-nvidia-default.conf ?

Yes, I remember setting OpenRmEnableUnsupportedGpus=1 in /usr/lib/modprobe.d/50-nvidia-default.conf before (it was in the SUSE internal docs for the laptop), but now I see it's not set. I suspect it was overwrite by rpm update. So I reenabled it again.

And setting it is really required:
* Both Discrete Graphics and Hybrid Graphics modes are not able to use external screens when OpenRmEnableUnsupportedGpus=1 is not set.
* Discrete Graphics mode now starts normally, I can use X11 based window managers and also Wayland based compositors (tested on sway, which is picky on nvidia proprietary drivers).

I guess we can close this bug.

Maybe we should consider to document using OpenRmEnableUnsupportedGpus=1 also somewhere in openSUSE wiki. Or ask Nvidia, which IMHO maintains /usr/lib/modprobe.d/50-nvidia-default.conf, to somehow document which GPU need this option.
Comment 11 Petr Vorel 2023-10-09 09:07:11 UTC
(In reply to Petr Vorel from comment #10)
> I guess we can close this bug.

Actually loosing whole output without OpenRmEnableUnsupportedGpus=1 is a new *feature*, maybe Nvidia driver is broken on 6.5 kernel (it should be usable, although only internal screen).

> 
> Maybe we should consider to document using OpenRmEnableUnsupportedGpus=1
> also somewhere in openSUSE wiki. Or ask Nvidia, which IMHO maintains
> /usr/lib/modprobe.d/50-nvidia-default.conf, to somehow document which GPU
> need this option.

To correct myself: 
nvidia-open-driver-G06-signed-kmp-default-535.113.01_k6.5.4_1-43.4.x86_64 which contains /usr/lib/modprobe.d/50-nvidia-default.conf is from obs://build.opensuse.org/X11:Drivers:Video. Shouldn't be the config file in /etc? Or am I suppose to put it into /etc?
Comment 12 Stefan Dirsch 2023-10-09 09:35:09 UTC
Hmm. In theory during an update a file marked as %config in RPM and edited by yourself before should not be overwritten.

  https://www.cl.cam.ac.uk/~jw35/docs/rpm_config.html

I don't think it has changed in the package itself. But you could check if there is a  .rpmsave with a timestamp of the update.
/usr/lib/modprobe.d is the new location for packaged config files. But you can overwrite things permanently on your system in /etc/modprobe.d using the same filename (IIRC).

Usage of the opengpu driver is documented:

--> https://en.opensuse.org/SDB:NVIDIA_drivers

Open GPU kernel modules versus Proprietary drivers
The following article is about installing NVIDIA's Proprietary drivers. For more information about the Open GPU kernel modules, that NVIDIA released in May 2022, read this [openSUSE Blog article][https://sndirsch.github.io/nvidia/2022/06/07/nvidia-opengpu.html].
[...]

I doubt nvidia opengpu driver ever worked without that option. It does only on computing cards without graphical output.
Comment 13 Stefan Dirsch 2023-10-09 09:37:12 UTC
> I don't think it has changed in the package itself. But you could check if there is a  .rpmsave with a timestamp of the update.

Therefore keeping NEEDINFO open ...
Comment 14 Petr Vorel 2023-10-10 17:02:35 UTC
(In reply to Stefan Dirsch from comment #12)
> Hmm. In theory during an update a file marked as %config in RPM and edited
> by yourself before should not be overwritten.
> 
>   https://www.cl.cam.ac.uk/~jw35/docs/rpm_config.html
> 
> I don't think it has changed in the package itself. But you could check if
> there is a  .rpmsave with a timestamp of the update.

Yes, there is 50-nvidia-default.conf.rpmsave with date 29th September, which is *without* "options nvidia NVreg_OpenRmEnableUnsupportedGpus=1" line (not even commented out). That also brought my suspicion that it was overwritten. Also in the file before I edited it was this line commented out (it was also after the installation before I modified it to get GPU working).

> /usr/lib/modprobe.d is the new location for packaged config files. But you
> can overwrite things permanently on your system in /etc/modprobe.d using the
> same filename (IIRC).

It's ok if I'm supposed to make this copy (I'll do). I just wanted to point out whole problem in case of any problem/bug in the package itself.

> 
> Usage of the opengpu driver is documented:
> 
> --> https://en.opensuse.org/SDB:NVIDIA_drivers
> 
> Open GPU kernel modules versus Proprietary drivers
> The following article is about installing NVIDIA's Proprietary drivers. For
> more information about the Open GPU kernel modules, that NVIDIA released in
> May 2022, read this [openSUSE Blog
> article][https://sndirsch.github.io/nvidia/2022/06/07/nvidia-opengpu.html].
> [...]

Yes, I've noticed both of them before. The blog document using this variable and I found it via the official docs. But none of them suggests to move content of /usr/lib/modprobe.d to /etc/modprobe.d (probably general approach which I should have known, but in this case it leads to a broken system).

Blog also mentions pci_ids-unsupported [1] in our packaging. I wonder if there could be automation which would on package configure checked this list and enable or disable the variable.

> I doubt nvidia opengpu driver ever worked without that option. It does only
> on computing cards without graphical output.

Interesting. This could be mentioned in the blog post.

[1] https://build.opensuse.org/package/view_file/X11:Drivers:Video:Redesign/nvidia-open-driver-G06-signed/pci_ids-unsupported
Comment 15 Stefan Dirsch 2023-10-10 18:06:30 UTC
Hmm. So that would mean the packaged file has changed (not sure why though; I'm not aware of any changes I did) and the .rpmsave is the edited one. So apparently you would have removed the line before yourself!?! But you needed to have set it. Hmm ...

I'm not happy with the situation with this option. I had the idea to make a subpackage just out of this option, i.e. just one file. Install or uninstall this package to enable the driver or not.

I'm afraid I can't enable this option by default as long as nVidia call it alpha quality for cards with display engine.

In my blog post I mention, which GPUs are supported by default and which need this option. Pretty obvious I believe.
Comment 16 Stefan Dirsch 2023-10-10 18:16:16 UTC
I just checked that 50-nvidia-default.conf of 535.104.05 and 535.113.01 is identical. So this does not explain, which such a .rpmsave file has been created.
Comment 17 Petr Vorel 2023-10-10 20:06:37 UTC
(In reply to Stefan Dirsch from comment #16)
> I just checked that 50-nvidia-default.conf of 535.104.05 and 535.113.01 is
> identical. So this does not explain, which such a .rpmsave file has been
> created.

Thanks for all info. I remember only adding this option. But maybe I really removed this option, but it would have to be some time ago, not recently. But let's expect it was my fault, I'll watch next update of the driver.

Also although I thought that I at least once before boot with Nvidia driver without NVreg_OpenRmEnableUnsupportedGpus=1, I'm not sure. Now I think it's unlike there is a regression in the driver or kernel.

Maybe we should close this bug for now, it can be reopen if problem gets back.
Comment 18 Stefan Dirsch 2023-10-10 20:18:26 UTC
Yes, I would definitely appreciate if you could watch what happens with the next update! And of course this ticket can be reopened if you run into the same situation again with the next update!
Comment 19 Petr Vorel 2023-11-14 16:15:33 UTC
After zypper dup the problem is back. Notebook is running, but no output.

I tried to run without dock station and any external screen. dmesg output is visible, last message is:
nvidia 0000:01:00:0: [drm] fb0: nvidia-drmdrmfb frame buffer device

and it asks for root to fix the problem.

And indeed /usr/lib/modprobe.d/50-nvidia-default.conf is different (- for original + for new):
-options nvidia-drm modeset=1
-options nvidia NVreg_OpenRMEnableSupporteGpus=1
+options nvidia-drm modeset=1 fbdev=1

1) Do I need to copy my config somewhere in /etc not to be overwritten?

2) Auto detection for NVreg_OpenRMEnableSupporteGpus would really help.
Comment 20 Stefan Dirsch 2023-11-15 11:26:02 UTC
Hmm ...

NVreg_OpenRMEnableSupporteGpus option is no longer needed. The support for Workstation cards is now considered beta and officially supported.

fbdev option is new and eventually enables a Linux console with the nvidia driver (and no longer breaks simpledrm on newer 6.x.y kernels).

Do things work again when you remove the fbdev option? I think you need to regenerate the initrd by running 'dracut' to make the changes effective.
Comment 21 Petr Vorel 2023-11-15 21:37:54 UTC
(In reply to Stefan Dirsch from comment #20)
> Hmm ...
> 
> NVreg_OpenRMEnableSupporteGpus option is no longer needed. The support for
> Workstation cards is now considered beta and officially supported.

Does this apply to Open GPU kernel modules or to NVIDIA's Proprietary drivers? Your comment #12 suggests it's needed for Open GPU kernel modules which I'm trying to use. Although I need to double check if I installed only Open GPU kernel modules (the open ones) and not NVIDIA's Proprietary drivers.

> 
> fbdev option is new and eventually enables a Linux console with the nvidia
> driver (and no longer breaks simpledrm on newer 6.x.y kernels).
> 
> Do things work again when you remove the fbdev option? 

OK, I'll test "options nvidia-drm modeset=1" (with removed "fbdev=1" from that line and removed "options nvidia NVreg_OpenRMEnableSupporteGpus=1"). But I remember last time "options nvidia-drm modeset=1" only didn't work (NVreg_OpenRMEnableSupporteGpus=1 was required on kernel 6.5 and kernel-firmware-nvidia-gspx-G06-535.113.01).

> I think you need to regenerate the initrd by running 'dracut' to make the changes effective.

OK, I'll try tomorrow something like:
dracut --kver $(uname -r) -f
Comment 22 Stefan Dirsch 2023-11-15 21:50:44 UTC
(In reply to Petr Vorel from comment #21)
> (In reply to Stefan Dirsch from comment #20)
> > Hmm ...
> > 
> > NVreg_OpenRMEnableSupporteGpus option is no longer needed. The support for
> > Workstation cards is now considered beta and officially supported.
> 
> Does this apply to Open GPU kernel modules or to NVIDIA's Proprietary
> drivers? Your comment #12 suggests it's needed for Open GPU kernel modules
> which I'm trying to use. Although I need to double check if I installed only
> Open GPU kernel modules (the open ones) and not NVIDIA's Proprietary drivers.

This applies to Open GPU kernel modules. Setting this option is no longer needed for Desktop GPUs since version 545.29.02.

> > fbdev option is new and eventually enables a Linux console with the nvidia
> > driver (and no longer breaks simpledrm on newer 6.x.y kernels).
> > 
> > Do things work again when you remove the fbdev option? 
> 
> OK, I'll test "options nvidia-drm modeset=1" (with removed "fbdev=1" from
> that line and removed "options nvidia NVreg_OpenRMEnableSupporteGpus=1").
> But I remember last time "options nvidia-drm modeset=1" only didn't work
> (NVreg_OpenRMEnableSupporteGpus=1 was required on kernel 6.5 and
> kernel-firmware-nvidia-gspx-G06-535.113.01).

See above.

> > I think you need to regenerate the initrd by running 'dracut' to make the changes effective.
> 
> OK, I'll try tomorrow something like:
> dracut --kver $(uname -r) -f

yes. I think this should do the job.
Comment 23 Petr Vorel 2023-11-16 14:28:18 UTC
TL;DR: Probably problem in my setup, we can probably close this. The rest is a description if you find something which I do obviously wrong or if there is something what can be improved.

I wonder how can happen that 2 driver versions can coexist together? (kernel-firmware-nvidia-gsp-G06-525.116 vs. kernel-firmware-nvidia-gspx-G06-535 and nvidia-open-driver-G06-signed-kmp-default-535 and nvidia-open-driver-G06-signed-kmp-default-545):

$ rpm -qa |grep -i nvidia | sort
kernel-firmware-nvidia-20231107-1.1.noarch
kernel-firmware-nvidia-gsp-G06-525.116.04-2.1.x86_64
kernel-firmware-nvidia-gsp-G06-535.54.03-1.1.x86_64
kernel-firmware-nvidia-gspx-G06-535.113.01-1.1.x86_64
kernel-firmware-nvidia-gspx-G06-535.129.03-1.1.x86_64
kernel-firmware-nvidia-gspx-G06-535.129.03-11.1.x86_64
kernel-firmware-nvidia-gspx-G06-535.129.03-12.1.x86_64
kernel-firmware-nvidia-gspx-G06-545.29.02-13.1.x86_64
libnvidia-egl-wayland1-1.1.12-1.2.x86_64
libva-nvidia-driver-0.0.10-1.1.x86_64
nvidia-compute-G06-32bit-535.129.03-15.1.x86_64
nvidia-compute-G06-535.129.03-15.1.x86_64
nvidia-gl-G06-32bit-535.129.03-15.1.x86_64
nvidia-gl-G06-535.129.03-15.1.x86_64
nvidia-open-driver-G06-signed-kmp-default-535.129.03_k6.6.1_1-1.2.x86_64
nvidia-open-driver-G06-signed-kmp-default-545.29.02_k6.5.9_1-57.1.x86_64
nvidia-video-G06-32bit-535.129.03-15.1.x86_64
nvidia-video-G06-535.129.03-15.1.x86_64


$ rpm -qi kernel-firmware-nvidia-gspx-G06-545.29.02-13.1.x86_64
Name        : kernel-firmware-nvidia-gspx-G06
Version     : 545.29.02
Release     : 13.1
Architecture: x86_64
Install Date: Út 14. listopadu 2023, 09:27:44
Group       : System/Kernel
Size        : 64294720
License     : GPL-2.0-only AND SUSE-Firmware AND GPL-2.0-or-later AND MIT
Signature   : RSA/SHA256, Po 13. listopadu 2023, 16:53:44, Key ID 590401a1e38fb563
Source RPM  : kernel-firmware-nvidia-gspx-G06-545.29.02-13.1.nosrc.rpm
Build Date  : Po 13. listopadu 2023, 16:53:25
Build Host  : i04-ch2a
Vendor      : obs://build.opensuse.org/X11:Drivers:Video
URL         : https://www.nvidia.com/en-us/drivers/unix/
Summary     : Kernel firmware file for open NVIDIA kernel module driver G06
Description :
This package contains the versioned kernel firmware file "gsp.bin" for
the OpenSource NVIDIA kernel module driver G06.
Distribution: X11:Drivers:Video:Redesign / openSUSE_Tumbleweed

$ rpm -qi kernel-firmware-nvidia-gspx-G06-535.129.03-1.1.x86_64
Name        : kernel-firmware-nvidia-gspx-G06
Version     : 535.129.03
Release     : 1.1
Architecture: x86_64
Install Date: Pá 10. listopadu 2023, 07:23:53
Group       : System/Kernel
Size        : 61824832
License     : GPL-2.0-only AND SUSE-Firmware AND GPL-2.0-or-later AND MIT
Signature   : RSA/SHA512, Čt 2. listopadu 2023, 20:48:50, Key ID 35a2f86e29b700a4
Source RPM  : kernel-firmware-nvidia-gspx-G06-535.129.03-1.1.nosrc.rpm
Build Date  : Čt 2. listopadu 2023, 20:48:26
Build Host  : i04-ch1b
Packager    : https://bugs.opensuse.org
Vendor      : openSUSE
URL         : https://www.nvidia.com/en-us/drivers/unix/
Summary     : Kernel firmware file for open NVIDIA kernel module driver G06
Description :
This package contains the versioned kernel firmware file "gsp.bin" for
the OpenSource NVIDIA kernel module driver G06.
Distribution: openSUSE Tumbleweed

I suppose this is due multiversion = provides:multiversion(kernel), right?

Because I see that both nvidia-open-driver devel [1] and factory [2] have the same newer version, the same applies to kernel-firmware-nvidia-gspx-G06 [3] [4] I removed obs://build.opensuse.org/X11:Drivers:Video and removed packages and install only the latest version.

After this, the default value ("options nvidia-drm modeset=1 fbdev=1" and *not* set NVreg_OpenRMEnableSupporteGpus=1) was working for xorg. After installation the still was not working even I run dracut, I needed to ssh to the system, rerun dracut and reboot to get it working. Let's assume I did something wrong, that's why I needed to rerun dracut via ssh. But sway did not work.

Removing "fbdev=1" made no difference (working xorg, broken sway).

Adding NVreg_OpenRMEnableSupporteGpus=1 is the option which breaks booting.

For sway are also needed nvidia-video-G06 (otherwise sway startup freezes) and nvidia-gl-G06 (sway startup fails) from the proprietary NVIDIA repository.

i.e. both kernel open driver
nvidia-open-driver-G06-signed-kmp-default-545.29.02_k6.6.1_1-1.1.x86_64 and GPU 
and proprietary NVIDIA OpenGL libraries are needed for sway (while this might be obvious from the block post [5] it was new for me, because sway claims "don't use nvidia proprietary").

[1] https://build.opensuse.org/package/view_file/X11:Drivers:Video:Redesign/nvidia-open-driver-G06-signed/nvidia-open-driver-G06-signed.changes?expand=1
[2] https://build.opensuse.org/package/view_file/openSUSE:Factory/nvidia-open-driver-G06-signed/nvidia-open-driver-G06-signed.changes?expand=1
[3] https://build.opensuse.org/package/view_file/X11:Drivers:Video:Redesign/kernel-firmware-nvidia-gspx-G06/kernel-firmware-nvidia-gspx-G06.changes?expand=1
[4] https://build.opensuse.org/package/view_file/openSUSE:Factory/kernel-firmware-nvidia-gspx-G06/kernel-firmware-nvidia-gspx-G06.changes?expand=1
[5] https://sndirsch.github.io/nvidia/2022/06/07/nvidia-opengpu.html
Comment 24 Stefan Dirsch 2023-11-17 10:45:26 UTC
(In reply to Petr Vorel from comment #23)
> TL;DR: Probably problem in my setup, we can probably close this. The rest is
> a description if you find something which I do obviously wrong or if there
> is something what can be improved.

Thanks for the detailed report. Very much appreciated!

> I wonder how can happen that 2 driver versions can coexist together?
> (kernel-firmware-nvidia-gsp-G06-525.116 vs.
> kernel-firmware-nvidia-gspx-G06-535 and
> nvidia-open-driver-G06-signed-kmp-default-535 and
> nvidia-open-driver-G06-signed-kmp-default-545):
> 
> $ rpm -qa |grep -i nvidia | sort
> kernel-firmware-nvidia-20231107-1.1.noarch
> kernel-firmware-nvidia-gsp-G06-525.116.04-2.1.x86_64
> kernel-firmware-nvidia-gsp-G06-535.54.03-1.1.x86_64
> kernel-firmware-nvidia-gspx-G06-535.113.01-1.1.x86_64
> kernel-firmware-nvidia-gspx-G06-535.129.03-1.1.x86_64
> kernel-firmware-nvidia-gspx-G06-535.129.03-11.1.x86_64
> kernel-firmware-nvidia-gspx-G06-535.129.03-12.1.x86_64
> kernel-firmware-nvidia-gspx-G06-545.29.02-13.1.x86_64
> libnvidia-egl-wayland1-1.1.12-1.2.x86_64
> libva-nvidia-driver-0.0.10-1.1.x86_64
> nvidia-compute-G06-32bit-535.129.03-15.1.x86_64
> nvidia-compute-G06-535.129.03-15.1.x86_64
> nvidia-gl-G06-32bit-535.129.03-15.1.x86_64
> nvidia-gl-G06-535.129.03-15.1.x86_64
> nvidia-open-driver-G06-signed-kmp-default-535.129.03_k6.6.1_1-1.2.x86_64
> nvidia-open-driver-G06-signed-kmp-default-545.29.02_k6.5.9_1-57.1.x86_64
> nvidia-video-G06-32bit-535.129.03-15.1.x86_64
> nvidia-video-G06-535.129.03-15.1.x86_64
> 
> 
> $ rpm -qi kernel-firmware-nvidia-gspx-G06-545.29.02-13.1.x86_64
> Name        : kernel-firmware-nvidia-gspx-G06
> Version     : 545.29.02
> Release     : 13.1
> Architecture: x86_64
> Install Date: Út 14. listopadu 2023, 09:27:44
> Group       : System/Kernel
> Size        : 64294720
> License     : GPL-2.0-only AND SUSE-Firmware AND GPL-2.0-or-later AND MIT
> Signature   : RSA/SHA256, Po 13. listopadu 2023, 16:53:44, Key ID
> 590401a1e38fb563
> Source RPM  : kernel-firmware-nvidia-gspx-G06-545.29.02-13.1.nosrc.rpm
> Build Date  : Po 13. listopadu 2023, 16:53:25
> Build Host  : i04-ch2a
> Vendor      : obs://build.opensuse.org/X11:Drivers:Video
> URL         : https://www.nvidia.com/en-us/drivers/unix/
> Summary     : Kernel firmware file for open NVIDIA kernel module driver G06
> Description :
> This package contains the versioned kernel firmware file "gsp.bin" for
> the OpenSource NVIDIA kernel module driver G06.
> Distribution: X11:Drivers:Video:Redesign / openSUSE_Tumbleweed
> 
> $ rpm -qi kernel-firmware-nvidia-gspx-G06-535.129.03-1.1.x86_64
> Name        : kernel-firmware-nvidia-gspx-G06
> Version     : 535.129.03
> Release     : 1.1
> Architecture: x86_64
> Install Date: Pá 10. listopadu 2023, 07:23:53
> Group       : System/Kernel
> Size        : 61824832
> License     : GPL-2.0-only AND SUSE-Firmware AND GPL-2.0-or-later AND MIT
> Signature   : RSA/SHA512, Čt 2. listopadu 2023, 20:48:50, Key ID
> 35a2f86e29b700a4
> Source RPM  : kernel-firmware-nvidia-gspx-G06-535.129.03-1.1.nosrc.rpm
> Build Date  : Čt 2. listopadu 2023, 20:48:26
> Build Host  : i04-ch1b
> Packager    : https://bugs.opensuse.org
> Vendor      : openSUSE
> URL         : https://www.nvidia.com/en-us/drivers/unix/
> Summary     : Kernel firmware file for open NVIDIA kernel module driver G06
> Description :
> This package contains the versioned kernel firmware file "gsp.bin" for
> the OpenSource NVIDIA kernel module driver G06.
> Distribution: openSUSE Tumbleweed
> 
> I suppose this is due multiversion = provides:multiversion(kernel), right?

Yes, this is exactly the reason.

> Because I see that both nvidia-open-driver devel [1] and factory [2] have
> the same newer version, the same applies to kernel-firmware-nvidia-gspx-G06
> [3] [4] I removed obs://build.opensuse.org/X11:Drivers:Video and removed
> packages and install only the latest version.

Yes, you no longer need the devel projects, since the driver+firmware is now included in our products. So better remove these.

> After this, the default value ("options nvidia-drm modeset=1 fbdev=1" and
> *not* set NVreg_OpenRMEnableSupporteGpus=1) was working for xorg. 

Thanks for confirmation.

> After
> installation the still was not working even I run dracut, I needed to ssh to
> the system, rerun dracut and reboot to get it working. Let's assume I did
> something wrong, that's why I needed to rerun dracut via ssh. But sway did
> not work.

Yeah. You need to reboot now after changing kernel modules config. You no longer can easily unload the driver when option "fbdev=1" is et which eventually added a Linux console with this driver.

> Removing "fbdev=1" made no difference (working xorg, broken sway).
> 
> Adding NVreg_OpenRMEnableSupporteGpus=1 is the option which breaks booting.

Interesting that having this option still set breaks things. I think it should be removed from the driver.
 
> For sway are also needed nvidia-video-G06 (otherwise sway startup freezes)
> and nvidia-gl-G06 (sway startup fails) from the proprietary NVIDIA
> repository.
> 
> i.e. both kernel open driver
> nvidia-open-driver-G06-signed-kmp-default-545.29.02_k6.6.1_1-1.1.x86_64 and
> GPU 
> and proprietary NVIDIA OpenGL libraries are needed for sway (while this
> might be obvious from the blog post [5] it was new for me, because sway
> claims "don't use nvidia proprietary").

Ok. Good to know this. Maybe sway just doesn't work with Mesa's software fallback driver, no matter which KMS driver is in use.
Comment 25 Stefan Dirsch 2023-11-17 10:48:38 UTC
So I'm closing this for now. Of course you can report what happens with the next update. ;-)
Comment 26 Stefan Dirsch 2023-11-22 13:19:42 UTC
(In reply to Petr Vorel from comment #23)
> Adding NVreg_OpenRMEnableSupporteGpus=1 is the option which breaks booting.

I cannot reproduce that issue. Driver 545.29.02 simply ignores this setting.

[    4.993601] nvidia: unknown parameter 'NVreg_OpenRMEnableSupporteGpus' ignored
Comment 31 Maintenance Automation 2023-12-05 12:30:02 UTC
SUSE-RU-2023:4642-1: An update that has two fixes can now be installed.

Category: recommended (moderate)
Bug References: 1215981, 1217370
Sources used:
openSUSE Leap 15.5 (src): nvidia-open-driver-G06-signed-545.29.02-150500.3.18.1
SUSE Linux Enterprise Micro 5.5 (src): nvidia-open-driver-G06-signed-545.29.02-150500.3.18.1
Basesystem Module 15-SP5 (src): nvidia-open-driver-G06-signed-545.29.02-150500.3.18.1
Public Cloud Module 15-SP5 (src): nvidia-open-driver-G06-signed-545.29.02-150500.3.18.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 32 Maintenance Automation 2023-12-05 12:36:08 UTC
SUSE-RU-2023:4641-1: An update that has two fixes can now be installed.

Category: recommended (moderate)
Bug References: 1215981, 1217370
Sources used:
openSUSE Leap 15.4 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1
SUSE Linux Enterprise Micro for Rancher 5.3 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1
SUSE Linux Enterprise Micro 5.3 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1
SUSE Linux Enterprise Micro for Rancher 5.4 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1
SUSE Linux Enterprise Micro 5.4 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1
Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1
Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 35 Petr Vorel 2023-12-15 08:53:37 UTC
I still experience black screen very often (e.g. ~ 50% of boots or resumes from boot). I guess what I reported as a configuration issue /usr/lib/modprobe.d/50-nvidia-default.conf (there probably was at least one problem with it) or with broken "systemctl suspend" is something else. It happens even I don't do any update or configuration issue. OTOH I did some updates, thus it also happened on different kernels and nvidia driver versions.

When there is a black screen there is full log of repeating messages:

[   23.262590] snd_hda_intel 0000:01:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[   23.262597] snd_hda_intel 0000:01:00.1:   device [10de:2291] error status/mask=00100000/00000000
[   23.262602] snd_hda_intel 0000:01:00.1:    [20] UnsupReq               (First)
[   23.262606] snd_hda_intel 0000:01:00.1: AER:   TLP Header: 60000008 000000ff 00000040 00840000
[   23.262613] pci 0000:01:00.0: AER: can't recover (no error_detected callback)
[   23.262615] snd_hda_intel 0000:01:00.1: AER: can't recover (no error_detected callback)
[   23.262646] pcieport 0000:00:01.0: AER: device recovery failed
[   23.349965] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:01:00.1

I already reported it in comment #5, but in dmesg #7 it was added only once. Later it become permanent (i.e. dmesg ring buffer contains only these messages). Is that a hardware error?

Documenting current state of the config files (IMHO they are correct).

$ rpm -qa |grep -i -e kernel-default -e nvidia | sort 
kernel-default-devel-6.6.2-1.1.x86_64
kernel-default-devel-6.6.3-1.1.x86_64
kernel-default-6.6.2-1.1.x86_64
kernel-default-6.6.3-1.1.x86_64
kernel-firmware-nvidia-gspx-G06-545.29.06-1.1.x86_64
kernel-firmware-nvidia-20231128-1.1.noarch
libnvidia-egl-wayland1-1.1.13-1.1.x86_64
libva-nvidia-driver-0.0.11-1.1.x86_64
nvidia-compute-G06-32bit-545.29.06-18.1.x86_64
nvidia-compute-G06-545.29.06-18.1.x86_64
nvidia-driver-G06-kmp-default-545.29.06_k6.6.2_1-18.1.x86_64
nvidia-gl-G06-32bit-545.29.06-18.1.x86_64
nvidia-gl-G06-545.29.06-18.1.x86_64
nvidia-video-G06-32bit-545.29.06-18.1.x86_64
nvidia-video-G06-545.29.06-18.1.x86_64

$ uname -a
Linux p16 6.6.3-1-default #1 SMP PREEMPT_DYNAMIC Wed Nov 29 05:06:07 UTC 2023 (d766c57) x86_64 x86_64 x86_64 GNU/Linux

$ cat /usr/lib/modprobe.d/50-nvidia-default.conf |grep -v ^#
options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=485 NVreg_DeviceFileMode=0660 NVreg_PreserveVideoMemoryAllocations=1
options nvidia-drm modeset=1 fbdev=1
install nvidia PATH=$PATH:/bin:/usr/bin; if /sbin/modprobe --ignore-install nvidia; then   if /sbin/modprobe nvidia_uvm; then     if [ ! -c /dev/nvidia-uvm ]; then       mknod -m 660 /dev/nvidia-uvm c $(cat /proc/devices | while read major device; do if [ "$device" = "nvidia-uvm" ]; then echo $major; break; fi ; done) 0;        chown :video /dev/nvidia-uvm;     fi;     if [ ! -c /dev/nvidia-uvm-tools ]; then       mknod -m 660 /dev/nvidia-uvm-tools c $(cat /proc/devices | while read major device; do if [ "$device" = "nvidia-uvm" ]; then echo $major; break; fi ; done) 1;       chown :video /dev/nvidia-uvm-tools;     fi;   fi;   if [ ! -c /dev/nvidiactl ]; then     mknod -m 660 /dev/nvidiactl c 195 255;     chown :video /dev/nvidiactl;   fi;   devid=-1;   for dev in $(ls -d /sys/bus/pci/devices/*); do      vendorid=$(cat $dev/vendor);     if [ "$vendorid" = "0x10de" ]; then       class=$(cat $dev/class);       classid=${class%%00};       if [ "$classid" = "0x0300" -o "$classid" = "0x0302" ]; then          devid=$((devid+1));         if [ ! -c /dev/nvidia${devid} ]; then            mknod -m 660 /dev/nvidia${devid} c 195 ${devid};            chown :video /dev/nvidia${devid};         fi;       fi;     fi;   done;   /sbin/modprobe nvidia_drm;   if [ ! -c /dev/nvidia-modeset ]; then     mknod -m 660 /dev/nvidia-modeset c 195 254;     chown :video /dev/nvidia-modeset;   fi; fi

$ cat /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G06.conf 
L /run/udev/static_node-tags/uaccess/nvidiactl - - - - /dev/nvidiactl
L /run/udev/static_node-tags/uaccess/nvidia-uvm - - - - /dev/nvidia-uvm
L /run/udev/static_node-tags/uaccess/nvidia-uvm-tools - - - - /dev/nvidia-uvm-tools
L /run/udev/static_node-tags/uaccess/nvidia-modeset - - - - /dev/nvidia-modeset
L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0

$ cat /usr/lib/modprobe.d/nvidia-default.conf 
blacklist nouveau

$ cat /usr/lib/dracut/dracut.conf.d/60-nvidia-default.conf 
add_drivers+=" nvidia nvidia-drm nvidia-modeset nvidia-uvm "

$ cat /usr/src/kernel-modules/nvidia-545.29.06-default/dkms.conf |grep -v ^#
PACKAGE_NAME="nvidia"
PACKAGE_VERSION="__VERSION_STRING"
AUTOINSTALL="yes"

MAKE[0]="'make' -j__JOBS NV_EXCLUDE_BUILD_MODULES='__EXCLUDE_MODULES' KERNEL_UNAME=${kernelver} modules"

__DKMS_MODULES
Comment 36 Stefan Dirsch 2024-01-06 18:52:02 UTC
Hmm, snd_hda_intel sounds like the driver for the internal Intel sound chip.

> [   23.262646] pcieport 0000:00:01.0: AER: device recovery failed
> [   23.349965] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:01:00.1

No idea. Google tells me

https://www.videogames.ai/dmesg-aer-error#:~:text=You%20can%20fix%20the%20problem,and%20disabling%20memory%20mapping%20support.&text=Just%20need%20to%20reboot%20and%20the%20error%20should%20disapear.

Maybe it's worth a try.
Comment 37 Takashi Iwai 2024-01-08 12:35:54 UTC
AER report is usually harmless, but if it happens even with a newer kernel, it's a regression and should be addressed.
(And yes, it's worth to test the boot options to see whether it suppresses or not.)
Comment 38 Maintenance Automation 2024-01-18 12:30:05 UTC
SUSE-RU-2024:0143-1: An update that has one fix can now be installed.

Category: recommended (moderate)
Bug References: 1215981
Sources used:
openSUSE Leap 15.5 (src): nvidia-open-driver-G06-signed-545.29.06-150500.3.21.5
SUSE Linux Enterprise Micro 5.5 (src): nvidia-open-driver-G06-signed-545.29.06-150500.3.21.5
Basesystem Module 15-SP5 (src): nvidia-open-driver-G06-signed-545.29.06-150500.3.21.5
Public Cloud Module 15-SP5 (src): nvidia-open-driver-G06-signed-545.29.06-150500.3.21.5

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 39 Maintenance Automation 2024-01-22 08:30:01 UTC
SUSE-RU-2024:0169-1: An update that has one fix can now be installed.

Category: recommended (moderate)
Bug References: 1215981
Sources used:
SUSE Manager Retail Branch Server 4.3 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2
SUSE Manager Server 4.3 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2
openSUSE Leap 15.4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2
SUSE Linux Enterprise Micro for Rancher 5.3 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2
SUSE Linux Enterprise Micro 5.3 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2
SUSE Linux Enterprise Micro for Rancher 5.4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2
SUSE Linux Enterprise Micro 5.4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2
Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2
SUSE Linux Enterprise High Performance Computing ESPOS 15 SP4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2
SUSE Linux Enterprise High Performance Computing LTSS 15 SP4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2
SUSE Linux Enterprise Desktop 15 SP4 LTSS 15-SP4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2
SUSE Linux Enterprise Server 15 SP4 LTSS 15-SP4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2
SUSE Linux Enterprise Server for SAP Applications 15 SP4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2
SUSE Manager Proxy 4.3 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 44 Stefan Dirsch 2024-02-09 04:49:18 UTC
(In reply to Takashi Iwai from comment #37)
> AER report is usually harmless, but if it happens even with a newer kernel,
> it's a regression and should be addressed.
> (And yes, it's worth to test the boot options to see whether it suppresses
> or not.)

So have you tried this meanwhile? Instructions in the link you posted in comment #36.
Comment 46 Stefan Dirsch 2024-03-01 00:34:29 UTC
(In reply to Stefan Dirsch from comment #44)
> (In reply to Takashi Iwai from comment #37)
> > AER report is usually harmless, but if it happens even with a newer kernel,
> > it's a regression and should be addressed.
> > (And yes, it's worth to test the boot options to see whether it suppresses
> > or not.)
> 
> So have you tried this meanwhile? Instructions in the link you posted in
> comment #36.

Any news on this one?
Comment 47 Stefan Dirsch 2024-03-28 03:18:17 UTC
@Petr ping ...
Comment 48 Petr Vorel 2024-03-28 06:28:37 UTC
I'm sorry, meanwhile I reinstalled to nouveau, but I'll reinstall back and check it.
Comment 49 Petr Vorel 2024-04-04 10:08:12 UTC
(In reply to Stefan Dirsch from comment #46)
> (In reply to Stefan Dirsch from comment #44)
> > (In reply to Takashi Iwai from comment #37)
> > > AER report is usually harmless, but if it happens even with a newer kernel,
> > > it's a regression and should be addressed.
> > > (And yes, it's worth to test the boot options to see whether it suppresses
> > > or not.)
> > 
> > So have you tried this meanwhile? Instructions in the link you posted in
> > comment #36.
> 
> Any news on this one?

Yes, pci=nommconf kernel command parameter suppresses AER error message in dmesg.
Comment 50 Petr Vorel 2024-04-04 10:12:36 UTC
Just for the record, nouveau kernel driver does not have the problem (going to retest nvidia kernel drivers).
Comment 51 Stefan Dirsch 2024-04-04 10:45:21 UTC
(In reply to Petr Vorel from comment #49)
> (In reply to Stefan Dirsch from comment #46)
> > (In reply to Stefan Dirsch from comment #44)
> > > (In reply to Takashi Iwai from comment #37)
> > > > AER report is usually harmless, but if it happens even with a newer kernel,
> > > > it's a regression and should be addressed.
> > > > (And yes, it's worth to test the boot options to see whether it suppresses
> > > > or not.)
> > > 
> > > So have you tried this meanwhile? Instructions in the link you posted in
> > > comment #36.
> > 
> > Any news on this one?
> 
> Yes, pci=nommconf kernel command parameter suppresses AER error message in
> dmesg.

Thanks for verifying that!
Comment 52 Stefan Dirsch 2024-04-04 10:48:03 UTC
(In reply to Petr Vorel from comment #50)
> Just for the record, nouveau kernel driver does not have the problem (going
> to retest nvidia kernel drivers).

I think with that we should close this bug. I understand that it's a hassle testing again and again a driver when you already found another solution. And since nobody else seems to be affected ...