Bug 1228145 - nvidia driver wont load with debug kernel (6.9.9-1-debug)
Summary: nvidia driver wont load with debug kernel (6.9.9-1-debug)
Status: RESOLVED WONTFIX
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: X11 3rd Party Driver (show other bugs)
Version: Current
Hardware: x86-64 Linux
: P4 - Low : Normal (vote)
Target Milestone: ---
Assignee: Stefan Dirsch
QA Contact: Stefan Dirsch
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-19 13:22 UTC by Eric Benton
Modified: 2024-07-20 01:58 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
nvidia-installer.log (11.49 MB, text/plain)
2024-07-20 01:58 UTC, Eric Benton
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Benton 2024-07-19 13:22:15 UTC
kernel: 6.9.9-1-debug
I had reason to use the debug kernel to try and track down a recurring hang. When the debug kernel is loaded the nvidia drivers will not load.
I tried to modprobe nvidia.ko and get this error
modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or unknown parameter (see dmesg)
Nothing shows up in dmesg
Comment 1 Stefan Dirsch 2024-07-19 14:42:45 UTC
Well. We don't provide packages for the -debug kernel flavor. I think we never did.

nvidia-driver-G06.spec
[...]
%define x_flavors kdump um debug xen xenpae
[...]
%kernel_module_package %kmp_template %_builddir/nvidia-kmp-template -p %_sourcedir/preamble -f %_sourcedir/%kmp_filelist -x %x_flavors

nvidia-open-driver-G06.spec
[...]
%define kernel_flavors default
%ifnarch aarch64
%if !0%{?is_opensuse}
%define kernel_flavors azure default
%endif
%else
%define kernel_flavors 64kb default
%endif
[...]
Comment 2 Stefan Dirsch 2024-07-19 17:35:34 UTC
Of course you can build it yourself by running their installer.

  https://www.nvidia.com/en-us/drivers/unix/

I think it has also an option to only build and install the kernel modules.

$ sh ./NVIDIA-Linux-x86_64-550.100.run -A

[...]
  -K, --kernel-modules-only
      Install the kernel modules only, and do not uninstall the existing driver.  This is intended to be used to install kernel modules for additional kernels (in cases where you might boot between several different kernels).  To use this option, you must already have a driver installed, and the version of the installed driver must match the version of these kernel modules.
[...]

I don't plan to add packages for this flavor. I think you're the first who asks for it and we build nvidia KMPs since almost 2 decades now.
Comment 3 Eric Benton 2024-07-19 22:27:50 UTC
What are KMP's? (pardon my ignorance)
Comment 4 Eric Benton 2024-07-19 22:49:34 UTC
I had yo uninstall nvidia G06 to a chance of this compiling but even after that i get this error
     MODPOST /tmp/selfgz3544335/NVIDIA-Linux-x86_64-550.100/kernel/Module.symvers
   ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'mutex_destroy'
   make[4]: *** [/usr/src/linux-6.9.9-1/scripts/Makefile.modpost:151: /tmp/selfgz3544335/NVIDIA-Linux-x86_64-550.100/kernel/Module.symvers] Error 1
   make[4]: Target '__modpost' not remade because of errors.
   make[3]: *** [/usr/src/linux-6.9.9-1/Makefile:1887: modpost] Error 2
   make[3]: Target 'modules' not remade because of errors.
   make[2]: *** [/usr/src/linux-6.9.9-1/Makefile:240: __sub-make] Error 2
   make[2]: Target 'modules' not remade because of errors.
   make[2]: Leaving directory '/usr/src/linux-6.9.9-1-obj/x86_64/debug'
   make[1]: *** [Makefile:240: __sub-make] Error 2
   make[1]: Target 'modules' not remade because of errors.
   make[1]: Leaving directory '/usr/src/linux-6.9.9-1'
   make: *** [Makefile:85: modules] Error 2
-> Error.
ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log for details.
-> The command `cd kernel; /bin/make -k -j12  NV_EXCLUDE_KERNEL_MODULES="" SYSSRC="/lib/modules/6.9.9-1-debug/source" SYSOUT="/lib/modules/6.9.9-1-debug/build" ` failed with the following output:


make[1]: Entering directory '/usr/src/linux-6.9.9-1'
make[2]: Entering directory '/usr/src/linux-6.9.9-1-obj/x86_64/debug'
warning: the compiler differs from the one used to build the kernel
  The kernel was built by: gcc (SUSE Linux) 13.3.0
  You are using:           cc (SUSE Linux) 13.3.0
Comment 5 Stefan Dirsch 2024-07-20 00:57:25 UTC
Hmm. I'm not sure if one can build the modules against our -debug kernel. Last time I tried that is likely more than a decade ago.
Comment 6 Stefan Dirsch 2024-07-20 00:57:53 UTC
(In reply to Eric Benton from comment #3)
> What are KMP's? (pardon my ignorance)

Kernel Module Package
Comment 7 Eric Benton 2024-07-20 01:56:22 UTC
I was looking at the nvidia-installer-log (attached) and other than a ton of warnings (which nvidia should never have allowed out the door)
the only error that shows up is:
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'mutex_destroy'

Thats not to say its the entry to a rabbit hole where you fix that one and another appears over and over and over
Comment 8 Eric Benton 2024-07-20 01:58:11 UTC
Created attachment 876169 [details]
nvidia-installer.log