Bugzilla – Bug 1214908
Nvidia kernel module compiled from nvidia-gfxG05-kmp-default gets installed in wrong folder
Last modified: 2023-09-03 15:32:56 UTC
I have opensuse tumbleweed with kernel 6.4.3-1-default installed. I have this repo activated: https://developer.download.nvidia.com/compute/cuda/repos/opensuse15/ And installed this rpm: https://developer.download.nvidia.com/compute/cuda/repos/opensuse15/x86_64/nvidia-gfxG05-kmp-default-535.104.05_k4.12.14_lp150.12.82-0.x86_64.rpm After reboot, I ended up without nvidia-driver running. Investigating: I found, that kernel-module was compiled, but installed (hardcoded) to /lib/modules/4.12.14-lp150.12.82-default/updates/ which is the wrong location Running rpm -qi --scripts nvidia-gfxG05-kmp-default shows me following postinstall scriptlet: arch=x86_64 flavor=default kver=$(make -sC /usr/src/linux-obj/$arch/$flavor kernelrelease) export NV_EXCLUDE_KERNEL_MODULES=nvidia-peermem RES=0 make -C /usr/src/linux-obj/$arch/$flavor \ modules \ M=/usr/src/kernel-modules/nvidia-535.104.05-$flavor \ SYSSRC=/lib/modules/$kver/source \ SYSOUT=/usr/src/linux-obj/$arch/$flavor || RES=1 pushd /usr/src/kernel-modules/nvidia-535.104.05-$flavor make -f Makefile \ nv-linux.o \ SYSSRC=/lib/modules/$kver/source \ SYSOUT=/usr/src/linux-obj/$arch/$flavor || RES=1 popd # remove still existing old kernel modules (boo#1174204) rm /lib/modules/$kver/updates/nvidia*.ko install -m 755 -d /lib/modules/4.12.14-lp150.12.82-$flavor/updates install -m 644 /usr/src/kernel-modules/nvidia-535.104.05-$flavor/nvidia*.ko \ /lib/modules/4.12.14-lp150.12.82-$flavor/updates depmod 4.12.14-lp150.12.82-$flavor /usr/sbin/update-alternatives --install /usr/lib/nvidia/alternate-install-present alternate-install-present /usr/lib/nvidia/alternate-install-present-$flavor 11 # Create symlinks for udev so these devices will get user ACLs by logind later (bnc#1000625) mkdir -p /run/udev/static_node-tags/uaccess mkdir -p /usr/lib/tmpfiles.d ln -snf /dev/nvidiactl /run/udev/static_node-tags/uaccess/nvidiactl ln -snf /dev/nvidia-uvm /run/udev/static_node-tags/uaccess/nvidia-uvm ln -snf /dev/nvidia-uvm-tools /run/udev/static_node-tags/uaccess/nvidia-uvm-tools ln -snf /dev/nvidia-modeset /run/udev/static_node-tags/uaccess/nvidia-modeset cat > /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf << EOF L /run/udev/static_node-tags/uaccess/nvidiactl - - - - /dev/nvidiactl L /run/udev/static_node-tags/uaccess/nvidia-uvm - - - - /dev/nvidia-uvm L /run/udev/static_node-tags/uaccess/nvidia-uvm-tools - - - - /dev/nvidia-uvm-tools L /run/udev/static_node-tags/uaccess/nvidia-modeset - - - - /dev/nvidia-modeset EOF devid=-1 for dev in $(ls -d /sys/bus/pci/devices/*); do vendorid=$(cat $dev/vendor) if [ "$vendorid" == "0x10de" ]; then class=$(cat $dev/class) classid=${class%00} if [ "$classid" == "0x0300" -o "$classid" == "0x0302" ]; then devid=$((devid+1)) ln -snf /dev/nvidia${devid} /run/udev/static_node-tags/uaccess/nvidia${devid} echo "L /run/udev/static_node-tags/uaccess/nvidia${devid} - - - - /dev/nvidia${devid}" >> /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf fi fi done echo echo "Modprobe blacklist files have been created at /etc/modprobe.d to \ prevent Nouveau from loading. This can be reverted by deleting \ /etc/modprobe.d/nvidia-*.conf." echo echo "*** Reboot your computer and verify that the NVIDIA graphics driver \ can be loaded. ***" echo # Let all initrds get generated by regenerate-initrd-posttrans mkdir -p /run/regenerate-initrd touch /run/regenerate-initrd/all # Recreate initrd without KMS if required (sle11) # Only touch config, if the use of KMS is enabled in initrd; if grep -q NO_KMS_IN_INITRD=\"no\" /etc/sysconfig/kernel; then sed -i 's/NO_KMS_IN_INITRD.*/NO_KMS_IN_INITRD="yes"/g' /etc/sysconfig/kernel fi # groups are now dynamic if [ -f /etc/modprobe.d/50-nvidia-default.conf ]; then VIDEOGID=`getent group video | cut -d: -f3` sed -i "s/33/$VIDEOGID/" /etc/modprobe.d/50-nvidia-default.conf fi #needed to move this to specfile after running weak-modules2 (boo#1145316) #exit $RES nvr=nvidia-gfxG05-kmp-default-535.104.05_k4.12.14_lp150.12.82-0 wm2=/usr/lib/module-init-tools/weak-modules2 if [ -x $wm2 ]; then INITRD_IN_POSTTRANS=1 /bin/bash -${-/e/} $wm2 --add-kmp $nvr fi exit $RES When I replaced the hardcoded "4.12.14-lp150.12.82" with my kernel version and run the script again, I have the new driver running after reboot! The install-location should not be hardcoded.
The kernel package in this repository only supports sle15/Leap 15 (which provide kABI compatible kernels with weak-updates mechanism, which creates symlinks to compatible kernel modules). And *not* Tumbleweed. For Tumbleweed please use the *G06* driver packages from https://download.nvidia.com/opensuse/tumbleweed/ instead. These can also be used together with the CUDA packages from the cuda repository. For more details see also https://en.opensuse.org/SDB:NVIDIA_drivers
Last time I tried g06 driver I had a very bad experience (not productive usable, nvidia GTX 1660 TI device). So that is not a solution for now. Why you do not offer a G05-kmp module for tumbleweed?
(In reply to H. Hansen from comment #2) > Why you do not offer a G05-kmp module for tumbleweed? There it is https://download.nvidia.com/opensuse/tumbleweed/x86_64/nvidia-gfxG05-kmp-default-470.199.02_k6.4.3_1-54.8.x86_64.rpm
(In reply to hui from comment #3) > (In reply to H. Hansen from comment #2) > > Why you do not offer a G05-kmp module for tumbleweed? > There it is > https://download.nvidia.com/opensuse/tumbleweed/x86_64/nvidia-gfxG05-kmp- > default-470.199.02_k6.4.3_1-54.8.x86_64.rpm Sorry, I did not mention, that I had that one before, but I need to upgrade, see cuda requirements here: https://docs.nvidia.com/deploy/cuda-compatibility/index.html
(In reply to H. Hansen from comment #2) > Last time I tried g06 driver I had a very bad experience (not productive > usable, nvidia GTX 1660 TI device). So that is not a solution for now. > > Why you do not offer a G05-kmp module for tumbleweed? Beware, that G05 in CUDA repo is G06 in gfx driver repo, i.e. 535.xx.yy driver. nvidia never split off a G06 driver, when removing Kepler from support. SUSE did. Not sure which G06 driver version you've tried.