Bug 1214908 - Nvidia kernel module compiled from nvidia-gfxG05-kmp-default gets installed in wrong folder
Summary: Nvidia kernel module compiled from nvidia-gfxG05-kmp-default gets installed i...
Status: RESOLVED INVALID
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: X11 3rd Party Driver (show other bugs)
Version: Current
Hardware: x86-64 openSUSE Tumbleweed
: P3 - Medium : Normal (vote)
Target Milestone: ---
Assignee: Stefan Dirsch
QA Contact: Stefan Dirsch
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-03 09:43 UTC by H. Hansen
Modified: 2023-09-03 15:32 UTC (History)
0 users

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description H. Hansen 2023-09-03 09:43:00 UTC
I have opensuse tumbleweed with kernel 6.4.3-1-default installed.

I have this repo activated:
https://developer.download.nvidia.com/compute/cuda/repos/opensuse15/

And installed this rpm:
https://developer.download.nvidia.com/compute/cuda/repos/opensuse15/x86_64/nvidia-gfxG05-kmp-default-535.104.05_k4.12.14_lp150.12.82-0.x86_64.rpm

After reboot, I ended up without nvidia-driver running.

Investigating: I found, that kernel-module was compiled, but installed (hardcoded) to
/lib/modules/4.12.14-lp150.12.82-default/updates/
which is the wrong location

Running
rpm -qi --scripts  nvidia-gfxG05-kmp-default
shows me following postinstall scriptlet:


arch=x86_64
flavor=default
kver=$(make -sC /usr/src/linux-obj/$arch/$flavor kernelrelease)
export NV_EXCLUDE_KERNEL_MODULES=nvidia-peermem
RES=0
make -C /usr/src/linux-obj/$arch/$flavor \
     modules \
     M=/usr/src/kernel-modules/nvidia-535.104.05-$flavor \
     SYSSRC=/lib/modules/$kver/source \
     SYSOUT=/usr/src/linux-obj/$arch/$flavor || RES=1
pushd /usr/src/kernel-modules/nvidia-535.104.05-$flavor 
make -f Makefile \
     nv-linux.o \
     SYSSRC=/lib/modules/$kver/source \
     SYSOUT=/usr/src/linux-obj/$arch/$flavor || RES=1
popd
# remove still existing old kernel modules (boo#1174204)
rm /lib/modules/$kver/updates/nvidia*.ko
install -m 755 -d /lib/modules/4.12.14-lp150.12.82-$flavor/updates
install -m 644 /usr/src/kernel-modules/nvidia-535.104.05-$flavor/nvidia*.ko \
        /lib/modules/4.12.14-lp150.12.82-$flavor/updates
depmod 4.12.14-lp150.12.82-$flavor


/usr/sbin/update-alternatives --install /usr/lib/nvidia/alternate-install-present alternate-install-present /usr/lib/nvidia/alternate-install-present-$flavor 11

# Create symlinks for udev so these devices will get user ACLs by logind later (bnc#1000625)
mkdir -p /run/udev/static_node-tags/uaccess
mkdir -p /usr/lib/tmpfiles.d
ln -snf /dev/nvidiactl /run/udev/static_node-tags/uaccess/nvidiactl 
ln -snf /dev/nvidia-uvm /run/udev/static_node-tags/uaccess/nvidia-uvm
ln -snf /dev/nvidia-uvm-tools /run/udev/static_node-tags/uaccess/nvidia-uvm-tools
ln -snf /dev/nvidia-modeset /run/udev/static_node-tags/uaccess/nvidia-modeset
cat >  /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf << EOF
L /run/udev/static_node-tags/uaccess/nvidiactl - - - - /dev/nvidiactl
L /run/udev/static_node-tags/uaccess/nvidia-uvm - - - - /dev/nvidia-uvm
L /run/udev/static_node-tags/uaccess/nvidia-uvm-tools - - - - /dev/nvidia-uvm-tools
L /run/udev/static_node-tags/uaccess/nvidia-modeset - - - - /dev/nvidia-modeset
EOF
devid=-1
for dev in $(ls -d /sys/bus/pci/devices/*); do 
  vendorid=$(cat $dev/vendor)
  if [ "$vendorid" == "0x10de" ]; then 
    class=$(cat $dev/class)
    classid=${class%00}
    if [ "$classid" == "0x0300" -o "$classid" == "0x0302" ]; then 
      devid=$((devid+1))
      ln -snf /dev/nvidia${devid} /run/udev/static_node-tags/uaccess/nvidia${devid}
      echo "L /run/udev/static_node-tags/uaccess/nvidia${devid} - - - - /dev/nvidia${devid}" >> /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf
    fi
  fi
done

echo
echo "Modprobe blacklist files have been created at /etc/modprobe.d to \
prevent Nouveau from loading. This can be reverted by deleting \
/etc/modprobe.d/nvidia-*.conf."
echo
echo "*** Reboot your computer and verify that the NVIDIA graphics driver \
can be loaded. ***"
echo

# Let all initrds get generated by regenerate-initrd-posttrans
mkdir -p /run/regenerate-initrd
touch /run/regenerate-initrd/all

# Recreate initrd without KMS if required (sle11)
# Only touch config, if the use of KMS is enabled in initrd;
if grep -q NO_KMS_IN_INITRD=\"no\" /etc/sysconfig/kernel; then
  sed -i 's/NO_KMS_IN_INITRD.*/NO_KMS_IN_INITRD="yes"/g' /etc/sysconfig/kernel
fi

# groups are now dynamic
if [ -f /etc/modprobe.d/50-nvidia-default.conf ]; then
  VIDEOGID=`getent group video | cut -d: -f3`
  sed -i "s/33/$VIDEOGID/" /etc/modprobe.d/50-nvidia-default.conf
fi

#needed to move this to specfile after running weak-modules2 (boo#1145316)
#exit $RES
nvr=nvidia-gfxG05-kmp-default-535.104.05_k4.12.14_lp150.12.82-0
wm2=/usr/lib/module-init-tools/weak-modules2
if [ -x $wm2 ]; then
     INITRD_IN_POSTTRANS=1 /bin/bash -${-/e/} $wm2 --add-kmp $nvr
fi
exit $RES


When I replaced the hardcoded "4.12.14-lp150.12.82" with my kernel version and run the script again, I have the new driver running after reboot! The install-location should not be hardcoded.
Comment 1 Stefan Dirsch 2023-09-03 12:13:28 UTC
The kernel package in this repository only supports sle15/Leap 15 (which provide kABI compatible kernels with weak-updates mechanism, which creates symlinks to compatible kernel modules). And *not* Tumbleweed. For Tumbleweed please use the *G06* driver packages from 

  https://download.nvidia.com/opensuse/tumbleweed/

instead. These can also be used together with the CUDA packages from the cuda repository. For more details see also

  https://en.opensuse.org/SDB:NVIDIA_drivers
Comment 2 H. Hansen 2023-09-03 12:53:45 UTC
Last time I tried g06 driver I had a very bad experience (not productive usable, nvidia GTX 1660 TI device). So that is not a solution for now.

Why you do not offer a G05-kmp module for tumbleweed?
Comment 3 hui 2023-09-03 13:01:21 UTC
(In reply to H. Hansen from comment #2)
> Why you do not offer a G05-kmp module for tumbleweed?
There it is
https://download.nvidia.com/opensuse/tumbleweed/x86_64/nvidia-gfxG05-kmp-default-470.199.02_k6.4.3_1-54.8.x86_64.rpm
Comment 4 H. Hansen 2023-09-03 13:26:18 UTC
(In reply to hui from comment #3)
> (In reply to H. Hansen from comment #2)
> > Why you do not offer a G05-kmp module for tumbleweed?
> There it is
> https://download.nvidia.com/opensuse/tumbleweed/x86_64/nvidia-gfxG05-kmp-
> default-470.199.02_k6.4.3_1-54.8.x86_64.rpm

Sorry, I did not mention, that I had that one before, but I need to upgrade, see cuda requirements here:

https://docs.nvidia.com/deploy/cuda-compatibility/index.html
Comment 5 Stefan Dirsch 2023-09-03 15:32:56 UTC
(In reply to H. Hansen from comment #2)
> Last time I tried g06 driver I had a very bad experience (not productive
> usable, nvidia GTX 1660 TI device). So that is not a solution for now.
> 
> Why you do not offer a G05-kmp module for tumbleweed?

Beware, that G05 in CUDA repo is G06 in gfx driver repo, i.e. 535.xx.yy driver. nvidia never split off a G06 driver, when removing Kepler from support. SUSE did.
Not sure which G06 driver version you've tried.