Bugzilla – Bug 1213765
nvidia driver removal does not run depmod making subsequent dracut calls fail
Last modified: 2023-08-02 12:43:32 UTC
2023-07-28 18:55:37|command|root@uefi|'zypper' 'rm' '--clean-deps' 'nvidia-driver-G06-kmp-default'| 2023-07-28 18:55:43|remove |nvidia-compute-G06-32bit|535.86.05-10.1|x86_64|| 2023-07-28 18:55:45|remove |nvidia-gl-G06-32bit|535.86.05-10.1|x86_64|| 2023-07-28 18:55:47|remove |nvidia-video-G06-32bit|535.86.05-10.1|x86_64|| 2023-07-28 18:56:07|remove |nvidia-gl-G06|535.86.05-10.1|x86_64|| 2023-07-28 18:56:09|remove |nvidia-video-G06|535.86.05-10.1|x86_64|| 2023-07-28 18:56:13|remove |libnvidia-egl-wayland1|1.1.12-1.1|x86_64|| 2023-07-28 18:56:18|remove |nvidia-compute-G06|535.86.05-10.1|x86_64|| # 2023-07-28 18:56:33 nvidia-driver-G06-kmp-default-535.86.05_k6.4.3_1-10.1.x86_64 removed ok # warning: /usr/lib/modprobe.d/50-nvidia-default.conf saved as /usr/lib/modprobe.d/50-nvidia-default.conf.rpmsave # warning: /usr/lib/dracut/dracut.conf.d/60-nvidia-default.conf saved as /usr/lib/dracut/dracut.conf.d/60-nvidia-default.conf.rpmsave # update-alternatives: warning: alternative /usr/lib/nvidia/alternate-install-present-default (part of link group alternate-install-present) doesn't exist; removing from list of alternatives 2023-07-28 18:56:33|remove |nvidia-driver-G06-kmp-default|535.86.05_k6.4.3_1-10.1|x86_64|root@uefi| So far so good, but subsequent dracut fails with dracut[I]: *** Including module: kernel-modules-extra *** realpath: updates/nvidia-modeset.ko: No such file or directory realpath: updates/nvidia-peermem.ko: No such file or directory realpath: updates/nvidia-drm.ko: No such file or directory realpath: updates/nvidia-uvm.ko: No such file or directory realpath: updates/nvidia.ko: No such file or directory dracut[F]: installkernel failed in module kernel-modules-extra Because user@uefi:~> ll /usr/lib/modules/6.4.4-1-default/updates/ total 0 user@uefi:~> grep nvidia /usr/lib/modules/6.4.4-1-default/modules.dep kernel/drivers/net/ethernet/nvidia/forcedeth.ko.zst: kernel/drivers/usb/typec/altmodes/typec_nvidia.ko.zst: kernel/drivers/usb/typec/altmodes/typec_displayport.ko.zst kernel/drivers/usb/typec/typec.ko.zst kernel/drivers/i2c/busses/i2c-nvidia-gpu.ko.zst: kernel/drivers/i2c/busses/i2c-ccgx-ucsi.ko.zst kernel/drivers/platform/x86/nvidia-wmi-ec-backlight.ko.zst: kernel/drivers/acpi/video.ko.zst kernel/drivers/platform/x86/wmi.ko.zst updates/nvidia-modeset.ko: updates/nvidia.ko kernel/drivers/acpi/video.ko.zst kernel/drivers/platform/x86/wmi.ko.zst updates/nvidia-peermem.ko: updates/nvidia-drm.ko: updates/nvidia-modeset.ko updates/nvidia.ko kernel/drivers/acpi/video.ko.zst kernel/drivers/platform/x86/wmi.ko.zst updates/nvidia-uvm.ko: updates/nvidia.ko updates/nvidia.ko: user@uefi:~> ll /usr/lib/modules/6.4.4-1-default/modules.dep -rw-r--r-- 1 root root 655367 Jul 25 21:47 /usr/lib/modules/6.4.4-1-default/modules.dep user@uefi:~> So modules.dep was not updated when drivers were removed.
Hmm. First time I hear this being an issue. Seems indeed it's not being executed in %postun/%preun. May I ask how you ran dracut exactly?
(In reply to Stefan Dirsch from comment #1) > May I ask how you ran dracut exactly? dracut -f --regenerate-all
Related - on installation of NVIDIA KMP new initrd is generated which includes NVDIA modules. On removal of KMP initrd is not regenerated so we are left with initrd including NVIDIA drivers. Which can be quite surprising for users and it actually did cause issues. So probably package should simply call /usr/lib/module-init-tools/kernel-scriptlets/kmp-postun which will also take care of depmod. I bet there are some magic RPM macros to take care of it.
(In reply to Andrei Borzenkov from comment #3) > Related - on installation of NVIDIA KMP new initrd is generated which > includes NVDIA modules. On removal of KMP initrd is not regenerated so we > are left with initrd including NVIDIA drivers. Which can be quite surprising > for users and it actually did cause issues. So probably package should > simply call > > /usr/lib/module-init-tools/kernel-scriptlets/kmp-postun > > which will also take care of depmod. I bet there are some magic RPM macros > to take care of it. Actually this is being done. postuninstall scriptlet (using /bin/sh): [...] run_if_exists /usr/lib/module-init-tools/kernel-scriptlets/kmp-postun --name "nvidia-driver-G06-kmp-default" \ --version "535.86.05_k6.4.3_1" --release "10.1" --kernelrelease "6.4.3-1" \ --flavor "default" --usrmerged "01" "$@"
(In reply to Stefan Dirsch from comment #4) > --version "535.86.05_k6.4.3_1" --release "10.1" --kernelrelease "6.4.3-1" Kernel release is hardcoded in RPM and it is skipped for any other release. There is some code in %triggerpostun, but I honestly fail to understand what it does. Oh, and for G06 it is using the wrong package name - it says triggerpostun scriptlet (using /bin/bash) -- nvidia-gfxG06-kmp-default while package is named nvidia-driver-G06-kmp-default. May be this trigger is explicitly for migrating from old package naming convention, do not know.
Thinking more about it. On Tumbleweed NVIDIA driver is compiled and installed into kernel tree every time *kernel* is updated. What should happen when KMP is removed? Should all compiled binaries installed for every kernel version be removed? Should only the binary from the latest release be removed? I rather expect all compiled binaries should be removed. This is what e.g. Ubuntu nvidia-dkms package does.
(In reply to Andrei Borzenkov from comment #6) > (In reply to Stefan Dirsch from comment #4) > > --version "535.86.05_k6.4.3_1" --release "10.1" --kernelrelease "6.4.3-1" > > Kernel release is hardcoded in RPM and it is skipped for any other release. > There is some code in %triggerpostun, but I honestly fail to understand what > it does. Oh, and for G06 it is using the wrong package name - it says > > triggerpostun scriptlet (using /bin/bash) -- nvidia-gfxG06-kmp-default > > while package is named nvidia-driver-G06-kmp-default. May be this trigger is > explicitly for migrating from old package naming convention, do not know. This was needed only for the migration from the old to the new package name.
(In reply to Andrei Borzenkov from comment #7) > Thinking more about it. On Tumbleweed NVIDIA driver is compiled and > installed into kernel tree every time *kernel* is updated. What should > happen when KMP is removed? Should all compiled binaries installed for every > kernel version be removed? Should only the binary from the latest release be > removed? I rather expect all compiled binaries should be removed. This is > what e.g. Ubuntu nvidia-dkms package does. I tried to remove nvidia modules for no longer existing kernel trees, but this fails and I needed to revert it. # rpm --triggers -qp nvidia-driver-G06-kmp-default-535.86.05_k6.4.3_1-10.1.x86_64.rpm [...] triggerpostun scriptlet (using /bin/sh) -- kernel-default # # Unfortunately doesn't work since kernel updates are not considered "atomar" # when using YaST/zypper (only safe when using rpm) [boo#1182666] # #for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d); do # if [ ! -d $dir/kernel ]; then # test -d $dir/updates && rm -f $dir/updates/nvidia*.ko # fi #done
Ok. Indeed I can reproduce this issue with current Tumbleweed easily. (In reply to Stefan Dirsch from comment #9) > (In reply to Andrei Borzenkov from comment #7) > > Thinking more about it. On Tumbleweed NVIDIA driver is compiled and > > installed into kernel tree every time *kernel* is updated. What should > > happen when KMP is removed? Should all compiled binaries installed for every > > kernel version be removed? Should only the binary from the latest release be > > removed? I rather expect all compiled binaries should be removed. This is > > what e.g. Ubuntu nvidia-dkms package does. > > I tried to remove nvidia modules for no longer existing kernel trees, but > this fails and I needed to revert it. > > # rpm --triggers -qp > nvidia-driver-G06-kmp-default-535.86.05_k6.4.3_1-10.1.x86_64.rpm > [...] > triggerpostun scriptlet (using /bin/sh) -- kernel-default > # > # Unfortunately doesn't work since kernel updates are not considered "atomar" > # when using YaST/zypper (only safe when using rpm) [boo#1182666] > # > #for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d); do > # if [ ! -d $dir/kernel ]; then > # test -d $dir/updates && rm -f $dir/updates/nvidia*.ko > # fi > #done Actually I'm doing this when removing the package completely (%postun). For that case I added now a run of depmod for the affected kernel module trees.
This will get fixed with the following RPM changelog entry ------------------------------------------------------------------- Wed Aug 2 12:23:27 UTC 2023 - Stefan Dirsch <sndirsch@suse.com> - %postun: regenerate modules.dep, etc. to avoid dracut failures later (boo#1213765) Closing ...