Bugzilla – Bug 1174204
NVIDIA driver after update 440.100 --> 450.57 fails due to remaining old kernel modules
Last modified: 2020-07-21 17:30:57 UTC
After applying the NVIDIA driver update to 450.57 I end up with an unsable NVIDIA driver and X.org falling back to 1024p with software rendering. Looking at the journal shows the following: NVRM: API mismatch: the client has the version 450.57, but NVRM: this kernel module has the version 440.100. Please NVRM: make sure that this kernel module and all NVIDIA driver NVRM: components have the same version. However, as I have in the meantime completely removed the driver, rebootet with Nouveau–which I am also using to write this–and re-installed the driver it is completely unclear to me where the old kernel module should come from. Zypper also claims that all my packages are on the same version: S | Name | Typ | Version | Arch | Repository ---+----------------------------+------------+-------------------------------------+--------+------------------------ | nvidia-computeG04 | Paket | 390.138-lp152.14.1 | x86_64 | nVidia Graphics Drivers i+ | nvidia-computeG05 | Paket | 450.57-lp152.28.1 | x86_64 | nVidia Graphics Drivers | nvidia-firmware-installer | Paket | 1.1-lp152.1.1 | noarch | hardware | nvidia-firmware-installer | Quellpaket | 1.1-lp152.1.1 | noarch | hardware | nvidia-gfxG04-kmp-default | Paket | 390.138_k5.3.18_lp152.19-lp152.14.1 | x86_64 | nVidia Graphics Drivers | nvidia-gfxG04-kmp-preempt | Paket | 390.138_k5.3.18_lp152.19-lp152.14.1 | x86_64 | nVidia Graphics Drivers i+ | nvidia-gfxG05-kmp-default | Paket | 450.57_k5.3.18_lp152.19-lp152.28.1 | x86_64 | nVidia Graphics Drivers | nvidia-gfxG05-kmp-preempt | Paket | 450.57_k5.3.18_lp152.19-lp152.28.1 | x86_64 | nVidia Graphics Drivers | nvidia-glG04 | Paket | 390.138-lp152.14.1 | x86_64 | nVidia Graphics Drivers i+ | nvidia-glG05 | Paket | 450.57-lp152.28.1 | x86_64 | nVidia Graphics Drivers | nvidia-texture-tools | Paket | 2.0.8-lp152.3.9 | x86_64 | Haupt-Repository (OSS) | pcp-pmda-nvidia-gpu | Paket | 4.3.1-lp152.4.3 | x86_64 | Haupt-Repository (OSS) | skelcd-EULA-NVIDIA-compute | Paket | 2020.05.04-lp152.1.1 | x86_64 | Haupt-Repository (OSS) | x11-video-nvidiaG04 | Paket | 390.138-lp152.14.1 | x86_64 | nVidia Graphics Drivers i+ | x11-video-nvidiaG05 | Paket | 450.57-lp152.28.1 | x86_64 | nVidia Graphics Drivers I have tried explicitly running mkinitrd but this did not change the situation. One additional thing I ran into is that when removing the driver to switch to Nouveau, the system still behaves as before the uninstallation and lsmod will show the Nvidia driver still being loaded after reboot: nvidia_drm 53248 0 nvidia_modeset 1118208 1 nvidia_drm nvidia 20721664 1 nvidia_modeset ipmi_msghandler 69632 1 nvidia drm_kms_helper 229376 2 nvidia_drm,nouveau drm 544768 5 drm_kms_helper,nvidia_drm,ttm,nouveau Only explicitly invoking mkinitrd will actually cause the Nvidia driver not to be loaded on boot and provide me with a working Nouveau driver.
Seems the kernel module build of 450 failed or the 440 module is being preferred for some reason. I suggest to uninstall nvidia-gfxG05-kmp-default package, remove all remaining nvidia modules below /lib/modules: cd /lib/modules find . -name nvidia*.ko -print | xargs rm and then reinstall nvidia-gfxG05-kmp-default package. Check then this: find /lib/modules -name nvidia*.ko
And if it still doesn't work also attach the result when running nvidia-bug-report.sh
Dear Stefan, I too have been hit with the same issue as above and you solution worked for me. Thanks
Created attachment 839785 [details] Result of nvidia-bug-report.sh Sadly this didn't fix the issue for me. One interesting thing I noted: Before removing the modules I had a /lib/modules//5.3.18-lp152.20.7-default/updates/nvidia.ko, along with many modules for LEap 15.1 and 15.2 kernels. After removing all modules and running the driver installation I have /lib/modules//5.3.18-lp152.19-default/updates/nvidia.ko. So I did actually have a module with a higher version number lying around.
I have this same issue. in 15.2. I get no graphics at all. e NVidia drivers got updated. Now I cannot activate them with # prime-select nvidia It says it cannot query the GPU. I uninstalled and reinstalled the packages, and prime-select still fails. Help please.
Can we delete all the 4.4 and 4.12 files in /lib/modules?
(In reply to James Rome from comment #6) > Can we delete all the 4.4 and 4.12 files in /lib/modules? And, I do not have an nvidia file in /lib/modules: drwxr-xr-x 1 root root 14 Aug 18 2018 4.12.14-lp150.12.10-default drwxr-xr-x 1 root root 14 Oct 8 2018 4.12.14-lp150.12.13-default drwxr-xr-x 1 root root 14 Oct 16 2018 4.12.14-lp150.12.16-default drwxr-xr-x 1 root root 14 Nov 7 2018 4.12.14-lp150.12.19-default drwxr-xr-x 1 root root 14 Dec 15 2018 4.12.14-lp150.12.22-default drwxr-xr-x 1 root root 14 Jan 17 2019 4.12.14-lp150.12.25-default drwxr-xr-x 1 root root 14 Feb 19 2019 4.12.14-lp150.12.28-default drwxr-xr-x 1 root root 24 Aug 7 2018 4.12.14-lp150.12.4-default drwxr-xr-x 1 root root 14 Apr 12 2019 4.12.14-lp150.12.45-default drwxr-xr-x 1 root root 14 May 16 2019 4.12.14-lp150.12.48-default drwxr-xr-x 1 root root 14 May 27 2019 4.12.14-lp150.12.58-default drwxr-xr-x 1 root root 14 Jun 17 2019 4.12.14-lp150.12.61-default drwxr-xr-x 1 root root 14 Aug 18 2018 4.12.14-lp150.12.7-default drwxr-xr-x 1 root root 14 Sep 22 2019 4.12.14-lp151.28.10-default drwxr-xr-x 1 root root 14 Oct 10 2019 4.12.14-lp151.28.13-default drwxr-xr-x 1 root root 14 Oct 30 2019 4.12.14-lp151.28.16-default drwxr-xr-x 1 root root 14 Nov 13 2019 4.12.14-lp151.28.20-default drwxr-xr-x 1 root root 14 Dec 9 2019 4.12.14-lp151.28.25-default drwxr-xr-x 1 root root 14 Mar 8 10:06 4.12.14-lp151.28.32-default drwxr-xr-x 1 root root 14 Mar 25 18:30 4.12.14-lp151.28.36-default drwxr-xr-x 1 root root 14 Jul 16 2019 4.12.14-lp151.28.4-default drwxr-xr-x 1 root root 14 Apr 20 11:14 4.12.14-lp151.28.40-default drwxr-xr-x 1 root root 14 Jun 11 15:02 4.12.14-lp151.28.44-default drwxr-xr-x 1 root root 14 Jul 3 10:36 4.12.14-lp151.28.48-default drwxr-xr-x 1 root root 14 Jul 3 12:44 4.12.14-lp151.28.52-default drwxr-xr-x 1 root root 14 Aug 11 2019 4.12.14-lp151.28.7-default drwxr-xr-x 1 root root 278 Jul 30 2017 4.4.27-2-default drwxr-xr-x 1 root root 278 May 26 2018 4.4.76-1-default drwxr-xr-x 1 root root 292 Jul 16 12:53 5.3.18-lp152.19-default drwxr-xr-x 1 root root 292 Jul 16 12:53 5.3.18-lp152.19-preempt drwxr-xr-x 1 root root 462 Jul 16 12:53 5.3.18-lp152.20.7-default drwxr-xr-x 1 root root 292 Jul 15 18:23 5.3.18-lp152.20.7-preempt drwxr-xr-x 1 root root 484 Jul 16 12:53 5.3.18-lp152.26-default drwxr-xr-x 1 root root 314 Jul 15 18:19 5.3.18-lp152.26-preempt
I wish this was editable. There are NVidia modules in /lib/modules/5.3.18-lp152.19-preempt/updates. But surely /lib/modules/5.3.18-lp152.26-preempt/updates would be newer, but nothing is there.
(In reply to Matthias Bach from comment #4) > Sadly this didn't fix the issue for me. I just realised I failed. I only ran `find /lib/modules -name nvidia.ko -delete`. Will retry with `find /lib/modules -name nvidia.ko -delete`.
(In reply to Matthias Bach from comment #9) > (In reply to Matthias Bach from comment #4) > > Sadly this didn't fix the issue for me. > > I just realised I failed. I only ran `find /lib/modules -name nvidia.ko > -delete`. Will retry with `find /lib/modules -name nvidia.ko -delete`. So doing this properly does fix the issue. Thanks! Still weird that I had /lib/modules/5.3.18-lp152.20.7-default/updates/nvidia*.ko though when the current package builds /lib/modules/5.3.18-lp152.19-default/updates/nvidia*.ko which now gets linked from /lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia*.ko.
(In reply to Matthias Bach from comment #10) > So doing this properly does fix the issue. Thanks! Good! > Still weird that I had > /lib/modules/5.3.18-lp152.20.7-default/updates/nvidia*.ko though So I assume these were the 440.110 ones still, which weren't removed during uninstallation of old package for some reason. > when the current package builds > /lib/modules/5.3.18-lp152.19-default/updates/nvidia*.ko That's correct. > which now gets linked from > /lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia*.ko. That's how it is supposed to be. Create symlinks for all kernels sharing the same kABI. Our weak-updates concept.
@James Rome Please follow instructions of comment#1. They make sure nothing is left below /lib/modules.
Yes, using find /lib/modules -name nvidia*.ko -delete and removing and reinstalling the drivers fixed it.
Ok. So at least we have a workaround. But now I'm afraid this happens for everyone for this update 440.100 --> 450.57. :-(
Now I know what happens. Up to 440.100 mistakenly kernel modules were rebuilt and installed for the kernel, against it has been locally built. Currently this is 5.3.18-lp152.20.7. With 450.57 I switched this back to our weak-modules concept, i.e. kernel modules are installed to a fixed kernel version (here: 5.3.18-lp152.19; even if it doesn't exist on the system), then weak-modules symlinks are created for all other installed kernels. Example 440.100 packages 450.57 packages ----------- .19 fixed GA Kernel no kernel moules 450.57 modules .20 build kernel 440.100 modules 440.100 modules (no weak symlinks created) *** .85 another kernel no kernel modules weak symlinks to .19 fixed kernel (450.57 modules) *** because modules with the same name already exist As a fix I could remove the old modules before installing the new ones.
Fixed and pushed packages towards nvidia. Consider this a reliable workaround as long as this update is not available yet: rpm -e nvidia-gfxG05-kmp-default --nodeps find /lib/modules -name nvidia*.ko -delete zypper in nvidia-gfxG05-kmp-default Fixed packages contain the following RPM changelog: Thu Jul 16 19:36:52 UTC 2020 - Stefan Dirsch <sndirsch@suse.com> - remove still existing old kernel modules during installation of new modules, since otherwise weak-modules doesn't work (boo#1174204)