Bugzilla – Bug 1224773
NVIDIA KMP: initrd not regenerated on transactional systems using systemd-boot like Aeon >= RC2
Last modified: 2024-07-12 13:44:54 UTC
Installing NVIDIA drivers in Aeon RC2 (https://en.opensuse.org/SDB:NVIDIA_drivers) fails because the MOK cannot be enrolled after boot. Error during installation using pkg is: SKIP: /var/lib/nvidia-pubkeys/MOK-nvidia-driver-G06-550.78-22.1-default.der is not in MokList Without the nvidia-pubkeys folder it is impossible to enroll the MOK or try to enable it after reboot. The key doesn't exist :(
Warning: The following files were changed in the snapshot, but are shadowed by other mounts and will not be visible to the system: /.snapshots/3/snapshot/var/lib/nvidia-pubkeys/MOK-nvidia-driver-G06-550.78-22.1-default.der --After the second attempt, having used the home backup option.
Not an Aeon specific bug, nor a package in any Aeon-supported/maintained repo Looks to me that the packaging of the NVIDIA package is wrong /var/ should be avoided when packaging for any transactional distribution (ie Aeon, MicroOS, SLE Micro, etc) because the whole premise is that such systems don't change during runtime, whereas the whole premise of /var is for data that needs to change at runtime This is why transactonal-update doesn't mount var during the update, hence the observation in Comment #1 /usr/share/$FOO seems like a more natural and correct location for NVIDIA's public keys, as I assume they're not meant to change at runtime. Assigning to the appropriate category and a commnuity packager who may be willing to help
(In reply to Richard Brown from comment #2) > /usr/share/$FOO seems like a more natural and correct location for NVIDIA's > public keys, as I assume they're not meant to change at runtime. Well, the public key gets generated in %post of installation. I think this qualifies as changing at runtime?!? So I don't think /usr/share would be the right location? What would be the correct location for installing MOK public keys then?
(In reply to Stefan Dirsch from comment #3) > (In reply to Richard Brown from comment #2) > > /usr/share/$FOO seems like a more natural and correct location for NVIDIA's > > public keys, as I assume they're not meant to change at runtime. > > Well, the public key gets generated in %post of installation. I think this > qualifies as changing at runtime?!? So I don't think /usr/share would be the > right location? What would be the correct location for installing MOK public > keys then? /var is not mounted during the package installation on any transactional system I see /var and things "changing at runtime" as things like databases, VMs Not something that is static for the life of the package installation (which I assume a public key is) I really thing somewhere like /usr/share/nvidia-pubkeys would be a more appropriate place, the same way /usr/share/containers-keys is the correct place for our container public keys
Ok. Changed that now for G04, G05, G06 packages. Will be available as update expectedly in early June.
The 555 driver was released a few hours ago, so hopefully, the fix was in before.
(In reply to Benjamin Sabatini from comment #6) > The 555 driver was released a few hours ago, so hopefully, the fix was in > before. Not really. This was a SUSE packaging issue. It will be included in the next driver package update for driver > 550.78.
Created attachment 875233 [details] Aeon RC2 systemd-boot log
Hi there, I just updated an old MicroOS Desktop system to Aeon RC 2 using the Aeon Installer. Unfortunately I need to report that nVidia drivers still can't be installed on Aeon RC2. > Mai 31 07:32:47 localhost systemd-udevd[1106]: modprobe: ERROR: could not insert 'nvidia': Key was rejected by service It seems it still can't import the MOK's for the nVidia driver. Looking up https://en.opensuse.org/SDB:NVIDIA_drivers#Secureboot Revealed: 1) The Wiki is outdated, as the MOKs has been moved to "/usr/share/nvidia-pubkeys" according to this ticket but 2) The directory "/usr/share/nvidia-pubkeys" does not exist on my system at all. Neither does the old "/var/lib/nvidia-pubkeys" I uploaded the full boot-log file of my current system.
Oh sorry I missed the last comment that the changes are available early June. My bad.
I think there is another issues with the driver. After disabling secure boot in my system the nvidia driver still does not get loaded because of nouveau it seems: > Mai 31 07:55:03 localhost kernel: NVRM: GPU 0000:01:00.0 is already bound to nouveau. After manually running: > sudo transactional-update initrd > sudo reboot It still seems not to pickup the file "/usr/lib/modprobe.d/nvidia-default.conf" Which states: > blacklist nouveau I'll attach a 2nd boot log
Created attachment 875234 [details] Aeon RC2 boot log with out secure boot enabled
Created attachment 875235 [details] transactional update initrd log file
Indeed. nouveau is being loaded. So either /usr/lib/modprobe.d/nvidia-default.conf is not being added to initrd or it's being ignored in initrd. The first you can easily verify with lsinitrd | grep nvidia Try this please!
Hm, I get the following output: >sudo lsinitrd | grep nvidia >No <initramfs file> specified and the default image '/boot/efi/7b5e214ac1c6413b9145cd262341e2fe/6.9.1-1-default/initrd' cannot be accessed! Am I using the command wrong?
A never mind. I rolled back to snapshot #2 before installing the driver. Then ran transactional-update pkg in nvidia-drivers-G06 Rebooted And ran the command. It seems the driver package on it's own seems not to generate an initramfs on it's own. After manually running transactional-update initrd reboot I now get the following output (see attached log file)
Created attachment 875249 [details] Output of lsinitrd | grep nvidia
Any KMP runs dracut after installation. But who knows, possible KMPs simply don't work on such systems. (In reply to Imo Hester from comment #17) > Created attachment 875249 [details] > Output of lsinitrd | grep nvidia -rw-r--r-- 1 root root 18 Apr 29 10:36 usr/lib/modprobe.d/nvidia-default.conf Looks like the modprobe.d snippet is in initrd. No idea why it is being ignored. nouveau driver should not be loaded with such an initrd.
At least with SLE Micro 6.0 KMPs create initrds as usual. I've seen this on an aarch64 system. @Imo Just to make sure. With this freshly by transactional-update initrd created initrd nouveau driver is still being loaded?
(In reply to Stefan Dirsch from comment #19) > At least with SLE Micro 6.0 KMPs create initrds as usual. I've seen this on > an aarch64 system. > > @Imo Just to make sure. With this freshly by > > transactional-update initrd > > created initrd nouveau driver is still being loaded? Yes nouveau is still being loaded. > Mai 31 18:21:30 localhost kernel: nouveau 0000:01:00.0: NVIDIA GA102 (b72000a1) > ... > Mai 31 18:21:37 localhost kernel: NVRM: GPU 0000:01:00.0 is already bound to nouveau. While the initrd still lists: > -rw-r--r-- 1 root root 18 Apr 29 10:36 usr/lib/modprobe.d/nvidia-default.conf Maybe a shoot in the dark. Might this have something to do with systemd-boot which is what Aeon RC2 uses as it moved away from GRUB2?
Honestly I don't know anything about Aeon. The best I could find about was this page. https://en.opensuse.org/Portal:Aeon [...] openSUSE Aeon is a Desktop operating system you don't have to worry about. [...] Hmm. I would have expected/hoped it would behave like SLE Micro.
Re'adding Richard. He might be able to help here. I think he has much more background about such systems and Aeon in particular. Richard, we need to figure out why blacklisting nouveau driver doesn't work on Aeon. Do you have a clue? We have a modprobe.d snippet /usr/lib/modprobe.d/nvidia-default.conf with content blacklist nouveau This is now also in initrd. But it doesn't help. nouveau driver is still loaded during boot.
I don’t have a clue, the transactional-update maintainer might
Hey everyone after a bit of digging and tons or rebooting I found the issue(s) 1) The initrd file is being generated to the wrong location. Running "sudo transactional-update initrd" will generate the initrd file to: /boot/initrd-<current-kernel-version> 2) The systemd-boot entry of the actual snapshot which is to be booted is not being updated as it uses the any initrd file located at: /boot/efi/opensuse-aeon/6.9.1-1-default/ > vortexacherontic@linux:/boot/efi/opensuse-aeon/6.9.1-1-default> ll > total 277568 > -rwxr-xr-x. 1 root root 178947935 Mai 25 10:22 initrd-4c6dba516f5f3d230a1f5f1e144106e46c960602 > -rwxr-xr-x. 1 root root 90530598 Jun 2 07:58 initrd-6.9.1-1-default <<<<< Manually moved this here > -rwxr-xr-x. 1 root root 14641520 Mai 17 13:59 linux-6b13316fa0178df1fd6898c8f96f96ab9ecbbbe1 As you can see the initrd file generated this morning is also there because I manually moved it there and edited the systemd-boot entry of the current snapshot: Original file: # Boot Loader Specification type#1 entry title openSUSE Aeon 20240524 version 36@6.9.1-1-default sort-key opensuse-aeon options quiet loglevel=2 systemd.show_status=no console=ttyS0,115200 console=tty0 vt.global_cursor_default=0 ignition.platform.id=metal security=selinux selinux=1 root=UUID=42fc02eb-e038-424f-b5a5-aed45173338d rootflags=subvol=@/.snapshots/36/snapshot systemd.machine_id=7b5e214ac1c6413b9145cd262341e2fe linux /opensuse-aeon/6.9.1-1-default/linux-6b13316fa0178df1fd6898c8f96f96ab9ecbbbe1 initrd /opensuse-aeon/6.9.1-1-default/initrd-4c6dba516f5f3d230a1f5f1e144106e46c960602 Modifieed file: # Boot Loader Specification type#1 entry title openSUSE Aeon 20240524 version 36@6.9.1-1-default sort-key opensuse-aeon options quiet loglevel=2 systemd.show_status=no console=ttyS0,115200 console=tty0 vt.global_cursor_default=0 ignition.platform.id=metal security=selinux selinux=1 root=UUID=42fc02eb-e038-424f-b5a5-aed45173338d rootflags=subvol=@/.snapshots/36/snapshot systemd.machine_id=7b5e214ac1c6413b9145cd262341e2fe linux /opensuse-aeon/6.9.1-1-default/linux-6b13316fa0178df1fd6898c8f96f96ab9ecbbbe1 initrd /opensuse-aeon/6.9.1-1-default/initrd-6.9.1-1-default The entry still point's to the very first initrd as generated upon installing the system. Also I do expect the next update to reset the entry. But as of writing these lines I have the nvidia driver loaded and nouveau is no where near to be found. It seems to be a transactional-update issue I believe and they have to update/patch this?
Thanks for figuring this out, Imo! So it seems transactional-update initrd doesn't work in a consistent way, at least not together with systemd-boot. Ignaz will look into this. I'm wondering if we're seing two separate issues here. Could you check once more by re-installing the KMP if transactional-update initrd is being running during that installation (it should!)? I need to know if the initrd is just being installed to the wrong location or systemd-boot configuration not adjusted correctly respectively. Or if no initrd is being re-created at all during KMP installation.
(In reply to Stefan Dirsch from comment #25) > Thanks for figuring this out, Imo! So it seems > > transactional-update initrd > > doesn't work in a consistent way, at least not together with systemd-boot. > Ignaz will look into this. > > I'm wondering if we're seing two separate issues here. Could you check once > more by re-installing the KMP if transactional-update initrd is being > running during that installation (it should!)? I need to know if the initrd > is just being installed to the wrong location or systemd-boot configuration > not adjusted correctly respectively. Or if no initrd is being re-created at > all during KMP installation. Hey there. I am not entirely sure how to test this so here is what I did: 1) sudo transactional-update pkg rm nvidia-driver-G06-kmp-default (This removed also *-compute-G06, *-video-G06, *-util-G06, *-gl-G06 2) sudo reboot 3) sudo transactional-update pkg in nvidia-driver-G06-kmp-default 4) sudo reboot 5) cd /boot 6) ll > total 88476 > drwxr-xr-x. 5 root root 65536 Jan 1 1970 efi > -rw-------. 1 root root 90530598 Jun 2 07:53 initrd-6.9.1-1-default As you can see the initrd-6.9.1-1-default is still the one from yesterday which I manually generated using transactional-update initrd Also 7) cd /boot/efi/opensuse-aeon/6.9.1-1-default 8) ll > total 277568 > -rwxr-xr-x. 1 root root 178947935 Mai 25 10:22 initrd-4c6dba516f5f3d230a1f5f1e144106e46c960602 > -rwxr-xr-x. 1 root root 90530598 Jun 2 07:58 initrd-6.9.1-1-default > -rwxr-xr-x. 1 root root 14641520 Mai 17 13:59 linux-6b13316fa0178df1fd6898c8f96f96ab9ecbbbe1 does not reveal any new initramfs files. I suppose transactional-update initrd is not being executed or the initramfs is stored to an entirely different location I did not yet found. Does this help?
Another thing I noticed is, that systemd-boot took over my initramfs modification to the boot loader entry I did a few snapshots ago. Therefore I expect the next Kernel update to break the system or at least the boot process and (hopefully) Aeon will rollback to the previous snapshot. I attached an archive with the entries for snapshot 34 - 39. As you can see 34 and 35 are still lisitng initrd-4c6dba516f5f3d230a1f5f1e144106e46c960602 as theri initramfs while every entry after 35 (thus 36, 37, 38, 39) all list initrd-6.9.1-1-default. Which is the one I manually moved there and initially modified the entrie 36 to use this. This changes seems to have been carried over to every following entry. Therefore I expect a new Kernel or driver update to break the system. Unless I re-generated a initramfs manually and fix the boot entry before rebooting
Created attachment 875264 [details] SystemD Boot entries 34 till 39
(In reply to Imo Hester from comment #26) > (In reply to Stefan Dirsch from comment #25) > > Thanks for figuring this out, Imo! So it seems > > > > transactional-update initrd > > > > doesn't work in a consistent way, at least not together with systemd-boot. > > Ignaz will look into this. > > > > I'm wondering if we're seing two separate issues here. Could you check once > > more by re-installing the KMP if transactional-update initrd is being > > running during that installation (it should!)? I need to know if the initrd > > is just being installed to the wrong location or systemd-boot configuration > > not adjusted correctly respectively. Or if no initrd is being re-created at > > all during KMP installation. > > Hey there. I am not entirely sure how to test this so here is what I did: > > 1) sudo transactional-update pkg rm nvidia-driver-G06-kmp-default > (This removed also *-compute-G06, *-video-G06, *-util-G06, *-gl-G06 > 2) sudo reboot > 3) sudo transactional-update pkg in nvidia-driver-G06-kmp-default > 4) sudo reboot > 5) cd /boot > 6) ll > > total 88476 > > drwxr-xr-x. 5 root root 65536 Jan 1 1970 efi > > -rw-------. 1 root root 90530598 Jun 2 07:53 initrd-6.9.1-1-default > > As you can see the initrd-6.9.1-1-default is still the one from yesterday > which I manually generated using transactional-update initrd > > Also > > 7) cd /boot/efi/opensuse-aeon/6.9.1-1-default > 8) ll > > total 277568 > > -rwxr-xr-x. 1 root root 178947935 Mai 25 10:22 initrd-4c6dba516f5f3d230a1f5f1e144106e46c960602 > > -rwxr-xr-x. 1 root root 90530598 Jun 2 07:58 initrd-6.9.1-1-default > > -rwxr-xr-x. 1 root root 14641520 Mai 17 13:59 linux-6b13316fa0178df1fd6898c8f96f96ab9ecbbbe1 > > does not reveal any new initramfs files. > > I suppose transactional-update initrd is not being executed or the initramfs > is stored to an entirely different location I did not yet found. > > Does this help? Yes, definitely! Thanks, this shows that we have two separate issues here. 1. transactional-update initrd is not being executed when installing a KMP 2. transactional-update initrd doesn't work together with systemd-boot
While I must say the first one surprises me as with Aeon RC 1 I had no issues with the drivers or had to manually trigger transactional-update initrd after the installation. At least not that I remember.
(In reply to Stefan Dirsch from comment #29) > 2. transactional-update initrd doesn't work together with systemd-boot Correct, currently systemd-boot is not directly integrated into transactional-update itself yet (i.e. the commands "initrd", "bootloader" and "grub.cfg" don't do the right thing on a systemd-boot system). systemd-boot support is generically implemented as a snapper plugin using the "sdbootutil" instead, because of course you can also use systemd-boot on non-transactional systems. The sdbootutil script also includes detection whether it is necessary to regenerate the initrd. @Alberto: On a first glance I couldn't find any RPM macro integration so that a RPM package calling %regenerate_initrd_post would trigger the initrd rebuild on a systemd-boot system. Am I missing something, or do we still have to implement this? > 1. transactional-update initrd is not being executed when installing a KMP See answer above - this is something which needs to be implemented on a RPM macro level...
(In reply to Ignaz Forster from comment #31) > @Alberto: On a first glance I couldn't find any RPM macro integration so > that a RPM package calling %regenerate_initrd_post would trigger the initrd > rebuild on a systemd-boot system. Am I missing something, or do we still > have to implement this? regenerate-initrd-posttransaction script has been updated here: https://github.com/openSUSE/suse-module-tools/pull/103 It is still under review, and requires a new plugin in sdbootutil, that is here: https://github.com/openSUSE/sdbootutil/pull/92 This plugin is a tukit (transactional-update) plugin, using a new mechanism implemented here: https://github.com/openSUSE/transactional-update/pull/122 So all has been implemented, and all of them are waiting reviews (except sdbootutil#92 that I reviewed myself :P) The new mechanism should be working as this: 1) tukit can execute plugins in a -pre / -post fashion similar to snapper, for each verb like "call", "callext", "open", "abort". The exact list is documented in transactional-update#122 2) sdbootutil provides a plugin for callext-post, that will inspect $bind_dir/run/regenerate-initrd. If it is present inside the transaction it will be copied outside in the host /run 3) regenerate-initrd-posttrans is now aware of transactional systems with a separate /boot partition (that is the case for FDE when using systemd-boot or grub2-bls) If detected and the scriptlet is called _inside_ the transaction it will do nothing, as this would require changing something _outside_ the transaction (the risk of that is that you can see initrds that are not attached to any transaction, for example) 4) sdbootutil provides the old snapper plugin that is executed once the transaction is closed and available, and will call regenerate-initrd-posttrans if the host /run contains the regenerate-initrd/ directory, but now from outside the transaction
@Ignaz @Alberto Thanks a lot for digging deeply into this! There's nothing I can do other than waiting that things get integrated into Aeon, right? Now I'm wondering why generating initrd works on SLE Micro 6.0. It's also a transactional system. It doesn't use systemd-boot yet though.
(In reply to Stefan Dirsch from comment #33) > Now I'm wondering why generating initrd works on SLE Micro 6.0. It's also a > transactional system. It doesn't use systemd-boot yet though. The difference when using a bootloader that follows BLS in a EFI systems (sd-boot, grub2-bls) then /boon should be in the ESP, in a FAT32 partition. And is there where the kernel and initrd are stored. But in MicroOS the kernel and initrd from /boot are in the same btrfs filesystem, in a different subvolume, so in can be part of the transaction.
The remaining issue with this ticket is, that I have no idea when these missing features are checked in, let alone get fully integrated in Aeon. So I can't say when this can be retested. :-(
Just want to add some details: Unlike anticipated I did not require to re-fix the driver as I updated from Kernel 6.9.1 to 6.9.3. Somehow the kernel update itself or systemd-boot build a new initramfs for 6.9.3 AND the modprobe rules where included as well as put into the right location: > cd /boot/efi/opensuse-aeon/6.9.3-1-default/ > ll > -rwxr-xr-x. 1 root root 95383117 5. Jun 19:49 initrd-56b6a5223453a59edeb15854dc0ed58154332456 > -rwxr-xr-x. 1 root root 14625136 30. Mai 10:00 linux-58d6d3c7cc8c858e239a8137fe4faa5b2fea52bf > vim /boot/efi/loader/entries/opensuse-aeon-6.9.3-1-default-50.conf > # Boot Loader Specification type#1 entry > title openSUSE Aeon 20240531 > version 50@6.9.3-1-default > sort-key opensuse-aeon > options quiet loglevel=2 systemd.show_status=no console=ttyS0,115200 console=tty0 vt.global_cursor_default=0 ignition.platform.id=metal security=selinux selinux=1 root=UUID=42fc02eb-e038-424f-b5a5-aed45173338d rootflags=subvol=@/.snapshots/50/snapshot systemd.machine_id=7b5e214ac1c6413b9145cd262341e2fe > linux /opensuse-aeon/6.9.3-1-default/linux-58d6d3c7cc8c858e239a8137fe4faa5b2fea52bf > initrd /opensuse-aeon/6.9.3-1-default/initrd-56b6a5223453a59edeb15854dc0ed58154332456 So it seems at least updating just the kernel seem not to break things after the initial fix. A new initramfs is somehow generated correctly and picks up the additional modprode confs. I'll keep this thread updated as soon as driver 550.90.07 lands in the repos and get's updated. Probably that will require the re-fix as required initially as it might be a similar situation as like on a new system with a newly installed driver package.
(In reply to Imo Hester from comment #36) > So it seems at least updating just the kernel seem not to break things after > the initial fix. > A new initramfs is somehow generated correctly and picks up the additional > modprode confs. Yes. This is expected. As today the sdbootutil scriplets (that replaces suse-module-tools-scriplets) detect that a new kernel has been installed and command the generation of a new initrd for it. The sdbootutil scriptlets are in the process of being merged with the suse-module-tools' one (see PR from last comment). So ideally the new scriptlets will recognize the installation of a new kernel and the command for the regeneration of a new initrd independently and will do the right thing.
@Alberto Could you give us an update on the status of development of the needed components? Thanks!
(In reply to Stefan Dirsch from comment #38) > @Alberto Could you give us an update on the status of development of the > needed components? Thanks! As commented: the development was done almost one month ago, both PRs (the one from tukit and the one from suse-module-tools) are still pending for reviews. So I am still waiting that this review happens. Maybe @Ignaz Foster or @Martin Wilck can provide better ETA?
Thanks @Alberto. I kind of feel like the wrong person the ticket is assigned to, because I have no idea when things are supposed to be fixed in this product and could be retested. :-(
(In reply to Stefan Dirsch from comment #40) > I have no idea when things are supposed to be fixed in this > product and could be retested. :-( Ah right. I will update this bug when something moves and a new test can be done.
Thanks. That would be very useful!
@Alberto Any improvements you can share here?
(In reply to Stefan Dirsch from comment #43) > @Alberto Any improvements you can share here? Yes. The PRs are moving into Factory, but some openQA tests require some changes (not related with the changes)
Thanks for the update @Alberto!
*** Bug 1227702 has been marked as a duplicate of this bug. ***