Bugzilla – Bug 1216880
filesystems/zfs: ZFS 2.2.0 kernel module package for Leap 15.4 built using kernel packages not available in update repo
Last modified: 2023-11-17 15:53:24 UTC
It seems that the ZFS kernel module package for Leap 15.4 has been built using newer kernel packages than what is available in the 15.4 update repo. It has been built using 5.14.21-150400.24.92.1 sources, whereas the latest available version in the update repo is 5.14.21-150400.24.88.1. This causes the module load to fail with unknown symbol error: baroque:~ # uname -r 5.14.21-150400.24.88-default baroque:~ # insmod /lib/modules/5.14.21-150400.24.92-default/extra/zfs.ko insmod: ERROR: could not insert module /lib/modules/5.14.21-150400.24.92-default/extra/zfs.ko: Unknown symbol in module
[ 8s] [137/188] cumulate kernel-devel-5.14.21-150400.24.92.1
That should not generally be a problem - the provided symbols should not vary. Still the report does not show which symbol is not provided. It is also not clear why the kernel in the build repository would not be available as update.
Alright, I looked a bit more into my problem and realized that zfs.ko depends on spl.ko, so if I first load spl.ko and then zfs.ko, then that part of the problem goes away. The next problem I have is that the ZFS modules in kernel 5.14.21-150400.24.88-default weak-updates directory still point to older zfs-2.1.3 modules: baroque:/lib/modules/5.14.21-150400.24.88-default # tree weak-updates/ weak-updates/ └── extra ├── avl │ └── zavl.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/avl/zavl.ko ├── icp │ └── icp.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/icp/icp.ko ├── lua │ └── zlua.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/lua/zlua.ko ├── nvpair │ └── znvpair.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/nvpair/znvpair.ko ├── spl │ └── spl.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/spl/spl.ko ├── unicode │ └── zunicode.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/unicode/zunicode.ko ├── zcommon │ └── zcommon.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/zcommon/zcommon.ko ├── zfs │ └── zfs.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/zfs/zfs.ko └── zstd └── zzstd.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/zstd/zzstd.ko What mechanism takes care of updating the links to point to newer module(s)? Is it something that happens only during kernel package update? Looking at the zfs-2.2.0 kernel module info, all the other modules except spl seem to be aliases for zfs.ko: baroque:/lib/modules/5.14.21-150400.24.88-default # modinfo /lib/modules/5.14.21-150400.24.92-default/extra/zfs.ko filename: /lib/modules/5.14.21-150400.24.92-default/extra/zfs.ko version: 2.2.0-1 license: CDDL license: Dual BSD/GPL license: Dual MIT/GPL author: OpenZFS description: ZFS alias: zzstd alias: zcommon alias: zunicode alias: znvpair alias: zlua alias: icp alias: zavl alias: devname:zfs alias: char-major-10-249 suserelease: SLE15-SP4 srcversion: 92158472E32FE6AEEEC7201 depends: spl retpoline: Y name: zfs vermagic: 5.14.21-150400.24.92-default SMP preempt mod_unload modversions Could that be the reason why the links have not been updated?
so why don't you use modprobe?
Because, like modprobe's man page says: "modprobe looks in the module directory /lib/modules/`uname -r` for all the modules and other files, except for the optional configuration files in the /etc/modprobe.d directory (see modprobe.d(5))." So, doing 'modprobe zfs' loads the zfs module from the running kernel's module directory, where the weak-updates links are still pointing to zfs-2.1.3 modules. Giving modprobe the full path to the zfs-2.2.0 module won't work either: baroque:~ # modprobe -v /lib/modules/5.14.21-150400.24.92-default/extra/zfs.ko modprobe: FATAL: Module /lib/modules/5.14.21-150400.24.92-default/extra/zfs.ko not found in directory /lib/modules/5.14.21-150400.24.88-default So, the question is, how do those weak-updates links get updated when newer KMPs are released? To date I haven't had to bother with such details since it all has happened automagically. I guess I could just manually remove the zfs-2.1.3 weak-updates links and create new ones that point to the zfs-2.2.0 modules and run 'depmod -a' or something. And rebuild the initrd image, of course. Is that enough or is something else also required? But that is still beside the point that for some reason zfs-2.2.0 modules were built against newer kernel packages that have not been released to the 15.4 updates repo. If they had been available, I bet I would not have noticed anything out of the ordinary when updating my system and everything would have just worked. To be fair, even with zfs-2.1.3 drivers and zfs-2.2.0 tools it kind of does, although e.g. 'zpool status' displays a 'non-allocating' warning(?), which makes me kind of nervous: baroque:~ # zpool status pool: datapool state: ONLINE scan: scrub canceled on Fri Nov 3 22:30:40 2023 config: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 d1-part2 ONLINE 0 0 0 (non-allocating) d2-part2 ONLINE 0 0 0 (non-allocating) mirror-1 ONLINE 0 0 0 d3-part2 ONLINE 0 0 0 (non-allocating) d4-part2 ONLINE 0 0 0 (non-allocating) errors: No known data errors With zfs-2.2.0 drivers loaded that warning goes away: baroque:~ # zpool status pool: datapool state: ONLINE status: Some supported and requested features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: scrub canceled on Fri Nov 3 22:30:40 2023 config: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 d1-part2 ONLINE 0 0 0 d2-part2 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 d3-part2 ONLINE 0 0 0 d4-part2 ONLINE 0 0 0 errors: No known data errors
FTR, at least a blank install (with no prior zfs-2.1.3 in @System) happens to work. localhost:/ # zypper in zfs-kmp-default Loading repository data... Reading installed packages... [TechPreview] $ZYPP_SINGLE_RPMTRANS=1 : New rpm install backend is enabled If you find any bugs or issues please let us know: https://bugzilla.opensuse.org/ Component: libzypp (or zypper) And please attach the /var/log/zypper.log to the bug report. Resolving package dependencies... The following 2 NEW packages are going to be installed: zfs-kmp-default zfs-ueficert 2 new packages to install. Overall download size: 18.2 MiB. Already cached: 0 B. After the operation, additional 133.1 MiB will be used. Continue? [y/n/v/...? shows all options] (y): d The following 2 NEW packages are going to be installed: zfs-kmp-default 2.2.0_k5.14.21_150400.24.92-lp154.2.9 x86_64 fs obs://build.opensuse.org/filesystems zfs-ueficert 2.2.0-lp154.2.9 x86_64 fs obs://build.opensuse.org/filesystems 2 new packages to install. Overall download size: 18.2 MiB. Already cached: 0 B. After the operation, additional 133.1 MiB will be used. Continue? [y/n/v/...? shows all options] (y): Retrieving: zfs-ueficert-2.2.0-lp154.2.9.x86_64 (fs) (1/2), 22.7 KiB Retrieving: zfs-ueficert-2.2.0-lp154.2.9.x86_64.rpm ......................[done] Retrieving: zfs-kmp-default-2.2.0_k5.14.21_150400.24.92-lp154.2.9.x86_64 (fs) (2/2), 18.2 MiB Retrieving: zfs-kmp-default-2.2.0_k5.14.21_150400.24.92-lp154[done (18.3 MiB/s)] Preparing ................................................................[done] (0/2) Executing prein script for: zfs-ueficert-2.2.0-lp154.2.9.x86_64 ....[done] (1/2) Installing: zfs-ueficert-2.2.0-lp154.2.9.x86_64 ....................[done] (1/2) Executing postin script for: zfs-ueficert-2.2.0-lp154.2.9.x86_64 ...[done] (1/2) Executing prein script for: zfs-kmp-default-2.2.0_k5.14.21_150400.24[done] (2/2) Installing: zfs-kmp-default-2.2.0_k5.14.21_150400.24.92-lp154.2.9.x8[done] (2/2) Executing postin script for: zfs-kmp-default-2.2.0_k5.14.21_150400.2[done] (2/2) Executing posttrans script for: zfs-ueficert-2.2.0-lp154.2.9.x86_64 [done] (2/2) Executing posttrans script for: zfs-kmp-default-2.2.0_k5.14.21_15040[done] localhost:/etc/zypp/repos.d # modprobe zfs modprobe: ERROR: module 'spl' is unsupported modprobe: ERROR: Use --allow-unsupported or set allow_unsupported_modules 1 in modprobe: ERROR: /etc/modprobe.d/10-unsupported-modules.conf modprobe: ERROR: could not insert 'zfs': Operation not permitted localhost:/ # modprobe --allow-unsupported zfs localhost:/ # uname -r 5.14.21-150400.24.88-default
suse-module-tools (weak-modules2) updates these links. If you can reproduce the problem (old symlinks) reliably it's a bug in wm2. However, it can also be that due to problem during installation wm2 did not run, and reinstalling the KMP would fix it. Finally, removing the old module will update the symlinks to point to the new module for sure.
I have tried reinstalling zfs-kmp-default-2.2.0 several times, but that did not correct the weak-updates links in kernel 5.14.21-150400.24.88-default modules directory. Wrt. to installed zfs KMPs this was the situation at start: baroque:~ # zypper se -s zfs-kmp Loading repository data... Reading installed packages... S | Name | Type | Version | Arch | Repository ---+-----------------+---------+----------------------------------------+--------+------------------ i+ | zfs-kmp-default | package | 2.1.13_k5.14.21_150400.24.84-lp154.1.1 | x86_64 | (System Packages) i+ | zfs-kmp-default | package | 2.1.12_k5.14.21_150400.24.81-lp154.4.5 | x86_64 | (System Packages) i+ | zfs-kmp-default | package | 2.1.12_k5.14.21_150400.24.63-lp154.1.2 | x86_64 | (System Packages) i+ | zfs-kmp-default | package | 2.1.9_k5.14.21_150400.24.41-lp154.1.4 | x86_64 | (System Packages) i+ | zfs-kmp-default | package | 2.2.0_k5.14.21_150400.24.92-lp154.2.9 | x86_64 | filesystems baroque:~ # tree /lib/modules/`uname -r`/weak-updates /lib/modules/5.14.21-150400.24.88-default/weak-updates └── extra ├── avl │ └── zavl.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/avl/zavl.ko ├── icp │ └── icp.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/icp/icp.ko ├── lua │ └── zlua.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/lua/zlua.ko ├── nvpair │ └── znvpair.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/nvpair/znvpair.ko ├── spl │ └── spl.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/spl/spl.ko ├── unicode │ └── zunicode.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/unicode/zunicode.ko ├── zcommon │ └── zcommon.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/zcommon/zcommon.ko ├── zfs │ └── zfs.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/zfs/zfs.ko └── zstd └── zzstd.ko -> /lib/modules/5.14.21-150400.24.84-default/extra/zstd/zzstd.ko So I then decided to remove both zfs-kmp-default-2.2.0 and zfs-ueficert packages, which also removed zfs-kmp-default-2.1.12 and zfs-kmp-default-2.1.13 as a dependency, but _not_ zfs-kmp-default-2.1.9 for some reason: baroque:~ # zypper rm zfs-kmp-default-2.2.0_k5.14.21_150400.24.92-lp154.2.9.x86_64 zfs-ueficert Reading installed packages... Resolving package dependencies... The following 4 packages are going to be REMOVED: zfs-kmp-default-2.1.12_k5.14.21_150400.24.81-lp154.4.5 zfs-kmp-default-2.1.13_k5.14.21_150400.24.84-lp154.1.1 zfs-kmp-default-2.2.0_k5.14.21_150400.24.92-lp154.2.9 zfs-ueficert 4 packages to remove. After the operation, 396.0 MiB will be freed. Continue? [y/n/v/...? shows all options] (y): y (1/4) Removing zfs-kmp-default-2.2.0_k5.14.21_150400.24.92-lp154.2.9.x86_64 ......................................[done] (2/4) Removing zfs-kmp-default-2.1.13_k5.14.21_150400.24.84-lp154.1.1.x86_64 .....................................[done] dracut stuff... (3/4) Removing zfs-kmp-default-2.1.12_k5.14.21_150400.24.81-lp154.4.5.x86_64 .....................................[done] SKIP: /etc/uefi/certs/7201315D.crt.delete is not in MokList (4/4) Removing zfs-ueficert-2.2.0-lp154.2.9.x86_64 ...............................................................[done] Then I installed zfs-kmp-default and zfs-ueficert again: baroque:~ # zypper in --details zfs-kmp-default zfs-ueficert Loading repository data... Reading installed packages... Resolving package dependencies... The following recommended package was automatically selected: zfs-ueficert 2.2.0-lp154.2.9 x86_64 filesystems obs://build.opensuse.org/filesystems The following 2 NEW packages are going to be installed: zfs-kmp-default 2.2.0_k5.14.21_150400.24.92-lp154.2.9 x86_64 filesystems obs://build.opensuse.org/filesystems zfs-ueficert 2.2.0-lp154.2.9 x86_64 filesystems obs://build.opensuse.org/filesystems 2 new packages to install. Overall download size: 18.2 MiB. Already cached: 0 B. After the operation, additional 133.1 MiB will be used. Continue? [y/n/v/...? shows all options] (y): y Retrieving: zfs-ueficert-2.2.0-lp154.2.9.x86_64 (filesystems) (1/2), 22.7 KiB Retrieving: zfs-ueficert-2.2.0-lp154.2.9.x86_64.rpm ..............................................................[done] Retrieving: zfs-kmp-default-2.2.0_k5.14.21_150400.24.92-lp154.2.9.x86_64 (filesystems) (2/2), 18.2 MiB Retrieving: zfs-kmp-default-2.2.0_k5.14.21_150400.24.92-lp154.2.9.x86_64.rpm ........................[done (69.0 MiB/s)] Checking for file conflicts: .....................................................................................[done] (1/2) Installing: zfs-ueficert-2.2.0-lp154.2.9.x86_64 ............................................................[done] (2/2) Installing: zfs-kmp-default-2.2.0_k5.14.21_150400.24.92-lp154.2.9.x86_64 ...................................[done] Executing %posttrans scripts .....................................................................................[done] After which the running kernel's weak-updates looked like this: baroque:~ # tree /lib/modules/`uname -r`/weak-updates /lib/modules/5.14.21-150400.24.88-default/weak-updates └── extra ├── avl │ └── zavl.ko -> /lib/modules/5.14.21-150400.24.41-default/extra/avl/zavl.ko ├── icp │ └── icp.ko -> /lib/modules/5.14.21-150400.24.41-default/extra/icp/icp.ko ├── lua │ └── zlua.ko -> /lib/modules/5.14.21-150400.24.41-default/extra/lua/zlua.ko ├── nvpair │ └── znvpair.ko -> /lib/modules/5.14.21-150400.24.41-default/extra/nvpair/znvpair.ko ├── spl │ └── spl.ko -> /lib/modules/5.14.21-150400.24.41-default/extra/spl/spl.ko ├── unicode │ └── zunicode.ko -> /lib/modules/5.14.21-150400.24.41-default/extra/unicode/zunicode.ko ├── zcommon │ └── zcommon.ko -> /lib/modules/5.14.21-150400.24.41-default/extra/zcommon/zcommon.ko ├── zfs │ └── zfs.ko -> /lib/modules/5.14.21-150400.24.41-default/extra/zfs/zfs.ko └── zstd └── zzstd.ko -> /lib/modules/5.14.21-150400.24.41-default/extra/zstd/zzstd.ko So the links were still pointing to wrong zfs modules (zfs-kmp-default-2.1.9 instead of zfs-kmp-default-2.2.0): baroque:~ # zypper se -s zfs-kmp Loading repository data... Reading installed packages... S | Name | Type | Version | Arch | Repository ---+-----------------+---------+---------------------------------------+--------+------------------ i+ | zfs-kmp-default | package | 2.1.9_k5.14.21_150400.24.41-lp154.1.4 | x86_64 | (System Packages) i+ | zfs-kmp-default | package | 2.2.0_k5.14.21_150400.24.92-lp154.2.9 | x86_64 | filesystems After I removed zfs-kmp-default-2.1.9 package, the weak-updates links now look correct, although the old zfs module sub-directories were left behind: baroque:~ # zypper rm zfs-kmp-default-2.1.9_k5.14.21_150400.24.41-lp154.1.4 Reading installed packages... Resolving package dependencies... The following package is going to be REMOVED: zfs-kmp-default-2.1.9_k5.14.21_150400.24.41-lp154.1.4 1 package to remove. After the operation, 131.1 MiB will be freed. Continue? [y/n/v/...? shows all options] (y): y dracut stuff... (1/1) Removing zfs-kmp-default-2.1.9_k5.14.21_150400.24.41-lp154.1.4.x86_64 ......................................[done] baroque:~ # tree /lib/modules/`uname -r`/weak-updates /lib/modules/5.14.21-150400.24.88-default/weak-updates └── extra ├── avl ├── icp ├── lua ├── nvpair ├── spl ├── spl.ko -> /lib/modules/5.14.21-150400.24.92-default/extra/spl.ko ├── unicode ├── zcommon ├── zfs ├── zfs.ko -> /lib/modules/5.14.21-150400.24.92-default/extra/zfs.ko └── zstd I have since removed those empty directories. Anyway, after all that, and a few reboots in between, everything now seems to be as it should, at least wrt. zfs-2.2.0 kernel modules.
Looks like WM2 does not install the latest module when multiple module versions are available.
To me this looks like a problem with the ZFS KMP. > i+ | zfs-kmp-default | package | 2.1.9_k5.14.21_150400.24.41-lp154.1.4 | x86_64 | (System Packages) > i+ | zfs-kmp-default | package | 2.2.0_k5.14.21_150400.24.92-lp154.2.9 | x86_64 | filesystems You can see multiple versions of the KMP installed. This is wrong. multiversion(kernel) should only be used if *the same module version* is installed alongside *multiple kernel versions*. But that's unnecessary on SLE/Leap, because KABI stability makes sure that a KMP will remain compatible with MU kernels during the lifetime of a service pack. IOW, the ZFS kmp should not have a "Provides: multiversion(kernel)" directive. Unfortunately, the scripts we deliver for KMP builds unconditionally add this directive to KMP packages. The only way to avoid that is to use the "-t" argument to the "kernel_module_package" macro and substitute our default /usr/lib/rpm/kernel-module-subpackage script with a different one. We have discussed this repeatedly in the past. See bug 1109137, comment 62 ff. You (Michal) insisted that this is necessary to cover the unlikely case of a "KABI accident", whereas I have always argued that using multiversion(kernel) makes no sense, and that the "KABI accident" case can be handled with other means. The authors of the ZFS module make another mistake: they compile different versions of their KMP for different SUSE kernels, which allows these KMPs to be installed alongside each other. If they compiled every KMP against the same (GA) kernel, there would be file conflicts if the user tried to update, and the user might understand that the previous KMP must be removed before adding the new one.
>they compile different versions of their KMP for different SUSE kernels I do not see how this is a zfs-kmp issue. If anything, that is because the OBS /filesystems project chose to use SP4:Update rather than SP4/GA(?).
(In reply to Jan Engelhardt from comment #11) > I do not see how this is a zfs-kmp issue. If anything, that is because the > OBS /filesystems project chose to use SP4:Update rather than SP4/GA(?). Exactly. And given that the ZFS KMP seems to release new versions much faster than the SLE/Leap KABI changes, I strongly recommend that they remove the "Provides: multiversion(kernel)". Technically, this works as follows: you make a copy of /usr/lib/rpm/kernel-module-subpackage inside the OBS project, remove the line adding "Provides: multiversion(kernel)", add this file as additional SOURCE, and use it with "%kernel_module_package -t %{SOURCE1} ..." (roughly). IMO the only drawback is that you may miss updates of this script from SUSE's side. While I was building KMPs on a larger scale, I double checked changes in the script with every new SP.
(In reply to Martin Wilck from comment #12) > While I was building KMPs on a larger scale, I double checked > changes in the script with every new SP. ... and rarely found any.
(In reply to Martin Wilck from comment #10) > We have discussed this repeatedly in the past. See bug 1109137, comment 62 > ff. > You (Michal) insisted that this is necessary to cover the unlikely case of a > "KABI accident", whereas I have always argued that using There is also the case of using a test kernel with different kABI - either from another release, or too heavily modified to preserve kABI - either patched or with debug options enabled. > multiversion(kernel) makes no sense, and that the "KABI accident" case can > be handled with other means. What other means, specifically? And the multiversion is either enabled, or it is not. Enabling it after the fact when it's needed does not work because it will not get enabled retroactively on existing KMPs. So to have the ability to install multiple copies of a KMP when it's useful you need to have the ability to install multiple copies of a KMP - even when there is no reason to do so at the moment.
I didn't expect you to agree with me ;-) Anyway, this isn't a wm2 bug. wm2's purpose it to maintain compatibility symlinks between KABI-compatible kernels. It's not its job to reason about KMP versions and select the "best" or "newest" one. I guess it could be implemented, but I don't think anyone will spend the effort any time soon. Moreover, it would re-implement package management functionality which belongs into rpm and/or libzypp. The vercmp logic of rpm is known to be complex, and reimplementing it correctly would be difficult and error-prone. multiversion(kernel) basically disables rpm's "update" logic, which is what we're observing here. If a KMP is updated frequently, the situation described in this bug will necessarily occur sooner or later. This means that either the user needs to manually remove the old KMP and install the new one, or that the KMP authors need to disable multiversion(kernel) as explained in comment 12. That's what I did in the past, and Fujitsu has't had any major issues with it, although we had at least one "KABI incident", AFAIR.
(In reply to Martin Wilck from comment #15) > Anyway, this isn't a wm2 bug. wm2's purpose it to maintain compatibility > symlinks between KABI-compatible kernels. It's not its job to reason about > KMP versions and select the "best" or "newest" one. I guess it could be > implemented, but I don't think anyone will spend the effort any time soon. > Moreover, it would re-implement package management functionality which > belongs into rpm and/or libzypp. No, it's job is to manage modules, and to that end it should manage even multiple versions of the same module. As already said it's unrealistic to always have One And Only True Module. That means that multiple versions of a modules can exist, and wm2 should handle it. > The vercmp logic of rpm is known to be complex, and reimplementing it > correctly would be difficult and error-prone. It so happens that openSUSE uses rpm for package management which means that the rpm tools is always installed, and wm2 can use its vercmp without reimplementing it.
(In reply to Michal Suchanek from comment #16) > No, it's job is to manage modules, and to that end it should manage even > multiple versions of the same module. That's your personal opinion. Neither of us was involved in the original conception of the tool. But I am quite certain that it's purpose was limited to managing weak-updates symlinks, which it does. Note the tool's name: "weak-modules2", not "universal-module-manager". I'm not saying it can't be done, but implementing correct version handling is an enhancement request which should be handled through Jira, taking account available manpower resources. My personal resources for wm2 are limited. This feature won't be implemented for SLE15-SP6, and I doubt it will be implemented for later SPs, as SLE15 is slowly approaching feature freeze. And, as I've said before, I would consider these resources wasted because there's a simple workaround: not using multiversion(kernel). If a "KABI accident" really happens, and you build KMPs without multiversion(kernel), *and* your KMP is affected by the KABI change, the KMP maintainer justs need to make sure the KMP is rebuilt in time for the kernel maintenance update. The KABI notification mechanism that we offer in solid driver program helps with that. If you do this, the KMP and the kernel will be updated at the same time, and most users won't notice any issue. The updated KMP can have an rpm-level dependency on the updated kernel, which avoids a user mistake at installation time. A serious problem only arises if the following 3 issues occur at the same time: 1. KABI accident which affects the installed KMP (KABI changes usually won't affect all KMPs) 2. The new kernel doesn't boot or causes some other severe regression 3. The system must rely on multiversion(kernel) for fallback because it has no generic rollback functionality (not using btrfs / snapshots) These 3 points are unlikely by themselves. Their combination is so unlikely that I consider it justified to call it a corner case. I have never encountered a case like this. I can only repeat, I maintained about a dozen KMPs during my time at Fujitsu, for SLE10, SLE11, and SLE12, and while a few "KABI accidents" occured over the years, we have always been able to handle them gracefully for our customers, while not using "multiversion(kernel)". > > The vercmp logic of rpm is known to be complex, and reimplementing it > > correctly would be difficult and error-prone. > > It so happens that openSUSE uses rpm for package management which means that > the rpm tools is always installed, and wm2 can use its vercmp without > reimplementing it. Right, I thought about that, too. Patches welcome, but see above. The other possible solution to this problem, which I've mentioned before, and which I think is cleaner than attempts to add versioning logic to wm2, is to change the KMP concept such that the kernel version for which a KMP is compiled becomes part of the package _name_ rather than its version. Then we'd have zfs-kmp-k5.14.21-150400.24.84-default-2.1.13-lp154.1.1 which would mean "zfs kmp compiled for 5.14.21-150400.24.84 and KABI-compatible kernels, version 2.1.13, release lp154.1.1". The same module for a non-kabi compatible kernel could then be installed along side this one, while KMP version updates would work as usual. That would also require a Jira, and obviously can't be done in the SLE15 code stream. I'd very much want to do for ALP, but as I said, resources are limited.
(In reply to Martin Wilck from comment #17) > (In reply to Michal Suchanek from comment #16) > > > No, it's job is to manage modules, and to that end it should manage even > > multiple versions of the same module. > > That's your personal opinion. Neither of us was involved in the original > conception of the tool. But I am quite certain that it's purpose was limited > to managing weak-updates symlinks, which it does. Note the tool's name: > "weak-modules2", not "universal-module-manager". It does not as evidenced by this bug. multiversion is also a KMP feature that exists for a very long time, and wm2 does not handle it correctly. > I'm not saying it can't be done, but implementing correct version handling > is an enhancement request which should be handled through Jira, taking > account available manpower resources. My personal resources for wm2 are > limited. > This feature won't be implemented for SLE15-SP6, and I doubt it will be > implemented for later SPs, as SLE15 is slowly approaching feature freeze. > And, as I've said before, I would consider these reso urces wasted because > there's a simple workaround: not using multiversion(kernel). That's not really going to work. We also have packages KMPs like LTTNG where breakage is normal. > If a "KABI accident" really happens, and you build KMPs without > multiversion(kernel), *and* your KMP is affected by the KABI change, the KMP > maintainer justs need to make sure the KMP is rebuilt in time for the kernel > maintenance update. The KABI notification mechanism that we offer in solid > driver program helps with that. If you do this, the KMP and the kernel will > be updated at the same time, and most users won't notice any issue. The > updated KMP can have an rpm-level dependency on the updated kernel, which > avoids a user mistake at installation time. It does but the updated kernel does not have a rpm level dependency on the updated KMP. > A serious problem only arises if the following 3 issues occur at the same > time: > > 1. KABI accident which affects the installed KMP (KABI changes usually won't > affect all KMPs) > 2. The new kernel doesn't boot or causes some other severe regression > 3. The system must rely on multiversion(kernel) for fallback because it has > no generic rollback functionality (not using btrfs / snapshots) > > These 3 points are unlikely by themselves. Their combination is so unlikely > that I consider it justified to call it a corner case. I have never > encountered a case like this. > > I can only repeat, I maintained about a dozen KMPs during my time at > Fujitsu, for SLE10, SLE11, and SLE12, and while a few "KABI accidents" > occured over the years, we have always been able to handle them gracefully > for our customers, while not using "multiversion(kernel)". In what way, specifically? > > > The vercmp logic of rpm is known to be complex, and reimplementing it > > > correctly would be difficult and error-prone. > > > > It so happens that openSUSE uses rpm for package management which means that > > the rpm tools is always installed, and wm2 can use its vercmp without > > reimplementing it. > > Right, I thought about that, too. Patches welcome, but see above. > > The other possible solution to this problem, which I've mentioned before, > and which I think is cleaner than attempts to add versioning logic to wm2, > is to change the KMP concept such that the kernel version for which a KMP is > compiled becomes part of the package _name_ rather than its version. Then > we'd have > > zfs-kmp-k5.14.21-150400.24.84-default-2.1.13-lp154.1.1 > > which would mean "zfs kmp compiled for 5.14.21-150400.24.84 and > KABI-compatible kernels, version 2.1.13, release lp154.1.1". The same module > for a non-kabi compatible kernel could then be installed along side this > one, while KMP version updates would work as usual. > > That would also require a Jira, and obviously can't be done in the SLE15 > code stream. I'd very much want to do for ALP, but as I said, resources are > limited. It's planned to drop multiversion entirely in ALP - see PED-133. Your input on how to handle KMPs without multiversion is welcome.
(In reply to Michal Suchanek from comment #18) > (In reply to Martin Wilck from comment #17) > > That's your personal opinion. Neither of us was involved in the original > > conception of the tool. But I am quite certain that it's purpose was limited > > to managing weak-updates symlinks, which it does. Note the tool's name: > > "weak-modules2", not "universal-module-manager". > > It does not as evidenced by this bug. Yes it does. Handling weak symlinks means to figure out whether a given module can be used for a kernel it hasn't been compiled for, and creating weak-updates symlinks for that. Nothing more. > > And, as I've said before, I would consider these reso urces wasted because > > there's a simple workaround: not using multiversion(kernel). > > That's not really going to work. We also have packages KMPs like LTTNG where > breakage is normal. Bad example. The entire "KABI accident" discussion revolves around systems that might become unbootable because of the combination of 3 unlikely events at the same time (comment 17). LTTNG doesn't qualify. It's not necessary for bringing up the system. It's a developer tool. > > be updated at the same time, and most users won't notice any issue. The > > updated KMP can have an rpm-level dependency on the updated kernel, which > > avoids a user mistake at installation time. > > It does but the updated kernel does not have a rpm level dependency on the > updated KMP. My point was: if a user runs "zypper update", the updated KMP will be seen and the compatible kernel will be pulled in. The only prerequisite being that the KMP package was distributed at the same time as the kernel update. > It's planned to drop multiversion entirely in ALP - see PED-133. Your input > on how to handle KMPs without multiversion is welcome. Right. That's on my todo list.