Bug 1218068 - dracut: initqueue in /usr?
Summary: dracut: initqueue in /usr?
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Current
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: dracut maintainers
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-14 15:57 UTC by Ludwig Nussel
Modified: 2024-04-29 07:17 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ludwig Nussel 2023-12-14 15:57:02 UTC
systemd now mounts /usr initrd read-only: https://github.com/systemd/systemd/pull/30255

Looks like dracut (accidentally?) writes there and therefore fails now with things like:
Dec 14 15:38:23 localhost dracut-cmdline[257]: //lib/dracut/hooks/cmdline/00-parse-root.sh: line 28: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f8b3c35f1-610b-48d8-ae73-aecdaa924731.sh: Read-only file system
Dec 14 15:38:23 localhost dracut-cmdline[208]: //lib/dracut/hooks/cmdline/00-parse-root.sh: line 38: /lib/dracut/hooks/emergency/80-\x2fdev\x2fdisk\x2fby-uuid\x2f8b3c35f1-610b-48d8-ae73-aecdaa924731.sh: Read-only file system
Dec 14 15:38:23 localhost dracut-cmdline[286]: mv: inter-device move failed: '/tmp/284-daemon-reload.sh' to '/lib/dracut/hooks/initqueue/daemon-reload.sh'; unable to remove target: Read-only file system
Dec 14 15:38:23 localhost dracut-cmdline[284]: /sbin/initqueue: line 71: /lib/dracut/hooks/initqueue/work: Read-only file system
Dec 14 15:38:23 localhost dracut-cmdline[208]: //lib/dracut/hooks/cmdline/30-parse-crypt.sh: line 169: /lib/dracut/hooks/initqueue/finished/90-crypt.sh: Read-only file system
Dec 14 15:38:23 localhost dracut-cmdline[208]: //lib/dracut/hooks/cmdline/30-parse-crypt.sh: line 126: /lib/dracut/hooks/emergency/90-crypt.sh: Read-only file system
Dec 14 15:38:23 localhost systemd[1]: Finished dracut cmdline hook.
Dec 14 15:38:23 localhost systemd[1]: Starting dracut pre-udev hook...
Dec 14 15:38:24 localhost systemd[1]: Finished dracut pre-udev hook.
Dec 14 15:38:24 localhost systemd[1]: dracut pre-trigger hook was skipped because no trigger condition checks were met.
Dec 14 15:38:24 localhost systemd[1]: Starting dracut initqueue hook...
Dec 14 15:38:24 localhost dracut-initqueue[374]: rm: cannot remove '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-path\x2fpci-0000:00:03.0-part.sh': Read-only file system
Dec 14 15:38:24 localhost dracut-initqueue[376]: rm: cannot remove '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f8826-B4B3.sh': Read-only file system
Comment 1 Ludwig Nussel 2023-12-14 16:03:50 UTC
looks like upstream issue already
Comment 2 Ludwig Nussel 2023-12-14 16:07:51 UTC
filed https://github.com/dracutdevs/dracut/issues/2588
Comment 3 Thomas Blume 2023-12-14 17:10:11 UTC
The issue should only happen if /usr is on a separate device/partition.

Maybe we could just replace the symlink /lib -> /usr/lib by a hardlink and have /lib as separate directory when that fails?
Comment 4 Antonio Feijoo 2023-12-15 09:52:13 UTC
(In reply to Ludwig Nussel from comment #0)
> systemd now mounts /usr initrd read-only:
> https://github.com/systemd/systemd/pull/30255

Ludwig, thanks for noticing this change. Good way to change things up regardless of the side effects... :(

"This is particularly nice on USIs (i.e. Unified System Images, that pack a whole OS into a UKI without transitioning out of it), such as diskomator."

"I think this is useful. For mkosi-initrd-style initrds, I think it should work out of the box, i.e. I don't think anything would modify things under /usr. I'm not sure what will happen with dracut-built initrds."

(In reply to Thomas Blume from comment #3)
> The issue should only happen if /usr is on a separate device/partition.
> 
> Maybe we could just replace the symlink /lib -> /usr/lib by a hardlink and
> have /lib as separate directory when that fails?

Not sure if splitting /lib is a good idea. Maybe we have to move the hookdir to another location that systemd will not want to "protect" in the future. But I'm not sure about which location would be better. /etc is present when the initramfs is build, but it should be for conf, not for scripts. /run is mounted as tmpfs during early boot, /tmp can also be defined as tmpfs... maybe /boot? but this dir is also used in fips...
Comment 5 Thomas Blume 2023-12-15 10:29:32 UTC
(In reply to Antonio Feijoo from comment #4)
> Not sure if splitting /lib is a good idea. Maybe we have to move the hookdir
> to another location that systemd will not want to "protect" in the future.
> But I'm not sure about which location would be better. /etc is present when
> the initramfs is build, but it should be for conf, not for scripts. /run is
> mounted as tmpfs during early boot, /tmp can also be defined as tmpfs...
> maybe /boot? but this dir is also used in fips...

Ok, then, what about /var?
At least this one is quite unlikely to become ro.
Comment 6 Antonio Feijoo 2023-12-15 11:14:09 UTC
(In reply to Thomas Blume from comment #5)
> (In reply to Antonio Feijoo from comment #4)
> > Not sure if splitting /lib is a good idea. Maybe we have to move the hookdir
> > to another location that systemd will not want to "protect" in the future.
> > But I'm not sure about which location would be better. /etc is present when
> > the initramfs is build, but it should be for conf, not for scripts. /run is
> > mounted as tmpfs during early boot, /tmp can also be defined as tmpfs...
> > maybe /boot? but this dir is also used in fips...
> 
> Ok, then, what about /var?
> At least this one is quite unlikely to become ro.

FHS says /var is to save variable data files, so bash scripts would not fall into this category. Perhaps Ludwig can suggest a good place for this. Maybe some kind of out-of-hierarchy directory like /.dracut, I don't know...
Comment 7 Ludwig Nussel 2023-12-18 09:29:19 UTC
What are the uses case for the hookdir? The part that breaks is using it for generated scripts IIUC. For that purpose /run would be more appropriate. If the hookdir also contains static data that is actually part of the cpio then code logic needs to be extended to read from multiple directories. Ie collect all script to be run from /run, /etc and /usr but only write stuff to /run.
Comment 8 Antonio Feijoo 2023-12-18 09:45:11 UTC
(In reply to Ludwig Nussel from comment #7)
> What are the uses case for the hookdir? The part that breaks is using it for
> generated scripts IIUC. For that purpose /run would be more appropriate. If
> the hookdir also contains static data that is actually part of the cpio then
> code logic needs to be extended to read from multiple directories. Ie
> collect all script to be run from /run, /etc and /usr but only write stuff
> to /run.

The hookdir only contains scripts added by dracut modules during the different phases of the boot process. The problem is that this directory must be persistent, it's populated and packed into the initramfs image when dracut generates it. If we add the scripts in the /run directory of the initramfs image, at boot /run will be mounted as tmpfs, masking the content of the existing directory.
Comment 9 Antonio Feijoo 2023-12-18 12:12:53 UTC
Patched package just to test that changing only the hookdir would be enough (using /.dracut/hooks for now, the final directory can be changed):

https://download.opensuse.org/repositories/home:/afeijoo:/branches:/openSUSE:/Factory:/bsc1218068/standard/

Ludwig, could you give it a try using the systemd version from git main?
Comment 10 Ludwig Nussel 2023-12-18 12:42:18 UTC
(In reply to Antonio Feijoo from comment #8)
> (In reply to Ludwig Nussel from comment #7)
> > What are the uses case for the hookdir? The part that breaks is using it for
> > generated scripts IIUC. For that purpose /run would be more appropriate. If
> > the hookdir also contains static data that is actually part of the cpio then
> > code logic needs to be extended to read from multiple directories. Ie
> > collect all script to be run from /run, /etc and /usr but only write stuff
> > to /run.
> 
> The hookdir only contains scripts added by dracut modules during the
> different phases of the boot process. The problem is that this directory
> must be persistent, it's populated and packed into the initramfs image when
> dracut generates it. If we add the scripts in the /run directory of the
> initramfs image, at boot /run will be mounted as tmpfs, masking the content
> of the existing directory.

IOW the hook dir contains both static data that is generated when creating the cpio, as well as stuff that is written there only at runtime when the initrd is booted :-) So the correct solution would be to keep the static stuff in /usr and write only to /run.
Comment 11 Ludwig Nussel 2023-12-18 12:42:45 UTC
(In reply to Antonio Feijoo from comment #9)
> Patched package just to test that changing only the hookdir would be enough
> (using /.dracut/hooks for now, the final directory can be changed):
> 
> https://download.opensuse.org/repositories/home:/afeijoo:/branches:/openSUSE:
> /Factory:/bsc1218068/standard/
> 
> Ludwig, could you give it a try using the systemd version from git main?

sure
Comment 12 Ludwig Nussel 2023-12-18 12:54:29 UTC
anyway instead of /.dracut I'd rather use /var. Still not perfect but better than the current state.
Comment 13 Ludwig Nussel 2023-12-18 13:30:16 UTC
the updated dracut does not print error messages anymore. Still doesn't boot waiting for a device to appear though. Have to investigate further.

Builds of systemd git main are at https://build.opensuse.org/package/show/home:lnussel:systemd/systemd btw
Comment 14 Antonio Feijoo 2023-12-18 13:31:17 UTC
(In reply to Ludwig Nussel from comment #10)
> IOW the hook dir contains both static data that is generated when creating
> the cpio, as well as stuff that is written there only at runtime when the
> initrd is booted :-) So the correct solution would be to keep the static
> stuff in /usr and write only to /run.

Hmm.. that would mean the same hierarchy duplicated in 2 different dirs... and furthermore this would only solve the problem of new additions, but no the removals. E.g. https://github.com/dracutdevs/dracut/commit/07af8d58745a121052cab49c70a476f02996da1e, excerpt from your log in comment #0:

Dec 14 15:38:24 localhost dracut-initqueue[374]: rm: cannot remove '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-path\x2fpci-0000:00:03.0-part.sh': Read-only file system
Dec 14 15:38:24 localhost dracut-initqueue[376]: rm: cannot remove '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f8826-B4B3.sh': Read-only file system

(In reply to Ludwig Nussel from comment #12)
> anyway instead of /.dracut I'd rather use /var. Still not perfect but better
> than the current state.

Ok, I buy this :)
Comment 15 Antonio Feijoo 2023-12-18 13:31:50 UTC
(In reply to Ludwig Nussel from comment #13)
> the updated dracut does not print error messages anymore. Still doesn't boot
> waiting for a device to appear though. Have to investigate further.
> 
> Builds of systemd git main are at
> https://build.opensuse.org/package/show/home:lnussel:systemd/systemd btw

Ok, thanks, I only tested the build against v254.
Comment 16 Antonio Feijoo 2023-12-18 13:38:54 UTC
(In reply to Antonio Feijoo from comment #15)
> (In reply to Ludwig Nussel from comment #13)
> > the updated dracut does not print error messages anymore. Still doesn't boot
> > waiting for a device to appear though. Have to investigate further.
> > 
> > Builds of systemd git main are at
> > https://build.opensuse.org/package/show/home:lnussel:systemd/systemd btw
> 
> Ok, thanks, I only tested the build against v254.

It works for me with your OBS package, maybe my system setup is simpler or you are facing another issue introduced by systemd-v255.
Comment 17 Ludwig Nussel 2023-12-18 15:20:39 UTC
probably. It's an encrypted disk. Reaches my limit of knowledge how to debug. It waits indefinitely for dev-disk-by\x2dpath-pci\x2d0000:00:03.0\x2dpart.device. Manyually stopping it with rd.systemd.debug-shell=1 continues boot.

JOB UNIT                                                   TYPE  STATE                                                                                                        
61  dracut-pre-pivot.service                               start waiting                                                                                                      
60  transactional-update-etc-cleaner.service               start waiting                                                                                                      
122 initrd-cleanup.service                                 start waiting                                                                                                      
48  dev-disk-by\x2dpath-pci\x2d0000:00:03.0\x2dpart.device start running                                                                                                      
1   initrd.target                                          start waiting                                                                                                      

5 jobs listed.                                                                                                                                                                

                                                                                                                                        
○ dev-disk-by\x2dpath-pci\x2d0000:00:03.0\x2dpart.device - /dev/disk/by-path/pci-0000:00:03.0-part                                                                            
     Loaded: loaded                                                                                                                                                           
    Drop-In: /etc/systemd/system/dev-disk-by\x2dpath-pci\x2d0000:00:03.0\x2dpart.device.d                                                                                     
             └─timeout.conf                                                                                                                                                   
     Active: inactive (dead)      


# lsinitrd /boot/efi/opensuse-microos/6.6.6-1-kvmsmall/initrd-6118ebab7f11e11136223449b93c5ec677c6af97 |grep 03\.0
-rw-r--r--   1 root     root          116 Dec 18 13:16 .dracut/hooks/emergency/80-\x2fdev\x2fdisk\x2fby-path\x2fpci-0000:00:03.0-part.sh
-rw-r--r--   1 root     root           49 Dec 18 13:16 .dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-path\x2fpci-0000:00:03.0-part.sh
drwxr-xr-x   2 root     root            0 Dec 18 13:16 etc/systemd/system/dev-disk-by\x2dpath-pci\x2d0000:00:03.0\x2dpart.device.d
-rw-r--r--   2 root     root            0 Dec 18 13:16 etc/systemd/system/dev-disk-by\x2dpath-pci\x2d0000:00:03.0\x2dpart.device.d/timeout.conf
lrwxrwxrwx   1 root     root           57 Dec 18 13:16 etc/systemd/system/initrd.target.wants/dev-disk-by\x2dpath-pci\x2d0000:00:03.0\x2dpart.device -> ../dev-disk-by\x2dpath-pci\x2d0000:00:03.0\x2dpart.device
Comment 18 Antonio Feijoo 2023-12-18 15:39:52 UTC
(In reply to Ludwig Nussel from comment #17)
> probably. It's an encrypted disk.

And with transactional updates. I'll try a similar setup and give you some feedback.

Is there any reason why you identify devices by-path instead of by-uuid? What is mounted on the device that blocks the boot, the root fs?

You can stop before initqueue using `rd.break=initqueue` and try to check what's the state of the system at that point.
Comment 19 Ludwig Nussel 2023-12-18 16:05:22 UTC
it actually also happens without encrypted disk. Using
http://download.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-kvm-and-xen-sdboot.qcow2 and then installing the latest systemd and your dracut, reboot, then call `sdbootutil --no-reuse-initrd add-all-kernels`. Reboot again. voila, problem appears.
Maybe dracut gets confused because /dev/disk/by-path/pci-0000:00:03.0-part/ is actually a directory
Comment 20 Ludwig Nussel 2023-12-18 16:08:44 UTC
Here we go I think https://github.com/systemd/systemd/pull/29219
Comment 21 Antonio Feijoo 2023-12-18 16:13:21 UTC
(In reply to Ludwig Nussel from comment #19)
> it actually also happens without encrypted disk. Using
> http://download.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-
> kvm-and-xen-sdboot.qcow2 and then installing the latest systemd and your
> dracut, reboot, then call `sdbootutil --no-reuse-initrd add-all-kernels`.
> Reboot again. voila, problem appears.
> Maybe dracut gets confused because /dev/disk/by-path/pci-0000:00:03.0-part/
> is actually a directory

Ok, I tried with a new Tumbleweed vm, transactional server and rootfs encrypted, but no issues.

Then I have to add systemd-boot to the set :) I bet this is another issue and it can be reproduced if you downgrade dracut and set `ProtectSystem=no` in /etc/systemd/system.conf.
Comment 22 Ludwig Nussel 2023-12-18 16:17:29 UTC
Probably releated to code that calls get_maj_min() on it. There's no error condition for calling stat on a non-device:

$ stat -L -c '%t:%T' /dev/disk/by-path/pci-0000\:00\:03.0-part
0:0

calling code seems to expect an empty reply from get_maj_min() on error though
Comment 23 Ludwig Nussel 2023-12-18 16:30:22 UTC
reproduced without sytsemd-boot using http://download.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-16.0.0-kvm-and-xen-Snapshot20231215.qcow2

transactional-update --no-selfupdate shell
zypper ar https://download.opensuse.org/repositories/devel:/microos:/systemd-boot/openSUSE_Tumbleweed/devel:microos:systemd-boot.repo
zypper up -r 1 --allow-vendor-change systemd dracut
exit
reboot
transactional-update initrd
reboot

The trick is that you need to actually boot the new systemd/udev and then regenerate the initrd. only then the new by-path directory exists
Comment 24 Antonio Feijoo 2023-12-19 08:15:39 UTC
(In reply to Ludwig Nussel from comment #23)
> reproduced without sytsemd-boot using
> http://download.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-
> 16.0.0-kvm-and-xen-Snapshot20231215.qcow2
> 
> transactional-update --no-selfupdate shell
> zypper ar
> https://download.opensuse.org/repositories/devel:/microos:/systemd-boot/
> openSUSE_Tumbleweed/devel:microos:systemd-boot.repo
> zypper up -r 1 --allow-vendor-change systemd dracut
> exit
> reboot
> transactional-update initrd
> reboot
> 
> The trick is that you need to actually boot the new systemd/udev and then
> regenerate the initrd. only then the new by-path directory exists

Thanks for the reproducer. As I suspected, this is another issue introduced by systemd-v255.

Before:

> localhost:~ # systemctl --version
> systemd 254 (254.5+suse.17.gce08cd5f66)
> +PAM +AUDIT +SELINUX +APPARMOR +IMA -SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK -XKBCOMMON -UTMP +SYSVINIT default-hierarchy=unified
> localhost:~ # ls -l /dev/disk/by-path/
> total 0
> lrwxrwxrwx. 1 root root  9 Dec 19 08:59 pci-0000:04:00.0 -> ../../vda
> lrwxrwxrwx. 1 root root 10 Dec 19 08:59 pci-0000:04:00.0-part1 -> ../../vda1
> lrwxrwxrwx. 1 root root 10 Dec 19 08:59 pci-0000:04:00.0-part2 -> ../../vda2
> lrwxrwxrwx. 1 root root 10 Dec 19 08:59 pci-0000:04:00.0-part3 -> ../../vda3
> lrwxrwxrwx. 1 root root  9 Dec 19 08:59 virtio-pci-0000:04:00.0 -> ../../vda
> lrwxrwxrwx. 1 root root 10 Dec 19 08:59 virtio-pci-0000:04:00.0-part1 -> ../../vda1
> lrwxrwxrwx. 1 root root 10 Dec 19 08:59 virtio-pci-0000:04:00.0-part2 -> ../../vda2
> lrwxrwxrwx. 1 root root 10 Dec 19 08:59 virtio-pci-0000:04:00.0-part3 -> ../../vda3

After:

> localhost:~ # systemctl --version
> systemd 255 (255+git20231218.c88753db4)
> +PAM +AUDIT +SELINUX +APPARMOR +IMA -SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP -SYSVINIT default-hierarchy=unified
> localhost:~ # ls -l /dev/disk/by-path/
> total 0
> lrwxrwxrwx. 1 root root   9 Dec 19 09:06 pci-0000:04:00.0 -> ../../vda
> drwxr-xr-x. 7 root root 140 Dec 19 09:06 pci-0000:04:00.0-part
> lrwxrwxrwx. 1 root root  10 Dec 19 09:06 pci-0000:04:00.0-part1 -> ../../vda1
> lrwxrwxrwx. 1 root root  10 Dec 19 09:06 pci-0000:04:00.0-part2 -> ../../vda2
> lrwxrwxrwx. 1 root root  10 Dec 19 09:06 pci-0000:04:00.0-part3 -> ../../vda3
> lrwxrwxrwx. 1 root root   9 Dec 19 09:06 virtio-pci-0000:04:00.0 -> ../../vda
> lrwxrwxrwx. 1 root root  10 Dec 19 09:06 virtio-pci-0000:04:00.0-part1 -> ../../vda1
> lrwxrwxrwx. 1 root root  10 Dec 19 09:06 virtio-pci-0000:04:00.0-part2 -> ../../vda2
> lrwxrwxrwx. 1 root root  10 Dec 19 09:06 virtio-pci-0000:04:00.0-part3 -> ../../vda3

And dracut doesn't handle this /dev/disk/by-path/pci-0000:04:00.0-part directory well. I would open a different bug for this, because I think it has nothing to do with /usr being read-only.
Comment 25 Ludwig Nussel 2023-12-19 08:29:13 UTC
I've filed https://github.com/dracutdevs/dracut/issues/2592
Comment 26 Ludwig Nussel 2023-12-19 08:29:36 UTC
it's post 255 btw
Comment 27 Antonio Feijoo 2023-12-19 14:37:55 UTC
(In reply to Ludwig Nussel from comment #25)
> I've filed https://github.com/dracutdevs/dracut/issues/2592

Check https://github.com/dracutdevs/dracut/pull/2593, tested using the image from comment #23

BTW maybe you can group systemd-boot entries instead of showing this large list of snapshots ;) https://github.com/systemd/systemd/pull/30069
Comment 28 Antonio Feijoo 2024-04-29 07:17:03 UTC
Fix agreed by the new upstream (https://github.com/openSUSE/dracut/commit/14cb75e05b1b1bf8b9f3bfadddd5a483ef7f4dc3) available since snapshot 20240426.
Closing.