Bug 1197309 - [Build 20220318] systemd 250: failed to parse fstab: permission denied
Summary: [Build 20220318] systemd 250: failed to parse fstab: permission denied
Status: CONFIRMED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Current
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Ignaz Forster
QA Contact: E-mail List
URL: https://openqa.opensuse.org/tests/225...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-19 10:55 UTC by Dominique Leuenberger
Modified: 2023-10-03 14:10 UTC (History)
3 users (show)

See Also:
Found By: openQA
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---


Attachments
systemd debug log (826.89 KB, text/x-log)
2022-03-22 14:46 UTC, Fabian Vogt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dominique Leuenberger 2022-03-19 10:55:14 UTC
## Observation

openQA test in scenario microos-Tumbleweed-MicroOS-Image-x86_64-microos@64bit fails in
[trup_smoke](https://openqa.opensuse.org/tests/2252516/modules/trup_smoke/steps/28)

## Test suite description



## Reproducible

Fails since (at least) Build [20220318](https://openqa.opensuse.org/tests/2252516) (current job)


## Expected result

Last good: [20220317](https://openqa.opensuse.org/tests/2249910) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=microos&flavor=MicroOS-Image&machine=64bit&test=microos&version=Tumbleweed)
Comment 1 Fabian Vogt 2022-03-21 12:19:49 UTC
It works locally for me and fbui as well as in openQA: https://openqa.opensuse.org/tests/2257035

Even in the linked failed test it successfully boots into a new snapshot during the cockpit test.

In the journal from the test I see these messages:

Mar 19 07:10:14.821930 localhost.localdomain kernel: audit: type=1400 audit(1647673814.816:12): avc:  denied  { read } for  pid=1091 comm="systemd-sysctl" name="protected_hardlinks" dev="proc" ino=14005 scontext=system_u:system_r:systemd_sysctl_t:s0 tcontext=system_u:object_r:proc_security_t:s0 tclass=file permissive=0
Mar 19 07:10:14.827159 localhost.localdomain systemd[1]: Finished Apply Kernel Variables.
Mar 19 07:10:14.829928 localhost.localdomain kernel: audit: type=1400 audit(1647673814.824:13): avc:  denied  { read } for  pid=1091 comm="systemd-sysctl" name="protected_hardlinks" dev="proc" ino=14005 scontext=system_u:system_r:systemd_sysctl_t:s0 tcontext=system_u:object_r:proc_security_t:s0 tclass=file permissive=0
Mar 19 07:10:14.829972 localhost.localdomain kernel: audit: type=1400 audit(1647673814.824:14): avc:  denied  { read } for  pid=1091 comm="systemd-sysctl" name="protected_symlinks" dev="proc" ino=14006 scontext=system_u:system_r:systemd_sysctl_t:s0 tcontext=system_u:object_r:proc_security_t:s0 tclass=file permissive=0

Those don't appear locally here, which indicates that there are some behaviour differences, could be random and induced by some race conditions...
Comment 2 Thomas Blume 2022-03-21 15:04:32 UTC
The sequence in https://openqa.opensuse.org/tests/2252516/file/journal_check-full_journal.log looks quite strange.
Apparently SELINUX gets enable at switch root, just before systemd is restarted in the real root:

-->
Mar 19 07:10:13.382735 localhost.localdomain systemd[1]: Switching root.
Mar 19 07:10:13.432872 localhost.localdomain systemd-journald[172]: Journal stopped
Mar 19 07:10:14.566701 localhost.localdomain systemd-journald[172]: Received SIGTERM from PID 1 (systemd).
Mar 19 07:10:14.566754 localhost.localdomain kernel: audit: type=1400 audit(1647673813.468:5): avc:  denied  { prog_run } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcont
ext=system_u:system_r:kernel_t:s0 tclass=bpf permissive=0
Mar 19 07:10:14.566781 localhost.localdomain kernel: audit: type=1400 audit(1647673813.468:6): avc:  denied  { prog_run } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcont
ext=system_u:system_r:kernel_t:s0 tclass=bpf permissive=0
Mar 19 07:10:14.566796 localhost.localdomain kernel: audit: type=1400 audit(1647673813.468:7): avc:  denied  { prog_run } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=bpf permissive=0
Mar 19 07:10:14.566810 localhost.localdomain kernel: audit: type=1400 audit(1647673813.468:8): avc:  denied  { prog_run } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=bpf permissive=0
Mar 19 07:10:14.566823 localhost.localdomain kernel: audit: type=1400 audit(1647673813.468:9): avc:  denied  { prog_run } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=bpf permissive=0
Mar 19 07:10:14.566837 localhost.localdomain kernel: SELinux:  Class mctp_socket not defined in policy.
Mar 19 07:10:14.566854 localhost.localdomain kernel: SELinux:  Class io_uring not defined in policy.
Mar 19 07:10:14.566867 localhost.localdomain kernel: SELinux: the above unknown classes and permissions will be allowed
Mar 19 07:10:14.566881 localhost.localdomain kernel: SELinux:  Converting 204 SID table entries...
Mar 19 07:10:14.566893 localhost.localdomain kernel: SELinux:  policy capability network_peer_controls=1
Mar 19 07:10:14.566907 localhost.localdomain kernel: SELinux:  policy capability open_perms=1
Mar 19 07:10:14.566920 localhost.localdomain kernel: SELinux:  policy capability extended_socket_class=1
Mar 19 07:10:14.566933 localhost.localdomain kernel: SELinux:  policy capability always_check_network=0
Mar 19 07:10:14.566950 localhost.localdomain kernel: SELinux:  policy capability cgroup_seclabel=1
Mar 19 07:10:14.566964 localhost.localdomain kernel: SELinux:  policy capability nnp_nosuid_transition=1
Mar 19 07:10:14.566977 localhost.localdomain kernel: SELinux:  policy capability genfs_seclabel_symlinks=0
Mar 19 07:10:14.566991 localhost.localdomain kernel: audit: type=1403 audit(1647673813.580:10): auid=4294967295 ses=4294967295 lsm=selinux res=1
Mar 19 07:10:14.567004 localhost.localdomain kernel: audit: type=1400 audit(1647673813.584:11): avc:  denied  { setcurrent } for  pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=process permissive=0
Mar 19 07:10:14.567019 localhost.localdomain systemd[1]: Failed to transition into init label 'system_u:system_r:init_t:s0', ignoring.
Mar 19 07:10:14.567039 localhost.localdomain systemd[1]: Successfully loaded SELinux policy in 94.915ms.
Mar 19 07:10:14.567054 localhost.localdomain systemd[1]: Relabelled /dev, /dev/shm, /run, /sys/fs/cgroup in 65.251ms.
Mar 19 07:10:14.567069 localhost.localdomain systemd[1]: systemd 250.4+suse.35.g8ef8dfd540 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR -IMA -SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
--<


What is starting SELinux?
Is that a userspace daemon or a kernel driver?
Is SELinux supposed to be initialized in the initrd or afterwards?
Comment 3 Fabian Vogt 2022-03-22 13:50:07 UTC
https://github.com/kubic-project/microos-tools/tree/master/selinux/98selinux-microos
relabels /sysroot from within the initrd in the dracut-pre-pivot hook. Note that on MicroOS, /etc is an overlay with /var/lib/overlay/X/etc as upperdir, so /sysroot/var and /sysroot/etc are mounted in the initrd (x-initrd.mount). The issue is that those aren't mounted yet when the pre-pivot hook runs, but they are mounted while the relabel is in progress!

Mar 19 07:10:05.230048 localhost.localdomain systemd[1]: Finished Reload Configuration from the Real Root.
Mar 19 07:10:05.233033 localhost.localdomain systemd[1]: dracut mount hook was skipped because all trigger condition checks failed.
Mar 19 07:10:05.234886 localhost.localdomain systemd[1]: Starting dracut pre-pivot and cleanup hook...
Mar 19 07:10:05.253660 localhost.localdomain dracut-pre-pivot[965]: SELinux: relabeling root filesystem
...
Mar 19 07:10:05.383675 localhost.localdomain dracut-pre-pivot[965]: SELinux: mount root read-write and relabel
Mar 19 07:10:05.403681 localhost.localdomain kernel: BTRFS info (device vda3): using free space tree
Mar 19 07:10:05.407806 localhost.localdomain dracut-pre-pivot[983]: Can't stat exclude path "/var/lib/overlay", No such file or directory - ignoring.
Mar 19 07:10:05.790973 localhost.localdomain systemd[1]: Mounting /sysroot/root...
Mar 19 07:10:05.793760 localhost.localdomain systemd[1]: sysroot-var.mount: Directory /sysroot/var to mount over is not empty, mounting anyway.
Mar 19 07:10:05.798267 localhost.localdomain systemd[1]: Mounting /sysroot/var...
Mar 19 07:10:05.807194 localhost.localdomain systemd[1]: Mounted /sysroot/root.
Mar 19 07:10:05.818591 localhost.localdomain systemd[1]: Mounted /sysroot/var.
Mar 19 07:10:05.820518 localhost.localdomain systemd[1]: sysroot-etc.mount: Directory /sysroot/etc to mount over is not empty, mounting anyway.
Mar 19 07:10:05.822020 localhost.localdomain systemd[1]: Mounting /sysroot/etc...
Mar 19 07:10:05.828763 localhost.localdomain systemd[1]: Starting Grow File System on /sysroot/var...
Mar 19 07:10:05.857164 localhost.localdomain systemd[1]: Reached target Initrd File Systems.
...
Mar 19 07:10:13.180623 localhost.localdomain systemd[1]: sysroot-proc.mount: Deactivated successfully.
Mar 19 07:10:13.194281 localhost.localdomain systemd[1]: Finished dracut pre-pivot and cleanup hook.

This apparently means that "/var/lib/overlay" from the exclusion list is ignored and it relabels /sysroot/etc/ recursively and then also relabels /sysroot/var/lib/overlay, the upperdir of the /etc overlay. Due to caching effects of overlayfs, /sysroot/etc/ still shows the correct labels, even though the upperdir has broken ones. They become visible after remounting the overlay or simply on the next boot.

I could reproduce that locally by running this in the dracut shell (rd.break) of a healthy system:

bash-5.1# ls -Z /sysroot/var/lib/overlay/*/etc/fstab
system_u:object_r:etc_t:s0 /sysroot/var/lib/overlay/2/etc/fstab
bash-5.1# ls -Z /sysroot/etc/fstab
system_u:object_r:etc_t:s0 /sysroot/etc/fstab
bash-5.1# systemctl stop /sysroot/var; (. /usr/lib/dracut-lib.sh; . /usr/lib/dracut/hooks/pre-pivot/50-selinux-microos-relabel.sh) & sleep 2; systemctl start /sysroot/etc
[1] 1310
SELinux: relabeling root filesystem
SELinux: loading policy

SELinux: mount root read-write and relabel
Can't stat exclude path "/var/lib/overlay", No such file or directory - ignoring.
bash-5.1# ls -Z /sysroot/etc/fstab
system_u:object_r:etc_t:s0 /sysroot/etc/fstab
[1]+  Done                    ( . /usr/lib/dracut-lib.sh; . /usr/lib/dracut/hooks/pre-pivot/50-selinux-microos-relabel.sh )
bash-5.1# ls -Z /sysroot/var/lib/overlay/*/etc/fstab
system_u:object_r:var_lib_t:s0 /sysroot/var/lib/overlay/2/etc/fstab
bash-5.1# mount -o remount /sysroot/etc
bash-5.1# ls -Z /sysroot/etc/fstab
system_u:object_r:var_lib_t:s0 /sysroot/etc/fstab

As it changes labels in /var/lib/overlay, it'll also break booting from old snapshots (!)

The question is why dracut-pre-pivot.service starts before the /sysroot/{var,etc} mounts are up and initrd-fs.target reached.

bash-5.1# systemctl show dracut-pre-pivot.service -P After --no-pager
dracut-initqueue.service system.slice initrd.target dracut-mount.service sysroot.mount dracut-pre-mount.service systemd-journald.socket initrd-parse-etc.service remote-fs.target
bash-5.1# systemctl show initrd.target -P After --no-pager
initrd-root-fs.target initrd-root-device.target initrd-usr-fs.target "dev-disk-by\\x2duuid-68348daf\\x2dc38c\\x2d4ffc\\x2dbd9b\\x2da0bdb9ac225d.device" ignition-complete.target basic.target rescue.target ignition-remove-reconfig_system.service rescue.service initrd-fs.target
bash-5.1# systemctl show initrd-fs.target -P After --no-pager
initrd-parse-etc.service sysroot-etc.mount sysroot-root.mount systemd-growfs@sysroot-var.service sysroot-var.mount
Comment 4 Fabian Vogt 2022-03-22 14:46:44 UTC
Created attachment 857254 [details]
systemd debug log

I wrote a combustion script which adds a debug script to the pre-pivot hook on boot. With that, it's visible that initrd-fs.target isn't active yet when dracut-pre-pivot.service runs. The full systemd debug log is attached. Which of the units are active is random, in this particular run sysroot-etc.mount was fast enough and is shown as active.
Comment 5 Thomas Blume 2022-03-22 15:57:13 UTC
(In reply to Fabian Vogt from comment #4)
> Created attachment 857254 [details]
> systemd debug log
> 
> I wrote a combustion script which adds a debug script to the pre-pivot hook
> on boot. With that, it's visible that initrd-fs.target isn't active yet when
> dracut-pre-pivot.service runs. The full systemd debug log is attached. Which
> of the units are active is random, in this particular run sysroot-etc.mount
> was fast enough and is shown as active.

There is probably a dependency missing.
On my tumbleweed machine I can see that dracut-pre-pivot.service has no dependency to sysroot-etc.mount, only to sysroot.mount:

-->
# cat /usr/lib/systemd/system/dracut-pre-pivot.service
#  This file is part of dracut.
#
# See dracut.bootup(7) for details

[Unit]
Description=dracut pre-pivot and cleanup hook
Documentation=man:dracut-pre-pivot.service(8)
DefaultDependencies=no
After=initrd.target initrd-parse-etc.service sysroot.mount
After=dracut-initqueue.service dracut-pre-mount.service dracut-mount.service
Before=initrd-cleanup.service
Wants=remote-fs.target
After=remote-fs.target
--<

Fabian, can you add such a dep and retry?
Comment 6 Fabian Vogt 2022-03-22 16:39:21 UTC
I found these lines in the log:

ignition-mount.service: Trying to enqueue job ignition-mount.service/stop/replace
ignition-complete.target: Installed new job ignition-complete.target/stop as 88
initrd.target: Job 1 initrd.target/start finished, result=canceled

It looks like the fix/workaround for boo#1176746 is the cause. The added
"systemctl stop ignition-mount.serivce" indirectly stops initrd.target, which
causes systemd to ignore ordering relative to that as it's no longer part of the
transaction.

So we need to find a different solution to the bug, e.g.:

a) Splitting ignition-mount.service's ExecStop into a separate ignition-umount.service
b) Disabling ignition-mount's ExecStop in a dropin and run that from combustion
c) Some other way...

(In reply to Thomas Blume from comment #5)
> (In reply to Fabian Vogt from comment #4)
> > Created attachment 857254 [details]
> > systemd debug log
> > 
> > I wrote a combustion script which adds a debug script to the pre-pivot hook
> > on boot. With that, it's visible that initrd-fs.target isn't active yet when
> > dracut-pre-pivot.service runs. The full systemd debug log is attached. Which
> > of the units are active is random, in this particular run sysroot-etc.mount
> > was fast enough and is shown as active.
> 
> There is probably a dependency missing.
> On my tumbleweed machine I can see that dracut-pre-pivot.service has no
> dependency to sysroot-etc.mount, only to sysroot.mount:

That's expected, you'll only see that if you have an /etc mountpoint with
x-initrd.mount in /etc/fstab. It'll then get pulled in through
initrd-fs.target -> initrd.target.

> -->
> # cat /usr/lib/systemd/system/dracut-pre-pivot.service
> #  This file is part of dracut.
> #
> # See dracut.bootup(7) for details
> 
> [Unit]
> Description=dracut pre-pivot and cleanup hook
> Documentation=man:dracut-pre-pivot.service(8)
> DefaultDependencies=no
> After=initrd.target initrd-parse-etc.service sysroot.mount
> After=dracut-initqueue.service dracut-pre-mount.service dracut-mount.service
> Before=initrd-cleanup.service
> Wants=remote-fs.target
> After=remote-fs.target
> --<
> 
> Fabian, can you add such a dep and retry?
Comment 7 Fabian Vogt 2022-03-24 12:34:42 UTC
(In reply to Fabian Vogt from comment #6)
> I found these lines in the log:
> 
> ignition-mount.service: Trying to enqueue job
> ignition-mount.service/stop/replace
> ignition-complete.target: Installed new job ignition-complete.target/stop as
> 88
> initrd.target: Job 1 initrd.target/start finished, result=canceled
> 
> It looks like the fix/workaround for boo#1176746 is the cause. The added
> "systemctl stop ignition-mount.serivce" indirectly stops initrd.target, which
> causes systemd to ignore ordering relative to that as it's no longer part of
> the
> transaction.
> 
> So we need to find a different solution to the bug, e.g.:
> 
> a) Splitting ignition-mount.service's ExecStop into a separate
> ignition-umount.service
> b) Disabling ignition-mount's ExecStop in a dropin and run that from
> combustion

For now we decided to go with b): https://build.opensuse.org/request/show/964654

Ideally we find some better solution for compatibility with ignition upstream.
I'll leave the bug open for that.

> c) Some other way...
> 
> (In reply to Thomas Blume from comment #5)
> > (In reply to Fabian Vogt from comment #4)
> > > Created attachment 857254 [details]
> > > systemd debug log
> > > 
> > > I wrote a combustion script which adds a debug script to the pre-pivot hook
> > > on boot. With that, it's visible that initrd-fs.target isn't active yet when
> > > dracut-pre-pivot.service runs. The full systemd debug log is attached. Which
> > > of the units are active is random, in this particular run sysroot-etc.mount
> > > was fast enough and is shown as active.
> > 
> > There is probably a dependency missing.
> > On my tumbleweed machine I can see that dracut-pre-pivot.service has no
> > dependency to sysroot-etc.mount, only to sysroot.mount:
> 
> That's expected, you'll only see that if you have an /etc mountpoint with
> x-initrd.mount in /etc/fstab. It'll then get pulled in through
> initrd-fs.target -> initrd.target.
> 
> > -->
> > # cat /usr/lib/systemd/system/dracut-pre-pivot.service
> > #  This file is part of dracut.
> > #
> > # See dracut.bootup(7) for details
> > 
> > [Unit]
> > Description=dracut pre-pivot and cleanup hook
> > Documentation=man:dracut-pre-pivot.service(8)
> > DefaultDependencies=no
> > After=initrd.target initrd-parse-etc.service sysroot.mount
> > After=dracut-initqueue.service dracut-pre-mount.service dracut-mount.service
> > Before=initrd-cleanup.service
> > Wants=remote-fs.target
> > After=remote-fs.target
> > --<
> > 
> > Fabian, can you add such a dep and retry?