Bugzilla – Bug 1193172
systemd: klog.service: Failed with result 'exit-code'.
Last modified: 2021-12-09 09:09:23 UTC
After updating TW (32-bit) from 20211116-1275.1 to 20211126-1296.1 klog.service fails to start on boot.
Attaching journal excerpt.
Created attachment 854130 [details]
Can confirm this issue on TW 20211128.
It most likely happens due to https://en.opensuse.org/openSUSE:Security_Features#Systemd_hardening_effort being applied to https://build.opensuse.org/package/show/Base:System/syslogd, which causes access errors. Disabling the service file hardening for the syslogd services fixes the issue, I created a request to do so:
However, I really don't know if disabling the service file hardening for syslogd is as manageable/acceptable as it is for CUPS (it was also affected), we'll see whether the request gets accepted or not.
I'd like to know *which* of the settings
# disabled for this service due to access errors, see boo#1193172
does/do cause this crash?
hm, I'm trying to figure this out, but currently the daemon doesn't start for me even none of these in all three service files
Now this bug is getting really weird. Even after installing the fully protected version of klogd again the bug is not reproducible for me anymore, on a freshly installed system it isn't reproducible either. Perhaps some scheduled task was completed sucessfully and now the system doens't try to perform it anymore?
According to common sense, I would suspect the line
to be the culprit as the error is likely an access error to kernel logs.
However, I'm not 100% sure.
I think I found what cause the problems. Can you please give
a try if that works in your setup?
(In reply to Simon Vogl from comment #5)
I also had some issues reproducing this. I think this is caused by how the services relate to each other. ProtectKernelLogs is part of the problem (and a stupid error of me, that one is kind of obvious).
So I got a few updates:
1. The issue seems to re-appear on kernel updates, in my case it did going from 5.15.3 to 5.15.5.
2. The package from home:jsegitz:branches:Base:System does not fix the issue for me unfortunately.
Hmm ... maybe there is a new capability about kernel logs?
I'm also on 5.15.5-1-default and don't see the problem. I'm not aware of new capabilities.
@Simon: Which service is failing for you?
(In reply to Johannes Segitz from comment #10)
> I'm also on 5.15.5-1-default and don't see the problem. I'm not aware of new
> @Simon: Which service is failing for you?
klog.service, same as OP reported. All other services are running fine.
My symptom is that klog.service fails to start when it botches
/bin/sh -c "test -s /var/log/boot.msg && \
/bin/mv -f /var/log/boot.msg /var/log/boot.omsg || :"
saying in syslog:
klog.service: Failed to set up standard output: Operation not permitted
klog.service: Failed at step STDOUT spawning /bin/sh: Operation not permitted
Failed to start Early Kernel Boot Messages.
Commenting out all the Protect commands (and RefuseManualStart=true),
systemctl daemon-reload, and restarting klog.service made it start
successfully (although I probably trashed my /var/log/boot.msg).
I'm testing on a development VM so trashing logs doesn't matter.
Commenting them out one at a time: "fails" means it has the same
symptom as the unmodified service unit.
# ProtectSystem=full -- fails.
# ProtectHome=true -- fails.
# PrivateDevices=true -- fails.
# ProtectHostname=true -- fails.
# ProtectClock=true -- fails.
# ProtectKernelTunables=true -- fails.
# ProtectKernelModules=true -- fails.
# ProtectKernelLogs=true -- fails.
# ProtectControlGroups=true -- fails.
# RestrictRealtime=true -- fails.
Summary: if only one of the protections is turned off, it still fails.
So someone is going to have to do a fair amount of work to get to the
bottom of this. It might be a good idea to revert the hardening until
the cause of failure is found and fixed.
klog.service does these steps:
* Preserves the old /var/log/boot.msg
* Runs dmesg -r saving kernel boot messages in /var/log/boot.msg
* Appends messages from the initrd to /var/log/boot.msg
* Sets the tty on which ongoing kernel log messages are shown
* Sets the log level from the kernel (dmesg -n)
But on the failure we're seeing, it executes none of them. So after a
fix, if the last two items are important on a particular host, one might
consider restarting klog.service (which would trash /var/log/boot.msg)
or rebooting. Or doing those steps by hand. Otherwise, there's no
point in restarting a fixed klog.service.
Yep there is a reason why klog.service is a oneshot as well it refuses to be started manually (means via e.g. systemctl) as otherwise it would trash/smash the /var/log/boot.msg as well as /var/log/boot.omsg
And klog.service should be able to access the character devices below /dev as otherwise /usr/bin/setlogcons as well as /bin/dmesg can not work ... also boot.msg and boot.omsg should be in the real upper filesystem and not within a temporary userspace
Something must be wrong apart from the hardenings. It doesn't make sense that this service fails if just ProtectHome is set.
ExecStart=/bin/sh -c "test -s /var/log/boot.msg && /bin/mv -f /var/log/boot.msg /var/log/boot.omsg || :"
ExecStart=/bin/sh -c "/bin/dmesg -r > /var/log/boot.msg"
ExecStart=/bin/sh -c "test -s /dev/shm/initrd.msg && /bin/cat /dev/shm/initrd.msg >> /var/log/boot.msg || :"
ExecStart=/bin/sh -c "test -c /dev/tty$KLOG_CONSOLE && /usr/bin/setlogcons $KLOG_CONSOLE || :"
ExecStart=/bin/sh -c "if test -n \"$CONSOLE_LOGLEVEL\" ; then /bin/dmesg -n $CONSOLE_LOGLEVEL || : ; fi"
nothing here accesses the home directories.
For me doing a daemon-reload wasn't enough to test this. I had to reboot the machine to get to a proper state. I have the service running with
enabled. As this is a one-shot service I don't mind if we remove those hardenings for this service, but it is quite strange.
Can someone who has the problem please remove all but the ProtectHome hardenings,
reboot and verify that it still fails, then comment out the ExecStart commands one by one (rebooting each time) to see if we can find the command that trips?
For ProtectHome=true ... maybe a
grep :/var/log /etc/passwd
shows if there is a HOME below /var/log ... like lirc
I tested this with lirc and also had a quick look at the systemd sources. I don't think that custom home directories are considered there.
To make sure I created a service with the hardening options to give me a reverse shell into the service. I'm able to execute all the commands in there without issues.
For the reporters you can do this with
zypper in gnu-netcat
cat << EOF > /etc/systemd/system/test.service
ExecStart= /usr/bin/nc 127.0.0.1 3000 -e /bin/bash
in one shell start the listener:
nc -nlp 3000
then start the service:
systemctl start test
you now should have a shell where you can go through the commands. For me they all work (but for me the service also starts ;))
Within this shell your are able to change e.g. the default log tty if PrivateDevices= is not touched. Is this visible *outside* then?
Btw: There still exists loop hole if SystemCallFilter= is not used to protect all those Private, Protect, ReadOnly, ReadWrite, and Inaccessible options
I changed my local klog.service and klogd.service to match those from the link you provided. I commented manuallly the lines which are not present in your files. Then I rebooted.
Result: the error messages are gone and klog.service starts normally now.
Here is the changed configuration which works:
# grep -E '^#?Protect' klog*.service
Well well well well.....
After retrying the patch in my home repository and installing that, it works, even after kernel updates!
Turns out I made some severe mistake building the package I used previously that made the service files revert to the old state, my sincere sorry.
Johannes , the request 934727 works fine, just my brain unfortunately didn't.
good to hear, thank you for testing.
@Werner: Can you please accept the request?
(In reply to Johannes Segitz from comment #20)
> good to hear, thank you for testing.
> @Werner: Can you please accept the request?
This is an autogenerated message for OBS integration:
This bug (1193172) was mentioned in
https://build.opensuse.org/request/show/935468 Factory / syslogd
I can confirm the bug is fixed in TW 20211206. Yay!
thanks for the confirmation