Bug 1193172 - systemd[1]: klog.service: Failed with result 'exit-code'.
systemd[1]: klog.service: Failed with result 'exit-code'.
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Other
Current
i686 Other
: P5 - None : Normal (vote)
: ---
Assigned To: Johannes Segitz
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2021-11-29 11:08 UTC by Suse User
Modified: 2021-12-09 09:09 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
journal (1.14 KB, text/plain)
2021-11-29 11:09 UTC, Suse User
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Suse User 2021-11-29 11:08:17 UTC
After updating TW (32-bit) from 20211116-1275.1 to 20211126-1296.1 klog.service fails to start on boot.

Attaching journal excerpt.
Comment 1 Suse User 2021-11-29 11:09:59 UTC
Created attachment 854130 [details]
journal
Comment 2 Simon Vogl 2021-11-29 20:13:02 UTC
Can confirm this issue on TW 20211128.
It most likely happens due to https://en.opensuse.org/openSUSE:Security_Features#Systemd_hardening_effort being applied to https://build.opensuse.org/package/show/Base:System/syslogd, which causes access errors. Disabling the service file hardening for the syslogd services fixes the issue, I created a request to do so:
https://build.opensuse.org/request/show/934625
However, I really don't know if disabling the service file hardening for syslogd is as manageable/acceptable as it is for CUPS (it was also affected), we'll see whether the request gets accepted or not.
Comment 3 Dr. Werner Fink 2021-11-30 08:28:39 UTC
I'd like to know *which* of the settings

 # disabled for this service due to access errors, see boo#1193172
 # ProtectSystem=full
 # ProtectHome=true
 # PrivateDevices=true
 # ProtectHostname=true
 # ProtectClock=true
 # ProtectKernelTunables=true
 # ProtectKernelModules=true
 # ProtectKernelLogs=true
 # ProtectControlGroups=true
 # RestrictRealtime=true

does/do cause this crash?
Comment 4 Johannes Segitz 2021-11-30 12:27:16 UTC
hm, I'm trying to figure this out, but currently the daemon doesn't start for me even none of these in all three service files
Comment 5 Simon Vogl 2021-11-30 13:18:11 UTC
Now this bug is getting really weird. Even after installing the fully protected version of klogd again the bug is not reproducible for me anymore, on a freshly installed system it isn't reproducible either. Perhaps some scheduled task was completed sucessfully and now the system doens't try to perform it anymore?

According to common sense, I would suspect the line

ProtectKernelLogs=true

to be the culprit as the error is likely an access error to kernel logs.
However, I'm not 100% sure.
Comment 6 Johannes Segitz 2021-11-30 13:24:35 UTC
I think I found what cause the problems. Can you please give
https://build.opensuse.org/package/show/home:jsegitz:branches:Base:System/syslogd
a try if that works in your setup?
Comment 7 Johannes Segitz 2021-11-30 13:25:40 UTC
(In reply to Simon Vogl from comment #5)
I also had some issues reproducing this. I think this is caused by how the services relate to each other. ProtectKernelLogs is part of the problem (and a stupid error of me, that one is kind of obvious).
Comment 8 Simon Vogl 2021-11-30 14:10:21 UTC
So I got a few updates:

1. The issue seems to re-appear on kernel updates, in my case it did going from 5.15.3 to 5.15.5.

2. The package from home:jsegitz:branches:Base:System does not fix the issue for me unfortunately.
Comment 9 Dr. Werner Fink 2021-11-30 14:19:12 UTC
Hmm ... maybe there is a new capability about kernel logs?
Comment 10 Johannes Segitz 2021-11-30 15:23:30 UTC
I'm also on 5.15.5-1-default and don't see the problem. I'm not aware of new capabilities. 

@Simon: Which service is failing for you?
Comment 11 Simon Vogl 2021-11-30 16:06:51 UTC
(In reply to Johannes Segitz from comment #10)
> I'm also on 5.15.5-1-default and don't see the problem. I'm not aware of new
> capabilities. 
> 
> @Simon: Which service is failing for you?

klog.service, same as OP reported. All other services are running fine.
Comment 12 James Carter 2021-12-01 05:42:02 UTC
Mee2...  Versions:
openSUSE-release-20211128-1300.1.x86_64
syslog-service-2.0-794.1.noarch
kernel-default-5.15.3-1.3.x86_64

My symptom is that klog.service fails to start when it botches
/bin/sh -c "test -s /var/log/boot.msg && \
    /bin/mv -f /var/log/boot.msg /var/log/boot.omsg || :" 
    (code=exited, status=209/STDOUT)
saying in syslog:
klog.service: Failed to set up standard output: Operation not permitted
klog.service: Failed at step STDOUT spawning /bin/sh: Operation not permitted
Failed to start Early Kernel Boot Messages.

Commenting out all the Protect commands (and RefuseManualStart=true),
systemctl daemon-reload, and restarting klog.service made it start
successfully (although I probably trashed my /var/log/boot.msg).  
I'm testing on a development VM so trashing logs doesn't matter.  

Commenting them out one at a time: "fails" means it has the same
symptom as the unmodified service unit. 
# ProtectSystem=full -- fails.
# ProtectHome=true -- fails.
# PrivateDevices=true -- fails.
# ProtectHostname=true -- fails. 
# ProtectClock=true -- fails.
# ProtectKernelTunables=true -- fails.
# ProtectKernelModules=true -- fails.
# ProtectKernelLogs=true -- fails.
# ProtectControlGroups=true -- fails. 
# RestrictRealtime=true -- fails. 
Summary: if only one of the protections is turned off, it still fails. 
So someone is going to have to do a fair amount of work to get to the
bottom of this.  It might be a good idea to revert the hardening until
the cause of failure is found and fixed.  

klog.service does these steps: 
* Preserves the old /var/log/boot.msg
* Runs dmesg -r saving kernel boot messages in /var/log/boot.msg
* Appends messages from the initrd to /var/log/boot.msg
* Sets the tty on which ongoing kernel log messages are shown
* Sets the log level from the kernel (dmesg -n)

But on the failure we're seeing, it executes none of them.  So after a
fix, if the last two items are important on a particular host, one might
consider restarting klog.service (which would trash /var/log/boot.msg)
or rebooting.  Or doing those steps by hand.  Otherwise, there's no
point in restarting a fixed klog.service.
Comment 13 Dr. Werner Fink 2021-12-01 07:17:49 UTC
Yep there is a reason why klog.service is a oneshot as well it refuses to be started manually (means via e.g. systemctl) as otherwise it would trash/smash the /var/log/boot.msg as well as /var/log/boot.omsg

And klog.service should be able to access the character devices below /dev as otherwise /usr/bin/setlogcons as well as /bin/dmesg can not work ... also boot.msg  and boot.omsg should be in the real upper filesystem and not within a temporary userspace
Comment 14 Johannes Segitz 2021-12-01 07:58:11 UTC
Something must be wrong apart from the hardenings. It doesn't make sense that this service fails if just ProtectHome is set. 

ExecStart=/bin/sh -c "test -s /var/log/boot.msg && /bin/mv -f /var/log/boot.msg /var/log/boot.omsg || :"
ExecStart=/bin/sh -c "/bin/dmesg -r > /var/log/boot.msg"
ExecStart=/bin/sh -c "test -s /dev/shm/initrd.msg && /bin/cat /dev/shm/initrd.msg >> /var/log/boot.msg || :"
ExecStart=/bin/sh -c "test -c /dev/tty$KLOG_CONSOLE && /usr/bin/setlogcons $KLOG_CONSOLE || :"
ExecStart=/bin/sh -c "if test -n \"$CONSOLE_LOGLEVEL\" ; then /bin/dmesg -n $CONSOLE_LOGLEVEL || : ; fi"

nothing here accesses the home directories.

For me doing a daemon-reload wasn't enough to test this. I had to reboot the machine to get to a proper state. I have the service running with

ProtectSystem=full
ProtectHome=true
ProtectHostname=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictRealtime=true

enabled. As this is a one-shot service I don't mind if we remove those hardenings for this service, but it is quite strange. 

Can someone who has the problem please remove all but the ProtectHome hardenings,
reboot and verify that it still fails, then comment out the ExecStart commands one by one (rebooting each time) to see if we can find the command that trips?
Comment 15 Dr. Werner Fink 2021-12-01 09:05:09 UTC
For ProtectHome=true ... maybe a

  grep :/var/log /etc/passwd

shows if there is a HOME below /var/log ... like lirc
Comment 16 Johannes Segitz 2021-12-01 10:16:30 UTC
I tested this with lirc and also had a quick look at the systemd sources. I don't think that custom home directories are considered there.

To make sure I created a service with the hardening options to give me a reverse shell into the service. I'm able to execute all the commands in there without issues.

For the reporters you can do this with
zypper in gnu-netcat
cat << EOF > /etc/systemd/system/test.service
[Unit]
Description=Test service
Requires=local-fs.target
After=local-fs.target

[Service]
ProtectSystem=full
ProtectHome=true
ProtectHostname=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictRealtime=true
Type=oneshot
RemainAfterExit=yes
ExecStart= /usr/bin/nc 127.0.0.1 3000 -e /bin/bash

[Install]
WantedBy=basic.target
Alias=bootmsg.service
EOF

in one shell start the listener:
nc -nlp 3000

then start the service:
systemctl daemon-reload
systemctl start test

you now should have a shell where you can go through the commands. For me they all work (but for me the service also starts ;))
Comment 17 Dr. Werner Fink 2021-12-01 10:32:13 UTC
Within this shell your are able to change e.g. the default log tty if  PrivateDevices= is not touched.   Is this visible *outside* then?

Btw: There still exists loop hole if SystemCallFilter= is not used to protect all those Private, Protect, ReadOnly, ReadWrite, and Inaccessible options
Comment 18 Suse User 2021-12-03 09:24:09 UTC
Johannes,

I changed my local klog.service and klogd.service to match those from the link you provided. I commented manuallly the lines which are not present in your files. Then I rebooted.

Result: the error messages are gone and klog.service starts normally now.

Here is the changed configuration which works:
# grep -E '^#?Protect' klog*.service
klog.service:ProtectSystem=full
klog.service:ProtectHome=true
klog.service:ProtectHostname=true
klog.service:#ProtectClock=true
klog.service:ProtectKernelTunables=true
klog.service:ProtectKernelModules=true
klog.service:#ProtectKernelLogs=true
klog.service:ProtectControlGroups=true
klogd.service:ProtectSystem=full
klogd.service:ProtectHome=true
klogd.service:ProtectHostname=true
klogd.service:ProtectClock=true
klogd.service:ProtectKernelTunables=true
klogd.service:ProtectKernelModules=true
klogd.service:#ProtectKernelLogs=true
klogd.service:ProtectControlGroups=true
Comment 19 Simon Vogl 2021-12-03 10:38:09 UTC
Well well well well.....
After retrying the patch in my home repository and installing that, it works, even after kernel updates!

Turns out I made some severe mistake building the package I used previously that made the service files revert to the old state, my sincere sorry.

Johannes , the request 934727 works fine, just my brain unfortunately didn't.
Comment 20 Johannes Segitz 2021-12-03 13:46:27 UTC
good to hear, thank you for testing.

@Werner: Can you please accept the request?
Comment 21 Dr. Werner Fink 2021-12-03 14:02:18 UTC
(In reply to Johannes Segitz from comment #20)
> good to hear, thank you for testing.
> 
> @Werner: Can you please accept the request?

done
Comment 22 OBSbugzilla Bot 2021-12-03 15:40:06 UTC
This is an autogenerated message for OBS integration:
This bug (1193172) was mentioned in
https://build.opensuse.org/request/show/935468 Factory / syslogd
Comment 23 Simon Vogl 2021-12-08 14:43:50 UTC
I can confirm the bug is fixed in TW 20211206. Yay!
Comment 24 Johannes Segitz 2021-12-09 09:09:23 UTC
thanks for the confirmation