Bug 1227282 - [docs][SELinux]: kernel params security=selinux selinux=1 appends selinux behind bpf, leading to broken system in Leap 15.6
Summary: [docs][SELinux]: kernel params security=selinux selinux=1 appends selinux beh...
Status: REOPENED
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Documentation (show other bugs)
Version: Leap 15.6
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Kernel Bugs
QA Contact: Cathy Hu
URL:
Whiteboard: https://jira.suse.com/browse/DOCTEAM-...
Keywords:
Depends on: 1226937
Blocks:
  Show dependency treegraph
 
Reported: 2024-07-02 10:47 UTC by Cathy Hu
Modified: 2024-07-18 15:03 UTC (History)
8 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Cathy Hu 2024-07-02 10:47:37 UTC
+++ This bug was initially created as a clone of Bug #1226937 +++

see https://bugzilla.suse.com/show_bug.cgi?id=1226937#c5
Comment 1 Cathy Hu 2024-07-04 13:12:16 UTC
i can reproduce it, but still working on finding the root cause
Comment 2 Cathy Hu 2024-07-09 08:15:33 UTC
Reassigning to kernel people:

in Leap 15.6 kernel version 6.4.0-150600.23.7.3 (the current release), when I set the kernel parameters in /etc/default/grub in GRUB_CMDLINE_LINUX_DEFAULT:

security=selinux selinux=1

this results in this error reported by Felix: https://bugzilla.suse.com/show_bug.cgi?id=1226937#c5

I think it is because it appends `selinux` like this:

/sys/kernel/security/lsm -> lockdown,capability,bpf,selinux

However, selinux should be loaded before bpf. When I overwrite the lsm list via `lsm=` parameter like this, it works and the system boots up:

lsm=selinux,bpf selinux=1

/sys/kernel/security/lsm -> lockdown,capability,selinux,bpf


In tumbleweed (kernel-default-6.9.7-1.1), this seems to be fixed, so setting security=selinux selinux=1 results in:
/sys/kernel/security/lsm -> lockdown,capability,landlock,yama,selinux,bpf,ima,evm


Can this be fixed on the kernel side? Please let me know if you need more info or I am doing something really wrong :D Thanks!
Comment 3 Takashi Iwai 2024-07-09 13:31:50 UTC
My wild guess is that it could be the difference of CONFIG_LSM.

SLE15-SP6 / Leap 15.6 config doesn't contain selinux in its CONFIG_LSM while TW has both selinux and bpf in CONFIG_LSM, and selinux is defined before bpf.
The kernel appends the LSM specified via security= option if it isn't present, and that's the case for SLE15-SP6 / Leap 15.6.

ALP-current (SL Micro 6.0) kernel is with CONFIG_LSM containing selinux while its code base is more or less identical with SLE15-SP6.  Could you verify whether the problem persists with ALP-current kernel?
Comment 4 Cathy Hu 2024-07-10 08:35:16 UTC
The problem does not happen in SL Micro 6.0. I just set up a machine with the current image, which has SELinux enabled by default:

kernel version 6.4.0-17.1

/etc/default/grub:
security=selinux selinux=1

localhost:~ # cat /sys/kernel/security/lsm
lockdown,capability,landlock,yama,selinux,bpf

Please let me know if you need more informations :)
Comment 5 Takashi Iwai 2024-07-10 11:27:53 UTC
So it's indeed about CONFIG_LSM, the default lsm entries.

Unlike SLM-6.0 (ALP-current) kernel, SLE15-SP6 keeps the minimalistic CONFIG_LSM, as suggested at:
  https://bugzilla.suse.com/show_bug.cgi?id=1205603#c16

Adding Jiri to Cc.

Adding only selinux to CONFIG_LSM could be acceptable?

Meanwhile, I wonder whether it's a regression in SLE15-SP6.  Didn't SLE15-SP5 show the same problem at all?
Comment 6 David Disseldorp 2024-07-10 11:57:47 UTC
(In reply to Takashi Iwai from comment #5)
...
> Meanwhile, I wonder whether it's a regression in SLE15-SP6.  Didn't
> SLE15-SP5 show the same problem at all?

"bpf" appearing before the appended security=selinux LSM could well be considered a regression due to:

origin/SLE15-SP5:config/x86_64/default:CONFIG_LSM="integrity,apparmor"
origin/SLE15-SP6:config/x86_64/default:CONFIG_LSM="integrity,apparmor,bpf"

I don't know what the best way to proceed here would be - perhaps we just document the "lsm=" workaround?
Comment 7 Takashi Iwai 2024-07-10 12:32:18 UTC
Ah thanks, indeed bpf was appended only since SLE15-SP6, bsc#1219440.

If adding selinux wouldn't lead to a significant regression, we can update CONFIG_LSM on SLE15-SP6 as well.  i.e. CONFIG_LSM="integrity,apparmor,selinux,bpf"

But I'm not 100% sure about it.
Comment 8 Felix Niederwanger 2024-07-10 12:49:48 UTC
(In reply to Takashi Iwai from comment #5)
> Meanwhile, I wonder whether it's a regression in SLE15-SP6.  Didn't
> SLE15-SP5 show the same problem at all?

I think I was the first person to test SELinux here, and I did this on my self on Leap 15.6. Since SELinux is not officially supported but comes as-is I think nobody else has tested this so far. I will ask QE Security as well, if they know more.
Comment 9 Cathy Hu 2024-07-10 13:05:38 UTC
Ahh okay, thanks for the investigation!

Hmm so SELinux on Leap 15.6 is more or less a "tech preview" anyway, AppArmor is still the default MAC there. 

Probably it is the safest way then if I ask the documentation team to change the setup docs to `lsm=selinux,bpf selinux=1` for 15.6 before we introduce more regressions?
Comment 10 David Disseldorp 2024-07-10 13:23:38 UTC
(In reply to Cathy Hu from comment #9)
...
> Probably it is the safest way then if I ask the documentation team to change
> the setup docs to `lsm=selinux,bpf selinux=1` for 15.6 before we introduce
> more regressions?

I think that would likely be the best approach. We could also consider removing the new "bpf" entry, but I'd like to first learn about what systemd (and others) want it for. I've asked via https://bugzilla.suse.com/show_bug.cgi?id=1205603#c24 .
Comment 11 Cathy Hu 2024-07-10 14:52:37 UTC
thanks, i will reassign this to the docs team

@docs team, could you change in the Leap documentation for SELinux for Leap 15.6 (https://doc.opensuse.org/documentation/leap/security/html/book-security/cha-selinux.html#sec-selinux-getpolicy) that the GRUB_CMDLINE_LINUX_DEFAULT= parameter should be added like this:

lsm=selinux,bpf selinux=1

instead of

(wrong) security=selinux selinux=1

Only for 15.6, <15.6 should stay as before.

Thanks a lot :)
Comment 12 Amrita Sakthivel 2024-07-10 15:34:45 UTC
Thanks Cathy for adding me in the cc, I will take a look
Comment 13 Jiri Wiesner 2024-07-10 15:52:54 UTC
(In reply to Takashi Iwai from comment #7)
> If adding selinux wouldn't lead to a significant regression, we can update
> CONFIG_LSM on SLE15-SP6 as well.  i.e.
> CONFIG_LSM="integrity,apparmor,selinux,bpf"

I spent some time reading the parsing code in ordered_lsm_parse(). Adding selinux after apparmor and before bpf will make it possible to boot a system where the security=selinux has been passed to the kernel. With selinux in the mentioned position in CONFIG_LSM, these outcomes are expected:
1. When security=apparmor is passed to the kernel only apparmor will be enabled as it is the selected major LSM
2. When no security= argument is passed to the kernel only apparmor will be enabled as it is the first exclusive LSM in the CONFIG_LSM option
3. When security=selinux is passed to the kernel only selinux will be enabled as it is the selected major LSM

In the above 3 cases, the order of the LSMs will be determined by the CONFIG_LSM option. It should be noted that the security= argument is a legacy approach and the lsm= argument should be the preferred way to specify the LSMs to enable and as well as their ordering. On the other hand, the lsm= argument makes it possible for users to get it wrong and end up with a system that does not boot, e.g. passing lsm=bpf,selinux.
Comment 14 Amrita Sakthivel 2024-07-10 16:06:02 UTC
Cathy,Jiri,

Based on comment 13(specifically On the other hand, the lsm= argument makes it possible for users to get it wrong and end up with a system that does not boot, e.g. passing lsm=bpf,selinux.) , I am a little confused.

can you please confirm that I need to update to :
lsm=selinux,bpf selinux=1
Comment 15 Jiri Wiesner 2024-07-10 16:42:20 UTC
(In reply to Amrita Sakthivel from comment #14)
> Cathy,Jiri,
>
> Based on comment 13(specifically On the other hand, the lsm= argument makes
> it possible for users to get it wrong and end up with a system that does not
> boot, e.g. passing lsm=bpf,selinux.) , I am a little confused.
 
I would say this proves my point.
 
> can you please confirm that I need to update to :
> lsm=selinux,bpf selinux=1
 
Yes, this is the needed change. It will work on the GA release of 15sp6 as well as later updates. The order of the LSMs in the lsm= parameter matters. lsm=selinux,bpf is right and will work, lsm=bpf,selinux will result in a system that does not boot up. So, security=selinux would not work on the GA release of 15sp6 but it might work on later releases because I think we will change CONFIG_LSM to "integrity,apparmor,selinux,bpf". lsm=selinux,bpf will work always but there is a slight possibility of someone getting the order of the LSMs wrong (because the person might think it does not matter).
 
I must admit I do not understand the exact reason why a system that is passed lsm=bpf,selinux does not boot. I think it should be fixed along with changing the documentation. The bpf and selinux LSMs are initialized very early in the boot sequence and the root switch happens much later:
> [    0.217801] LSM: initializing lsm=lockdown,capability,bpf,selinux,integrity
> [    0.217801] LSM support for eBPF active
> [    0.217801] SELinux:  Initializing.
I suspect it's not the kernel causing this but I don't really know.
Comment 16 Jiri Wiesner 2024-07-11 09:18:08 UTC
I have tried reproducing the issue. It cannot be reproduced without installing the SELinux policies no matter the value of the lsm= or security= parameters. I can reproduce it after installing the SELinux policies and passing "lsm=bpf,selinux selinux=1" to the kernel. The system boots up when SELinux policies are installed but SELinux is off - "lsm=bpf,selinux selinux=0" passed to the kernel. So, it takes a specific ordering of the LSMs, SELinux policies and SELinux being on to get a failure. I tried a simple command to search the SELinux policies:
> grep bpf $(for p in restorecond policycoreutils setools-console selinux-policy-targeted selinux-policy-devel; do rpm -ql $p; done) 2> /dev/null
but I could not make much sense of the output. Could someone in charge of SELinux look at this issue and clarify the root cause of the failure to boot up?
Comment 17 Amrita Sakthivel 2024-07-11 15:50:59 UTC
Thanks Cathy for confirming , I will go ahead with your suggested change
Comment 18 Amrita Sakthivel 2024-07-12 06:31:35 UTC
Merged
Comment 19 Cathy Hu 2024-07-12 09:17:58 UTC
@Amrita, thanks for changing the docs :)

@Jiri
the issue happens in permissive mode as well, so it is probably not the policy as it should not block anything in permissive mode (add `security=selinux selinux=1 enforcing=0` to test)

i can have a look next week if it is something else in the userspace setup of selinux that is not the policy
Comment 20 Jiri Wiesner 2024-07-12 10:11:35 UTC
Thanks. I think userspace should not be sensitive to the order of LSMs. The current state is brittle - a user error can cause a failure to boot. AFAIK, the order of LSMs matters for initialization. The kernel carries out initialization early in the boot sequence and before the init process is started. If it's not the userspace component of SELinux then it's probably systemd what should be improved.
Comment 21 Fabian Vogt 2024-07-15 14:18:27 UTC
Is this just bug 1197746 again?
Comment 22 Cathy Hu 2024-07-16 08:40:25 UTC
Looks like it, and the root cause already was debugged as well: https://bugzilla.suse.com/show_bug.cgi?id=1197746#c1

@Jiri, i dont think we can change that in userspace, but please let me know if i misunderstood something
Comment 23 Jiri Wiesner 2024-07-18 15:03:06 UTC
(In reply to Fabian Vogt from comment #21)
> Is this just bug 1197746 again?

Thank you, Fabian. That helped.

(In reply to Cathy Hu from comment #22)
> Looks like it, and the root cause already was debugged as well:
> https://bugzilla.suse.com/show_bug.cgi?id=1197746#c1

I needed to see more data to understand how the errors reported in user space are caused by the LSMs. There are many errors reported during the boot sequence. I just took the first error 
systemd[1]: Failed to set SELinux security context system_u:object_r:device_t:s0 for /dev/core: Invalid argument
and enabled tracing for the kernel:
quiet tp_printk log_buf_len=64M trace_buf_size=8M kprobe_event=p,do_sys_openat2,path=+0(%si):ustring\;r,do_sys_openat2,\$retval\;p,ksys_write,fd=%di,count=%dx\;r,ksys_write,ret=\$retval\;p,security_setprocattr,lsm=+0(%di):string,lsmp=%di,name=+0(%si):string\;r,security_setprocattr,ret=\$retval

The error is reported after systemd fails to execute setfscreatecon_raw(), which is implemented by libselinux. The output of the trace shows that it was the write syscall what returned an error:
> kernel: p_do_sys_openat2_0: (do_sys_openat2+0x0/0x320) path="/proc/thread-self/attr/fscreate"
> kernel: r_do_sys_openat2_0: (do_sys_open+0x57/0x80 <- do_sys_openat2) arg1=0x4
> kernel: p_ksys_write_0: (ksys_write+0x0/0xe0) fd=0x4 count=0x1e
> kernel: p_security_setprocattr_0: (security_setprocattr+0x0/0x70) lsm=(fault) lsmp=0x0 name="fscreate"
> kernel: r_security_setprocattr_0: (proc_pid_attr_write+0x10d/0x160 <- security_setprocattr) ret=0xffffffea
> kernel: r_ksys_write_0: (do_syscall_64+0x5b/0x80 <- ksys_write) ret=0xffffffffffffffea
> systemd[1]: Failed to set SELinux security context system_u:object_r:device_t:s0 for /dev/core: Invalid argument
The error originated from security_setprocattr(), which executed the hook of the bpf LSM.

This is a successful boot:
> kernel: [    6.779348][    T1] p_do_sys_openat2_0: (do_sys_openat2+0x0/0x320) path="/proc/thread-self/attr/fscreate"
> kernel: [    6.779478][    T1] r_do_sys_openat2_0: (do_sys_open+0x57/0x80 <- do_sys_openat2) arg1=0x4
> kernel: [    6.779486][    T1] p_ksys_write_0: (ksys_write+0x0/0xe0) fd=0x4 count=0x1e
> kernel: [    6.779497][    T1] p_security_setprocattr_0: (security_setprocattr+0x0/0x70) lsm=(fault) lsmp=0x0 name="fscreate"
> kernel: [    6.779508][    T1] r_security_setprocattr_0: (proc_pid_attr_write+0x10d/0x160 <- security_setprocattr) ret=0x1e
> kernel: [    6.779513][    T1] r_ksys_write_0: (do_syscall_64+0x5b/0x80 <- ksys_write) ret=0x1e

> i dont think we can change that in userspace

Yes, it's on the kernel side where improvements need to be made. I have submitted a commit changing CONFIG_LSM to "integrity,apparmor,selinux,bpf" to 15sp6. This will resolve the issue with the security=selinux parameter but it will not protect against user error when passing lsm=bpf,selinux to the kernel. I don't think there is a better solution at the moment. Changing the default value of selected LSM hooks would be required (I expect resistance against that upstream), or SUSE would provide BPF code for the LSM hooks causing boot failure. Both solutions seems overkill when all that is needed it getting the order of LSMs right.