Bug 1225738 (CVE-2024-36888) - VUL-0: CVE-2024-36888: kernel: workqueue: fix selection of wake_cpu in kick_pool()
Summary: VUL-0: CVE-2024-36888: kernel: workqueue: fix selection of wake_cpu in kick_p...
Status: RESOLVED INVALID
Alias: CVE-2024-36888
Product: SUSE Security Incidents
Classification: Novell Products
Component: Incidents (show other bugs)
Version: unspecified
Hardware: Other Other
: P3 - Medium : Normal
Target Milestone: ---
Assignee: Security Team bot
QA Contact: Security Team bot
URL: https://smash.suse.de/issue/408220/
Whiteboard: CVSSv3.1:SUSE:CVE-2024-36888:4.1:(AV:...
Keywords:
Depends on:
Blocks:
 
Reported: 2024-05-31 12:01 UTC by SMASH SMASH
Modified: 2024-06-05 18:46 UTC (History)
2 users (show)

See Also:
Found By: Security Response Team
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description SMASH SMASH 2024-05-31 12:01:39 UTC
In the Linux kernel, the following vulnerability has been resolved:

workqueue: Fix selection of wake_cpu in kick_pool()

With cpu_possible_mask=0-63 and cpu_online_mask=0-7 the following
kernel oops was observed:

smp: Bringing up secondary CPUs ...
smp: Brought up 1 node, 8 CPUs
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000803
[..]
 Call Trace:
arch_vcpu_is_preempted+0x12/0x80
select_idle_sibling+0x42/0x560
select_task_rq_fair+0x29a/0x3b0
try_to_wake_up+0x38e/0x6e0
kick_pool+0xa4/0x198
__queue_work.part.0+0x2bc/0x3a8
call_timer_fn+0x36/0x160
__run_timers+0x1e2/0x328
__run_timer_base+0x5a/0x88
run_timer_softirq+0x40/0x78
__do_softirq+0x118/0x388
irq_exit_rcu+0xc0/0xd8
do_ext_irq+0xae/0x168
ext_int_handler+0xbe/0xf0
psw_idle_exit+0x0/0xc
default_idle_call+0x3c/0x110
do_idle+0xd4/0x158
cpu_startup_entry+0x40/0x48
rest_init+0xc6/0xc8
start_kernel+0x3c4/0x5e0
startup_continue+0x3c/0x50

The crash is caused by calling arch_vcpu_is_preempted() for an offline
CPU. To avoid this, select the cpu with cpumask_any_and_distribute()
to mask __pod_cpumask with cpu_online_mask. In case no cpu is left in
the pool, skip the assignment.

tj: This doesn't fully fix the bug as CPUs can still go down between picking
the target CPU and the wake call. Fixing that likely requires adding
cpu_online() test to either the sched or s390 arch code. However, regardless
of how that is fixed, workqueue shouldn't be picking a CPU which isn't
online as that would result in unpredictable and worse behavior.

References:
http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2024-36888
https://git.kernel.org/pub/scm/linux/security/vulns.git/plain/cve/published/2024/CVE-2024-36888.mbox
https://git.kernel.org/stable/c/c57824d4fe07c2131f8c48687cbd5ee2be60c767
https://git.kernel.org/stable/c/6d559e70b3eb6623935cbe7f94c1912c1099777b
https://git.kernel.org/stable/c/57a01eafdcf78f6da34fad9ff075ed5dfdd9f420
https://www.cve.org/CVERecord?id=CVE-2024-36888
Comment 1 Miroslav Franc 2024-06-05 12:09:15 UTC
This pertains only to stable branch, including the fix.  Switching back to the security team.