Bug 1227893 (CVE-2024-40976) - VUL-0: CVE-2024-40976: kernel: drm/lima: mask irqs in timeout path before hard reset
Summary: VUL-0: CVE-2024-40976: kernel: drm/lima: mask irqs in timeout path before har...
Status: NEW
Alias: CVE-2024-40976
Product: SUSE Security Incidents
Classification: Novell Products
Component: Incidents (show other bugs)
Version: unspecified
Hardware: Other Other
: P3 - Medium : Normal
Target Milestone: ---
Assignee: Kernel Bugs
QA Contact: Security Team bot
URL: https://smash.suse.de/issue/413903/
Whiteboard: CVSSv3.1:SUSE:CVE-2024-40976:4.7:(AV:...
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-16 08:43 UTC by SMASH SMASH
Modified: 2024-07-19 16:22 UTC (History)
1 user (show)

See Also:
Found By: Security Response Team
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description SMASH SMASH 2024-07-16 08:43:40 UTC
In the Linux kernel, the following vulnerability has been resolved:

drm/lima: mask irqs in timeout path before hard reset

There is a race condition in which a rendering job might take just long
enough to trigger the drm sched job timeout handler but also still
complete before the hard reset is done by the timeout handler.
This runs into race conditions not expected by the timeout handler.
In some very specific cases it currently may result in a refcount
imbalance on lima_pm_idle, with a stack dump such as:

[10136.669170] WARNING: CPU: 0 PID: 0 at drivers/gpu/drm/lima/lima_devfreq.c:205 lima_devfreq_record_idle+0xa0/0xb0
...
[10136.669459] pc : lima_devfreq_record_idle+0xa0/0xb0
...
[10136.669628] Call trace:
[10136.669634]  lima_devfreq_record_idle+0xa0/0xb0
[10136.669646]  lima_sched_pipe_task_done+0x5c/0xb0
[10136.669656]  lima_gp_irq_handler+0xa8/0x120
[10136.669666]  __handle_irq_event_percpu+0x48/0x160
[10136.669679]  handle_irq_event+0x4c/0xc0

We can prevent that race condition entirely by masking the irqs at the
beginning of the timeout handler, at which point we give up on waiting
for that job entirely.
The irqs will be enabled again at the next hard reset which is already
done as a recovery by the timeout handler.

References:
http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2024-40976
https://www.cve.org/CVERecord?id=CVE-2024-40976
https://git.kernel.org/stable/c/03e7b2f7ae4c0ae5fb8e4e2454ba4008877f196a
https://git.kernel.org/stable/c/58bfd311c93d66d8282bf21ebbf35cc3bb8ad9db
https://git.kernel.org/stable/c/70aa1f2dec46b6fdb5f6b9f37b6bfa4a4dee0d3a
https://git.kernel.org/stable/c/9fd8ddd23793a50dbcd11c6ba51f437f1ea7d344
https://git.kernel.org/stable/c/a421cc7a6a001b70415aa4f66024fa6178885a14
https://git.kernel.org/stable/c/bdbc4ca77f5eaac15de7230814253cddfed273b1
https://git.kernel.org/pub/scm/linux/security/vulns.git/plain/cve/published/2024/CVE-2024-40976.mbox