Bug 1227437 (CVE-2024-39476) - VUL-0: CVE-2024-39476: kernel: md/raid5: fix deadlock that raid5d() wait for itself to clear MD_SB_CHANGE_PENDING
Summary: VUL-0: CVE-2024-39476: kernel: md/raid5: fix deadlock that raid5d() wait for ...
Status: NEW
Alias: CVE-2024-39476
Product: SUSE Security Incidents
Classification: Novell Products
Component: Incidents (show other bugs)
Version: unspecified
Hardware: Other Other
: P3 - Medium : Normal
Target Milestone: ---
Assignee: Coly Li
QA Contact: Security Team bot
URL: https://smash.suse.de/issue/412896/
Whiteboard: CVSSv3.1:SUSE:CVE-2024-39476:5.1:(AV:...
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-05 09:21 UTC by SMASH SMASH
Modified: 2024-07-08 04:41 UTC (History)
2 users (show)

See Also:
Found By: Security Response Team
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description SMASH SMASH 2024-07-05 09:21:20 UTC
In the Linux kernel, the following vulnerability has been resolved:

md/raid5: fix deadlock that raid5d() wait for itself to clear MD_SB_CHANGE_PENDING

Xiao reported that lvm2 test lvconvert-raid-takeover.sh can hang with
small possibility, the root cause is exactly the same as commit
bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"")

However, Dan reported another hang after that, and junxiao investigated
the problem and found out that this is caused by plugged bio can't issue
from raid5d().

Current implementation in raid5d() has a weird dependence:

1) md_check_recovery() from raid5d() must hold 'reconfig_mutex' to clear
   MD_SB_CHANGE_PENDING;
2) raid5d() handles IO in a deadloop, until all IO are issued;
3) IO from raid5d() must wait for MD_SB_CHANGE_PENDING to be cleared;

This behaviour is introduce before v2.6, and for consequence, if other
context hold 'reconfig_mutex', and md_check_recovery() can't update
super_block, then raid5d() will waste one cpu 100% by the deadloop, until
'reconfig_mutex' is released.

Refer to the implementation from raid1 and raid10, fix this problem by
skipping issue IO if MD_SB_CHANGE_PENDING is still set after
md_check_recovery(), daemon thread will be woken up when 'reconfig_mutex'
is released. Meanwhile, the hang problem will be fixed as well.

References:
http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2024-39476
https://git.kernel.org/pub/scm/linux/security/vulns.git/plain/cve/published/2024/CVE-2024-39476.mbox
https://git.kernel.org/stable/c/b32aa95843cac6b12c2c014d40fca18aef24a347
https://git.kernel.org/stable/c/634ba3c97ec413cb10681c7b196db43ee461ecf4
https://git.kernel.org/stable/c/aa64464c8f4d2ab92f6d0b959a1e0767b829d787
https://git.kernel.org/stable/c/098d54934814dd876963abfe751c3b1cf7fbe56a
https://git.kernel.org/stable/c/3f8d5e802d4cedd445f9a89be8c3fd2d0e99024b
https://git.kernel.org/stable/c/cd2538e5af495b3c747e503db346470fc1ffc447
https://git.kernel.org/stable/c/e332a12f65d8fed8cf63bedb4e9317bb872b9ac7
https://git.kernel.org/stable/c/151f66bb618d1fd0eeb84acb61b4a9fa5d8bb0fa
https://www.cve.org/CVERecord?id=CVE-2024-39476