Bug 1227954 (CVE-2022-48800) - VUL-0: CVE-2022-48800: kernel: mm: vmscan: remove deadlock due to throttling failing to make progress
Summary: VUL-0: CVE-2022-48800: kernel: mm: vmscan: remove deadlock due to throttling ...
Status: RESOLVED FIXED
Alias: CVE-2022-48800
Product: SUSE Security Incidents
Classification: Novell Products
Component: Incidents (show other bugs)
Version: unspecified
Hardware: Other Other
: P4 - Low : Normal
Target Milestone: ---
Assignee: Security Team bot
QA Contact: Security Team bot
URL: https://smash.suse.de/issue/414218/
Whiteboard: CVSSv3.1:SUSE:CVE-2022-48800:2.5:(AV:...
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-16 14:45 UTC by SMASH SMASH
Modified: 2024-07-18 11:00 UTC (History)
2 users (show)

See Also:
Found By: Security Response Team
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description SMASH SMASH 2024-07-16 14:45:11 UTC
In the Linux kernel, the following vulnerability has been resolved:

mm: vmscan: remove deadlock due to throttling failing to make progress

A soft lockup bug in kcompactd was reported in a private bugzilla with
the following visible in dmesg;

  watchdog: BUG: soft lockup - CPU#33 stuck for 26s! [kcompactd0:479]
  watchdog: BUG: soft lockup - CPU#33 stuck for 52s! [kcompactd0:479]
  watchdog: BUG: soft lockup - CPU#33 stuck for 78s! [kcompactd0:479]
  watchdog: BUG: soft lockup - CPU#33 stuck for 104s! [kcompactd0:479]

The machine had 256G of RAM with no swap and an earlier failed
allocation indicated that node 0 where kcompactd was run was potentially
unreclaimable;

  Node 0 active_anon:29355112kB inactive_anon:2913528kB active_file:0kB
    inactive_file:0kB unevictable:64kB isolated(anon):0kB isolated(file):0kB
    mapped:8kB dirty:0kB writeback:0kB shmem:26780kB shmem_thp:
    0kB shmem_pmdmapped: 0kB anon_thp: 23480320kB writeback_tmp:0kB
    kernel_stack:2272kB pagetables:24500kB all_unreclaimable? yes

Vlastimil Babka investigated a crash dump and found that a task
migrating pages was trying to drain PCP lists;

  PID: 52922  TASK: ffff969f820e5000  CPU: 19  COMMAND: "kworker/u128:3"
  Call Trace:
     __schedule
     schedule
     schedule_timeout
     wait_for_completion
     __flush_work
     __drain_all_pages
     __alloc_pages_slowpath.constprop.114
     __alloc_pages
     alloc_migration_target
     migrate_pages
     migrate_to_node
     do_migrate_pages
     cpuset_migrate_mm_workfn
     process_one_work
     worker_thread
     kthread
     ret_from_fork

This failure is specific to CONFIG_PREEMPT=n builds.  The root of the
problem is that kcompact0 is not rescheduling on a CPU while a task that
has isolated a large number of the pages from the LRU is waiting on
kcompact0 to reschedule so the pages can be released.  While
shrink_inactive_list() only loops once around too_many_isolated, reclaim
can continue without rescheduling if sc->skipped_deactivate == 1 which
could happen if there was no file LRU and the inactive anon list was not
low.

References:
http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2022-48800
https://git.kernel.org/pub/scm/linux/security/vulns.git/plain/cve/published/2022/CVE-2022-48800.mbox
https://git.kernel.org/stable/c/3980cff6349687f73d5109f156f23cb261c24164
https://git.kernel.org/stable/c/b485c6f1f9f54b81443efda5f3d8a5036ba2cd91
https://www.cve.org/CVERecord?id=CVE-2022-48800