Bug 1220704 (CVE-2021-46987)

Summary: VUL-0: CVE-2021-46987: kernel: btrfs: fix deadlock when cloning inline extents and using qgroups
Product: [Novell Products] SUSE Security Incidents Reporter: SMASH SMASH <smash_bz>
Component: IncidentsAssignee: Security Team bot <security-team>
Status: RESOLVED INVALID QA Contact: Security Team bot <security-team>
Severity: Normal    
Priority: P3 - Medium CC: andrea.mattiazzo, vasant.karasulli
Version: unspecified   
Target Milestone: ---   
Hardware: Other   
OS: Other   
URL: https://smash.suse.de/issue/395436/
Whiteboard: CVSSv3.1:SUSE:CVE-2021-46987:5.5:(AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H)
Found By: Security Response Team Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description SMASH SMASH 2024-02-29 15:44:09 UTC
In the Linux kernel, the following vulnerability has been resolved:

btrfs: fix deadlock when cloning inline extents and using qgroups

There are a few exceptional cases where cloning an inline extent needs to
copy the inline extent data into a page of the destination inode.

When this happens, we end up starting a transaction while having a dirty
page for the destination inode and while having the range locked in the
destination's inode iotree too. Because when reserving metadata space
for a transaction we may need to flush existing delalloc in case there is
not enough free space, we have a mechanism in place to prevent a deadlock,
which was introduced in commit 3d45f221ce627d ("btrfs: fix deadlock when
cloning inline extent and low on free metadata space").

However when using qgroups, a transaction also reserves metadata qgroup
space, which can also result in flushing delalloc in case there is not
enough available space at the moment. When this happens we deadlock, since
flushing delalloc requires locking the file range in the inode's iotree
and the range was already locked at the very beginning of the clone
operation, before attempting to start the transaction.

When this issue happens, stack traces like the following are reported:

  [72747.556262] task:kworker/u81:9   state:D stack:    0 pid:  225 ppid:     2 flags:0x00004000
  [72747.556268] Workqueue: writeback wb_workfn (flush-btrfs-1142)
  [72747.556271] Call Trace:
  [72747.556273]  __schedule+0x296/0x760
  [72747.556277]  schedule+0x3c/0xa0
  [72747.556279]  io_schedule+0x12/0x40
  [72747.556284]  __lock_page+0x13c/0x280
  [72747.556287]  ? generic_file_readonly_mmap+0x70/0x70
  [72747.556325]  extent_write_cache_pages+0x22a/0x440 [btrfs]
  [72747.556331]  ? __set_page_dirty_nobuffers+0xe7/0x160
  [72747.556358]  ? set_extent_buffer_dirty+0x5e/0x80 [btrfs]
  [72747.556362]  ? update_group_capacity+0x25/0x210
  [72747.556366]  ? cpumask_next_and+0x1a/0x20
  [72747.556391]  extent_writepages+0x44/0xa0 [btrfs]
  [72747.556394]  do_writepages+0x41/0xd0
  [72747.556398]  __writeback_single_inode+0x39/0x2a0
  [72747.556403]  writeback_sb_inodes+0x1ea/0x440
  [72747.556407]  __writeback_inodes_wb+0x5f/0xc0
  [72747.556410]  wb_writeback+0x235/0x2b0
  [72747.556414]  ? get_nr_inodes+0x35/0x50
  [72747.556417]  wb_workfn+0x354/0x490
  [72747.556420]  ? newidle_balance+0x2c5/0x3e0
  [72747.556424]  process_one_work+0x1aa/0x340
  [72747.556426]  worker_thread+0x30/0x390
  [72747.556429]  ? create_worker+0x1a0/0x1a0
  [72747.556432]  kthread+0x116/0x130
  [72747.556435]  ? kthread_park+0x80/0x80
  [72747.556438]  ret_from_fork+0x1f/0x30

  [72747.566958] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
  [72747.566961] Call Trace:
  [72747.566964]  __schedule+0x296/0x760
  [72747.566968]  ? finish_wait+0x80/0x80
  [72747.566970]  schedule+0x3c/0xa0
  [72747.566995]  wait_extent_bit.constprop.68+0x13b/0x1c0 [btrfs]
  [72747.566999]  ? finish_wait+0x80/0x80
  [72747.567024]  lock_extent_bits+0x37/0x90 [btrfs]
  [72747.567047]  btrfs_invalidatepage+0x299/0x2c0 [btrfs]
  [72747.567051]  ? find_get_pages_range_tag+0x2cd/0x380
  [72747.567076]  __extent_writepage+0x203/0x320 [btrfs]
  [72747.567102]  extent_write_cache_pages+0x2bb/0x440 [btrfs]
  [72747.567106]  ? update_load_avg+0x7e/0x5f0
  [72747.567109]  ? enqueue_entity+0xf4/0x6f0
  [72747.567134]  extent_writepages+0x44/0xa0 [btrfs]
  [72747.567137]  ? enqueue_task_fair+0x93/0x6f0
  [72747.567140]  do_writepages+0x41/0xd0
  [72747.567144]  __filemap_fdatawrite_range+0xc7/0x100
  [72747.567167]  btrfs_run_delalloc_work+0x17/0x40 [btrfs]
  [72747.567195]  btrfs_work_helper+0xc2/0x300 [btrfs]
  [72747.567200]  process_one_work+0x1aa/0x340
  [72747.567202]  worker_thread+0x30/0x390
  [72747.567205]  ? create_worker+0x1a0/0x1a0
  [72747.567208]  kthread+0x116/0x130
  [72747.567211]  ? kthread_park+0x80/0x80
  [72747.567214]  ret_from_fork+0x1f/0x30

  [72747.569686] task:fsstress        state:D stack:    
---truncated---

References:
http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2021-46987
https://www.cve.org/CVERecord?id=CVE-2021-46987
https://bugzilla.redhat.com/show_bug.cgi?id=2266752
https://lore.kernel.org/linux-cve-announce/2024022825-CVE-2021-46987-f73f@gregkh/T/#u

Patch:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f9baa501b4fd
Comment 1 Andrea Mattiazzo 2024-02-29 15:48:24 UTC
Tracking as affected:
-cve/linux-5.3

cve/linux-5.14,stable and SLE15-SP6 are already patched. Below cve/linux-5.3 are not affected.
Comment 4 David Sterba 2024-05-07 13:59:00 UTC
(In reply to Andrea Mattiazzo from comment #1)
> Tracking as affected:
> -cve/linux-5.3

Not affected because it fixes commit e076ab2a2ca70a ("btrfs: shrink delalloc pages instead of full inodes") which is in 5.10 and we don't have backport to 5.3.
Comment 5 Andrea Mattiazzo 2024-05-16 15:17:08 UTC
Thanks for the info. Closing.