Bugzilla – Bug 1221128
VUL-0: REJECTED: CVE-2024-26628: kernel: drm/amdkfd: Fix lock dependency warning
Last modified: 2024-05-31 13:22:04 UTC
In the Linux kernel, the following vulnerability has been resolved: drm/amdkfd: Fix lock dependency warning ====================================================== WARNING: possible circular locking dependency detected 6.5.0-kfd-fkuehlin #276 Not tainted ------------------------------------------------------ kworker/8:2/2676 is trying to acquire lock: ffff9435aae95c88 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550 but task is already holding lock: ffff9435cd8e1720 (&svms->lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&svms->lock){+.+.}-{3:3}: __mutex_lock+0x97/0xd30 kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu] kfd_ioctl+0x1b2/0x5d0 [amdgpu] __x64_sys_ioctl+0x86/0xc0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #1 (&mm->mmap_lock){++++}-{3:3}: down_read+0x42/0x160 svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20 -> #0 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}: __lock_acquire+0x1426/0x2200 lock_acquire+0xc1/0x2b0 __flush_work+0x80/0x550 __cancel_work_timer+0x109/0x190 svm_range_bo_release+0xdc/0x1c0 [amdgpu] svm_range_free+0x175/0x180 [amdgpu] svm_range_deferred_list_work+0x15d/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20 other info that might help us debug this: Chain exists of: (work_completion)(&svm_bo->eviction_work) --> &mm->mmap_lock --> &svms->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&svms->lock); lock(&mm->mmap_lock); lock(&svms->lock); lock((work_completion)(&svm_bo->eviction_work)); I believe this cannot really lead to a deadlock in practice, because svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO refcount is non-0. That means it's impossible that svm_range_bo_release is running concurrently. However, there is no good way to annotate this. To avoid the problem, take a BO reference in svm_range_schedule_evict_svm_bo instead of in the worker. That way it's impossible for a BO to get freed while eviction work is pending and the cancel_work_sync call in svm_range_bo_release can be eliminated. v2: Use svm_bo_ref_unless_zero and explained why that's safe. Also removed redundant checks that are already done in amdkfd_fence_enable_signaling. References: http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2024-26628 https://www.cve.org/CVERecord?id=CVE-2024-26628 https://bugzilla.redhat.com/show_bug.cgi?id=2268212 https://lore.kernel.org/linux-cve-announce/2024030649-CVE-2024-26628-f6ce@gregkh/ Patch: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=47bf0f83fc86
Already patched: - ALP-current - SLE15-SP5 - SLE15-SP6 - stable @kernel-bugs could you please add CVE reference? Tracking as affected: - SLE15-SP4-LTSS - cve/linux-5.14
Reading through the changelog: I believe this cannot really lead to a deadlock in practice, because svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO refcount is non-0. That means it's impossible that svm_range_bo_release is running concurrently. However, there is no good way to annotate this. suggests the CVE is just bogus. The patch might be an improvement but I do not see it fixes an actual bug. Let me bring that up upstream.
REJECTED: https://lore.kernel.org/linux-cve-announce/20240320164818.3778843-2-lee@kernel.org/T/#u
(In reply to Robert Frohl from comment #3) > REJECTED: > > https://lore.kernel.org/linux-cve-announce/20240320164818.3778843-2- > lee@kernel.org/T/#u Reset assigner because CVE be rejected.
Closing as invalid since rejected.