Bugzilla – Bug 64652
VUL-0: CVE-2005-0001: kernel: page fault on SMP machines
Last modified: 2021-10-19 14:01:37 UTC
Paul Starzetz found a kernel bug again. From: Paul Starzetz <ihaquer@isec.pl> To: vendor-sec <vendor-sec@lst.de> Subject: [vendor-sec] page fault @ SMP Errors-To: vendor-sec-admin@lst.de Date: Wed, 5 Jan 2005 21:57:58 +0100 (CET) Hi, I have found an exploitable flaw in the page fault handler, however only in the SMP case. The problem is this: [...] see attachment. Directly assigned to Andrea for verifying the bug+patch.
<!-- SBZ_reproduce --> .
Created attachment 27460 [details] vendor-sec discussion
Created attachment 27461 [details] smp-fault-2.4.diff
Created attachment 27462 [details] smp-fault-2.6.diff
CAN-2005-0001
I don't see the point of the VM_GROWSUP part since that code cannot handle growsup, but it's harmless. Anyway it's buggy: + } else if ((vma->vm_flags & VM_GROWSUP) && vma->vm_start >= address) { Hugh also pointed it out in the email discussion), it must be "vma->vm_end >= address". (both vm_start should be replaced with vm_end, and really once the growsup code will be enabled [as said currently it's useless noop] special care should be taken for the ">=", that might have to become a ">") Same goes for the 2.6 patch, it must be written like this: address += 4 + PAGE_SIZE - 1; address &= PAGE_MASK; /* already expanded while waiting for anon_vma lock? */ if (vma->vm_end >= address) { anon_vma_unlock(vma); return 0; } BTW, I designed the 2.6 anon-vma locking so I'm partly to blame for not noticing this race. They CC'ed various mainline folks but not me, so I guess they don't need my help. So just fix the above bit s/vm_start/vm_end/ in the VM_GROWSUP and everything should be fine. It's a very minor bug, since it's not going to be easily reproducible and it requires SMP.
* This comment was added by mail. YFYI Date: Fri, 7 Jan 2005 10:58:52 +0000 (GMT) From: Mark J Cox <mjc@redhat.com> To: Paul Starzetz <ihaquer@isec.pl> Cc: vendor-sec <vendor-sec@lst.de> Subject: Re: [vendor-sec] page fault @ SMP On Wed, 5 Jan 2005, Paul Starzetz wrote: |I have found an exploitable flaw in the page fault handler, however only |in the SMP case. We noticed yesterday that our kernel folks had previously already fixed this in a December 20th update to our kernels - it was in response to a bug logged by one of our customers and didn't get escalated to the security team as it wasn't seen to have a security consequence at the time: http://rhn.redhat.com/errata/RHBA-2004-550.html contains kernel-2.4.21-27.EL.src.rpm which has the following patch inside: + + /* check if another thread has already expanded the stack */ + if (address >= vma->vm_start) { + spin_unlock(&vma->vm_mm->page_table_lock); + vm_validate_enough("exiting expand_stack - NOTHING TO DO"); + return 0; + } + This fix was also scheduled for a RHEL2.1 kernel update, but that isn't out yet. Cheers, Mark _______________________________________________ Vendor Security mailing list Vendor Security@lst.de https://www.lst.de/cgi-bin/mailman/listinfo/vendor-sec
* This comment was added by mail. JFYI Date: Fri, 7 Jan 2005 13:11:40 -0200 From: Marcelo Tosatti <marcelo.tosatti@cyclades.com> To: Hugh Dickins <hugh@veritas.com> Cc: Paul Starzetz <ihaquer@isec.pl>, vendor-sec <vendor-sec@lst.de>, Andrew Morton <akpm@osdl.org>, mingo@elte.hu Subject: Re: [vendor-sec] page fault @ SMP | Thanks for the review! Attached are updated patches. Description: Fix expand_stack() SMP race Two threads sharing the same VMA can race in expand_stack, resulting in incorrect VMA size accounting and possibly a "uncovered-by-VMA" pte leak. Fix is to check if the stack has already been expanded after acquiring a lock which guarantees exclusivity (page_table_lock in v2.4 and vma_anon lock in v2.6). v2.4: --- linux-2.4.28.orig/include/linux/mm.h 2005-01-07 09:12:48.000000000 -0200 +++ linux-2.4.28/include/linux/mm.h 2005-01-07 14:51:20.595060272 -0200 unsigned long grow; /* - * vma->vm_start/vm_end cannot change under us because the caller is required - * to hold the mmap_sem in write mode. We need to get the spinlock only - * before relocating the vma range ourself. + * vma->vm_start/vm_end cannot change under us because the caller + * is required to hold the mmap_sem in read mode. We need the + * page_table_lock lock to serialize against concurrent expand_stacks. */ address &= PAGE_MASK; spin_lock(&vma->vm_mm->page_table_lock); + + /* already expanded while we were spinning? */ + if (vma->vm_start <= address) { + spin_unlock(&vma->vm_mm->page_table_lock); + return 0; + } + grow = (vma->vm_start - address) >> PAGE_SHIFT; if (vma->vm_end - address > current->rlim[RLIMIT_STACK].rlim_cur || ((vma->vm_mm->total_vm + grow) << PAGE_SHIFT) > current->rlim[RLIMIT_AS].rlim_cur) { v2.6: --- linux-2.6.10-mm1.orig/mm/mmap.c 2005-01-05 15:58:26.000000000 -0200 +++ linux-2.6.10-mm1/mm/mmap.c 2005-01-07 14:47:05.894780600 -0200 */ address += 4 + PAGE_SIZE - 1; address &= PAGE_MASK; + + /* already expanded while waiting for anon_vma lock? */ + if (vma->vm_end >= address) { + anon_vma_unlock(vma); + return 0; + } + grow = (address - vma->vm_end) >> PAGE_SHIFT; /* Overcommit.. */ return -ENOMEM; anon_vma_lock(vma); + /* already expanded while waiting for anon_vma lock? */ + if (vma->vm_start <= address) { + anon_vma_unlock(vma); + return 0; + } + /* * vma->vm_start/vm_end cannot change under us because the caller * is required to hold the mmap_sem in read mode. We need the _______________________________________________ Vendor Security mailing list Vendor Security@lst.de https://www.lst.de/cgi-bin/mailman/listinfo/vendor-sec
Added the folks from atsec to the Cc: List: klaus@atsec.com,smueller@atsec.com Andrea: Unfortunately, the property of a bug of being "minor" has nothing to do with the probability of getting exploited. Like if a race window is very narrow, it still can be exploited, and it's a breach of the policy in the kernel (local privilege escalation). By consequence, it needs fixing in all cases. Roman.
Sure, I didn't mean it shouldn't be fixed. The new patches attached today are correct, thanks.
Marcelo suggests CRD 12.01.05
swamp id: 111
i would like to have this patch in the next kernel update... Hubert, can you apply please? all branches, except sles9-sp1 branch
btw, the v2.6 patch as showed above is mangled, it misses a few lines (but lines that have nothing to do with the + part). If in doubt feel free to ask.
is public now. Synopsis: Linux kernel i386 SMP page fault handler privilege escalation Product: Linux kernel Version: 2.2 up to and including 2.2.27-rc1, 2.4 up to and including 2.4.29-rc1, 2.6 up to and including 2.6.10 Vendor: http://www.kernel.org/ URL: http://isec.pl/vulnerabilities/isec-0022-pagefault.txt CVE: CAN-2005-0001 Author: Paul Starzetz <ihaquer@isec.pl> Date: Jan 12, 2005 Issue: ====== Locally exploitable flaw has been found in the Linux page fault handler code that allows users to gain root privileges if running on multiprocessor machine. Details: ======== The Linux kernel is the core software component of a Linux environment and is responsible for handling of machine resources. One of the functions of an operating system kernel is handling of virtual memory. On Linux virtual memory is provided on demand if an application accesses virtual memory areas. One of the core components of the Linux VM subsystem is the page fault handler that is called if applications try to access virtual memory currently not physically mapped or not available in their address space. The page fault handler has the function to properly identify the type of the requested virtual memory access and take the appropriate action to allow or deny application's VM request. Actions taken may also include a stack expansion if the access goes just below application's actual stack limit. An exploitable race condition exists in the page fault handler if two concurrent threads sharing the same virtual memory space request stack expansion at the same time. It is only exploitable on multiprocessor machines (that also includes systems with hyperthreading). Discussion: =========== The vulnerable code resides for the i386 architecture in arch/i386/mm/fault.c in your kernel source code tree: [186] down_read(&mm->mmap_sem); vma = find_vma(mm, address); if (!vma) goto bad_area; if (vma->vm_start <= address) goto good_area; if (!(vma->vm_flags & VM_GROWSDOWN)) goto bad_area; if (error_code & 4) { /* * accessing the stack below %esp is always a bug. * The "+ 32" is there due to some instructions (like * pusha) doing post-decrement on the stack and that * doesn't show up until later.. */ [*] if (address + 32 < regs->esp) goto bad_area; } if (expand_stack(vma, address)) goto bad_area; where the line number has been given for the kernel 2.4.28 version. Since the page fault handler is executed with the mmap_sem semaphore held for reading only, two concurrent threads may enter the section after the line 186. The checks following line 186 ensure that the VM request is valid and in case it goes just below the actual stack limit [*], that the stack is expanded accordingly. On Linux the notion of stack includes any VM_GROWSDOWN virtual memory area, that is, it need not to be the actual process's stack. The exploitable race condition scenario looks as follows: A. thread_1 accesses a VM_GROWSDOWN area just below its actual starting address, lets call it fault_1, B. thread_2 accesses the same area at address fault_2 where fault_2 + PAGE_SIZE <= fault_1, that is: [ NOPAGE ] [fault_1 ] [ VMA ] ---> higher addresses [fault_2 ] [ NOPAGE ] [ VMA ] where one [] bracket pair stands for a page frame in the application's page table. C. if thread_2 is slightly faster than thread_1 following happens: [ PAGE2 ] [PAGE1 VMA ] that is, the stack is first expanded inside the expand_stack() function to cover fault_2, however it is right after 'expanded' to cover only fault_1 since the necessary checks have already been passed. In other words, the process's page table includes now two page references (PTEs) but only one is covered by the virtual memory area descriptor (namely only page1). The race window is very small but it is exploitable. Once the reference to page2 is available in the page table, it can be freely read or written by both threads. It will also not be released to the virtual memory management on process termination. Similar techniques like in http://www.isec.pl/vulnerabilities/isec-0014-mremap-unmap.txt may be further used to inject these lost page frames into a setuid application in order to gain elevated privileges (due to kmod this is also possible without any executable setuid binaries). Impact: ======= Unprivileged local users can gain elevated (root) privileges on SMP machines. Credits: ======== Paul Starzetz <ihaquer@isec.pl> has identified the vulnerability and performed further research. RedHat reported that a customer also pointed out some problems with the page fault handler on SMP about 20.12.2004 and they already included a patch for this vulnerability in the kernel-2.4.21-27.EL release, however the bug did not make it to the security division. COPYING, DISTRIBUTION, AND MODIFICATION OF INFORMATION PRESENTED HERE IS ALLOWED ONLY WITH EXPRESS PERMISSION OF ONE OF THE AUTHORS. Disclaimer: =========== This document and all the information it contains are provided "as is", for educational purposes only, without warranty of any kind, whether express or implied. The authors reserve the right not to be responsible for the topicality, correctness, completeness or quality of the information provided in this document. Liability claims regarding damage caused by the use of any information provided, including any kind of information which is incomplete or incorrect, will therefore be rejected. Appendix: ========= A proof of concept code won't be disclosed now. Special thanks goes to OSDL and Marcelo Tosatti for providing a SMP testbed.
rteassign component to enterprise server
Created attachment 27606 [details] proposed patch for 2.4.21 (SLES8) The patch for 2.4 did not apply cleanly to our 2.4.21 tree as we have an additional check there already. I'm going to use attached rediffed version. Please double check that it is correct.
Oops, Andrea is no longer in the cc list; re-adding for verification of the rediffed version.
Ok, fixes have been added to all trees.
The patch in id 19606 is actually wrong (it won't hurt at runtime because it cannot handle growsup anyway, but the code doing address >= vm_start is buggy shall that code ever run). The patches we have to apply are the ones in comment #8, the previous ones were not completely correct (even if pratically harmless for i386/x86-64/ia64 and most other archs with growsdown, it was only the growsup part to be buggy). In 2.4 there's no need of a growsup check at all, because the expand_stack of 2.4 can only handle growsdown. Marcelo fixed that bit as well fixing the address >= s/vm_start/vm_end/ in the 2.6 version where growsup is implemented.
andrea, can you create good patches for us please?
Created attachment 27626 [details] fix-smp-pagefault fix-smp-pagefault for 2.6 ... i replace vma_start to vma_end in the first case.
Comment on attachment 19461 [details] /etc/dhcpd.conf broken
Comment on attachment 19606 [details] 3rd patch broken too
Created attachment 27628 [details] fix-smp-race-2.4 proposed 2.4 patch for smprace
andrea, can you please verify the last 2 attachments for correctness? I redid them according to your notes.
19628 and 19626 looks fine thanks!
hubert, please replace the errnoeous fix with the 2 new ones in the submitted kernels ...
Done. Kernels are waiting for checkin.
Created attachment 27647 [details] exploit posted on full-disclosure
When reviewing this patch I have some difficulty in understanding the following: when threads share the same VMA, a anon_vma_lock() spin lock applies to all threads, therefore we have such a race condition. Now why is only this particular code path being guarded with this patch? Why are all other occurances of anon_vma_lock() not vulnerable to such a race condition in general (regardless whether we see an attack vector or not)? Or, on the other hand, why is it applicable that all other anon_vma_lock() are left as is? Could please someone enlighten me? Thanks a lot.
Sure, as the one who invented the anon_vma_lock I can answer this very easily (I hope very clearly too ;). The growsdown/growsup page faults are two special cases. They're the only two cases where the vma vm_start/vm_end fileds can be modified without holding the mm->mmap_sem in write mode. The place where the growsdown/growsup fault happens holds the mmap_sem but only in read mode (which means only other growsdown/growsup can happen from under us, no other modification can happen at the same time). The growsup/growsdown is only allowed for anonymous memory, and anonymous memory for a concurrent growsdown/growsup will share the same anon_vma. The anon_vma is guaranteed to be the same for the same thread, and in turn to serialize different threads doing growsdown/growsup at the same time, I had the idea to take the anon_vma_lock (starting from 2.6.6+). Problem is that I used a logic similar to 2.4 and I didn't notice 2.4 had a race when writing the anon_vma code. The reason the anon_vma_lock is not necessary in the other vma modifications, is that all other modifications (i.e. mmap/mremap/mprotect etc..) all take the mmap_sem in _write_ mode, and in turn they cannot race with growsdown/growsup either (which takes it in read mode) and they serialize against each other as well, without the need of spinlocks. The patch itself is obviously correct, as I confirmed to Olaf by phone in time for SP1 (thanks Olaf!). Hope this helps.
fix is out for all versions and branches now.
CVE-2005-0001: CVSS v2 Base Score: 6.9 (AV:L/AC:H/Au:N/C:C/I:C/A:C)