Bug 64652 (CVE-2005-0001)

Summary: VUL-0: CVE-2005-0001: kernel: page fault on SMP machines
Product: [Novell Products] SUSE Security Incidents Reporter: Thomas Biege <thomas>
Component: IncidentsAssignee: Security Team bot <security-team>
Status: RESOLVED FIXED QA Contact: Security Team bot <security-team>
Severity: Major    
Priority: P3 - Medium CC: ihno, klaus, mfrueh, patch-request, security-team, smueller
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard: CVE-2005-0001: CVSS v2 Base Score: 6.9 (AV:L/AC:H/Au:N/C:C/I:C/A:C)
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: vendor-sec discussion
smp-fault-2.4.diff
smp-fault-2.6.diff
proposed patch for 2.4.21 (SLES8)
fix-smp-pagefault
fix-smp-race-2.4
exploit posted on full-disclosure

Description Thomas Biege 2005-01-07 23:56:53 UTC
Paul Starzetz found a kernel bug again. 
 
From: Paul Starzetz <ihaquer@isec.pl> 
To: vendor-sec <vendor-sec@lst.de> 
Subject: [vendor-sec] page fault @ SMP 
Errors-To: vendor-sec-admin@lst.de 
Date: Wed, 5 Jan 2005 21:57:58 +0100 (CET) 
 
Hi, 
 
I have found an exploitable flaw in the page fault handler, however only 
in the SMP case. 
 
The problem is this: 
[...] 
 
see attachment. 
 
Directly assigned to Andrea for verifying the bug+patch.
Comment 1 Thomas Biege 2005-01-07 23:56:53 UTC
<!-- SBZ_reproduce  -->
.
Comment 2 Thomas Biege 2005-01-08 00:00:50 UTC
Created attachment 27460 [details]
vendor-sec discussion
Comment 3 Thomas Biege 2005-01-08 00:01:08 UTC
Created attachment 27461 [details]
smp-fault-2.4.diff
Comment 4 Thomas Biege 2005-01-08 00:01:28 UTC
Created attachment 27462 [details]
smp-fault-2.6.diff
Comment 5 Thomas Biege 2005-01-08 01:25:25 UTC
CAN-2005-0001 
Comment 6 Andrea Arcangeli 2005-01-08 08:30:42 UTC
I don't see the point of the VM_GROWSUP part since that code cannot handle 
growsup, but it's harmless. Anyway it's buggy: 
 
+	} else if ((vma->vm_flags & VM_GROWSUP) && vma->vm_start >= address) { 
 
Hugh also pointed it out in the email discussion), it must be "vma->vm_end >= 
address". (both vm_start should be replaced with vm_end, and really once the 
growsup code will be enabled [as said currently it's useless noop] special care 
should be taken for the ">=", that might have to become a ">") 
 
Same goes for the 2.6 patch, it must be written like this: 
 
	address += 4 + PAGE_SIZE - 1; 
 	address &= PAGE_MASK; 
 
	/* already expanded while waiting for anon_vma lock? */ 
	if (vma->vm_end >= address) { 
		anon_vma_unlock(vma); 
		return 0; 
	} 
 
BTW, I designed the 2.6 anon-vma locking so I'm partly to blame for not 
noticing this race. They CC'ed various mainline folks but not me, so I guess 
they don't need my help. 
 
So just fix the above bit s/vm_start/vm_end/ in the VM_GROWSUP and everything 
should be fine. 
 
It's a very minor bug, since it's not going to be easily reproducible and it 
requires SMP. 
Comment 7 Ludwig Nussel 2005-01-10 18:09:30 UTC
* This comment was added by mail.
YFYI

Date: Fri, 7 Jan 2005 10:58:52 +0000 (GMT)
From: Mark J Cox <mjc@redhat.com>
To: Paul Starzetz <ihaquer@isec.pl>
Cc: vendor-sec <vendor-sec@lst.de>
Subject: Re: [vendor-sec] page fault @ SMP

On Wed, 5 Jan 2005, Paul Starzetz wrote:
|I have found an exploitable flaw in the page fault handler, however only
|in the SMP case.

We noticed yesterday that our kernel folks had previously already fixed 
this in a December 20th update to our kernels - it was in response to a 
bug logged by one of our customers and didn't get escalated to the 
security team as it wasn't seen to have a security consequence at the 
time:

http://rhn.redhat.com/errata/RHBA-2004-550.html contains
kernel-2.4.21-27.EL.src.rpm which has the following patch inside:

+
+       /* check if another thread has already expanded the stack */
+       if (address >= vma->vm_start) {
+               spin_unlock(&vma->vm_mm->page_table_lock);
+               vm_validate_enough("exiting expand_stack - NOTHING TO 
DO");
+               return 0;
+       }
+

This fix was also scheduled for a RHEL2.1 kernel update, but that isn't 
out yet.

Cheers,
Mark
_______________________________________________
Vendor Security mailing list
Vendor Security@lst.de
https://www.lst.de/cgi-bin/mailman/listinfo/vendor-sec
Comment 8 Ludwig Nussel 2005-01-10 18:11:33 UTC
* This comment was added by mail.
JFYI

Date: Fri, 7 Jan 2005 13:11:40 -0200
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: Paul Starzetz <ihaquer@isec.pl>, vendor-sec <vendor-sec@lst.de>,
	Andrew Morton <akpm@osdl.org>, mingo@elte.hu
Subject: Re: [vendor-sec] page fault @ SMP

| Thanks for the review!

Attached are updated patches.

Description: Fix expand_stack() SMP race

Two threads sharing the same VMA can race in expand_stack, resulting in incorrect VMA 
size accounting and possibly a "uncovered-by-VMA" pte leak.

Fix is to check if the stack has already been expanded after acquiring a lock which 
guarantees exclusivity (page_table_lock in v2.4 and vma_anon lock in v2.6).

v2.4:

--- linux-2.4.28.orig/include/linux/mm.h	2005-01-07 09:12:48.000000000 -0200
+++ linux-2.4.28/include/linux/mm.h	2005-01-07 14:51:20.595060272 -0200
 	unsigned long grow;
 
 	/*
-	 * vma->vm_start/vm_end cannot change under us because the caller is required
-	 * to hold the mmap_sem in write mode. We need to get the spinlock only
-	 * before relocating the vma range ourself.
+	 * vma->vm_start/vm_end cannot change under us because the caller
+	 * is required to hold the mmap_sem in read mode.  We need the
+	 * page_table_lock lock to serialize against concurrent expand_stacks.
 	 */
 	address &= PAGE_MASK;
  	spin_lock(&vma->vm_mm->page_table_lock);
+
+	/* already expanded while we were spinning? */
+	if (vma->vm_start <= address) {
+		spin_unlock(&vma->vm_mm->page_table_lock);
+		return 0;
+	}
+
 	grow = (vma->vm_start - address) >> PAGE_SHIFT;
 	if (vma->vm_end - address > current->rlim[RLIMIT_STACK].rlim_cur ||
 	    ((vma->vm_mm->total_vm + grow) << PAGE_SHIFT) > current->rlim[RLIMIT_AS].rlim_cur) {

v2.6: 

--- linux-2.6.10-mm1.orig/mm/mmap.c	2005-01-05 15:58:26.000000000 -0200
+++ linux-2.6.10-mm1/mm/mmap.c	2005-01-07 14:47:05.894780600 -0200
 	 */
 	address += 4 + PAGE_SIZE - 1;
 	address &= PAGE_MASK;
+
+	/* already expanded while waiting for anon_vma lock? */
+	if (vma->vm_end >= address) {
+		anon_vma_unlock(vma);
+		return 0;
+	}
+
 	grow = (address - vma->vm_end) >> PAGE_SHIFT;
 
 	/* Overcommit.. */
 		return -ENOMEM;
 	anon_vma_lock(vma);
 
+	/* already expanded while waiting for anon_vma lock? */
+	if (vma->vm_start <= address) {
+		anon_vma_unlock(vma);
+		return 0;
+	}
+
 	/*
 	 * vma->vm_start/vm_end cannot change under us because the caller
 	 * is required to hold the mmap_sem in read mode.  We need the
_______________________________________________
Vendor Security mailing list
Vendor Security@lst.de
https://www.lst.de/cgi-bin/mailman/listinfo/vendor-sec
Comment 9 Roman Drahtmueller 2005-01-10 22:14:42 UTC
Added the folks from atsec to the Cc: List: klaus@atsec.com,smueller@atsec.com

Andrea: Unfortunately, the property of a bug of being "minor" has nothing to do 
with the probability of getting exploited. Like if a race window is very narrow,
it still can be exploited, and it's a breach of the policy in the kernel (local
privilege escalation). By consequence, it needs fixing in all cases.

Roman.
Comment 10 Andrea Arcangeli 2005-01-10 22:19:24 UTC
Sure, I didn't mean it shouldn't be fixed. 
 
The new patches attached today are correct, thanks. 
Comment 11 Ludwig Nussel 2005-01-11 17:09:19 UTC
Marcelo suggests CRD 12.01.05 
Comment 12 Thomas Biege 2005-01-11 21:23:11 UTC
swamp id: 111 
Comment 13 Marcus Meissner 2005-01-11 23:58:38 UTC
i would like to have this patch in the next kernel update... 
 
Hubert, can you apply please? 
 
all branches, except sles9-sp1 branch 
Comment 14 Andrea Arcangeli 2005-01-12 05:44:55 UTC
btw, the v2.6 patch as showed above is mangled, it misses a few lines (but 
lines that have nothing to do with the + part). If in doubt feel free to ask. 
Comment 15 Marcus Meissner 2005-01-12 20:07:00 UTC
is public now. 
 
Synopsis:  Linux kernel i386 SMP page fault handler privilege escalation 
Product:   Linux kernel 
Version:   2.2 up to and including 2.2.27-rc1, 2.4 up to and including 
           2.4.29-rc1, 2.6 up to and including 2.6.10 
Vendor:    http://www.kernel.org/ 
URL:       http://isec.pl/vulnerabilities/isec-0022-pagefault.txt 
CVE:       CAN-2005-0001 
Author:    Paul Starzetz <ihaquer@isec.pl> 
Date:      Jan 12, 2005 
 
 
Issue: 
====== 
 
Locally  exploitable flaw has been found in the Linux page fault handler 
code  that  allows  users  to  gain  root  privileges  if   running   on 
multiprocessor machine. 
 
 
Details: 
======== 
 
The  Linux  kernel is the core software component of a Linux environment 
and is responsible  for  handling  of  machine  resources.  One  of  the 
functions  of  an operating system kernel is handling of virtual memory. 
On Linux virtual memory is provided on demand if an application accesses 
virtual memory areas. 
 
One  of  the core components of the Linux VM subsystem is the page fault 
handler that is called if applications  try  to  access  virtual  memory 
currently not physically mapped or not available in their address space. 
 
The page fault handler has the function to properly identify the type of 
the  requested  virtual memory access and take the appropriate action to 
allow or deny application's VM request. Actions taken may also include a 
stack expansion if the access goes just below application's actual stack 
limit. 
 
An exploitable race condition exists in the page fault  handler  if  two  
concurrent  threads  sharing the same virtual memory space request stack 
expansion at the same time. It is  only  exploitable  on  multiprocessor 
machines (that also includes systems with hyperthreading). 
 
 
Discussion: 
=========== 
 
The   vulnerable   code   resides   for   the   i386   architecture   in 
arch/i386/mm/fault.c in your kernel source code tree: 
 
[186]  down_read(&mm->mmap_sem); 
 
       vma = find_vma(mm, address); 
       if (!vma) 
              goto bad_area; 
       if (vma->vm_start <= address) 
              goto good_area; 
       if (!(vma->vm_flags & VM_GROWSDOWN)) 
              goto bad_area; 
       if (error_code & 4) { 
              /* 
               * accessing the stack below %esp is always a bug. 
               * The "+ 32" is there due to some instructions (like 
               * pusha) doing post-decrement on the stack and that 
               * doesn't show up until later.. 
               */ 
[*]           if (address + 32 < regs->esp) 
                     goto bad_area; 
       } 
       if (expand_stack(vma, address)) 
              goto bad_area; 
 
where the line number has been given for the kernel 2.4.28 version. 
 
Since the page fault handler is executed  with  the  mmap_sem  semaphore 
held  for  reading  only,  two  concurrent threads may enter the section 
after the line 186. 
 
The checks following line 186 ensure that the VM request is valid and in 
case  it  goes  just below the actual stack limit [*], that the stack is 
expanded  accordingly.  On  Linux  the  notion  of  stack  includes  any 
VM_GROWSDOWN  virtual memory area, that is, it need not to be the actual 
process's stack. 
 
The exploitable race condition scenario looks as follows: 
 
 
A. thread_1 accesses a VM_GROWSDOWN area just below its actual  starting 
address, lets call it fault_1, 
 
B.  thread_2  accesses  the same area at address fault_2 where fault_2 + 
PAGE_SIZE <= fault_1, that is: 
 
[   NOPAGE    ] [fault_1      ] [     VMA     ]  --->  higher  addresses 
[fault_2      ] [   NOPAGE    ] [     VMA     ] 
 
where  one  [] bracket pair stands for a page frame in the application's 
page table. 
 
C. if thread_2 is slightly faster than thread_1 following happens: 
 
[   PAGE2     ] [PAGE1                VMA     ] 
 
 
that is, the stack is first expanded inside the expand_stack()  function 
to  cover  fault_2,  however  it is right after 'expanded' to cover only 
fault_1 since the necessary checks have already been  passed.  In  other 
words,  the process's page table includes now two page references (PTEs) 
but only one is covered by the virtual memory  area  descriptor  (namely 
only page1). The race window is very small but it is exploitable. 
 
Once  the  reference  to page2 is available in the page table, it can be 
freely read or written by both threads. It will also not be released  to 
the virtual memory management on process termination. Similar techniques 
like in 
 
http://www.isec.pl/vulnerabilities/isec-0014-mremap-unmap.txt 
 
may be further used to inject these  lost  page  frames  into  a  setuid 
application  in  order  to gain elevated privileges (due to kmod this is 
also possible without any executable setuid binaries). 
 
 
Impact: 
======= 
 
Unprivileged local users can gain  elevated  (root)  privileges  on  SMP 
machines. 
 
 
Credits: 
======== 
 
Paul  Starzetz  <ihaquer@isec.pl>  has  identified the vulnerability and 
performed further research. RedHat reported that a customer also pointed 
out  some  problems  with the page fault handler on SMP about 20.12.2004 
and they  already  included  a  patch  for  this  vulnerability  in  the 
kernel-2.4.21-27.EL  release,  however  the  bug  did not make it to the 
security division. 
 
COPYING, DISTRIBUTION, AND MODIFICATION OF INFORMATION PRESENTED HERE IS 
ALLOWED ONLY WITH EXPRESS PERMISSION OF ONE OF THE AUTHORS. 
 
 
Disclaimer: 
=========== 
 
This  document and all the information it contains are provided "as is", 
for educational purposes only, without warranty  of  any  kind,  whether 
express or implied. 
 
The  authors reserve the right not to be responsible for the topicality, 
correctness, completeness or quality of  the  information   provided  in 
this  document.  Liability  claims regarding damage caused by the use of 
any information provided, including any kind  of  information  which  is 
incomplete or incorrect, will therefore be rejected. 
 
 
Appendix: 
========= 
 
A proof of  concept code won't be disclosed now.  Special thanks goes to 
OSDL and Marcelo Tosatti for providing a SMP testbed. 
 
Comment 16 Marcus Meissner 2005-01-12 20:17:57 UTC
rteassign component to enterprise server 
Comment 17 Hubert Mantel 2005-01-13 15:33:14 UTC
Created attachment 27606 [details]
proposed patch for 2.4.21 (SLES8)

The patch for 2.4 did not apply cleanly to our 2.4.21 tree as we have an
additional check there already. I'm going to use attached rediffed version.
Please double check that it is correct.
Comment 18 Hubert Mantel 2005-01-13 15:34:42 UTC
Oops, Andrea is no longer in the cc list; re-adding for verification of the
rediffed version.
Comment 19 Hubert Mantel 2005-01-13 15:54:30 UTC
Ok, fixes have been added to all trees.
Comment 20 Andrea Arcangeli 2005-01-13 23:52:31 UTC
The patch in id 19606 is actually wrong (it won't hurt at runtime because it 
cannot handle growsup anyway, but the code doing address >= vm_start is buggy 
shall that code ever run). The patches we have to apply are the ones in comment 
#8, the previous ones were not completely correct (even if pratically harmless 
for i386/x86-64/ia64 and most other archs with growsdown, it was only the 
growsup part to be buggy). 
 
In 2.4 there's no need of a growsup check at all, because the expand_stack of 
2.4 can only handle growsdown. Marcelo fixed that bit as well fixing the 
address >= s/vm_start/vm_end/ in the 2.6 version where growsup is implemented. 
Comment 21 Marcus Meissner 2005-01-13 23:57:18 UTC
andrea, can you create good patches for us please? 
Comment 22 Marcus Meissner 2005-01-14 00:00:05 UTC
Created attachment 27626 [details]
fix-smp-pagefault

fix-smp-pagefault for 2.6  ... i replace vma_start to vma_end in the first
case.
Comment 23 Marcus Meissner 2005-01-14 00:34:19 UTC
Comment on attachment 19461 [details]
/etc/dhcpd.conf

broken
Comment 24 Marcus Meissner 2005-01-14 00:34:39 UTC
Comment on attachment 19606 [details]
3rd patch

broken too
Comment 25 Marcus Meissner 2005-01-14 00:42:16 UTC
Created attachment 27628 [details]
fix-smp-race-2.4

proposed 2.4 patch for smprace
Comment 26 Marcus Meissner 2005-01-14 00:43:25 UTC
andrea, can you please verify the last 2 attachments for correctness? 
 
I redid them according to your notes. 
Comment 27 Andrea Arcangeli 2005-01-14 00:47:53 UTC
19628 and 19626 looks fine thanks! 
Comment 28 Marcus Meissner 2005-01-14 00:51:34 UTC
hubert, please replace the errnoeous fix with the 2 new ones  
in the submitted kernels ...  
Comment 29 Hubert Mantel 2005-01-14 21:15:38 UTC
Done. Kernels are waiting for checkin.
Comment 30 Ludwig Nussel 2005-01-14 21:31:32 UTC
Created attachment 27647 [details]
exploit posted on full-disclosure
Comment 31 Stephan Müller 2005-01-18 19:24:59 UTC
When reviewing this patch I have some difficulty in understanding the following:
when threads share the same VMA, a anon_vma_lock() spin lock applies to all
threads, therefore we have such a race condition. Now why is only this
particular code path being guarded with this patch? Why are all other occurances
of anon_vma_lock() not vulnerable to such a race condition in general
(regardless whether we see an attack vector or not)? Or, on the other hand, why
is it applicable that all other anon_vma_lock() are left as is? Could please
someone enlighten me? Thanks a lot.
Comment 32 Andrea Arcangeli 2005-01-18 20:44:57 UTC
Sure, as the one who invented the anon_vma_lock I can answer this very easily 
(I hope very clearly too ;). 
 
The growsdown/growsup page faults are two special cases. They're the only two 
cases where the vma vm_start/vm_end fileds can be modified without holding the 
mm->mmap_sem in write mode. The place where the growsdown/growsup fault happens 
holds the mmap_sem but only in read mode (which means only other 
growsdown/growsup can happen from under us, no other modification can happen at 
the same time). The growsup/growsdown is only allowed for anonymous memory, and 
anonymous memory for a concurrent growsdown/growsup will share the same 
anon_vma. The anon_vma is guaranteed to be the same for the same thread, and in 
turn to serialize different threads doing growsdown/growsup at the same time, I 
had the idea to take the anon_vma_lock (starting from 2.6.6+). 
 
Problem is that I used a logic similar to 2.4 and I didn't notice 2.4 had a 
race when writing the anon_vma code. 
 
The reason the anon_vma_lock is not necessary in the other vma modifications, 
is that all other modifications (i.e. mmap/mremap/mprotect etc..) all take the 
mmap_sem in _write_ mode, and in turn they cannot race with growsdown/growsup 
either (which takes it in read mode) and they serialize against each other as 
well, without the need of spinlocks. 
 
The patch itself is obviously correct, as I confirmed to Olaf by phone in time 
for SP1 (thanks Olaf!). 
 
Hope this helps. 
Comment 33 Marcus Meissner 2005-01-21 21:58:06 UTC
fix is out for all versions and branches now. 
Comment 34 Thomas Biege 2009-10-13 20:56:03 UTC
CVE-2005-0001: CVSS v2 Base Score: 6.9 (AV:L/AC:H/Au:N/C:C/I:C/A:C)