Bug 114933

Summary: gcc miscompiles kpartx with -O2 or bigger.
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Carl-Daniel Hailfinger <kernel01>
Component: BasesystemAssignee: Hannes Reinecke <hare>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: hare, lmb, matz
Version: Beta 4   
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard:
Found By: Development Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: hdimage.tar.bz2 (sparse image, needs 1 MB diskspace unpacked)

Description Carl-Daniel Hailfinger 2005-09-02 10:53:46 UTC
# losetup -r /dev/loop0 harddisk.img
# kpartx -l -v /dev/loop0
llseek error
llseek error
llseek error
llseek error

This worked in 9.3. A strace shows that llseek does NOT fail. I suspect that
kpartx gets confused because a loop device doesn't return the expected value for
the BLKSZGET and BLKGETSIZE ioctls.

"fdisk -l /dev/loop0" works fine.
Comment 1 Carl-Daniel Hailfinger 2005-09-03 11:50:09 UTC
Bad news. The problem is not within kpartx, but a miscompilation.

If I compile without RPM_OPT_FLAGS, it works. If I strip -O2 from RPM_OPT_FLAGS,
it works.

Summary:
Flags  | Result
(none) | OK
-O     | OK
-O2    | Miscompilation
-O3    | Miscompilation

carldani@piii-1000:~> gcc --version
gcc (GCC) 4.0.2 20050826 (prerelease) (SUSE Linux)

System is a PentiumIII with 1 GHz, fresh install of SUSE 10.0beta4

Michael, Andreas: Can one of you take that bug? Thanks!
Comment 2 Andreas Schwab 2005-09-03 12:13:51 UTC
Could also be an aliasing bug.  Does it work when compiled with 
-fno-strict-aliasing? 
Comment 3 Carl-Daniel Hailfinger 2005-09-03 12:21:37 UTC
It still doesn't work when using "-O2 -fno-strict-aliasing".
Comment 4 Carl-Daniel Hailfinger 2005-09-03 12:27:44 UTC
Hm. -fno-unit-at-a-time seems to help.
Comment 5 Michael Matz 2005-09-05 09:37:56 UTC
Just to be sure, if anyone expect me or some other compiler guy to have  
a look at this, then this is missing much information.  For instance  
what package we are talking about.  As this seems also to be somewhat  
hard to use (needing a HD image), this would require someone setting  
up a buildsystem, with the sources of kpartx, where one can see the fault.  
  
But don't expect that this is not a bug in kpartx. 
Comment 6 Carl-Daniel Hailfinger 2005-09-05 13:52:07 UTC
Michael, what do you need? A simple way to reproduce this miscompilation?

kpartx is in package multipath-tools. Install the multipath-tools and the source
rpm of multipath-tools. Download the tar.bz2 of the harddisk image I'll attach
in bugzilla. Then do the following as root on a i386 machine running 10.0beta4:

# tar xfjS hdimage.tar.bz2
# losetup -r /dev/loop0 harddisk.img
# kpartx -l /dev/loop0
llseek error
llseek error
llseek error
llseek error

Expected output is:
loop0p1 : 0 48132 /dev/loop0 63
loop0p5 : 0 80262 /dev/loop0 48258
loop0p6 : 0 96327 /dev/loop0 128583
loop0p7 : 0 64197 /dev/loop0 224973
loop0p8 : 0 112392 /dev/loop0 289233

# cd /usr/src/packages/
# rpmbuild -bp SPECS/multipath-tools.spec
# cd BUILD/multipath-tools-0.4.4/
# make OPTFLAGS="-O2 -march=i586 -mtune=i686 -fmessage-length=0 -Wall
-D_FORTIFY_SOURCE=2 -g -fno-unit-at-a-time" BUILD=glibc
# kpartx/kpartx -l /dev/loop0
loop0p1 : 0 48132 /dev/loop0 63
loop0p5 : 0 80262 /dev/loop0 48258
loop0p6 : 0 96327 /dev/loop0 128583
loop0p7 : 0 64197 /dev/loop0 224973
loop0p8 : 0 112392 /dev/loop0 289233
# make OPTFLAGS="-O2 -march=i586 -mtune=i686 -fmessage-length=0 -Wall
-D_FORTIFY_SOURCE=2 -g" BUILD=glibc
# kpartx/kpartx -l /dev/loop0
llseek error
llseek error
llseek error
llseek error

Do you need any additional information?
Comment 7 Andreas Schwab 2005-09-05 13:55:07 UTC
Have you tried compiling without _FORTIFY_SOURCE? 
Comment 8 Carl-Daniel Hailfinger 2005-09-05 13:59:44 UTC
Created attachment 48811 [details]
hdimage.tar.bz2 (sparse image, needs 1 MB diskspace unpacked)
Comment 9 Carl-Daniel Hailfinger 2005-09-05 14:00:38 UTC
I have tried without _FORTIFY_SOURCE and every other flag except -O2. Didn't help.
Comment 10 Carl-Daniel Hailfinger 2005-09-05 14:02:14 UTC
# make OPTFLAGS="-O2 -fno-unit-at-a-time" BUILD=glibc
will compile correctly.
# make OPTFLAGS="-O2" BUILD=glibc
will miscompile.

I hope that is better for tracking it down.
Comment 11 Michael Matz 2005-09-05 15:30:11 UTC
Argh.  This code is horrible.  Anyway, it was a bug in the source code,   
which arguably is a bug in the kernel headers.  The problem is that   
kparts likes to define its own _llseek syscall via the syscall5 macro of   
linux/unistd.h.  But the assembler to which it expands doesn't capture all   
effects it has, namely:   
   
int _llseek (uint fd,ulong hi,ulong lo,long long * res,uint wh)   
{   
   long __res;   
   __asm__ volatile ("int $0x80"   
                     : "=a" (__res)   
                     : "0" (140),"b" ((long)(fd)),   
                       "c" ((long)(hi)), "d" ((long)(lo)),   
                       "S" ((long)(res)),"D" ((long)(wh)));   
   
The problem is that *res (where the asm simply sees "res") is written to.   
But the asm doesn't describe this sideeffect, so when gcc sees this call:   
   if (_llseek (fd, in>>32, in & 0xffffffff, &out, 0) != 0   
       || out != in)   
_and_ inlines _llseek (containing the asm above, created by the syscall5  
macro), which it only does with -O2 in unit-at-a-time mode by default (hence  
-fno-inline also "fixes" this problem), then GCC doesn't see anymore that  
"out" might have changed due to that call.  So it regards it as unchanged,  
and optimizes the later "out != in" into "1 != in" (as "1" was the initial  
value of "out", which according to all GCC knows didn't change.  
  
So, the lesson learned: Don't use the kernel headers in user space for  
defining own syscall functions.  Don't use the kernel headers at all,  
if possible.  In this case we have a perfectly fine lseek64 library function  
which can be used, when one defines _LARGEFILE64_SOURCE.  
Comment 12 Michael Matz 2005-09-05 15:45:50 UTC
Submitted fixed package removing this hack to STABLE. 
Comment 13 Carl-Daniel Hailfinger 2005-09-05 16:20:51 UTC
Thanks!

However, does it still compile and work with klibc instead of glibc? klibc
doesn't have a lseek64 function, resulting in this hack. IIRC every package
which can also be linked against klibc has similar hacks in its source to work
around the lack of  lseek64.
Comment 14 Michael Matz 2005-09-06 12:47:07 UTC
I was only concerned about this package, and it seems to build 
exclusively for glibc.  If klibc doesn't provide this syscall, then this 
should simply be fixed in klibc.