Bug 84600

Summary: grub cannot load current STABLE kernel
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Andreas Jaeger <aj>
Component: KernelAssignee: Michael Matz <matz>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Blocker    
Priority: P5 - None CC: adrian.schroeter, dmueller, duwe, lmuelle, ro
Version: Stable GCC Snapshot1   
Target Milestone: ---   
Hardware: Other   
OS: All   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: preprocessed fsys_reiserfs, gcc 3 and 4 commands and resulting .o files
2 typescripts, from gcc-3 and gcc-4 generated code resp.
preprocessed instrumented fsys_reiserfs.i showing the problem

Description Andreas Jaeger 2005-05-19 13:45:26 UTC
grub cannot load the current STABLE kernel, it reports 
Executable format not recognized.

If I use a 9.3 kernel (with the same grub!), everything works fine.
Comment 1 Andreas Jaeger 2005-05-19 13:58:21 UTC
Using the 9.3 kernel did not really help - now I get
invalid compressed format (err=1)

System halted
Comment 2 Andreas Jaeger 2005-05-19 14:07:50 UTC
Adrian mentioned that using the 9.3 kernel worked for him - 
and he only had this problem on one machine, the other worked fine.
Comment 3 Adrian Schröter 2005-05-19 14:09:45 UTC
yes, I do run a full-i386 grub with the 9.3 kernel rpm atm. 
Comment 4 Stanislav Visnovsky 2005-05-19 14:17:20 UTC
I've done minimal X11 install successfully - even with STABLE kernel. 
Comment 5 Andreas Jaeger 2005-05-19 14:29:52 UTC
I had to downgrade to the 9.3 grub - and then it works.

grub seems to be broken.
Comment 6 Torsten Duwe 2005-05-19 14:37:45 UTC
Indeed it is. On my test machine it doesn't even reach stage2 (stuck in  
stage1.5, Error 16) Up/Downgrading kernels _might_ be, independent of the 
actual kernel file contents, a game of chance.  
Comment 7 Andreas Jaeger 2005-05-19 15:14:58 UTC
Downgrading to an older grub version shows less problems - but still some.

Either our gcc4 patches are broken, not complete - or we have a grub miscompilation.
Comment 8 Torsten Duwe 2005-05-23 15:15:10 UTC
Does this happen on anything else than reiserfs on your test setups? 
Comment 9 Adrian Schröter 2005-05-23 15:23:46 UTC
I didn't test anything else. 
Comment 10 Andreas Jaeger 2005-05-24 06:47:33 UTC
I didn't test anything else either.
Comment 11 Stanislav Visnovsky 2005-05-24 06:56:45 UTC
Same with me, only reiser. 
Comment 12 Torsten Duwe 2005-05-24 09:18:03 UTC
Then this narrows down to the grub's reiserfs code. Building with gcc-3.3.5 
makes no difference it seems. The linux kernel bug of disjoint caches keeps 
complicating things, BTW. 
Comment 13 Lars Müller 2005-05-24 10:44:06 UTC
I have the same problem with grub while booting from reiserfs.  The grub command
line doesn't see /boot at all.  Even if the dir is there in the filesystem.

I've dongraded to the grub of 9.3.
Comment 14 Torsten Duwe 2005-05-24 11:27:13 UTC
Correction: working around the kernel bug, I can now clearly identify this as 
gcc4 miscompilation. I Comment#12 there were leftovers in the /dev/hda buffer 
cache. 
Comment 15 Torsten Duwe 2005-05-27 14:18:35 UTC
Created attachment 38248 [details]
preprocessed fsys_reiserfs, gcc 3 and 4 commands and resulting .o files
Comment 16 Torsten Duwe 2005-05-27 14:20:42 UTC
So here is the test case. 
Have a look at the variable "depth" in funtion "next_key()". 
 
GCC3 stores it in %ebx, and does increment, array access and test properly, 
addresses 0x3b9 to 0x3bf. 
 
GCC4 happens to store it in %edi, I think, because this is initialized with 1 
and incremented. The rest however is totally bogus (addresses 0x6cf to 0x6d7). 
 
 
Comment 17 Torsten Duwe 2005-05-27 14:37:23 UTC
Looking deeper into the generated code, it seems gcc4 has heavily shifted 
around the expressions. But whatever it did, on sufficiently populated 
ReiserFSes, it doesn't work; sorry I cannot make an attachment with a sample 
image. Hopefully the asm code is enough. 
Comment 18 Michael Matz 2005-05-27 15:36:04 UTC
There are many functions in that .i file.  Do we at least know _which_ one is miscompiled? 
Do we know if some compiler options (like removing -Os, or doing just -O0) works around 
this?  To limit the effect of reordering basic blocks, you can use the option 
-fno-reorder-blocks, and to even limit the effects of scheduling also -fno-schedule-insns2 . 
This should make the functions better comparable. 
Comment 19 Torsten Duwe 2005-06-03 12:04:32 UTC
Here are the typescripts that lead me to said conclusion. These were captured 
with debug enabled, in case that makes a difference. The first debug prints 
that differ are from next_key; both are on the same set of data 
(e.g.partition), both are reproducable. 
Comment 20 Torsten Duwe 2005-06-03 12:06:50 UTC
Created attachment 38590 [details]
2 typescripts, from gcc-3 and gcc-4 generated code resp.

preferably feed these 2 scripts to the diff command.
Comment 21 Michael Matz 2005-06-07 16:13:31 UTC
Steven or Zdenek, can either of you please look at this bug report and try 
to identify either the miscompilation or create a testcase which shows 
the misbehaviour without the need to boot every time (perhaps by writing a 
wrapper function calling the miscompiled one?)?  I severely lack the time 
due to other more pressing issues. 
Comment 22 Dirk Mueller 2005-06-09 13:15:18 UTC
just FYI: I tried patching grub with fedora rawhide's gcc4 patch (which is 4 
times as big) but it didn't make a difference either. still same failure here.  
 
 
Comment 23 Michael Matz 2005-06-10 14:57:35 UTC
Might be related to 
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21985 
Comment 24 Michael Matz 2005-06-11 01:15:27 UTC
Thorsten (or someone else), can you please show a sequence of commands you 
use to test if this works, or not, without rebooting the machine?  I can't just debug this 
with just looking at the assembler.  I see you use qemu in the typescript, so there 
must be a way.  I want a HD boot image, the full commands to change the grub in it, 
and the relevant snippets for running qemu testing that newly built grub. 
 
This way we can also check if a fix for the above mentioned PR21985 fixes also this bug. 
Or at least I can look further at where exactly the miscompile is. 
Comment 25 Michael Matz 2005-06-11 01:50:32 UTC
I've now looked at the assembler some more.  I looked at all places, where 
INFO->next_key_nr[] is accessed in next_key() and search_stat().  The typescript 
indicates that this loop is repeated with GCC 4 (next_key:749): 
     do { 
          if (depth == INFO->tree_depth)  { ... ; goto found; } 
          depth++; 
      } while (INFO->next_key_nr[depth] == 0); 
i.e. during the first iteration INFO->next_key_nr[depth] is zero, where with GCC-3 it's not 
(depth being 2).  The corresponding loop in gcc-4 asm looks like so: 
 3a4:   bb 01 00 00 00          mov    $0x1,%ebx 
 3a9:   0f b7 05 1e e0 06 00    movzwl 0x6e01e,%eax 
 3b0:   ba 00 e0 06 00          mov    $0x6e000,%edx 
 3b5:   39 c3                   cmp    %eax,%ebx 
 3b7:   74 41                   je     3fa <next_key+0x8c> 
 3b9:   43                      inc    %ebx 
 3ba:   83 7c 9a 44 00          cmpl   $0x0,0x44(%edx,%ebx,4) 
 3bf:   74 e8                   je     3a9 <next_key+0x3b> 
 
%ebx is depth (starting from 1), and the code looks correct.  Address 0x6e000 is INFO, 
the ->tree_depth member at offset 0x1e, and the ->next_key_nr[] array at offset 0x44. 
 
So this loop itself is not miscompiled, ergo I think next_key_nr is really 0 (wrongly so!) 
at that point.  I looked at the places where this array is changed.  First some more code 
in next_key: 
      do { 
          int nr_item = BLOCKHEAD (cache)->blk_nr_item; 
          int key_nr = INFO->next_key_nr[depth]++; 
          if (key_nr == nr_item) 
            INFO->next_key_nr[depth] = 0; 
          ... --depth ... 
      } while (depth > DISK_LEAF_NODE_LEVEL); 
 
Here it can be incremented or set to zero depending on other conditions.  The assembler 
looks like so: 
 409:   be 00 e0 06 00          mov    $0x6e000,%esi 
 40e:   0f b7 4f 02             movzwl 0x2(%edi),%ecx 
 412:   8b 54 9e 44             mov    0x44(%esi,%ebx,4),%edx 
 416:   8d 42 01                lea    0x1(%edx),%eax 
 419:   39 ca                   cmp    %ecx,%edx 
 41b:   89 44 9e 44             mov    %eax,0x44(%esi,%ebx,4) 
 41f:   75 08                   jne    429 <next_key+0xbb> 
 421:   c7 44 9e 44 00 00 00    movl   $0x0,0x44(%esi,%ebx,4) 
 428:   00 
 
The above is the code up to the first if, and it's implemented correctly.  If the array 
elements _before_ incrementing is equal to nr_item the array element will be zeroed. 
 
 429:   c1 e1 04                shl    $0x4,%ecx 
 42c:   4b                      dec    %ebx 
.... 
 443:   83 fb 01                cmp    $0x1,%ebx 
 446:   7f c1                   jg     409 <next_key+0x9b> 
 
Rest of loop also is correct.  Now the only other place, where next_key_nr[] is changed is 
in search_stat(): 
      INFO->next_key_nr[depth] = (i == nr_item) ? 0 : i+1; 
      cache = read_tree_node (DC (cache)[i].dc_block_number, --depth); 
This corresponds to (%ebx is 'i', %ecx is 'nr_item' and %esi is 'depth'): 
 2b6:   31 d2                   xor    %edx,%edx 
 2b8:   39 cb                   cmp    %ecx,%ebx 
 2ba:   74 03                   je     2bf <search_stat+0x59> 
 2bc:   8d 53 01                lea    0x1(%ebx),%edx 
 2bf:   c1 e1 04                shl    $0x4,%ecx 
 2c2:   b8 00 e0 06 00          mov    $0x6e000,%eax 
 2c7:   89 54 b0 44             mov    %edx,0x44(%eax,%esi,4) 
 
This correctly implements the ?: operator, the rest is the decrement of depth 
and the call: 
 
 2cb:   8d 04 d9                lea    (%ecx,%ebx,8),%eax 
 2ce:   4e                      dec    %esi 
 2cf:   89 f2                   mov    %esi,%edx 
 2d1:   8b 44 38 18             mov    0x18(%eax,%edi,1),%eax 
 2d5:   e8 0f ff ff ff          call   1e9 <read_tree_node> 
 
So, I really need a way to test various theories at runtime by changing and instrumenting 
grub itself.  Hence to make progress it's mandatory that we have an easy way to 
get a testing environment for grub. 
Comment 26 Michael Matz 2005-06-11 03:34:06 UTC
Crap!  It's cd-dce which eliminates necessary instructions.  I've found a way to  
test this with qemu on willimas, and instrumented the code some more.  The instrumented  
version exhibits a problem in t59.cddce in this code in search_stats:  
      INFO->next_key_nr[depth] = (i == nr_item) ? 0 : i+1;  
      printf ("  CCC depth=%d nr=%d nr2=%d\n", depth, INFO->next_key_nr[depth],  
              INFO->next_key_nr[2]);  
When running this the output will be e.g.:  
   CCC depth=2 nr=148 nr2=0  
Note that depth is 2, and hence the last two numbers should be the same (148).  That  
they are not is because the store to next_key_nr[depth] in front of the printf() is  
deleted.  Like shown in comment #25 this is not the immediate case when no 
instrumentation code is there, but I bet that some other store will be deleted for the same 
reasons.  Anyway for the above code in question we have this in .t58.redphi3: 
 
  D.3338_120 = 450560B; 
  ivtmp.234_121 = &450560B->next_key_nr[depth_13]; 
..... 
  # iftmp.32_6 = PHI <iftmp.32_96(14), 0(13)>; 
<L11>:; 
  D.3339_122 = (unsigned int *) ivtmp.234_118; 
  *D.3339_122 = iftmp.32_6; 
  D.2280_72 = 450560B->next_key_nr[2]; 
  grub_printf (&"  CCC depth=%d nr=%d nr2=%d\n"[0], depth_60, iftmp.32_6, D.2280_72); 
 
iftmp.32 is the result of the ?: operator (i.e. 0 or i+1) which is stored at *D.3339, which holds 
the address of 450560B->next_key_nr[depth].  Fine so far.  (For depth==2 this means 
that 450560B->next_key_nr[2] == D.2280 and *D.3339 alias!).  But now cd-dce comes 
and deletes the store: 
 
Deleting : D.3339_122 = (unsigned int *) ivtmp.234_118; 
Deleting : *D.3339_122 = iftmp.32_6; 
.... 
  # iftmp.32_6 = PHI <iftmp.32_96(10), 0(9)>; 
<L11>:; 
  D.2280_72 = 450560B->next_key_nr[2]; 
  grub_printf (&"  CCC depth=%d nr=%d nr2=%d\n"[0], depth_60, iftmp.32_6, D.2280_72); 
 
So, while iftmp.32 hold the correct temporary value (so it's printed correctly), D.2280 
will be wrong, because the store which would have set it is now removed.  From then 
everything goes downhill. 
 
I'll also attach my version of fsys_reiserfs.i 
It is to be compiled with 
  -falign-jumps=1 -falign-loops=1 -falign-functions=1 -m32 -O2 -Os -fno-strict-aliasing 
  -minline-all-stringops -fno-asynchronous-unwind-tables 
to see the problem. 
Comment 27 Michael Matz 2005-06-11 03:36:41 UTC
Created attachment 39001 [details]
preprocessed instrumented fsys_reiserfs.i showing the problem
Comment 28 Michael Matz 2005-06-11 03:41:28 UTC
If I should theorize, I would guess GCC thinks the move is unnecessary because of 
the strange "addresses" (literal numbers casted to pointer type and then dereferenced). 
This is of course non-conformant C, as the address space is accessed directly.  I.e. 
it violates object and pointer rules of C (e.g. that a pointer has to point into a valid 
(sub-)object, which obviously can't be the case for a pointer like "(int*)450560", except 
if the object is "whole memory"). 
 
But that's exactly why this source is compiled with -fno-strict-aliasing.  So the load 
and the store _should_ alias (and hence the store be necessary), but my theory is, 
that this is somehow not the case. 
Comment 29 Michael Matz 2005-06-16 14:56:40 UTC
Okay, it's all analyzed and well.  The problem was indeed a miscompilation, 
which in fact was already fixed no mainline, but was missing from the 4.0 
branch (it's there now).  The bug was http://gcc.gnu.org/PR21171 . 
 
Fixed package was submitted, and seems to fix the bug.  At least the snapshot2 
does not have the problems, and my testing with qemu also showed that 
grub was working.