Bug 104065 - Grub Error 18 on vmware gsxserver
Summary: Grub Error 18 on vmware gsxserver
Status: RESOLVED WONTFIX
: 130000 131587 138425 159426 (view as bug list)
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Beta 1
Hardware: x86 All
: P5 - None : Minor
Target Milestone: ---
Assignee: Torsten Duwe
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-11 09:48 UTC by Andreas Girardet
Modified: 2007-08-11 15:38 UTC (History)
4 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
bootsector (512 bytes, application/octet-stream)
2005-08-18 13:10 UTC, Andreas Girardet
Details
grub config (796 bytes, text/plain)
2005-08-18 13:10 UTC, Andreas Girardet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Girardet 2005-08-11 09:48:44 UTC
- Grub Error 18 when installing Beta1 onto a standard VMWARE gsxserver image.

System boots and loads Grub and halts before loading any kernel with Error 18.
No grub shell is available. System hangs.


After CD1 is installed and the system reboots I get a Grub Error 18 on vmware
gsxserver, installing on a 20 Gbyte virtual harddisc using the SCSI Buslogic
driver. Initrd is created fine (verified by doing a rescue chroot and recreating
initrd and installing grub again). Error 18 happens when the bootloader is
beyond 1024 cylinders. Never seen it happening before on Gsxserver. I assume it
is grub related. 


the default Partitioning chosen:

/dev/sda1 swap 1 GB
/dev/sda2 (18.9 GB) for /



################
To verify if the above error is genuine and related to the 1024 sector limit I
created the following partitions on my second trial

/dev/sda1 /boot 512M
/dev/sda2 swap 10248M
/dev/sda3 / the rest



This time the install is fine and I assume that the 1024 limit is actually the
reason for the error. Since I have installed many many other linuxes before
(including my own distro Yoper. which I have developed under vmware) under vm
and I have never seen an error 18 ....... seen others, but not that one.

Hope this helps.
Comment 1 Torsten Duwe 2005-08-11 10:02:16 UTC
The generated /etc/grub.conf for the failing case would be nice, 
as well as the dump of the failing MBR. 
I suspect it's related to #103031; investigating... 
Comment 2 Torsten Duwe 2005-08-18 11:28:06 UTC
No info within a week -- cannot be that critical. 
 
Meanwhile I'm pretty sure this is related. I _guess_ (since no info from 
reporter) that a generic MBR was installed and the virtual disk is erroneously 
partitioned and/or the virtual BIOS is flaky. 
Comment 3 Andreas Girardet 2005-08-18 13:10:01 UTC
Created attachment 46477 [details]
bootsector
Comment 4 Andreas Girardet 2005-08-18 13:10:55 UTC
Created attachment 46478 [details]
grub config
Comment 5 Andreas Girardet 2005-08-18 13:14:29 UTC
Good things take a while ;) ... sorry for the delay. create the dump with

dd if=/dev/sda of=firstsect.img bs=512 count=1


/etc/grub.conf reads:

setup --stage2=/boot/grub/stage2 (hd0,1) (hd0,1)
quit

Comment 6 Torsten Duwe 2005-08-18 13:56:13 UTC
OK. I may not have access to the (virtual) machine, but I'm nevertheless very 
confident that the virtual SCSI BIOS doesn't support LBA reads. No, this is 
not related to #103031, this is "only" a dysfunctional BIOS paired with a bad 
partition layout. 
 
Without LBA, the BIOS can only access the first 8GB of the disk (using XCHS). 
If anything that's required for booting (stage2, kernel, initrd files) is 
beyond that, the bootstrap fails. This even exceeds your (correct) assumption 
about the 1024 cyl limit. 
 
I suggest you use a /boot partition at the beginning of the disk. 
Comment 7 Andreas Girardet 2005-08-18 20:03:40 UTC
I might be wrong but:

Your answer makes logically NO sense. I have been running VMWARE for many years.
also in many ESX/GSX environments in large corporates with very large discs. I
can assure you that I have never ever received this error, even if I did not set
up /boot on the first partition. In case of SUSE and other distro's this is not
done anyhow by default and as such you would expect this to be tested again and
again by many people. Logically your assertion does not make sense and I cannot
accept this. You will see that this bug will come up again and again with VMWARE
users that use the default partitioning and a larger disc file. If you don't fix
it now, then you have to fix it at some point.

With SLES this was NOT a problem.

Comment 8 Andreas Girardet 2005-08-18 22:17:06 UTC
Just double checked with NLD and it does give a related issue. Grub fails to
install, when install manually it also give Error 18.

But when I use lilo all is fine and regardless of where /boot is the system runs
fine. I used the rescue disc to create a lilo.conf on the Error 18 failing system.

 Would that not just prove it is a grub issue and not a VMWARE bios issue?
Comment 9 Torsten Duwe 2005-08-19 09:39:06 UTC
Not quite. Lilo has a different philosophy, and gathers _no_ information at 
boot time, but relies on the settings determined when the lilo binary was run. 
Grub works with the BIOS to find a way to get the kernel into memory; if this 
cooperation is bad, grub will fail. 
 
On the matter of default partitioning I'm also amongst the ones saying that 
it's, well, sub-optimal, to put it mildly. 
 
I don't have vmware here (don't like it & don't want it), but if there's a 
test I can think of I'll ask you to run that. In the meantime, can you attach 
the lilo.conf you succeeded with? 
Comment 10 Andreas Girardet 2005-08-24 05:23:47 UTC
# Modified by YaST2. Last modification on Thu Aug 18 17:06:35 2005


timeout = 80
prompt
default = Linux
Here the lilo.conf.

boot = /dev/sda

image = /boot/vmlinuz
    label = Linux
    initrd = /boot/initrd
    optional
    root = /dev/sda2
    vga = 0x332
    append = "selinux=0 splash=silent resume=/dev/sda1 elevator=as showopts"

Comment 11 Andreas Girardet 2005-08-24 05:37:29 UTC
BTW .... 

Why should lilo work and grub not? Seems like grub is buggy or is missing
something that lilo has?

Of course lilo.conf does not contain "Here the lilo.conf.

timeout = 80
prompt
default = Linux

boot = /dev/sda

image = /boot/vmlinuz
    label = Linux
    initrd = /boot/initrd
    optional
    root = /dev/sda2
    vga = 0x332
    append = "selinux=0 splash=silent resume=/dev/sda1 elevator=as showopts"
Comment 12 Torsten Duwe 2005-08-24 09:43:17 UTC
Re-read the first paragraph of comment #9. 
Comment 13 Andreas Girardet 2005-08-24 10:27:07 UTC
Ok of course ..... It seems odd to me however that a modern bios would have such
an issue. For example on the novell internal linux-desktop mailing list someone
just asked about the same error today on an IBM T41p installing a new kernel he
got that error. It seems to me that this "philosophy" seems to even reject new
machines and new bioses. If that is the case I would consider this "philosphy"
buggy and someone (you?) write a patch to change its behaviour. I find it very
disturbing that the default bootloader we are using has such a flaky behaviour
when default partitioning is selected (/bot not as first). I think that either
way this causes trouble in normal userland and a lot of users are just thrown
into deep end and give up on Linux, just because of the "philosophy" of the
"GRand" Unified Bootloader. IMHO this is buggy and not a feature.

Comment 14 Torsten Duwe 2005-08-24 11:50:08 UTC
"modern bios" is an oxymoron. Additionally, you'd be surprised how buggy a 
BIOS can be. 
 
Which SLES are you talking about in comment #7 ? NLD is based on SLES9. 
 
Which bug was filed for the T41p? 
 
 
Comment 15 Torsten Duwe 2005-08-24 12:28:06 UTC
Can you do me a big favor and tell me 
what /sys/firmware/edd/int13_dev80/sectors contains for the failing disk? 
Comment 16 Andreas Girardet 2005-08-24 20:27:24 UTC
nld:~ # cat /sys/firmware/edd/int13_dev80/sectors
0x4ffd625
Comment 17 Dr. Werner Fink 2005-08-26 10:12:09 UTC
Why I've get this bug back?  I'm not familiar with grub nor vmware.
Comment 18 Torsten Duwe 2005-11-04 13:39:08 UTC
*** Bug 131587 has been marked as a duplicate of this bug. ***
Comment 19 Torsten Duwe 2006-06-13 11:12:28 UTC
*** Bug 130000 has been marked as a duplicate of this bug. ***
Comment 22 Forgotten User VV1WK2zJAY 2006-06-15 21:48:31 UTC
This issue was recently brought to VMware's attention:

Can someone provide the following info:

Version and build number of VMware GSX Server where problem discovered?
Host operationg system type and version?
For the SUSE LINUX 10 install, are you using virutal SCSI or virtual IDE driver?

Thx!

Comment 23 Andreas Girardet 2006-06-15 23:28:45 UTC
3.1.0-9089
Host: SLES9-SP3
Virtual SCSI
Comment 24 Forgotten User VV1WK2zJAY 2006-06-21 00:52:38 UTC
Setting up a duplicate environment to reproduce the problem.

I'll report any findings on this bug.

Comment 25 Torsten Duwe 2007-02-02 13:01:12 UTC
*** Bug 138425 has been marked as a duplicate of this bug. ***
Comment 26 Torsten Duwe 2007-02-02 14:11:10 UTC
*** Bug 159426 has been marked as a duplicate of this bug. ***
Comment 27 Andreas Jaeger 2007-08-11 15:38:00 UTC
No reaction since more than 2 months, therefore closing as CANTFIX (aka WONTFIX).

If you can provide the needed information, feel free to reopen the bug.