Bug 115232

Summary: "Boot-offboard" is problematic
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Jan Engelhardt <jengelh>
Component: InstallationAssignee: Steffen Winterfeldt <snwint>
Status: VERIFIED FIXED QA Contact: Klaus Kämpf <kkaempf>
Severity: Normal    
Priority: P5 - None CC: andreas.pfaller, behlert, ihno
Version: Beta 3   
Target Milestone: ---   
Hardware: x86   
OS: Linux   
Whiteboard:
Found By: Beta-Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on: 104517    
Bug Blocks:    

Description Jan Engelhardt 2005-09-04 16:31:21 UTC
The SUSE kernels all include the option "boot from offboard IDE chipsets first" (CONFIG_BLK_DEV_OFFBOARD). For laptops with PCMCIA CD drives, this makes hda the CD drive (at least on my Sony thing) and hdc/hde the harddisk. This is likely to bring problems with the bootloader - not only LILO - when the CD drive is not present at boot! (Because the harddisk would be at hda rather than hdc then.)
Comment 1 Jens Axboe 2005-09-05 13:12:48 UTC
Have you actually seen problems with this, or are you just guessing? Even if the
external CDROM is hda, it's still not assigned the 0x80 bios disk that typically
is the boot device.
Comment 2 Jan Engelhardt 2005-09-05 15:06:51 UTC
What does this have to do with the 0x80 device? It's hdc, and therefore (I
guess _at this point_) that the bootloader config that yast will write contains
/dev/hdc, at least if you choose LILO. I do not know how GRUB translates
(hd0,x) into /dev/hdX. But certainly the kernel will certainly get it wrong
when parsing the root=, resume= or whatever attributes.
Comment 3 Christoph Thiel 2005-09-05 15:09:43 UTC
Jan, you didn't actually answer the question asked ;) The bug is still in NEEDINFO.

To answer your grub question:

$ cat /boot/grub/device.map
(fd0)   /dev/fd0
(hd0)   /dev/hda
$
Comment 4 Jan Engelhardt 2005-09-05 15:48:34 UTC
>Jan, you didn't actually answer the question asked ;)

That's why I left it at NEEDINFO.

>To answer your grub question:

No, I did not mean that. Anyway, GRUB interprets (hd0) as 0x80, which is
correct. See below for the remaining problem I see.

Ok. From start. I insert the BOOT-CD, hypothethically doing a fresh install.
Kernel spouts:

	hda: TOSHIBA CD-ROM XM-7002Bc
	ide0 at 0x180-0x187,0x386 on irq 3
	hda: ATAPI 16X CD-ROM drive, 128kB Cache
	ALI15X3: chipset revision 196
	..
	  ide1: BM-DMA at 0x1400-0x1407 ..
	  ide2: BM-DMA at 0x1408-0x140f ..
	Probing IDE interface ide1...
	hdc: TOSHIBA MK2003GAH, ATA DISK drive
	ide1 at 0x1f0-0x1f7,0x3f6 on irq 14
	 hdc: hdc1 hdc2
	Probing IDE interface ide2...

Now, if the user chooses to have LILO installed as a bootloader instead of
GRUB, and we know LILO uses /dev/ notation rather than (hd*), YAST must put
boot=/dev/hdc into the lilo configuration to successfully write the MBR. LILO
of course translates the boot=/dev/hdc into 0x80, but if the user reruns lilo
under the finally-installed-system, it will stumble through hdc which really
should have been hda.

Another thing which happens before the first boot from harddisk: YAST will put
a root=/dev/hdc2 and resume=/dev/hdc1 as boot options to the kernel in the
bootloader configs (GRUB or LILO, affects both). But - when the kernel boots,
the harddrive is always hda(*), and therefore fails to mount the root.

See what I mean?

(*) Exception is possible if PCMCIA is _compiled in_ (not good!) and
boot-from-offboard is active.
Comment 5 Jan Engelhardt 2005-09-05 15:49:19 UTC
I have to add: the PCMCIA devices are loaded before "ide-disk" kernel module is
- this is why the CDROM becomes hda in the first place.
Comment 6 Jens Axboe 2005-09-05 17:29:31 UTC
I see your point, we should load ide-disk first always and avoid this confusion
(and possible bug with lilo). Hubert, can you assign this to the appropriate
person for the module loading stuff?
Comment 7 Steffen Winterfeldt 2005-09-05 17:45:35 UTC
Btw, see bug 104517 for people actually trying this. He doesn't actually 
get remotely far enough for this to matter to him, though. :-( 
Comment 8 Jan Engelhardt 2005-09-05 19:03:11 UTC
Move PCMCIA detection after ide-disk loading or change the boot-offboard-first
thing, that's the question ;)
Comment 9 Hubert Mantel 2005-09-07 14:16:00 UTC
I don't think we will change the kernel configuration at this point of time as
it might be far more wide reaching. I propose to just change the module load
order. Steffen, isn't this your arena?
Comment 10 Steffen Winterfeldt 2005-09-08 09:34:47 UTC
I've no idea how a kernel compile time option relates to some  
module loading order. And ide-disk _is_ loaded before pcmcia or not?  
  
Passing the bug to our pcmcia experts. 
Comment 11 Christian Zoz 2005-09-08 17:41:56 UTC
Steffen, what can i do? This is not a pcmcia problem. pcmcia detects and
initializes the device properly. And what comes first gets first interface,
thats a kernel 'problem'. So it seems that ide-disk is not loaded early enough.
At least the disk interface registration is later then pcmcia CD interface
registration.

I don't know, when what module is loaded in initrd/linuxrc. Steffen, thats is
definitively your part.
Comment 12 Andreas Pfaller 2005-09-08 23:39:53 UTC
Im am not sure if this is related. In the past (9.2) I have
been able to get the "normal" ide device numbering by passing
"ide=reverse" as a kernel boot option. This does not work with
RC1. The motherboard features an VIA8235 and also a HPT372
controller. My boot disk is connected to the VIA controller.
I have not tried to complete the RC1 installation so I don't
know if it would be lead to a bootable installation - given
my past experiences with this issue I doubt it.

Personally I consider SUSEs nonstandard choice for the setting
of CONFIG_BLK_DEV_OFFBOARD a mistake since it is surprising - just
check about any linux FAQ. It also easily leeds to fatal errors,
e.g when imageing disks with the rescue system). It has happend
to to me (I have identical disks - both type and partitioning
connected as master to channel one on each controller) and I
consider myself as a very experienced linux user and developer.

Furthermore in the installation screens (concerning partitioning)
I found no direct clue which helps identifying which disks are
concerned. Showing the type of controller to which the disks
are connected would certainly help. In my current configuration
there were some clues due to the presence of other drives 
with unexpected device names after booting with ide=reverse
(e.g installation source CD still at hdg instead of hdc) but
this is easily overseen with possibly data loosing consequences.
Comment 13 Steffen Winterfeldt 2005-09-09 09:19:13 UTC
Sorry, I've no idea what you want from me. The original report is about a 
kernel compile time option. 
 
And this discussion about ide-disk loading too late is complete crap. During 
installation (and that's what I care about) it's loaded way before any 
pcmcia thing. 
Comment 14 Hubert Mantel 2005-09-09 10:29:25 UTC
So who determines the order of modules to get loaded in the initrd for the
installed system? This is where this issue needs to get fixed. We certainly will
NOT change a kernel configuration option between RC1 and RC2. Hannes, isn't this
your area?
Comment 15 Hannes Reinecke 2005-09-09 10:49:56 UTC
Yes and no.

We have hit another 'kernel changes the device enumeration' bug.
Of course we can try to mimick the estimated enumeration within the installed
system and load the modules in a certain order.
However, there is no guarantee that this will work.
And no, I won't make any modifications to do so.

After all, that's exactly what persistent device names are for.
If you put root=/dev/disk/by-path/pci-XXXX-ide-XXX or
root=/dev/disk/by-id/ata-XXXXX in there everything would be fine.

So, lookup the respective symlink under /dev/disk/by-XXX and use that as the
argument to root=.

Of course it would be nice if YaST supports it.
Persistent naming support was promised for two releases now ...
Comment 16 Jan Engelhardt 2005-09-09 12:30:57 UTC
root= is not really meaningful for the installation cd, as the root is an initrd
after all. Check the order of `lsmod` - ide-disk is probably loaded after pcmcia
/ ide_cs.
Comment 17 Hannes Reinecke 2005-09-09 12:46:21 UTC
I didn't mean the installation cd. I meant the 'root=' commandline argument for
the _installed_ system. After all, you can boot with the installation cd, right?
It's the installed system which doesn't boot.

Unless I'm totally off kilter here.
Comment 18 Christian Zoz 2005-09-12 10:21:21 UTC
You got that wrong. This bug is about installation time. The problem is that hda
and hdc are swapped just at installation time, but not later at a normal boot.

Of course, if YaST would use by-path or by-id symlinks to devnodes, that would
be better. But i guess we cannot use them for bootloader configuration and not
for the resume= kernel parameter.

The goal is to see the disk as hda not only at normal boot, but also at
installation time.
Comment 19 Christian Zoz 2005-09-12 10:23:16 UTC
I'm just testing this, but there is another bug blocking (bug104517): ide_cs is
not loaded.

When i load it manually i can proceed installation and disk is hda and CDROM is
hde. But looking at the kernel messages at console 4 it looks if ide disks were
checked after pcmcia initialisation. Thus it might be, that i hit the problem of
this bug, if ide_cs would be loaded immediately (and not later manually).

In the list of loaded modules i can verify that ide_disk is loaded before pcmcia
was initialized. I will check that again as soon as i have a linuxrc that loades
ide_cs immediately.
Comment 20 Christian Zoz 2005-09-12 11:06:23 UTC
Booting with 'insmod=ide_cs' works. Also then ide_cs is loaded much later then
ide_disk. But alim15x3 is loaded after ide_cs. So it is clear why the cdrom
becomes hda and the disk hdc.

Steffen, is that now something you can change?
Comment 21 Jan Engelhardt 2005-09-12 15:01:54 UTC
ide-cs IS loaded sometime during the installation -- otherwise linuxrc could not
continue. Remember that the CDROM is PCMCIA and thus _must need_ ide-cs somehow.
Apart from the hda/hdc swapping, everything else is 100% ok.
Comment 22 Steffen Winterfeldt 2005-09-13 14:19:57 UTC
ide-cs is now loaded automatically. But before internal controllers. 
 
You can use 'scsibeforeusb=1' to swap that. 
Comment 23 Christian Zoz 2005-09-14 07:16:05 UTC
The problem that ide_cs was not loaded automatically was not topic of this bug.

I don't think this is a proper solution. Why don't you initialize internal
controllers first?
Comment 24 Christian Zoz 2005-09-14 07:17:06 UTC
See bug 104517 comment 34
Comment 25 Jan Engelhardt 2005-09-14 07:58:01 UTC
Please note: This issue originally arose with the SUSE 9.3 Boot-CD. The reason
it is marked as 10.0 is because the 10.0' _kernel_ still includes that option.

The _loading procedure_ however might indeed have changed between 9.3 and 10.0,
and I have not seen this until yet. Now it sounds like the kmodule loading order
is: ide-cs ide-disk alim54, so that the kernel option does not help at all.
Comment 26 Steffen Winterfeldt 2005-09-14 08:41:13 UTC
ad 23: because there's no way I'm going to make comment 22 the default 
while we're at RC stage. 
Comment 27 Christian Zoz 2005-09-14 09:02:59 UTC
Of course not for this release, but for the next. Is this changed for next
release? If yes, then i'll accept 'fixed'.
Comment 28 Steffen Winterfeldt 2005-09-14 09:07:06 UTC
yes 
Comment 29 Ihno Krumreich 2005-12-18 18:58:15 UTC
Closed.