Bug 114718

Summary: boot process fails on primergy rx-300 (dual xeon, "serverworks" chipset)
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Stephan Lauffer <lauffer>
Component: KernelAssignee: Hannes Reinecke <hare>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: acpi, novell
Version: Beta 4   
Target Milestone: ---   
Hardware: Other   
OS: All   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: boot messages from console
boot messages from console

Description Stephan Lauffer 2005-09-01 14:14:39 UTC
both beta3 and beta4 hangs shortly after:
"Sytem Boot Control: The system has been set up
System Boot Control: Running /etc/init.d/boot.local"

This happened after installing stuff from CD1 during the 1st boot from hdd.

Some minutes ago I found a small workaround to continue the installation: start
installation twice from CD1, abort the installation and continue with "boot from
installed system". So I got the machine installed with a minimal ("text mode"
package selection) based system.
There (after the yast setup) I saw two faild init scripts "irq_balancer" and
"acpid".

After this I tried to boot again but the machine still hang shortly after the
first setup scripts. But on tty10 I found - hopefully - a some hint:
iop0: DMA/IO allocation for I2O controller failed
acpi: PCI interrupt for device 0000:03:08.0 disabled
dpti0: Trying to abort cmd=8303
This is the last message on tty10 on all other boot tests (as long as I try to 
boot from hdd).

let me know how I can help. greetings, stephan
Comment 1 Hubert Mantel 2005-09-05 12:54:12 UTC
Did you try a failsafe install? Do ACPI related kernel parameters (such as
"pci=noacpi" or even "acpi=off") help?
Comment 2 Stephan Lauffer 2005-09-05 13:27:15 UTC
booting in "failsafe" mode did not help (there we've acpi=off). but i never
tried pci=off in the default boot mode (here with vga=0x31a selinux=0
splash=silent showopts)... *test* ok, fails too. But here I can read some other
errors, too. A lot of "Sworks_agp: Unknown symbol" error messages on tty10. And
by this I noticed "pIIX4_smbus: Illegal Interrupt configuration (or out of
date)!" with the note that I should try fix_hstcfg=1 but this didn't help. :-/

btw: my "latest" linux running on this machines is kernel-smp-2.6.5-7.155.29
from sles9.
Comment 3 Hubert Mantel 2005-09-07 08:57:38 UTC
"pci=off" is not a legal kernel parameter AFAIK. Still it seems to be some sort
of interrupt problem. Could you please also try "apic" or "noapic" as parameter?
Andi, what is the latest status here? Is the APIC enabled per default?
Comment 4 Stephan Lauffer 2005-09-07 09:15:45 UTC
i tried "noapic" during the boot in failsafe mode. there we've:
acpi=off noapic

(adding acpi=on to the default boot doesn't help either - imho it's on by default)


Comment 5 Stephan Lauffer 2005-09-07 09:24:07 UTC
sorry - forgotten to mention: yes i tried pci=noacpi for sure
Comment 6 Hubert Mantel 2005-09-07 09:28:38 UTC
Then I'm out of ideas. Maybe Andi knows some additional trick or Thomas knows
about some special ACPI options that could help...
Comment 7 Stephan Lauffer 2005-09-07 09:33:52 UTC
i'll play around without eth kernel modules. since the basic installation
works...  (maybe I'll play around with new initrd... in the past the switch from
bcm5700 to tg3 caused some trouble)
Comment 8 Andreas Kleen 2005-09-07 09:38:44 UTC
In beta4 it is still on, but in CVS it is off.

But on a SMP machine APIC has to work anyways because the SMP kernel always
enables it by default.

I would suggest you configure a serial console and attach a full boot log.
Comment 9 Thomas Renninger 2005-09-07 10:16:00 UTC
Maybe one of these helps (maybe combined with pci=noacpi?):
acpi_irq_balance        [HW,ACPI] ACPI will balance active IRQs
                                default in APIC mode

acpi_irq_nobalance      [HW,ACPI] ACPI will not move active IRQs (default)
                                default in PIC mode

acpi_irq_pci=   [HW,ACPI] If irq_balance, Clear listed IRQs for use by PCI
                        Format: <irq>,<irq>...

I don't know much about the interrupt handling internals, just an idea how I got
some machines booting. You probably should still provide Andi with requested
information, he has much more knowledge about this stuff...
Comment 10 Stephan Lauffer 2005-09-07 11:13:48 UTC
Created attachment 49034 [details]
boot messages from console

hope this is what you asked for (:

notice: here I've used the default boot options by 10b4
Comment 11 Stephan Lauffer 2005-09-07 11:15:24 UTC
Andreas, please have a loot at my attachend from #10 and tell me if you need
other informations... thx
Comment 12 Stephan Lauffer 2005-09-07 12:18:24 UTC
oh - sorry. the boot_msg_rx300.txt from #10 used my initrd where I changed some
modules (I've used the same ones as on the sles9 machine). i'll add a second in
a minute... 
Comment 13 Stephan Lauffer 2005-09-07 12:31:01 UTC
Created attachment 49045 [details]
boot messages from console

This showhs the output on ttyS0 with SuSE-10b4, default kernel appends...
initrd by SuSE with no changes by me.
Comment 14 Stephan Lauffer 2005-09-07 13:38:28 UTC
something strange to me:

<--snip-->

Loading kernel/drivers/scsi/dpt_Loading Adaptec I2O RAID: Version 2.4 Build 5go
i2o.ko
Detecting Adaptec I2O RAID controllers...

[...]

ACPI: PCI Interrupt 0000:03:08.0[A] -> GSI 24 (level, low) -> IRQ 169
Adaptec I2O RAID controller 0 irq=169
     BAR0 f8c80000 - size= 100000
     BAR1 f8e00000 - size= 1000000

<--snap-->

but l8ter then:

<--snip-->

i2o: Checking for PCI I2O controllers...
ACPI: PCI Interrupt 0000:03:08.0[A] -> GSI 24 (level, low) -> IRQ 169
iop0: controller found (0000:03:08.0)
PCI: Unable to reserve mem region #1:100000@f8400000 for device 0000:03:08.0
iop0: device already claimed
iop0: DMA / IO allocation for I2O controller  failed

<--snap-->

this sound for me like "something" wants to initialize the adapter twice. I
didn't found such lines in boot.msg from 2.6.5-7.201-smp (sles9). Would you like
to have the boot.msg from the working 2.6.5-7.201-smp, too?
Comment 15 Thomas Renninger 2005-09-07 14:20:12 UTC
okir had some pci setup problems recently, maybe it's related?
Comment 16 Stephan Lauffer 2005-09-08 17:12:16 UTC
short notice: same on 2.6.13-8-smp from RC1
Comment 17 Olaf Kirch 2005-09-12 07:33:27 UTC
Please try again with RC2, or a current KOTD. This will probably be fixed 
by this change: 
 
Wed Sep  7 10:31:06 CEST 2005 - olh@suse.de 
- remove patches.fixes/revert-pci-rom.patch 
  add patches.suse/pci-rom-mapping.patch 
  [PATCH] Fix PCI ROM mapping 
  add patches.suse/pci_assign_unassigned_resources.patch 
  [PATCH] x86: pci_assign_unassigned_resources() update (115118) 
 
Comment 18 Stephan Lauffer 2005-09-12 08:16:13 UTC
failed: tested with kernel-smp-2.6.13-20050911121049.i586.rpm
do you wanna the boot messages again? should I test with
kernel-smp-debuginfo-2.6.13-20050911121049.i586.rpm?
Comment 20 Thomas Renninger 2005-11-15 12:49:44 UTC
I wonder why all these unknown symbols for the agp driver appear? However, this should be unrelated?

Seems as if wrong resources (IRQ/MEMregion) are exported by ACPI?
First, you should check for a new BIOS.
If you still have problems try pci=noacpi boot param.
If you still have problems, please also attach dmesg output with pci=noacpi and acpidmp output.
Comment 21 Hannes Reinecke 2005-11-15 14:13:06 UTC
Nope. All wrong. Say thank you to your friendly ADAPTEC driver writer:

drivers/scsi/dpt_i2o.c:
static struct pci_device_id dptids[] = {
        { PCI_DPT_VENDOR_ID, PCI_DPT_DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID,},
        { PCI_DPT_VENDOR_ID, PCI_DPT_RAPTOR_DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID,},
        { 0, }
};

drivers/message/i2o/pci.c:
/* PCI device id table for all I2O controllers */
static struct pci_device_id __devinitdata i2o_pci_ids[] = {
        {PCI_DEVICE_CLASS(PCI_CLASS_INTELLIGENT_I2O << 8, 0xffff00)},
        {PCI_DEVICE(PCI_VENDOR_ID_DPT, 0xa511)},
        {.vendor = PCI_VENDOR_ID_INTEL,.device = 0x1962,
         .subvendor = PCI_VENDOR_ID_PROMISE,.subdevice = PCI_ANY_ID},
        {0}
};

So first dpt_i2o is loaded, and then i2o_block is loaded afterwards. Both trying to initialize the same adapter. Knowing how picky these i2o stuff is, I'd guess the card's out to lunch after this.

You can try to disable the i2o modules per alias in /etc/modprobe.conf.
Comment 22 Jacob Fingerfold 2005-11-18 18:41:12 UTC
I just submitted another bug report #133448 on this same bug. Sorry, didn't find this until after I had already submitted. I've added the following lines to /etc/modules.conf without sucess:

alias i2o_block off
alias i2o_config off
Comment 23 Jacob Fingerfold 2005-11-21 17:15:27 UTC
Testing with Fedora Core 4 and reading the <a href="http://i2o.shadowconnect.com/index.php">I2O on Linux - Home</a> page indicates that the aacraid module works properly. I've tried to override the module selections during installation of SuSE 10.0 without success. 

Please instuct how to override auto detected modules during the install of SuSE 10.
Comment 24 Andreas Kleen 2005-11-21 17:26:12 UTC
Quit graphical yast with the abort button. The text installer and following
graphical yast will be in manual mode.
Comment 25 Hannes Reinecke 2005-11-22 07:49:05 UTC
*** Bug 134448 has been marked as a duplicate of this bug. ***
Comment 26 Jacob Fingerfold 2005-12-01 00:34:45 UTC
I used the method described at http://whocares.de/archive/000940.php and it seems to be booting OK. 
Comment 27 Hannes Reinecke 2006-02-07 09:10:13 UTC
This issue has finally been resolved by this patch:

Author: Ben Collins <bcollins@ubuntu.com>  2005-12-18 03:39:23
Committer: Linus Torvalds <torvalds@g5.osdl.org>  2005-12-18 20:19:43
Parent: 48ea753075aa15699bd5fac26faa08431aaa697b (Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6)
Child:  e5508c13ac25b07585229b144a45cf64a990171e ([PATCH] dpt_i2o fix for deadlock condition)

    [PATCH] i2o: Do not disable pci device when it's in use
    
    When dpt_i2o is loaded first, i2o being loaded would cause it to call
    pci_device_disable, thus breaking dpt_i2o's use of the device.  Based on
    similar usage of pci_disable_device in other drivers.
    
    Signed-off-by: Ben Collins <bcollins@ubuntu.com>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Patch is included for 10.1 Beta4.