|
Bugzilla – Full Text Bug Listing |
| Summary: | boot process fails on primergy rx-300 (dual xeon, "serverworks" chipset) | ||
|---|---|---|---|
| Product: | [openSUSE] SUSE LINUX 10.0 | Reporter: | Stephan Lauffer <lauffer> |
| Component: | Kernel | Assignee: | Hannes Reinecke <hare> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | acpi, novell |
| Version: | Beta 4 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | All | ||
| Whiteboard: | |||
| Found By: | Other | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
boot messages from console
boot messages from console |
||
Did you try a failsafe install? Do ACPI related kernel parameters (such as "pci=noacpi" or even "acpi=off") help? booting in "failsafe" mode did not help (there we've acpi=off). but i never tried pci=off in the default boot mode (here with vga=0x31a selinux=0 splash=silent showopts)... *test* ok, fails too. But here I can read some other errors, too. A lot of "Sworks_agp: Unknown symbol" error messages on tty10. And by this I noticed "pIIX4_smbus: Illegal Interrupt configuration (or out of date)!" with the note that I should try fix_hstcfg=1 but this didn't help. :-/ btw: my "latest" linux running on this machines is kernel-smp-2.6.5-7.155.29 from sles9. "pci=off" is not a legal kernel parameter AFAIK. Still it seems to be some sort of interrupt problem. Could you please also try "apic" or "noapic" as parameter? Andi, what is the latest status here? Is the APIC enabled per default? i tried "noapic" during the boot in failsafe mode. there we've: acpi=off noapic (adding acpi=on to the default boot doesn't help either - imho it's on by default) sorry - forgotten to mention: yes i tried pci=noacpi for sure Then I'm out of ideas. Maybe Andi knows some additional trick or Thomas knows about some special ACPI options that could help... i'll play around without eth kernel modules. since the basic installation works... (maybe I'll play around with new initrd... in the past the switch from bcm5700 to tg3 caused some trouble) In beta4 it is still on, but in CVS it is off. But on a SMP machine APIC has to work anyways because the SMP kernel always enables it by default. I would suggest you configure a serial console and attach a full boot log. Maybe one of these helps (maybe combined with pci=noacpi?):
acpi_irq_balance [HW,ACPI] ACPI will balance active IRQs
default in APIC mode
acpi_irq_nobalance [HW,ACPI] ACPI will not move active IRQs (default)
default in PIC mode
acpi_irq_pci= [HW,ACPI] If irq_balance, Clear listed IRQs for use by PCI
Format: <irq>,<irq>...
I don't know much about the interrupt handling internals, just an idea how I got
some machines booting. You probably should still provide Andi with requested
information, he has much more knowledge about this stuff...
Created attachment 49034 [details]
boot messages from console
hope this is what you asked for (:
notice: here I've used the default boot options by 10b4
Andreas, please have a loot at my attachend from #10 and tell me if you need other informations... thx oh - sorry. the boot_msg_rx300.txt from #10 used my initrd where I changed some modules (I've used the same ones as on the sles9 machine). i'll add a second in a minute... Created attachment 49045 [details]
boot messages from console
This showhs the output on ttyS0 with SuSE-10b4, default kernel appends...
initrd by SuSE with no changes by me.
something strange to me:
<--snip-->
Loading kernel/drivers/scsi/dpt_Loading Adaptec I2O RAID: Version 2.4 Build 5go
i2o.ko
Detecting Adaptec I2O RAID controllers...
[...]
ACPI: PCI Interrupt 0000:03:08.0[A] -> GSI 24 (level, low) -> IRQ 169
Adaptec I2O RAID controller 0 irq=169
BAR0 f8c80000 - size= 100000
BAR1 f8e00000 - size= 1000000
<--snap-->
but l8ter then:
<--snip-->
i2o: Checking for PCI I2O controllers...
ACPI: PCI Interrupt 0000:03:08.0[A] -> GSI 24 (level, low) -> IRQ 169
iop0: controller found (0000:03:08.0)
PCI: Unable to reserve mem region #1:100000@f8400000 for device 0000:03:08.0
iop0: device already claimed
iop0: DMA / IO allocation for I2O controller failed
<--snap-->
this sound for me like "something" wants to initialize the adapter twice. I
didn't found such lines in boot.msg from 2.6.5-7.201-smp (sles9). Would you like
to have the boot.msg from the working 2.6.5-7.201-smp, too?
okir had some pci setup problems recently, maybe it's related? short notice: same on 2.6.13-8-smp from RC1 Please try again with RC2, or a current KOTD. This will probably be fixed by this change: Wed Sep 7 10:31:06 CEST 2005 - olh@suse.de - remove patches.fixes/revert-pci-rom.patch add patches.suse/pci-rom-mapping.patch [PATCH] Fix PCI ROM mapping add patches.suse/pci_assign_unassigned_resources.patch [PATCH] x86: pci_assign_unassigned_resources() update (115118) failed: tested with kernel-smp-2.6.13-20050911121049.i586.rpm do you wanna the boot messages again? should I test with kernel-smp-debuginfo-2.6.13-20050911121049.i586.rpm? I wonder why all these unknown symbols for the agp driver appear? However, this should be unrelated? Seems as if wrong resources (IRQ/MEMregion) are exported by ACPI? First, you should check for a new BIOS. If you still have problems try pci=noacpi boot param. If you still have problems, please also attach dmesg output with pci=noacpi and acpidmp output. Nope. All wrong. Say thank you to your friendly ADAPTEC driver writer:
drivers/scsi/dpt_i2o.c:
static struct pci_device_id dptids[] = {
{ PCI_DPT_VENDOR_ID, PCI_DPT_DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID,},
{ PCI_DPT_VENDOR_ID, PCI_DPT_RAPTOR_DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID,},
{ 0, }
};
drivers/message/i2o/pci.c:
/* PCI device id table for all I2O controllers */
static struct pci_device_id __devinitdata i2o_pci_ids[] = {
{PCI_DEVICE_CLASS(PCI_CLASS_INTELLIGENT_I2O << 8, 0xffff00)},
{PCI_DEVICE(PCI_VENDOR_ID_DPT, 0xa511)},
{.vendor = PCI_VENDOR_ID_INTEL,.device = 0x1962,
.subvendor = PCI_VENDOR_ID_PROMISE,.subdevice = PCI_ANY_ID},
{0}
};
So first dpt_i2o is loaded, and then i2o_block is loaded afterwards. Both trying to initialize the same adapter. Knowing how picky these i2o stuff is, I'd guess the card's out to lunch after this.
You can try to disable the i2o modules per alias in /etc/modprobe.conf.
I just submitted another bug report #133448 on this same bug. Sorry, didn't find this until after I had already submitted. I've added the following lines to /etc/modules.conf without sucess: alias i2o_block off alias i2o_config off Testing with Fedora Core 4 and reading the <a href="http://i2o.shadowconnect.com/index.php">I2O on Linux - Home</a> page indicates that the aacraid module works properly. I've tried to override the module selections during installation of SuSE 10.0 without success. Please instuct how to override auto detected modules during the install of SuSE 10. Quit graphical yast with the abort button. The text installer and following graphical yast will be in manual mode. *** Bug 134448 has been marked as a duplicate of this bug. *** I used the method described at http://whocares.de/archive/000940.php and it seems to be booting OK. This issue has finally been resolved by this patch: Author: Ben Collins <bcollins@ubuntu.com> 2005-12-18 03:39:23 Committer: Linus Torvalds <torvalds@g5.osdl.org> 2005-12-18 20:19:43 Parent: 48ea753075aa15699bd5fac26faa08431aaa697b (Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6) Child: e5508c13ac25b07585229b144a45cf64a990171e ([PATCH] dpt_i2o fix for deadlock condition) [PATCH] i2o: Do not disable pci device when it's in use When dpt_i2o is loaded first, i2o being loaded would cause it to call pci_device_disable, thus breaking dpt_i2o's use of the device. Based on similar usage of pci_disable_device in other drivers. Signed-off-by: Ben Collins <bcollins@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org> Patch is included for 10.1 Beta4. |
both beta3 and beta4 hangs shortly after: "Sytem Boot Control: The system has been set up System Boot Control: Running /etc/init.d/boot.local" This happened after installing stuff from CD1 during the 1st boot from hdd. Some minutes ago I found a small workaround to continue the installation: start installation twice from CD1, abort the installation and continue with "boot from installed system". So I got the machine installed with a minimal ("text mode" package selection) based system. There (after the yast setup) I saw two faild init scripts "irq_balancer" and "acpid". After this I tried to boot again but the machine still hang shortly after the first setup scripts. But on tty10 I found - hopefully - a some hint: iop0: DMA/IO allocation for I2O controller failed acpi: PCI interrupt for device 0000:03:08.0 disabled dpti0: Trying to abort cmd=8303 This is the last message on tty10 on all other boot tests (as long as I try to boot from hdd). let me know how I can help. greetings, stephan