Bugzilla – Bug 116763
HP nx8220 no longer installingas in 9.3
Last modified: 2005-12-08 16:49:23 UTC
I've a HP nx8220 here that stops during kernel initialization when installation is running. With 9.3 the installation worked flawless, with 10.0RC2 I have to disable ACPI for installation (which results in 'no network found', but that's another story). That's a clear regression :( It looks like interrupts are set wrong or at the wrong type.
did you try the other options like irqpoll?
pci=noacpi boots the machine and installation should work, still a blocker?
Please connect serial console and add boot log, also add acpidmp
I'd happily do so if the machine had a serial connector :( All these new notebooks don't have one, and we do not have a docking station for that type of notebook :(
Sigh. we really really need usb or firewire console. You could boot with vga=0x0f01 (to make the font as small as possible) and the photograph it (including scrollback) acpidmp should work with pci=noacpi/acpi=off anyways.
ok, booting with vga=0x0f01 and photographing the point where it stopped worked. sorry no scroll back working. I'll make an installation with pci=noapic and try the acpidmp then, if that's ok (or should it work while in linuxrc? I don't think so, or?)
Created attachment 49783 [details] Jpeg of point of stop
Created attachment 49785 [details] output of acpidmp That's the output with pci=noacpi
Lowering severity.
Ok, two weeks have passed. Anything new? 10.0 is hitting the street in a few months and it looks more and more like no HP from that generation will work without workaround and manual intervention (See also the nc6230 we gave you, Thomas).
*** Bug 116971 has been marked as a duplicate of this bug. ***
I think I got it. The problem is, I don't know why it works -> adding some info. The problem is an endless loop when iterating over a list. Adding this debug info into our (or vanilla 2.6.14-rcX) kernels (be careful for whitespaces): --- vanilla-linux-2.6.14-rc3/drivers/acpi/scan.c.orig 2005-10-03 18:27:00.000000000 +0200 +++ vanilla-linux-2.6.14-rc3/drivers/acpi/scan.c 2005-10-03 18:27:51.000000000 +0200 @@ -555,6 +555,7 @@ spin_lock(&acpi_device_lock); list_for_each_safe(node, next, &acpi_device_list) { + printk(KERN_ERR "Driver attach: prev %p - node %p - next %p\n", node->prev, nod e, node->next); struct acpi_device *dev = container_of(node, struct acpi_device, g_list); Results in a never ending list iteration (also added some more debug info, please note the prev/node/next pointers that are all equal): acpi_bus_match for driver [motherboard] for device [C23D] Driver attach: prev c18eb820 - node c18eb820 - next c18eb820 scan-0616 [02] acpi_driver_attach : Found driver [motherboard] for device [C23D] acpi_bus_match for driver [motherboard] for device [C23D] Driver attach: prev c18eb820 - node c18eb820 - next c18eb820 scan-0616 [02] acpi_driver_attach : Found driver [motherboard] for device [C23D] acpi_bus_match for driver [motherboard] for device [C23D] ... After I realised the bug may not be located where the list is actually touched, I compared changes with old (2.6.12.6) kernels. The third shot was the hit, reverting this change makes the machine boot again: --- vanilla-linux-2.6.14-rc3.orig/drivers/acpi/scan.c 2005-10-03 18:21:35.000000000 +0200 +++ vanilla-linux-2.6.14-rc3/drivers/acpi/scan.c 2005-10-03 18:21:58.000000000 +0200 @@ -1111,7 +1111,7 @@ * * TBD: Assumes LDM provides driver hot-plug capability. */ - result = acpi_bus_find_driver(device); + acpi_bus_find_driver(device); end: if (!result) Will also attach this as patch that fixes the problem. However, a kernel hacker has to review -> I don't know why this fixes anything ...
Created attachment 51332 [details] Reverted check of function value - please review someone...
I asked the author (rajesh.shah@intel.com) who made this change and it came out that this has been added accidently: FWD: ________________________________________________ Looking at this closely now, checking for the result does appear to be wrong. Binding a driver for a device should be optional, and should not fail adding the device to the acpi list. I suspect a previous iteration through this code failed to find a driver match, returned failure to the caller and caused bad things to happen. So, your patch looks good to me. cc'ing Bob for his expert opinion. ________________________________________________ Olaf can you add the patch to 10.0 and if it makes sense in 10.1 Alpha.
*** Bug 98280 has been marked as a duplicate of this bug. ***
Done. It's still scary that this mistake can mess up that list. There are other conditions where we bail out of that function without adding the device entry - will this have similar devastating effects?
No idea, if something similar pops up, I know at least where to search now ...
The patch helps an Asus K8N-DL to boot too. Previously it would also hang with an endless loop in the device list. It still doesn't boot unfortunately without pci=noacpi, but that's probably a different problem and it's better to have basic ACPI enabled.
*** Bug 117452 has been marked as a duplicate of this bug. ***
Thomas, are you sure about this duplicate? These are totally differen symptoms: one machine crashes at boot immediately, the other hangs much later during package installation...
After installation the system crashs on bootup with several stack traces, but I can't scroll up to see more. I see only a stack trace of ata_piix. -> You are right it is strange that he could boot the system for installation. Could it be that some device's resources were not requested during installation (firewire, ..., whatever) that haven't been touched during installation, but get requested after installation reboot? Then it would make sense. Still it is strange that the ooopses happen in the disk driver?!? -> I installed 10_0 CVS branch kernel and it booted fine without adding acpi=oldboot. Matthias: Please reopen if you still see problems with the kernel I installed or the next YOU update kernel ...
Thomas, I saw it first today. You have installed the 10_0 CVS branch kernel default kernel but I need the smp kernel (HT CPU). The problem still exists in the -smp kernel. I installed the kernel-smp from dist.
this is not in the current 10.0 update (matthias confirmed it)
Sorry for the confusion. The "HP nx8220 no longer installingas in 9.3" fix is in the update kernel. It seems as if comment #23 is valid and Matthias' problem is not a duplicate? I will close this one again and reopen the initial report from Matthias.