Bug 154964

Summary: apic=verbose/debug enables local apic
Product: [openSUSE] SUSE Linux 10.1 Reporter: Timo Hoenig <thoenig>
Component: KernelAssignee: Andreas Kleen <ak>
Status: VERIFIED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: behlert, dkukawka, jnelson-suse, len.brown, luming.yu, michel.munnix, trenn
Version: Beta 6   
Target Milestone: ---   
Hardware: i386   
OS: Other   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: hwinfo for Dell Latitude D600
acpidmp, acpidump and DSDT for Dell D600
dmidecode for Dell D600

Description Timo Hoenig 2006-03-03 12:10:51 UTC
Dell D600 locks up hard on lid switch events.

Pressing the lid switch on my Dell D600 locks up the machine.  Sometimes it locks up on the first lid switch event, somtimes it takes a couple more (max. ~10).  SysReq does not work.  It occurs with init=/bin/bash, too.  No modules loaded.

Maybe this helps a little: it is an regression:

 * The system does not hang with with 2.6.16-rc3-git3-2-default
 * The bug occurs when using 2.6.16-rc5-git2-2-default (Beta6 kernel)

Thomas, did we have any ACPI changes which could be responsible?

Danny, could you please try whether your system (similar model of the D600 series) is affected aswell?
Comment 1 Timo Hoenig 2006-03-03 12:52:16 UTC
Andi, booting with 'noapic' helps.
Comment 2 Timo Hoenig 2006-03-03 12:54:48 UTC
Sorry, false alarm.  After pressing the lid switch button for another few times the system crashed again.
Comment 3 Andreas Kleen 2006-03-03 13:16:28 UTC
Just because it affects your laptop doesn't mean it's a blocker.

I don't know what could have caused it. How about you do a binary
search using kerneltest kernel to narrow down the breaking point better? 
Comment 4 Timo Hoenig 2006-03-03 13:21:26 UTC
It's not my laptop.  It's one widely used in our company, that's why it is a blocker.

As I am not into this stupid severity game feel free to mark it as enhancement. I could not care less.

Comment 6 Olaf Kirch 2006-03-03 14:01:27 UTC
Reassigning to Thomas Renninger, as it seems to be an ACPI issue.

Please provide hwinfo for this machine.
Comment 8 Timo Hoenig 2006-03-03 14:19:25 UTC
Created attachment 71155 [details]
hwinfo for Dell Latitude D600
Comment 9 Timo Hoenig 2006-03-03 14:28:53 UTC
Booting with 'ec_intr=0' does not help.

Other ACPI events (power button, AC adapter) do not affect the system.
Comment 10 Timo Hoenig 2006-03-03 14:52:47 UTC
2.6.16-rc5-git5-20060302183043-default is affected, too.

Did we have any non-upstream changes for the ACPI code?

The upstream changes do not look suspicious to me.

Andi Kleen	  [PATCH] x86_64: Better ATI timer fix
Andi Kleen	  [PATCH] x86_64: Disable ACPI blacklist by year for now on x86-64
Andi Kleen	  [PATCH] x86-64/i386: Use common X86_PM_TIMER option and make it EMBEDDED
Pavel Machek  [PATCH] suspend-to-ram: allow video options to be set at runtime
Bjorn Helgaas [PATCH] ACPI: fix vendor resource length computation
Bjorn Helgaas [PATCH] HPET: handle multiple ACPI EXTENDED_IRQ resources

Danny, any news at your front?
Comment 11 Danny Al-Gaaf 2006-03-03 14:59:30 UTC
Yes, the same here with a DELL C640 and the Beta 6 kernel
Comment 12 Danny Al-Gaaf 2006-03-03 15:05:30 UTC
And jet here an other one: DELL Inspiron 8200
Comment 13 Timo Hoenig 2006-03-03 15:11:03 UTC
OK, thanks for the info.

Adjusting summary.
Comment 14 Thomas Renninger 2006-03-03 15:49:41 UTC
nolapic helps.
No idea whether it is a hidden ACPI bug that got revealed with enabling local apic or whether these machines just don't work with it...

Andi, this got activated in the latest kernel by setting apic per default on i386 UP systems?
Do you think it's worth digging trying to fix ACPI/lapic code or do you think it's just broken hardware? At least some events/interrupts seem to be correctly processed.

Hmm, even it claims to enable APIC, ACPI interrupt routing is still done over PIC...
This machine has a serial console, I try to get something out of it.
Will play with this one a bit more...
Comment 15 Andreas Kleen 2006-03-03 16:06:08 UTC
As an easy solution we could blacklist it to disable the local APIC.
Attach dmidecode output for that.

Better would be to find where the problem is and fix APIC on
Can you attach acpidmp etc?

Len, could you please have someone look at this then?
Comment 16 Thomas Renninger 2006-03-03 16:15:34 UTC
Adding Luming who does the EC/interrupt ACPI stuff...
Luming any ideas on that or have you already seen something similar?
Comment 17 Timo Hoenig 2006-03-03 16:16:20 UTC
Created attachment 71176 [details]
acpidmp, acpidump and DSDT for Dell D600
Comment 20 Timo Hoenig 2006-03-03 18:08:52 UTC
Created attachment 71197 [details]
dmidecode for Dell D600
Comment 21 Forgotten User ZhJd0F0L3x 2006-03-03 18:14:24 UTC
it is not noapic but nolapic.
Maybe we should search for similarities of the different dell machines that all seem to have the same problem.
And maybe we have some dell contact that would be interested in this?

Timo, could you try a vanilla kernel and if it has the same problem report
on LKML/acpi-devel?
Comment 22 Andreas Kleen 2006-03-03 18:23:10 UTC
nolapic could be blacklisted too. The power of DMI.

But if it's more than one type of laptop you would need to collect
dmidecode from all of them and there might be always one missing.

Better to look into the problem with APIC on.
Comment 23 Jon Nelson 2006-03-04 03:48:12 UTC
I don't know if it is related, but I've got an Inspiron 2650 and I've had a /ton/ of problems with mysterious hangs with beta6.  I have yet to complete and install, actually, and I don't have to do anything at all (no acpi events, etc...).  At one point I came back and the laptop was froze, took no keyboard input, etc.. and had to power cycle it. When it came back up it was VERY hot and in fact complained of thermal sensor issues.  A second reboot made it all better (but still did not complete the install - I'm on try #4 now).  This is a machine that has run 9.0 through 10.0 in a rock solid fashion.

I'll try to provide whatever information I can.
I'll also try install #4 with acpi off and with noapic.
Comment 24 Jon Nelson 2006-03-04 03:48:51 UTC
I forgot to mention that this is a Mobile Celeron at 1.6GHz and is NOT a 64 bit CPU.
Comment 25 Andreas Kleen 2006-03-04 11:33:03 UTC
Just try nolapic
Comment 26 Jon Nelson 2006-03-04 13:40:35 UTC
I got a successful install with acpi off and noapic. I'll try again with just nolapic.
Comment 27 Andreas Kleen 2006-03-04 14:03:06 UTC
Best you add dmidecode output in case we chose to go the blacklist route.
Comment 28 Jon Nelson 2006-03-04 20:29:35 UTC
Is there a possible relationship between this and bug 154709 ?
Comment 29 Andreas Kleen 2006-03-06 22:00:35 UTC
Ah I see what the problem is. Thanks to Len for hitting me with the cluehammer.

The automatic APIC enable code was too aggressive and forced the apic,
which would overwrite the BIOS setting. Bad thing.



Comment 30 Thomas Renninger 2006-03-07 05:58:27 UTC
Yes, I also saw it... Just didn't come to it yesterday.
To be honest it's rather obvious if you have dmesg in front of your eyes...:
"Local APIC disabled by BIOS -- reenabling."
Ahh, you already fixed it?
That's exactly how I just wanted to modify it ... inverting the dmi_enable_apic() assignement to disable if it's too old instead of forcing it to be enabled if it's newer (overriding the BIOS "no apic" flag).
Tested ... ohh does not work ... ahh the apic=debug clashes with the apic option reenabling local apic. Andi can we just delete the apic boot option?
Tested again ... Works ... Thanks.
Comment 31 Andreas Kleen 2006-03-07 11:07:18 UTC
Ah i fixed these command line parsing bugs for x86-64, but not for 32bit.
apic is still needed for the old machines. But I can fix it.
Comment 32 Stefan Behlert 2006-03-07 17:18:55 UTC
Thanks Andi!
Comment 33 Andreas Kleen 2006-03-08 14:02:50 UTC
Command line parsing for apic option fixed. A couple of other options
have the same problem, but i won't fix that for CODE10
Comment 34 Timo Hoenig 2006-03-09 12:42:29 UTC
DELL D600 works fine with Beta 7.
Comment 35 Thomas Renninger 2006-03-09 13:40:51 UTC
*** Bug 156115 has been marked as a duplicate of this bug. ***