Bug 113886

Summary: suspend to disk with SMP: only with noapic
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Forgotten User ZhJd0F0L3x <forgotten_ZhJd0F0L3x>
Component: KernelAssignee: Pavel Machek <pavel>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: behlert, markus.walser
Version: Beta 3   
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard:
Found By: Component Test Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Image of kernel panic

Description Forgotten User ZhJd0F0L3x 2005-08-29 15:10:18 UTC
suspend to disk (Toshiba P10-554, init=/bin/bash) only works with "noapic",
otherwise it is stuck in "booting cpu #1" ad then panics "not enough cpus" -- a
funny one ;-)

Do you need a screenshot?

It works with noapic
Comment 1 Forgotten User ZhJd0F0L3x 2005-08-29 15:16:21 UTC
interesting (from dmesg after resume): The APIC error. "noapic", remember? ;-)

Restarting tasks... done
Thawing cpus ...
Booting processor 1/1 eip 3000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5586.45 BogoMIPS (lpj=11172906)
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400
00000000 00000000
CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 00004400
00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbff 00000000 00000000 00000080 00004400
00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
CPU1: Thermal monitoring enabled
CPU1: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 09
APIC error on CPU1: 00(40)
CPU1 is up

Comment 2 Pavel Machek 2005-08-29 22:33:01 UTC
Can you try if the same problem happens with suspend-to-RAM? That way we may get
Intel people to fix it... 

APIC error after you used noapic is interesting... There are actually two apics
(OpenAPIC and LocalAPIC), the second one can be turned off with nolapic IIRC.
That may make message go away.
Comment 3 Forgotten User ZhJd0F0L3x 2005-08-31 15:46:07 UTC
Holger will get the machine, so he will check this ;-)
Comment 4 Holger Macht 2005-09-01 11:47:57 UTC
How so often: Suspend to ram suspends but doesn"t resume. Screen remains black,
no keyboard action possible. Powersave logs say that it resumed correctly.

nolapic (if spelled correctly) does not prevent the apic error to be shown.
Comment 5 Forgotten User ZhJd0F0L3x 2005-09-01 11:57:31 UTC
suspend to ram might need the "apcpi_sleep=s3_bios",
"acpi_sleep=s3_bios,s3_mode" or even the binary NVidia X driver (and the README
at http://www.susewiki.org/index.php?title=Suspend_NVidia_HOWTO) to get the
display back. If powersave logs say that it resumed correctly, it probably has.
Can you trigger a clean shutdown with the powerbutton after resume? Network access?
Comment 6 Forgotten User ZhJd0F0L3x 2005-09-01 11:58:57 UTC
in /usr/src/linux/Documentation/power/video.txt the P10-554 is mentioned as

Toshiba Satellite P10-554       s3_bios,s3_mode (4)(****)
(****) Not with SMP kernel, UP only.

but the (****) may be outdated.
Comment 7 Holger Macht 2005-09-02 08:08:54 UTC
So it works even with running X server with "acpi_sleep=s3_bios,s3_mode noapic".
And suprisingly also with smp kernel!

Error message
  APIC error on CPU1: 00(40)
  CPU1 is up
is also shown.

With apic enabled and init=/bin/bash during resume:

Restarting tasks... done
Thawing cpus ...
Booting processor 1/1 eip 3000
Initializing CPU#1
Stuck ??
Inquiring remote APIC #1...
... APIC #1 ID: failed
... APIC #1 VERSION: failed
... APIC #1 SPIV: failed
Error taking cpu 1 up: -22
Kernel panic - not syncing: Not enough cpus
Comment 8 Pavel Machek 2005-09-02 08:13:48 UTC
BTW andi is pushing for noapic to be the default, so this issue may go away
automagically :-).
Comment 9 Forgotten User ZhJd0F0L3x 2005-09-02 09:17:48 UTC
but only on UP, so no, it won't help us :-)
Comment 10 Holger Macht 2005-09-02 09:32:42 UTC
No, as mentioned in comment #7, it works also with smp kernel and noapic. So it
would help. Or did I misunderstood you?
Comment 11 Forgotten User ZhJd0F0L3x 2005-09-02 09:40:32 UTC
only UP will get noapic, SMP will not IIUC
Comment 12 Pavel Machek 2005-09-06 21:08:46 UTC
So, suspend-to-RAM works even in SMP+apic mode, suspend-to-disk does not.  I'll
have to try to reproduce it here. Yes, screenshot of the problem would be nice.
Comment 13 Pavel Machek 2005-11-29 09:44:25 UTC
I'll need more info on this one. I tried suspending on SMP machine here, and it works in all modes I tried -- UP/noapic, UP/apic and SMP/apic. It was with pretty recent vanilla kernel, but I do not think this area changed much.
Comment 14 Pavel Machek 2006-01-11 10:23:00 UTC
*** Bug 134548 has been marked as a duplicate of this bug. ***
Comment 15 Forgotten User ZhJd0F0L3x 2006-02-13 17:56:34 UTC
Holger has the machine
Comment 16 Holger Macht 2006-03-22 09:19:42 UTC
I attach an image of the freeze. Picture done with SUSE kernel 2.6.16-2-smp and init=/bin/bash. The kernel panic appears after reloading the image. No 'noapic' given.

BTW: s2ram has exactly the same error. If giving noapic as boot parameter s2disk and s2ram work good if acpi_sleep=s3_bios,s3_mode is given (s2ram -a 3).
Comment 17 Holger Macht 2006-03-22 09:30:17 UTC
Created attachment 74375 [details]
Image of kernel panic
Comment 18 Pavel Machek 2006-03-22 09:50:21 UTC
Thanks...

...we try to kick second CPU but fail for some reason. We could actually continue in this case (not panic) -- one CPU is enough :-). Backtrace above is innocent.

Could you try to emulate hotplug with echo > /sys/... ? That should produce failure, too. Then, I'm afraid this needs to go to bugzilla.kernel.org, so that Intel/IBM people take a look. This is more problem with underlying cpu hotplug subsystem than with suspend... (And intel people actually care about suspend to ram).
Comment 19 Holger Macht 2006-03-22 10:14:24 UTC
I already tried CPU hotplugging. It works as expected. I can enable and disable cpu1 as often as I like.
Comment 20 Holger Macht 2006-03-22 15:09:31 UTC
Bug filed upstrea:
http://bugzilla.kernel.org/show_bug.cgi?id=6270
Comment 21 Pavel Machek 2006-03-27 09:34:57 UTC
Thanks, lets see if Intel people can debug it. What is the bugzilla state for "being solved in another bugzilla"? ;-)
Comment 22 Holger Macht 2006-03-27 09:38:19 UTC
Not sure either. But WONTFIX seems to be fit most, because you actually won't fix it in regard to this bug ;-)
Comment 23 Pavel Machek 2006-03-27 09:44:41 UTC
Ok, agreed. Hopefully intel people can solve it.