Bug 153062 - T41p does not power down properly on suspend-to-disk
Summary: T41p does not power down properly on suspend-to-disk
Status: RESOLVED FIXED
: 153682 156053 (view as bug list)
Alias: None
Product: SUSE Linux 10.1
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 5
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Thomas Renninger
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-23 11:53 UTC by Christoph Thiel
Modified: 2006-03-08 17:58 UTC (History)
6 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Get rid of the slab debug errors (2.58 KB, patch)
2006-03-07 16:00 UTC, Thomas Renninger
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Christoph Thiel 2006-02-23 11:53:08 UTC
Performing a suspend-to-disk, my T41p actuall does the suspend, but keeps the power up and running while the little moon is blinking. Pressing the power-off button for a while, turns it off. Resuming afterwards works.

What kind of logs/debug out do you need?
Comment 1 Thomas Renninger 2006-02-23 13:57:53 UTC
As far as I know the Ferrari does also not switch off?
Do you know about this Pavel?
Comment 2 Bodo Bauer 2006-02-23 14:11:12 UTC
Correct, the Ferrari 4000 doesn't power down. However I
'm not sure if this only related to suspend. If I use gfxmenu in grub, it doesn't power off on a regular init 0 either, you need to disable gfxmenu.

For suspend however it doesn't make a difference if gfxmenu is enabled or not...
Comment 3 Pavel Machek 2006-02-23 15:59:41 UTC
T41p is very different machine from ferrari. Blinking moon means you are using platform mode... could you use shutdown mode, instead? (cd /sys/power; echo shutdown > disk; echo disk > state; see what happens).
Comment 4 Christoph Thiel 2006-02-23 16:18:42 UTC
Using shutdown mode instead of platform works OK.
Comment 5 Christoph Thiel 2006-02-23 20:41:26 UTC
Gerald, can you reproduce this bug on your T41p?
Comment 6 Gerald Pfeifer 2006-02-23 22:00:09 UTC
Yes, I can reproduce.  (That is, I haven't been able to boot that machine
regularly, but from safe mode, once I added resume=/dev/hda5 to the grub
command line, suspend seemed to work, the machine did not power off, though.)
Comment 7 Pavel Machek 2006-02-27 09:44:18 UTC
Ok, what do we do here. Platform mode is probably broken by ACPI interpretter code; could be solved by asking Intel people in bugzilla.suse.de (or perhaps just find out where it hangs, serial console or something). Or just work around it by using shutdown mode instead...
Comment 8 Gerald Pfeifer 2006-02-27 10:18:15 UTC
Thomas, Seife, Stefan, do you you contacts at Intel to help?

Rolla, Scott, can you help please?
Comment 9 Thomas Renninger 2006-02-27 13:00:14 UTC
Can you attach dmesg (or better grep /var/log/messages whether ACPI functions failed during shutdown) and acpidump output please.
Pavel: what kind of ACPI funcs are additionally invoked in platform mode?
I know the _WAK func is additionally called when resuming, AFAIK there isn't additional ACPI stuff called when powering off?

Hmm, there was a:
+       local_irq_save(flags);
added from 10.0 to current, without a local_irq_restore(..).
in dev_t swsusp_resume_device (kernel/power/disk.c). I wonder what sense that makes, no idea.
Comment 10 Thomas Renninger 2006-02-27 13:01:43 UTC
> AFAIK there isn't additional ACPI stuff called when powering off?
Oh it seems as if all capable devices are enumerated and D3 is invoked in platform mode, probably here it hangs or something goes wrong?
Comment 11 Christoph Thiel 2006-02-27 15:54:58 UTC
Feb 27 16:49:28 t41p kernel: Stopping tasks: ===========================================================================|
Feb 27 16:49:28 t41p kernel: Shrinking memory... done (78738 pages freed)
Feb 27 16:49:28 t41p kernel: pnp: Device 00:0b disabled.
Feb 27 16:49:28 t41p kernel: pnp: Device 00:0a disabled.
Feb 27 16:49:28 t41p kernel: [fglrx:drm_alloc] *ERROR* [buflist] Allocating 0 bytes
Feb 27 16:49:28 t41p kernel: ................................swsusp: Need to copy 94916 pages
Feb 27 16:49:28 t41p kernel: Intel machine check architecture supported.
Feb 27 16:49:28 t41p kernel: Intel machine check reporting enabled on CPU#0.
Feb 27 16:49:28 t41p kernel: swsusp: Restoring Highmem
Feb 27 16:49:28 t41p kernel: Debug: sleeping function called from invalid context at mm/slab.c:2666
Feb 27 16:49:28 t41p kernel: in_atomic():0, irqs_disabled():1
Feb 27 16:49:28 t41p kernel:  [<c01cf09b>] acpi_os_acquire_object+0xb/0x36
Feb 27 16:49:28 t41p kernel:  [<c014d4d7>] kmem_cache_alloc+0x20/0x7c
Feb 27 16:49:28 t41p kernel:  [<c01cf09b>] acpi_os_acquire_object+0xb/0x36
Feb 27 16:49:28 t41p kernel:  [<c01e45b1>] acpi_ut_allocate_object_desc_dbg+0x10/0x3e
Feb 27 16:49:28 t41p kernel:  [<c01e45f4>] acpi_ut_create_internal_object_dbg+0x15/0x68
Feb 27 16:49:28 t41p kernel:  [<c01e0911>] acpi_rs_set_srs_method_data+0x3d/0xb7
Feb 27 16:49:28 t41p kernel:  [<c014c8e1>] cache_alloc_debugcheck_after+0xb8/0xea
Feb 27 16:49:28 t41p kernel:  [<c01e7d03>] acpi_pci_link_set+0x40/0x1c0
Feb 27 16:49:28 t41p kernel:  [<c01e7dbc>] acpi_pci_link_set+0xf9/0x1c0
Feb 27 16:49:28 t41p kernel:  [<c01e7ed3>] irqrouter_resume+0x50/0x6f
Feb 27 16:49:28 t41p kernel:  [<c020a7df>] __sysdev_resume+0x11/0x53
Feb 27 16:49:28 t41p kernel:  [<c020a91f>] sysdev_resume+0x16/0x47
Feb 27 16:49:28 t41p kernel:  [<c020e85a>] device_power_up+0x5/0xa
Feb 27 16:49:29 t41p kernel:  [<c012f46a>] swsusp_suspend+0x6b/0x85
Feb 27 16:49:29 t41p kernel:  [<c01302a0>] pm_suspend_disk+0x44/0xd3
Feb 27 16:49:29 t41p kernel:  [<c012e9c0>] enter_state+0x50/0x16c
Feb 27 16:49:29 t41p kernel:  [<c012eb64>] state_store+0x88/0x95
Feb 27 16:49:29 t41p kernel:  [<c012eadc>] state_store+0x0/0x95
Feb 27 16:49:29 t41p kernel:  [<c01833f6>] subsys_attr_store+0x1e/0x22
Feb 27 16:49:29 t41p kernel:  [<c01834f7>] sysfs_write_file+0x9b/0xc1
Feb 27 16:49:29 t41p kernel:  [<c018345c>] sysfs_write_file+0x0/0xc1
Feb 27 16:49:29 t41p kernel:  [<c01503a1>] vfs_write+0xa1/0x146
Feb 27 16:49:29 t41p kernel:  [<c01508b7>] sys_write+0x3c/0x63
Feb 27 16:49:29 t41p kernel:  [<c0102ab9>] syscall_call+0x7/0xb
Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:00:1d.7[D] -> Link [LNKH] -> GSI 11 (level, low) -> IRQ 11
Feb 27 16:49:29 t41p kernel: PCI: Setting latency timer of device 0000:00:1d.7 to 64
Feb 27 16:49:29 t41p kernel: usb usb4: root hub lost power or was reset
Feb 27 16:49:29 t41p kernel: ehci_hcd 0000:00:1d.7: debug port 1
Feb 27 16:49:29 t41p kernel: PCI: cache line size of 32 is not supported by device 0000:00:1d.7
Feb 27 16:49:29 t41p kernel: ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
Feb 27 16:49:29 t41p kernel: PCI: Setting latency timer of device 0000:00:1e.0 to 64
Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11
Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:00:1f.5[B] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11
Feb 27 16:49:29 t41p kernel: PCI: Setting latency timer of device 0000:00:1f.5 to 64
Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
Feb 27 16:49:29 t41p kernel: [fglrx:drm_free] *ERROR* [buflist] Attempt to free NULL pointer
Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:02:00.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:02:00.1[B] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11
Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:02:01.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
Feb 27 16:49:30 t41p kernel: pnp: Device 00:07 does not supported activation.
Feb 27 16:49:30 t41p kernel: pnp: Device 00:08 does not supported activation.
Feb 27 16:49:30 t41p kernel: pnp: Device 00:0a activated.
Feb 27 16:49:30 t41p kernel: pnp: Device 00:0b activated.
Feb 27 16:49:30 t41p kernel: ACPI Exception (acpi_bus-0072): AE_NOT_FOUND, No context for object [dffdf640] [20060127]
Feb 27 16:49:30 t41p kernel: Restarting tasks... done
Comment 12 Christoph Thiel 2006-02-27 15:55:59 UTC
That's what I get when I suspend to disk in platform mode.
Comment 13 Thomas Renninger 2006-02-28 18:30:28 UTC
I changed:
	void *object = kmem_cache_alloc(cache, GFP_KERNEL);
to
	void *object = kmem_cache_alloc(cache, GFP_ATOMIC);
drivers/acpi/osl.c (1255).
This should make it work, however I am not sure whether this is a good idea.
Still evaluating ..., Pavel?

I started test builds for default/smp for i386/x86_64:
mbuild -q stravinsky-trenn-107 and mbuild -q stravinsky-trenn-108

Please give it a try...
Comment 14 Christoph Thiel 2006-03-01 15:22:11 UTC
Thomas, unfortunately your mbuild kernels don't fix the problem ;(
Comment 15 Thomas Renninger 2006-03-02 08:56:14 UTC
I discussed the problem with Hannes who also run into the problem.
The slab error that the kernel from comment #13 should fix seem to be unrelated.

PIC is used instead of APIC on this machine? According to Hannes ACPI PIC initialisation does not work perfectly here causing this. Digging a bit it seems not all devices have "resource data" attached, therefore searching for it for one device goes out with "AE_NOT_FOUND", suspend code assumes something went wrong and better aborts suspending.

Have you booted with noapic? If yes, please try without and if not, it seems noapic still seem to be default, you might want to try using apic boot parameter.

We should also try to find the bug in ACPI PIC code as noapic is an often used work around for a lot machines:
Could you install a debug kernel and try with:
log_buf_len=33554432 acpi_dbg_level=0x1001f
also increase the dmesg buffer before in /etc/init.d/boot.klogd and replace the line:
       /bin/dmesg -s16384 > /var/log/boot.msg
with
       /bin/dmesg -s33554432 > /var/log/boot.msg
This increases dmesg buffer to 32MB which should hopefully sufficient to log all ACPI debug messages. Please post dmesg and acpidump output then.

Hmmm, Christoph could I have the machine for a day, this probably would be easier?
Comment 17 Holger Macht 2006-03-03 14:49:10 UTC
*** Bug 153682 has been marked as a duplicate of this bug. ***
Comment 18 Holger Macht 2006-03-03 14:50:29 UTC
Also present on T42p. "shutdown" method helps.
Comment 19 Stefan Behlert 2006-03-03 14:54:07 UTC
Ok, we have a T42p currently available, Thomas, that we can give you.
Does "shutdown" help on the T41p, too?
Comment 20 Christoph Thiel 2006-03-03 15:42:10 UTC
Stefan, shutdown mode does work on my T41p (as stated above).
Comment 21 Christoph Thiel 2006-03-06 08:36:16 UTC
Thomas, do you still need a laptop for testing?
Comment 22 Thomas Renninger 2006-03-07 16:00:03 UTC
Created attachment 71593 [details]
Get rid of the slab debug errors

Pavel could you please review this one.
Not sure whether in_atomic already includes in_interrupt...
Also not sure how this could be optimised somehow, checking three flags on each ACPI mem/mutex access does not look like nice.
Comment 23 Thomas Renninger 2006-03-08 07:48:10 UTC
Hmmm, with the newest kernel I can suspend.
This either got fixed in mainline or it only happens when one of the kernel module packages are active (this would be nasty to debug...).

I checked in a fix for the slab errors.
I set this one to fixed for now, please verify if the kotd (mount/dist/kerneltest/...) works for you and reopen if not:
rpm -qp --changelog *.rpm should include:
Wed Mar  8 08:46:12 CET 2006 - trenn@suse.de

- patches.arch/acpi_set_resources_atomic_alloc.patch: Delete.
- patches.fixes/acpi_osl_atomics.patch: No GFP_KERNEL buffer
  alloc or mutexes if resuming (153062).
Comment 24 Forgotten User ZhJd0F0L3x 2006-03-08 08:00:13 UTC
not powering down could be related to the APIC screwup.
Christoph, can you check with an old kernel, but "noapic nolapic" boot parameters?
Comment 25 Christoph Thiel 2006-03-08 08:32:08 UTC
Seife, with the old kernel that I'v running on my laptop (beta6 something) + using the "noapic nolapic" boot paramters, it actually works.
Comment 26 Thomas Renninger 2006-03-08 08:55:30 UTC
Great. 
Andi already checked in a fix for that. That is why it powered down on my system. So now also the slab errors should be gone.
Comment 27 Forgotten User ZhJd0F0L3x 2006-03-08 09:07:20 UTC
there were no slab corruption errors in this bug.
There is a (known, harmless) "sleeping while atomic" warning that will go away once we disable the debug options anyway and IIRC upstream consensus was that slowing down acpi_os_allocate was a bad idea (and i think the "acpi_in_resume" part already took care of that...)
So if this "fix" is not upstream, you should discuss it at least on kernel@suse IMO.
Comment 28 Stefan Behlert 2006-03-08 17:58:40 UTC
*** Bug 156053 has been marked as a duplicate of this bug. ***