Bugzilla – Bug 153062
T41p does not power down properly on suspend-to-disk
Last modified: 2006-03-08 17:58:40 UTC
Performing a suspend-to-disk, my T41p actuall does the suspend, but keeps the power up and running while the little moon is blinking. Pressing the power-off button for a while, turns it off. Resuming afterwards works. What kind of logs/debug out do you need?
As far as I know the Ferrari does also not switch off? Do you know about this Pavel?
Correct, the Ferrari 4000 doesn't power down. However I 'm not sure if this only related to suspend. If I use gfxmenu in grub, it doesn't power off on a regular init 0 either, you need to disable gfxmenu. For suspend however it doesn't make a difference if gfxmenu is enabled or not...
T41p is very different machine from ferrari. Blinking moon means you are using platform mode... could you use shutdown mode, instead? (cd /sys/power; echo shutdown > disk; echo disk > state; see what happens).
Using shutdown mode instead of platform works OK.
Gerald, can you reproduce this bug on your T41p?
Yes, I can reproduce. (That is, I haven't been able to boot that machine regularly, but from safe mode, once I added resume=/dev/hda5 to the grub command line, suspend seemed to work, the machine did not power off, though.)
Ok, what do we do here. Platform mode is probably broken by ACPI interpretter code; could be solved by asking Intel people in bugzilla.suse.de (or perhaps just find out where it hangs, serial console or something). Or just work around it by using shutdown mode instead...
Thomas, Seife, Stefan, do you you contacts at Intel to help? Rolla, Scott, can you help please?
Can you attach dmesg (or better grep /var/log/messages whether ACPI functions failed during shutdown) and acpidump output please. Pavel: what kind of ACPI funcs are additionally invoked in platform mode? I know the _WAK func is additionally called when resuming, AFAIK there isn't additional ACPI stuff called when powering off? Hmm, there was a: + local_irq_save(flags); added from 10.0 to current, without a local_irq_restore(..). in dev_t swsusp_resume_device (kernel/power/disk.c). I wonder what sense that makes, no idea.
> AFAIK there isn't additional ACPI stuff called when powering off? Oh it seems as if all capable devices are enumerated and D3 is invoked in platform mode, probably here it hangs or something goes wrong?
Feb 27 16:49:28 t41p kernel: Stopping tasks: ===========================================================================| Feb 27 16:49:28 t41p kernel: Shrinking memory... done (78738 pages freed) Feb 27 16:49:28 t41p kernel: pnp: Device 00:0b disabled. Feb 27 16:49:28 t41p kernel: pnp: Device 00:0a disabled. Feb 27 16:49:28 t41p kernel: [fglrx:drm_alloc] *ERROR* [buflist] Allocating 0 bytes Feb 27 16:49:28 t41p kernel: ................................swsusp: Need to copy 94916 pages Feb 27 16:49:28 t41p kernel: Intel machine check architecture supported. Feb 27 16:49:28 t41p kernel: Intel machine check reporting enabled on CPU#0. Feb 27 16:49:28 t41p kernel: swsusp: Restoring Highmem Feb 27 16:49:28 t41p kernel: Debug: sleeping function called from invalid context at mm/slab.c:2666 Feb 27 16:49:28 t41p kernel: in_atomic():0, irqs_disabled():1 Feb 27 16:49:28 t41p kernel: [<c01cf09b>] acpi_os_acquire_object+0xb/0x36 Feb 27 16:49:28 t41p kernel: [<c014d4d7>] kmem_cache_alloc+0x20/0x7c Feb 27 16:49:28 t41p kernel: [<c01cf09b>] acpi_os_acquire_object+0xb/0x36 Feb 27 16:49:28 t41p kernel: [<c01e45b1>] acpi_ut_allocate_object_desc_dbg+0x10/0x3e Feb 27 16:49:28 t41p kernel: [<c01e45f4>] acpi_ut_create_internal_object_dbg+0x15/0x68 Feb 27 16:49:28 t41p kernel: [<c01e0911>] acpi_rs_set_srs_method_data+0x3d/0xb7 Feb 27 16:49:28 t41p kernel: [<c014c8e1>] cache_alloc_debugcheck_after+0xb8/0xea Feb 27 16:49:28 t41p kernel: [<c01e7d03>] acpi_pci_link_set+0x40/0x1c0 Feb 27 16:49:28 t41p kernel: [<c01e7dbc>] acpi_pci_link_set+0xf9/0x1c0 Feb 27 16:49:28 t41p kernel: [<c01e7ed3>] irqrouter_resume+0x50/0x6f Feb 27 16:49:28 t41p kernel: [<c020a7df>] __sysdev_resume+0x11/0x53 Feb 27 16:49:28 t41p kernel: [<c020a91f>] sysdev_resume+0x16/0x47 Feb 27 16:49:28 t41p kernel: [<c020e85a>] device_power_up+0x5/0xa Feb 27 16:49:29 t41p kernel: [<c012f46a>] swsusp_suspend+0x6b/0x85 Feb 27 16:49:29 t41p kernel: [<c01302a0>] pm_suspend_disk+0x44/0xd3 Feb 27 16:49:29 t41p kernel: [<c012e9c0>] enter_state+0x50/0x16c Feb 27 16:49:29 t41p kernel: [<c012eb64>] state_store+0x88/0x95 Feb 27 16:49:29 t41p kernel: [<c012eadc>] state_store+0x0/0x95 Feb 27 16:49:29 t41p kernel: [<c01833f6>] subsys_attr_store+0x1e/0x22 Feb 27 16:49:29 t41p kernel: [<c01834f7>] sysfs_write_file+0x9b/0xc1 Feb 27 16:49:29 t41p kernel: [<c018345c>] sysfs_write_file+0x0/0xc1 Feb 27 16:49:29 t41p kernel: [<c01503a1>] vfs_write+0xa1/0x146 Feb 27 16:49:29 t41p kernel: [<c01508b7>] sys_write+0x3c/0x63 Feb 27 16:49:29 t41p kernel: [<c0102ab9>] syscall_call+0x7/0xb Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:00:1d.7[D] -> Link [LNKH] -> GSI 11 (level, low) -> IRQ 11 Feb 27 16:49:29 t41p kernel: PCI: Setting latency timer of device 0000:00:1d.7 to 64 Feb 27 16:49:29 t41p kernel: usb usb4: root hub lost power or was reset Feb 27 16:49:29 t41p kernel: ehci_hcd 0000:00:1d.7: debug port 1 Feb 27 16:49:29 t41p kernel: PCI: cache line size of 32 is not supported by device 0000:00:1d.7 Feb 27 16:49:29 t41p kernel: ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 Feb 27 16:49:29 t41p kernel: PCI: Setting latency timer of device 0000:00:1e.0 to 64 Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11 Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:00:1f.5[B] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11 Feb 27 16:49:29 t41p kernel: PCI: Setting latency timer of device 0000:00:1f.5 to 64 Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 Feb 27 16:49:29 t41p kernel: [fglrx:drm_free] *ERROR* [buflist] Attempt to free NULL pointer Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:02:00.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:02:00.1[B] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11 Feb 27 16:49:29 t41p kernel: ACPI: PCI Interrupt 0000:02:01.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 Feb 27 16:49:30 t41p kernel: pnp: Device 00:07 does not supported activation. Feb 27 16:49:30 t41p kernel: pnp: Device 00:08 does not supported activation. Feb 27 16:49:30 t41p kernel: pnp: Device 00:0a activated. Feb 27 16:49:30 t41p kernel: pnp: Device 00:0b activated. Feb 27 16:49:30 t41p kernel: ACPI Exception (acpi_bus-0072): AE_NOT_FOUND, No context for object [dffdf640] [20060127] Feb 27 16:49:30 t41p kernel: Restarting tasks... done
That's what I get when I suspend to disk in platform mode.
I changed: void *object = kmem_cache_alloc(cache, GFP_KERNEL); to void *object = kmem_cache_alloc(cache, GFP_ATOMIC); drivers/acpi/osl.c (1255). This should make it work, however I am not sure whether this is a good idea. Still evaluating ..., Pavel? I started test builds for default/smp for i386/x86_64: mbuild -q stravinsky-trenn-107 and mbuild -q stravinsky-trenn-108 Please give it a try...
Thomas, unfortunately your mbuild kernels don't fix the problem ;(
I discussed the problem with Hannes who also run into the problem. The slab error that the kernel from comment #13 should fix seem to be unrelated. PIC is used instead of APIC on this machine? According to Hannes ACPI PIC initialisation does not work perfectly here causing this. Digging a bit it seems not all devices have "resource data" attached, therefore searching for it for one device goes out with "AE_NOT_FOUND", suspend code assumes something went wrong and better aborts suspending. Have you booted with noapic? If yes, please try without and if not, it seems noapic still seem to be default, you might want to try using apic boot parameter. We should also try to find the bug in ACPI PIC code as noapic is an often used work around for a lot machines: Could you install a debug kernel and try with: log_buf_len=33554432 acpi_dbg_level=0x1001f also increase the dmesg buffer before in /etc/init.d/boot.klogd and replace the line: /bin/dmesg -s16384 > /var/log/boot.msg with /bin/dmesg -s33554432 > /var/log/boot.msg This increases dmesg buffer to 32MB which should hopefully sufficient to log all ACPI debug messages. Please post dmesg and acpidump output then. Hmmm, Christoph could I have the machine for a day, this probably would be easier?
*** Bug 153682 has been marked as a duplicate of this bug. ***
Also present on T42p. "shutdown" method helps.
Ok, we have a T42p currently available, Thomas, that we can give you. Does "shutdown" help on the T41p, too?
Stefan, shutdown mode does work on my T41p (as stated above).
Thomas, do you still need a laptop for testing?
Created attachment 71593 [details] Get rid of the slab debug errors Pavel could you please review this one. Not sure whether in_atomic already includes in_interrupt... Also not sure how this could be optimised somehow, checking three flags on each ACPI mem/mutex access does not look like nice.
Hmmm, with the newest kernel I can suspend. This either got fixed in mainline or it only happens when one of the kernel module packages are active (this would be nasty to debug...). I checked in a fix for the slab errors. I set this one to fixed for now, please verify if the kotd (mount/dist/kerneltest/...) works for you and reopen if not: rpm -qp --changelog *.rpm should include: Wed Mar 8 08:46:12 CET 2006 - trenn@suse.de - patches.arch/acpi_set_resources_atomic_alloc.patch: Delete. - patches.fixes/acpi_osl_atomics.patch: No GFP_KERNEL buffer alloc or mutexes if resuming (153062).
not powering down could be related to the APIC screwup. Christoph, can you check with an old kernel, but "noapic nolapic" boot parameters?
Seife, with the old kernel that I'v running on my laptop (beta6 something) + using the "noapic nolapic" boot paramters, it actually works.
Great. Andi already checked in a fix for that. That is why it powered down on my system. So now also the slab errors should be gone.
there were no slab corruption errors in this bug. There is a (known, harmless) "sleeping while atomic" warning that will go away once we disable the debug options anyway and IIRC upstream consensus was that slowing down acpi_os_allocate was a bad idea (and i think the "acpi_in_resume" part already took care of that...) So if this "fix" is not upstream, you should discuss it at least on kernel@suse IMO.
*** Bug 156053 has been marked as a duplicate of this bug. ***