Bug 113930 - suspend-to-disk without resume / error -12 suspending
Summary: suspend-to-disk without resume / error -12 suspending
Status: RESOLVED WONTFIX
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Mobile Devices (show other bugs)
Version: RC 1
Hardware: Other All
: P5 - None : Normal
Target Milestone: ---
Assignee: Pavel Machek
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-29 17:18 UTC by Kevin Ivory
Modified: 2005-09-28 08:25 UTC (History)
2 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
hwinfo.txt (287.60 KB, text/plain)
2005-08-29 17:18 UTC, Kevin Ivory
Details
powersave_logs.tar.gz (1.56 KB, application/x-gzip)
2005-08-29 17:19 UTC, Kevin Ivory
Details
messages.gz (27.59 KB, application/x-gzip)
2005-09-13 10:12 UTC, Kevin Ivory
Details
messages.gz (14.21 KB, application/x-gzip)
2005-09-13 10:43 UTC, Kevin Ivory
Details
This might help/provide better diagnostics. (1013 bytes, patch)
2005-09-14 12:37 UTC, Pavel Machek
Details | Diff
suspend.jpg (picture before reboot) (61.09 KB, image/jpeg)
2005-09-15 07:45 UTC, Kevin Ivory
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Ivory 2005-08-29 17:18:06 UTC
As requested by "most-annoying-bugs" page here my suspend-to-disk experience.
hwinfo.txt and powersave_logs.tar.gz will be attached after this.
My system is Yakumo Centrino 1400 MHz where I have never sucessfully used
suspend-to-disk before (the is no non-SUSE OS on it to test it with).
The system goes down faster than usual noting that it is writing to swap
and then switches off.
When switching on again, the system boots normally - no resume.
Comment 1 Kevin Ivory 2005-08-29 17:18:55 UTC
Created attachment 48047 [details]
hwinfo.txt
Comment 2 Kevin Ivory 2005-08-29 17:19:30 UTC
Created attachment 48048 [details]
powersave_logs.tar.gz
Comment 3 Forgotten User ZhJd0F0L3x 2005-08-29 18:55:14 UTC
try setting SUSPEND2DISK_SHUTDOWN_MODE="reboot" in
/etc/sysconfig/powersave/sleep and try again. The machine will not shut down but
reboot after suspend. If this works (resumes), then we have a disk shutdown
problem (not all data is written out before powerdown)
Comment 4 Kevin Ivory 2005-08-31 07:54:07 UTC
the system did not resume. I called powersave_logs after that once more. Is it
needed?
Comment 5 Pavel Machek 2005-08-31 08:22:14 UTC
I'd need kernel messages just before the shutdown... Serial consore or digital
camera...
Comment 6 Forgotten User ZhJd0F0L3x 2005-08-31 09:27:07 UTC
maybe we could try to inspect the swap partition from the rescue system after
"suspend", try the following:

- suspend (with SHUTDOWN_MODE="platform" or "shutdown", not "reboot")
- boot the rescue system from CD
- try "swapon /dev/hda1"

if this swapon succeeds, something during suspend went wrong (the image was not
correctly / completely written). If this swapon fails, something during resume
goes wrong (resume from initramfs)
Comment 7 Kevin Ivory 2005-08-31 09:35:11 UTC
I will collect all suggestions and act this evening. (I am taking a null modem
cable home as well).
Comment 8 Kevin Ivory 2005-09-01 06:26:21 UTC
my notebook doesn't have a serial interface and I didn't have a camera available.

I am sorry I cannot provide the info at the moment since I installed Beta 4 - I
was thinking, either the problem will still be there or not. Well, I didn't
think there might be a different problem altogether.
Now the system stops with what looks like a kernel oops (but doesn't say so).
These are some lines that I copied per hand:

swsusp: Need to copy 28628 pages
Error -12 suspending
--- [ cut here ]-----
kernel BUG at kernel/power/swsusp.c:905!
invalid operand: 0000 [#1]
Comment 9 Pavel Machek 2005-09-02 08:11:24 UTC
....swsusp failed with -ENOMEM, and then something went wrong in error handling
path.

Are you sure you had swap enabled before trying to suspend? I really need the
messages above that to know _why_ it failed with -ENOMEM.
Comment 10 Jens Benecke 2005-09-09 23:25:49 UTC
I don't know if this is related, but I had swsusp problems on a Pentium-M  
Centrino notebook with suse 9.3 as well. It would suspend ok, but on resume it  
would fail with "Stopping tasks: =" and then tell me it could not stop one  
task (kseriod), and then continue booting normally.  
 
With 10.0 RC1 both suspend and resume work, but it takes very long to resume.  
Most of the time is spent after the resume started, with a blank screen, and a 
constantly lit harddisk LED (I guess) swapping the running tasks back into 
RAM. But I had nothing big running, just a default KDE desktop with a couple 
konq windows.  
  
suspend: 42sec altogether 
resume: 60 sec altogether 
 
This works, but is not much faster than booting ... 
Comment 11 Forgotten User ZhJd0F0L3x 2005-09-10 07:33:27 UTC
(In reply to comment #10)
> I don't know if this is related, but I had swsusp problems on a Pentium-M  
> Centrino notebook with suse 9.3 as well. It would suspend ok, but on resume it  
> would fail with "Stopping tasks: =" and then tell me it could not stop one  
> task (kseriod), and then continue booting normally.  

this was (fixed) bug #74170, not sure if it is publicly accessible but should be
fixed for 10.0

> With 10.0 RC1 both suspend and resume work, but it takes very long to resume.  
> Most of the time is spent after the resume started, with a blank screen, and a 
> constantly lit harddisk LED (I guess) swapping the running tasks back into 
> RAM. But I had nothing big running, just a default KDE desktop with a couple 
> konq windows.  

I have seen this, too. Also suspending takes a relatively long time in "freeing
memory" so it might be a swapping performance regression, but has nothing to do
with this bug in general.

> This works, but is not much faster than booting ... 

So it works => different bug :-)
Comment 12 Forgotten User ZhJd0F0L3x 2005-09-10 07:39:22 UTC
i created bug #116313 for the possible swap performance problem
Comment 13 Kevin Ivory 2005-09-13 08:34:08 UTC
I have my notebook at work today. The "kernel BUG" message is still there in RC-1.
swap was active (at least it was shown in "free")
All messages available before [cut here] line:
Stopping tasks: ===...
Freeing memory... done (25768 pages freed)
ACPI: PCI interrupt for device 0000:01:04.0 disabled
ACPI: PCI interrupt for device 0000:01:01.0 disabled
ACPI: PCI interrupt for device 0000:00:1f.5 disabled
ACPI: PCI interrupt for device 0000:00:1d.7 disabled
    ACPI-0212: *** Warning: Device is not power manageable
swsusp: Need to copy 28717 pages
Error -12 suspending
Comment 14 Kevin Ivory 2005-09-13 10:12:56 UTC
Created attachment 49748 [details]
messages.gz

Several ACPI messages are also visible via syslog. Attaching /var/log/messages
(.gz)
As mentioned: I have the notebook here today - any other info needed is easily
obtained (sorry, no serial port available)
Comment 15 Kevin Ivory 2005-09-13 10:43:36 UTC
Created attachment 49749 [details]
messages.gz

attachment #49748 [details] was some old file, this one should be correct
Comment 16 Kevin Ivory 2005-09-13 10:46:06 UTC
forgot to click the "info provided" button
Comment 17 Pavel Machek 2005-09-14 10:29:56 UTC
ACPI: PCI interrupt for device 0000:00:1d.7 disabled
    ACPI-0212: *** Warning: Device is not power manageable

This is the key. Find out what device causes this message (probably using printk).
Comment 18 Kevin Ivory 2005-09-14 11:17:06 UTC
the device info is in the attachment hwinfo.txt, but here is output of lspci -v:
00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI
Controller (rev 03) (prog-if 20 [EHCI])
        Subsystem: Mitac: Unknown device 8080
        Flags: medium devsel, IRQ 7
        Memory at febff000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Debug port
Comment 19 Forgotten User ZhJd0F0L3x 2005-09-14 11:28:26 UTC
I am not so sure if this is the problem (i have those Warnings, too but suspend
works fine).
Try unloading ehci_hcd manually before suspend - if it works with unloaded
ehci_hcd, add it to SUSPEND2*_UNLOAD_MODULES in /etc/sysconfig/powersave/sleep.
You should read the comment above those variables for instructions.
Comment 20 Kevin Ivory 2005-09-14 12:10:54 UTC
after unloading ehci_hcd manually before suspend, the oops or kernel bug or
whatever still persists as above - but missing the two lines mentioned in
comment #17
Comment 21 Pavel Machek 2005-09-14 12:35:20 UTC
So you still get "Error -12 suspending"? The thing below is kernel BUG().
Similar to oops, but it happens because kernel explicitely tests for condition.

BUG() happens in swsusp_suspend, right? Hmm, that BUG_ON is actually wrong; if
arch_suspend fails, I could understand nr_copy_pages not matching.

Can you enable DEBUG in swsusp.c and see what happens? It fails somewhere in
suspend_prepare_image...
Comment 22 Pavel Machek 2005-09-14 12:37:12 UTC
Created attachment 49896 [details]
This might help/provide better diagnostics.

BTW do you have enough swap space available? (cat /proc/swaps)? Do you feel
like having enough memory for your workload? Are you using highmem?
Comment 23 Kevin Ivory 2005-09-14 15:01:24 UTC
Compiling a new kernel will take some time (long list of todos at work today).
For the other infos:
Swap is fine, all tests are done with a fresh system - nothing special running.
(KDE defaults + 2 konsoles). Memory is 223768, swap is 514040 (used 0 in
/proc/swaps)
I don't know if I am running highmem. This is a default SUSE 10.0 RC-1 install.
How do I find out?
Comment 24 Kevin Ivory 2005-09-14 17:36:29 UTC
With the patch, the kernel bug is gone. Now I have the original behaviour from
comment #1 back. With lots of debug output shortly before reboot. I'll have to
find someone with a digital camera tomorrow.
Comment 25 Kevin Ivory 2005-09-15 07:45:04 UTC
Created attachment 49991 [details]
suspend.jpg (picture before reboot)

here the quite unfocused screenshot. With a little good will almost everything
is decipherable ;-)
Comment 26 Forgotten User ZhJd0F0L3x 2005-09-15 10:01:41 UTC
can you try it with init=/bin/bash as described in
http://www.susewiki.org/index.php?title=ACPI_suspend ? It may still be some
device driver although i somehow doubt it :-(
Comment 27 Pavel Machek 2005-09-15 10:21:26 UTC
The system seems to write data to the swap okay. Are you sure you have just one
swap partition, resume=XXX on command line, etc? Is your swap signature damaged
after reboot? Are you booting right kernel?
Comment 28 Kevin Ivory 2005-09-15 11:42:23 UTC
Now I see progress and success!

ad comment #27: yes, only one swap partition; correct resume line.
booting right kernel: actually: no; the default was still the official RC-1
kernel, for the suspend-patched kernel manual interaction was needed.
Now the suspend-patched kernel is my default.

ad comment #26: the init=/bin/bash method works,
after ensuring that suspend-patched kernel is default both the commandline
"echo disk > /sys/power/state" and the KDE version (right mouse click on the
KPowersave plug-button, click on "Suspend to Disk") works!

Now, how do we continue? (what part of that suspend patch is really needed and
will it be the 10.0 release)
Comment 29 Pavel Machek 2005-09-19 13:49:50 UTC
Well, the patch only fixed error handling, AFAICT. Can you revert it to see if
bug reappears? I think you somehow fixed the core problem in the meantime.
Comment 30 Kevin Ivory 2005-09-19 14:22:57 UTC
Reverting will bring us back to comment #13. That is the official RC-1 version
with default kernel and shows me the kernel BUG.
[My notebook is at home and I will bring it to work tomorrow.]
Comment 31 Pavel Machek 2005-09-19 14:36:30 UTC
"I think you somehow fixed the core problem in the meantime." The patch really
should not matter, outside of error handling.
Comment 32 Kevin Ivory 2005-09-20 07:19:40 UTC
You are correct. Back to the original default kernel of RC-1 and suspend to disk
/ resume works as intended.
Comment 33 Pavel Machek 2005-09-28 08:25:55 UTC
So it somehow fixed itself :-(.