|
Bugzilla – Full Text Bug Listing |
| Summary: | suspend-to-disk hangs during suspend | ||
|---|---|---|---|
| Product: | [openSUSE] SUSE LINUX 10.0 | Reporter: | Carl-Daniel Hailfinger <kernel01> |
| Component: | Kernel | Assignee: | Pavel Machek <pavel> |
| Status: | RESOLVED INVALID | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | dkukawka, nimdae, vojtech |
| Version: | RC 1 | ||
| Target Milestone: | --- | ||
| Hardware: | i386 | ||
| OS: | All | ||
| Whiteboard: | |||
| Found By: | Development | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
full suspend log including sysrq-p and sysrq-t
Full backtrace including SysRq-P and SysRq-T SysRq-P, SysRq-T during hanging suspend-to-disk backtrace of hang during suspend-to-disk (vanilla 2.6.13.1) |
||
Created attachment 48668 [details]
full suspend log including sysrq-p and sysrq-t
Interesting. SysRq-s,u,b gives another backtrace. Should I open a separate bug for that? SysRq : Emergency Sync Emergency Sync complete SysRq : Emergency Remount R/O SysRq : Resetting Badness in pci_get_subsys at drivers/pci/search.c:234 [<c01fa241>] pci_get_subsys+0xe1/0xf0 [<c01fa25b>] pci_get_device+0xb/0x10 [<c0115d62>] mach_reboot_fixups+0x12/0x30 [<c01122da>] machine_emergency_restart+0x3a/0xb0 [<c0246c82>] __handle_sysrq+0x72/0x100 [<c0246d27>] handle_sysrq+0x17/0x20 [<c0241ddc>] kbd_event+0x9c/0xc0 [<c026c5e3>] input_event+0xb3/0x400 [<c026fbb2>] atkbd_report_key+0x22/0xa0 [<c026fdc9>] atkbd_interrupt+0x199/0x590 [<c0249d89>] serio_interrupt+0x19/0x3d [<c024a9f1>] i8042_interrupt+0xf1/0x200 [<c013bc23>] handle_IRQ_event+0x33/0x60 [<c013bcd5>] __do_IRQ+0x85/0xd0 [<c01055e8>] do_IRQ+0x38/0x60 [<c0103dea>] common_interrupt+0x1a/0x20 [<e0830ac3>] acpi_processor_idle+0x10e/0x26f [processor] [<c01010ae>] cpu_idle+0xe/0x50 [<c03a8730>] start_kernel+0x160/0x1b0 No reboot fixup found for your hardware The machine replayed dozens of transactions on all mounted filesystems on the next boot although I had used SysRq-s. Strange. Booting up to kdm, inserting a cdrom, waiting a few seconds, ejecting it again and suspending afterwards also hangs during suspend. So we can rule out mounting by hand and strange kde effects. However, I cannot reproduce with runlevel 3. This is getting so strange. Boot to runlevel 3, insert cd, suspend -> works Boot to runlevel 3, insert cd, init 5, suspend -> BREAKS Boot to runlevel 3, insert cd, init 5, init 3, suspend -> works Boot to runlevel 3, insert cd, init 5, init 3, init 5, suspend -> works Boot to runlevel 3, insert cd, init 5, init 3, init 5, eject cd, insert cd, suspend -> works Boot to runlevel 3, insert cd, init 5, init 3, init 5, eject cd, insert cd, log into kde, suspend -> works this is really scary... Yes, scary. Can you try latest vanilla kernel to see if it is suse or generic problem? Can you try to generate smaller testcase? init 5 does really a lot of stuff, break_suspend.c would be great. Did you already look at the backtrace? Especially the following seems fishy to me (the process is called hald-addon-storage). How can a process be in the refrigerator (at the same place as all other frozen processes) and enter __invalidate_device from there? hald-addon-st D EE899100 0 15147 14499 15103 (NOTLB) Call Trace: [<c016fe10>] __invalidate_device+0x30/0x40 [<c0135575>] refrigerator+0x55/0x70 [<c0102b95>] do_signal+0x85/0x110 [<c01604db>] do_open+0x22b/0x2f0 [<c0160649>] blkdev_open+0x39/0x50 [<c0158743>] dentry_open+0x153/0x1d0 [<c01585de>] filp_open+0x3e/0x50 [<c0102c49>] do_notify_resume+0x29/0x40 [<c0102e0e>] work_notifysig+0x13/0x25 Besides that, I already spent 15 hours chasing this bug :-( I'm really glad I lowered it down from "random freeze at suspend" to "freezes if cdrom is/was inserted and is/was in runlevel 5 the first time after boot". Oh, and another hint: Suspend-to-disk spins down all disks before writing to swap. It then spins up again the disks to write the data to swap. My swap partition is on hdb and it seems (from hearing) that only hda and not hdb is spun up again, so the data can't be written to swap. is hal-addon-storage maybe already in state D before suspend? But how can the freezer appear in the middle of the hald-addon-storage backtrace? Should it not refuse to freeze if anything is in state D? Hm. hald-addon-storage is in state D for a few seconds after ejecting the data cd. After that, it returns to state S. And now the machine seems to suspend just fine. AARGH! I will now get a backtrace of hald-addon-storage in state D to compare it with the backtrace of hanging suspend. On top of all existing problems, this is a hard-to-trigger race condition. hald-addon-storage hangs in state D for 0.1 seconds every two seconds. hald-addon-storage hangs in state D for 10-20 seconds after closing the cd drive. hald-addon-storage hangs in state D for 5 seconds after opening the cd drive. So if the process freeze happens at exactly the right(wrong) time, you will hit the bug. However, the backtrace of hald-addon-storage in state D during "normal" use has nothing to do with the backtrace found here. refrigerator() should just fail, if some process is in D state. Not sure what is going on here. Pavel, please take a look. Is there some easy way to reproduce "D state on hald-addon-storage" without actually running hald? Do you have those backtraces you mentioned in comment #10? No, there is no way to run hald-addon-storage without HAL. Is there way to at least strace it to show what it does/create small application to "emulate" hald-addon-storage? I have to dig out these backtraces. If you want a small application to emulate the relevant effects of hald-addon-storage, you probably should just poll the cdrom in an endless loop. Danny, can you extract the polling code? Pavel: Here are the backtraces from "normal" hald-addon-storage in state D.
hald-addon-st D 2C9BC600 0 5379 5095 5639 5177 (NOTLB)
d9393d6c 00000086 00000000 2c9bc600 003d092b 2c9bc600 003d092b d81d46d8
d81d45b0 2c9bc600 003d092b 2c9bc600 003d092b 007a1200 00000000 d9392000
d9393d9c e0876ce8 d9393d90 c02f6ed6 00000001 d81d45b0 c0119580 d9393da0
Call Trace:
[<c02f6ed6>] wait_for_completion+0x76/0xb0
[<c0119580>] default_wake_function+0x0/0x10
[<e08607e1>] ide_do_drive_cmd+0xa1/0xf0 [ide_core]
[<e151b8c4>] cdrom_queue_packet_command+0x34/0xc0 [ide_cd]
[<c0119580>] default_wake_function+0x0/0x10
[<e08607fd>] ide_do_drive_cmd+0xbd/0xf0 [ide_core]
[<e151c48f>] cdrom_check_status+0x5f/0x70 [ide_cd]
[<c02595d0>] blk_end_sync_rq+0x0/0x20
[<e151d4ae>] ide_cdrom_check_media_change_real+0x1e/0x40 [ide_cd]
[<e151086a>] media_changed+0x4a/0x90 [cdrom]
[<c01601e3>] check_disk_change+0x13/0x60
[<e150fe33>] cdrom_open+0x63/0xe0 [cdrom]
[<e151e229>] idecd_open+0x49/0xa0 [ide_cd]
[<c0160325>] do_open+0x75/0x2f0
[<c016062c>] blkdev_open+0x1c/0x50
[<c01586ae>] dentry_open+0xbe/0x1d0
[<c01585de>] filp_open+0x3e/0x50
[<c01588e7>] sys_open+0x37/0xc0
[<c0102d1b>] sysenter_past_esp+0x54/0x79
hald-addon-st D 30A95F00 0 5639 5095 5379 (NOTLB)
df323d6c 00000082 00000000 30a95f00 003d092b 30a95f00 003d092b df2dfb88
df2dfa60 30a95f00 003d092b 30a95f00 003d092b 03567e00 00000000 df322000
df323d9c e0876af0 df323d90 c02f6ed6 00000001 df2dfa60 c0119580 df323da0
Call Trace:
[<c02f6ed6>] wait_for_completion+0x76/0xb0
[<c0119580>] default_wake_function+0x0/0x10
[<e08607e1>] ide_do_drive_cmd+0xa1/0xf0 [ide_core]
[<e151b8c4>] cdrom_queue_packet_command+0x34/0xc0 [ide_cd]
[<c02f6ed6>] wait_for_completion+0x76/0xb0
[<c0119580>] default_wake_function+0x0/0x10
[<e08607fd>] ide_do_drive_cmd+0xbd/0xf0 [ide_core]
[<e151c48f>] cdrom_check_status+0x5f/0x70 [ide_cd]
[<c02595d0>] blk_end_sync_rq+0x0/0x20
[<e151d4ae>] ide_cdrom_check_media_change_real+0x1e/0x40 [ide_cd]
[<e151086a>] media_changed+0x4a/0x90 [cdrom]
[<c01601e3>] check_disk_change+0x13/0x60
[<e150fe33>] cdrom_open+0x63/0xe0 [cdrom]
[<e151e229>] idecd_open+0x49/0xa0 [ide_cd]
[<c0160325>] do_open+0x75/0x2f0
[<c016062c>] blkdev_open+0x1c/0x50
[<c01586ae>] dentry_open+0xbe/0x1d0
[<c01585de>] filp_open+0x3e/0x50
[<c01588e7>] sys_open+0x37/0xc0
[<c0102d1b>] sysenter_past_esp+0x54/0x79
Argh. With RC1 it even hangs if no cdrom is/was ever inserted. I'm now in the unfortunate situation that every suspend hangs. Pavel: Is there any more information I can give you? Does it hang in the same way as with cdrom before? Any processes in D state with RC1? Does it work in init=/bin/bash? Argh! I still cannot reproduce without X running. And the reduced testcase I had does not help at all :(
Bah. Getting it to hang without X is near impossible...
To make matters worse, suspend-to-disk via "powersave -U" makes some dbus calls.
And the backtraces get even worse... How can a process in the refrigerator cause
page faults?!? (process "substrate"). And hald-addon-storage isn't involved with
the hang anymore.
Excerpt from my most recent hang:
ksoftirqd/0 S 854E6D00 0 2 1 3 (L-TLB)
dfd83fc0 00000046 00000000 854e6d00 003d08f1 c029d051 dd7d9400 dfdd3658
dfdd3530 854e6d00 003d08f1 00000001 c0412710 00f42400 00000000 dfd82000
dfd81fa0 00000000 c0120f60 c0120fc2 fffffffc c012db65 ffffffff ffffffff
Call Trace:
[<c029d051>] qdisc_restart+0x11/0x1a0
[<c0120f60>] ksoftirqd+0x0/0xb0
[<c0120fc2>] ksoftirqd+0x62/0xb0
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
events/0 R running 0 3 1 4 2 (L-TLB)
khelper S 34FF5900 0 4 1 5 3 (L-TLB)
dfddbf70 00000046 00000000 34ff5900 003d08e6 0000007b 00000001 dffe1b88
dffe1a60 34ff5900 003d08e6 d504be68 dfddbf6c 01312d00 00000000 dfff7548
00000202 d504be68 dfff7540 c012a59e dfff7548 dfff7550 dfff7558 dfdda000
Call Trace:
[<c012a59e>] worker_thread+0x1ee/0x200
[<c012a1a0>] __call_usermodehelper+0x0/0x50
[<c01195c7>] __wake_up_common+0x37/0x60
[<c0119580>] default_wake_function+0x0/0x10
[<c012a3b0>] worker_thread+0x0/0x200
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
kthread S 8F1C6E00 0 5 1 9 377 4 (L-TLB)
dff49f70 00000046 00000000 8f1c6e00 003d08c3 8e655300 003d08c3 dffe1678
dffe1550 8f1c6e00 003d08c3 8f1c6e00 003d08c3 00000000 00000000 dffd5e68
00000282 dc6adf64 dffd5e60 c012a59e dffd5e68 dffd5e70 dffd5e78 dff48000
Call Trace:
[<c012a59e>] worker_thread+0x1ee/0x200
[<c012db70>] keventd_create_kthread+0x0/0x50
[<c01195c7>] __wake_up_common+0x37/0x60
[<c0119580>] default_wake_function+0x0/0x10
[<c012a3b0>] worker_thread+0x0/0x200
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
kacpid S 15186700 0 9 5 340 (L-TLB)
dfe85f70 00000046 00000000 15186700 003d08df c022c6cd 00000001 dffc8188
dffc8060 15186700 003d08df d4382960 dfe85f6c 00000000 00000000 dffd5268
00000286 d4382960 dffd5260 c012a59e dffd5268 dffd5270 dffd5278 dfe84000
Call Trace:
[<c022c6cd>] acpi_bus_check_device+0x3e/0x61
[<c012a59e>] worker_thread+0x1ee/0x200
[<c021447f>] acpi_os_execute_deferred+0x0/0x35
[<c01195c7>] __wake_up_common+0x37/0x60
[<c0119580>] default_wake_function+0x0/0x10
[<c012a3b0>] worker_thread+0x0/0x200
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
kblockd/0 S CE707600 0 340 5 375 9 (L-TLB)
dfe87f70 00000046 00000000 ce707600 003d08f0 afaeed00 00000001 dff62bc8
dff62aa0 ce707600 003d08f0 df6dd65c dfe87f6c 007a1200 00000000 dff5fb88
00000247 df6dd65c dff5fb80 c012a59e dff5fb88 dff5fb90 dff5fb98 dfe86000
Call Trace:
[<c012a59e>] worker_thread+0x1ee/0x200
[<c02653e0>] cfq_kick_queue+0x0/0x60
[<c01195c7>] __wake_up_common+0x37/0x60
[<c0119580>] default_wake_function+0x0/0x10
[<c012a3b0>] worker_thread+0x0/0x200
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
pdflush S BCCF6E00 0 375 5 376 340 (L-TLB)
dfe89f9c 00000046 00000000 bccf6e00 003d08ba c0118be7 dffd0700 dff626b8
dff62590 bccf6e00 003d08ba dfe89fc8 00000046 00000000 00000000 dfe89fc0
dfe89fb4 dfe88000 c0142b20 c0142a47 fffffffc dfd81f98 00000000 c0142b3a
Call Trace:
[<c0118be7>] activate_task+0x57/0x70
[<c0142b20>] pdflush+0x0/0x20
[<c0142a47>] __pdflush+0x77/0x150
[<c0142b3a>] pdflush+0x1a/0x20
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
pdflush D 82EC1300 0 376 5 378 375 (L-TLB)
dfe8bde0 00000046 00000000 82ec1300 003d08f1 0000000e 00000000 dff621a8
dff62080 82ec1300 003d08f1 dfe8bdc4 dfe8bdc4 029f6300 00000000 dfe8be10
00000000 dfe8be18 dfe8bde8 c02fa16e c1401008 c015a2cb c02fa330 c015a2a0
Call Trace:
[<c02fa16e>] io_schedule+0xe/0x20
[<c015a2cb>] sync_buffer+0x2b/0x30
[<c02fa330>] __wait_on_bit+0x40/0x70
[<c015a2a0>] sync_buffer+0x0/0x30
[<c015a2a0>] sync_buffer+0x0/0x30
[<c02fa3cf>] out_of_line_wait_on_bit+0x6f/0x80
[<c012ded0>] wake_bit_function+0x0/0x40
[<c015a333>] __wait_on_buffer+0x23/0x30
[<e15ac723>] flush_commit_list+0x353/0x490 [reiserfs]
[<e15b0c0d>] do_journal_end+0x90d/0x9b0 [reiserfs]
[<e15afb1a>] journal_end_sync+0x5a/0x70 [reiserfs]
[<e159d420>] reiserfs_sync_fs+0x40/0x50 [reiserfs]
[<c015ee4d>] sync_supers+0x6d/0xc0
[<c01422de>] wb_kupdate+0x2e/0x130
[<c02f9883>] schedule+0x343/0x610
[<c0142b20>] pdflush+0x0/0x20
[<c0142a82>] __pdflush+0xb2/0x150
[<c0142b3a>] pdflush+0x1a/0x20
[<c01422b0>] wb_kupdate+0x0/0x130
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
aio/0 S BE3DA400 0 378 5 970 376 (L-TLB)
df92bf70 00000046 00000000 be3da400 003d08ba be3da400 003d08ba dff4d6d8
dff4d5b0 be3da400 003d08ba be3da400 003d08ba 00000000 00000000 df92a000
dfd81f64 dff5f480 dff5f480 c012a59e dff5f488 dff5f490 df92bfc8 df92a000
Call Trace:
[<c012a59e>] worker_thread+0x1ee/0x200
[<c01195c7>] __wake_up_common+0x37/0x60
[<c0119580>] default_wake_function+0x0/0x10
[<c012a3b0>] worker_thread+0x0/0x200
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
kseriod S 1768CA00 0 970 5 1235 378 (L-TLB)
dff4bfa4 00000046 00000000 1768ca00 003d08bb ffffffea c036cb88 dffe1168
dffe1040 1768ca00 003d08bb c036cb9c 00000246 007a1200 00000000 dff4a000
dff4a000 00000000 c024bd20 c024bddd 00000000 dffe1040 c012dea0 c0367c1c
Call Trace:
[<c024bd20>] serio_thread+0x0/0xf0
[<c024bddd>] serio_thread+0xbd/0xf0
[<c012dea0>] autoremove_wake_function+0x0/0x30
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
reiserfs/0 D 88E1F400 0 1235 5 2732 970 (L-TLB)
dfdddef8 00000046 00000000 88e1f400 003d08f1 c1380d00 c13ae5a0 dffc8ba8
dffc8a80 88e1f400 003d08f1 c11a7d00 c1329420 007a1200 00000000 ce70c4f8
00000286 dffc8a80 ce70c500 c02f93ee 00000001 dffc8a80 c0119580 ce70c500
Call Trace:
[<c02f93ee>] __down+0xbe/0xe0
[<c0119580>] default_wake_function+0x0/0x10
[<c02f9513>] __down_failed+0x7/0xc
[<e15b0d4a>] .text.lock.journal+0x8/0x10e [reiserfs]
[<e15afb60>] flush_async_commits+0x30/0x70 [reiserfs]
[<c012a519>] worker_thread+0x169/0x200
[<e15afb30>] flush_async_commits+0x0/0x70 [reiserfs]
[<c01195c7>] __wake_up_common+0x37/0x60
[<c0119580>] default_wake_function+0x0/0x10
[<c012a3b0>] worker_thread+0x0/0x200
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
khubd S B512FE00 0 2732 5 2835 1235 (L-TLB)
def13fa4 00000046 00000000 b512fe00 003d08df dc704940 daf94400 df2dfb88
df2dfa60 b512fe00 003d08df 00000000 00000001 016e3600 00000000 def12000
def12000 00000000 e18e3540 e18e35fa 00000000 df2dfa60 c012dea0 e18fa828
Call Trace:
[<e18e3540>] hub_thread+0x0/0xf0 [usbcore]
[<e18e35fa>] hub_thread+0xba/0xf0 [usbcore]
[<c012dea0>] autoremove_wake_function+0x0/0x30
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
kgameportd S 8F968000 0 2835 5 2732 (L-TLB)
dc70ffa4 00000046 0021e88e 8f968000 003d08c3 c022cc0c dc7cbcac dc6a4678
dc6a4550 90c7ad00 003d08c3 dc7cbc00 00000246 01312d00 00000000 dc70e000
dc70e000 00000000 e1921630 e19216ed 00000000 dc6a4550 c012dea0 e19245fc
Call Trace:
[<c022cc0c>] acpi_platform_notify+0x25/0x4d
[<e1921630>] gameport_thread+0x0/0xf0 [gameport]
[<e19216ed>] gameport_thread+0xbd/0xf0 [gameport]
[<c012dea0>] autoremove_wake_function+0x0/0x30
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
do_acpi_sleep D 9678F500 0 6128 5223 (NOTLB)
ce167dac 00000082 00000000 9678f500 003d08f1 df718200 0000000a d840cbc8
d840caa0 9678f500 003d08f1 c0264323 e0876434 bef92900 00000000 ce166000
ce167ddc e0876434 ce167dd0 c02f9bc6 00000001 d840caa0 c0119580 ce167de0
Call Trace:
[<c0264323>] cfq_next_request+0x33/0x90
[<c02f9bc6>] wait_for_completion+0x76/0xb0
[<c0119580>] default_wake_function+0x0/0x10
[<e08607e1>] ide_do_drive_cmd+0xa1/0xf0 [ide_core]
[<e085e49d>] generic_ide_resume+0x7d/0x90 [ide_core]
[<c025bf80>] blk_end_sync_rq+0x0/0x20
[<c02589e8>] resume_device+0x48/0xb0
[<c0258ade>] dpm_resume+0x8e/0xa0
[<c0258b01>] device_resume+0x11/0x20
[<c0136836>] swsusp_write+0x6/0x20
[<c0137379>] pm_suspend_disk+0x79/0x90
[<c0135395>] enter_state+0x45/0x70
[<c01354ca>] state_store+0x8a/0x98
[<c0135440>] state_store+0x0/0x98
[<c01912de>] subsys_attr_store+0x1e/0x30
[<c01914ed>] flush_write_buffer+0x1d/0x30
[<c0191530>] sysfs_write_file+0x30/0x50
[<c0191500>] sysfs_write_file+0x0/0x50
[<c015935d>] vfs_write+0x8d/0x170
[<c01594ec>] sys_write+0x3c/0x70
[<c0102d1b>] sysenter_past_esp+0x54/0x79
substrate D AC586F00 0 6131 6130 (NOTLB)
d5009ef8 00000082 00000000 ac586f00 003d08f0 00000002 00000001 c15c3168
c15c3040 ac586f00 003d08f0 00000000 c0117370 09c67100 00000000 d5008000
00000000 c0117370 d5008000 c0135575 d5009fbc 00000000 c0102b95 cdda5074
Call Trace:
[<c0117370>] do_page_fault+0x0/0x5ef
[<c0117370>] do_page_fault+0x0/0x5ef
[<c0135575>] refrigerator+0x55/0x70
[<c0102b95>] do_signal+0x85/0x110
[<c014c9e8>] __do_mmap_pgoff+0x508/0x780
[<c014ca12>] __do_mmap_pgoff+0x532/0x780
[<c02f9883>] schedule+0x343/0x610
[<c0117370>] do_page_fault+0x0/0x5ef
[<c0102c49>] do_notify_resume+0x29/0x40
[<c0102e0e>] work_notifysig+0x13/0x25
Interesting similarities about both hang backtraces: Processes reiserfs/0 and do_acpi_sleep. Partial diff follows. reiserfs/0 Call Trace: - __down+0xbe/0xe0 - default_wake_function+0x0/0x10 - __down_failed+0x7/0xc - .text.lock.journal+0x8/0x10e [reiserfs] - flush_async_commits+0x30/0x70 [reiserfs] - worker_thread+0x169/0x200 + worker_thread+0x1ee/0x200 flush_async_commits+0x0/0x70 [reiserfs] __wake_up_common+0x37/0x60 default_wake_function+0x0/0x10 worker_thread+0x0/0x200 kthread+0x85/0x90 kthread+0x0/0x90 kernel_thread_helper+0x5/0x14 do_acpi_sleep Call Trace: cfq_next_request+0x33/0x90 wait_for_completion+0x76/0xb0 default_wake_function+0x0/0x10 ide_do_drive_cmd+0xa1/0xf0 [ide_core] generic_ide_resume+0x7d/0x90 [ide_core] blk_end_sync_rq+0x0/0x20 resume_device+0x48/0xb0 dpm_resume+0x8e/0xa0 device_resume+0x11/0x20 swsusp_write+0x6/0x20 pm_suspend_disk+0x79/0x90 enter_state+0x45/0x70 state_store+0x8a/0x98 state_store+0x0/0x98 subsys_attr_store+0x1e/0x30 flush_write_buffer+0x1d/0x30 sysfs_write_file+0x30/0x50 sysfs_write_file+0x0/0x50 vfs_write+0x8d/0x170 sys_write+0x3c/0x70 sysenter_past_esp+0x54/0x79 BTW: do_acpi_sleep is a shellscript that does nothing but "echo disk > /sys/power/state" That means the backtrace of do_acpi_sleep tells us where suspend is hanging? If so, how to debug further? Hm. SysRq-S doesn't work. Neither does SysRq-U. They seem to hang like the rest of this system. Created attachment 49610 [details]
Full backtrace including SysRq-P and SysRq-T
Jens, could you have a look at this? I suspect a problem in the block layer.
I did another backtrace after SysRq-S and SysRq-U. The only task with different
backtrace was one of the pdflush tasks.
Before SysRq-S, SysRq-U:
pdflush S 6F530100 0 375 5 376 340 (L-TLB)
dfe89f9c 00000046 00000000 6f530100 003d08ba c0118be7 dffd0700 dff636b8
dff63590 6f530100 003d08ba dfe89fc8 00000046 00000000 00000000 dfe89fc0
dfe89fb4 dfe88000 c0142b20 c0142a47 fffffffc dfd81f98 00000000 c0142b3a
Call Trace:
[<c0118be7>] activate_task+0x57/0x70
[<c0142b20>] pdflush+0x0/0x20
[<c0142a47>] __pdflush+0x77/0x150
[<c0142b3a>] pdflush+0x1a/0x20
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
After SysRq-S, SysRq-U:
pdflush D 91641A00 0 375 5 376 340 (L-TLB)
dfe89d8c 00000046 00000000 91641a00 003d0945 dffcd17c c014aeba dff636b8
dff63590 91641a00 003d0945 00000000 00000000 00000000 00000000 dfe89dbc
dfe89dc4 c1400f98 dfe89d94 c02fa16e 00000000 c015a2cb c02fa41c c015a2a0
Call Trace:
[<c014aeba>] __handle_mm_fault+0xda/0x130
[<c02fa16e>] io_schedule+0xe/0x20
[<c015a2cb>] sync_buffer+0x2b/0x30
[<c02fa41c>] __wait_on_bit_lock+0x3c/0x70
[<c015a2a0>] sync_buffer+0x0/0x30
[<c015a2a0>] sync_buffer+0x0/0x30
[<c02fa4bf>] out_of_line_wait_on_bit_lock+0x6f/0x80
[<c012ded0>] wake_bit_function+0x0/0x40
[<c015a2f3>] __lock_buffer+0x23/0x30
[<c015ba5c>] __block_write_full_page+0x1fc/0x2e0
[<c015fb50>] blkdev_get_block+0x0/0x50
[<c015d042>] block_write_full_page+0x92/0xc0
[<c014572c>] pagevec_lookup_tag+0x1c/0x30
[<c015fb50>] blkdev_get_block+0x0/0x50
[<c0179645>] mpage_writepages+0x205/0x3b0
[<c015fc70>] blkdev_writepage+0x0/0x10
[<c014255a>] do_writepages+0x1a/0x30
[<c0177c10>] __sync_single_inode+0x50/0x220
[<c0177e03>] __writeback_single_inode+0x23/0x100
[<c0178084>] generic_sync_sb_inodes+0x1a4/0x280
[<c0142b20>] pdflush+0x0/0x20
[<c0178287>] sync_inodes_sb+0x67/0x70
[<c0178302>] __sync_inodes+0x52/0x90
[<c0178351>] sync_inodes+0x11/0x30
[<c015a681>] do_sync+0x11/0x60
[<c0142a82>] __pdflush+0xb2/0x150
[<c0142b3a>] pdflush+0x1a/0x20
[<c015a670>] do_sync+0x0/0x60
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
Another interesting fact: SysRq-O didn't poweroff the machine and it gave
interesting messages:
SysRq : Power Off
md: stopping all md devices.
md: md0 switched to read-only mode.
Shutdown: hdb
However, I have *two* harddisks (hda and hdb). And even stranger, SysRq-O caused
one of the harddisks to spin up again which it didn't during suspend.
I did another backtrace after SysRq-O. The only thing that changed was that
events/0 was no longer in state R, but in state D. Backtrace of this task follows:
events/0 D D9891200 0 3 1 4 2 (L-TLB)
dfd91e0c 00000046 062ccf80 d9891200 003d09cb e08763a0 e0864b7a dfdd3148
dfdd3020 17492d00 003d09cc 00000000 00000000 3f6b5a00 00000000 dfd90000
dfd91e3c e087662c dfd91e30 c02f9bc6 00000001 dfdd3020 c0119580 dfd91e40
Call Trace:
[<e0864b7a>] do_rw_taskfile+0x1fa/0x210 [ide_core]
[<c02f9bc6>] wait_for_completion+0x76/0xb0
[<c0119580>] default_wake_function+0x0/0x10
[<e08607e1>] ide_do_drive_cmd+0xa1/0xf0 [ide_core]
[<e085e40a>] generic_ide_suspend+0x7a/0x90 [ide_core]
[<c025bf80>] blk_end_sync_rq+0x0/0x20
[<e081ed30>] ide_device_shutdown+0x30/0x40 [ide_disk]
[<c0258509>] device_shutdown+0x49/0x66
[<c0127cb0>] kernel_power_off+0x20/0x30
[<c012a519>] worker_thread+0x169/0x200
[<c01375f0>] do_poweroff+0x0/0x10
[<c01195c7>] __wake_up_common+0x37/0x60
[<c0119580>] default_wake_function+0x0/0x10
[<c012a3b0>] worker_thread+0x0/0x200
[<c012db65>] kthread+0x85/0x90
[<c012dae0>] kthread+0x0/0x90
[<c01012f1>] kernel_thread_helper+0x5/0x14
While trying to reproduce, it hung during resume instead of during suspend. Diff between backtrace during suspend (still working) and resume (hang directly after): -do_acpi_sleep D CF7A5900 0 6029 5264 5950 (NOTLB) -df5fbdac 00000082 0007a120 cf7a5900 003d08fd e08763a0 e0864b7a dffc8698 - dffc8570 cfb76200 003d08fd 00000000 00000000 7b803600 00000000 df5fa000 - df5fbddc e0876434 df5fbdd0 c02f9bc6 00000001 dffc8570 c0119580 df5fbde0 +do_acpi_sleep D 93F0CB00 0 6418 5264 6340 (NOTLB) +d367ddb0 00000082 00000000 93f0cb00 000000d7 e08763a0 e0864b7a d2ae6168 + d2ae6040 93f0cb00 000000d7 00000000 00000000 09059100 0000000b d367c000 + d367dde0 e0876434 d367ddd4 c02f9bc6 00000001 d2ae6040 c0119580 d367dde4 Call Trace: [<e0864b7a>] do_rw_taskfile+0x1fa/0x210 [ide_core] [<c02f9bc6>] wait_for_completion+0x76/0xb0 [<c0119580>] default_wake_function+0x0/0x10 [<e08607e1>] ide_do_drive_cmd+0xa1/0xf0 [ide_core] [<e085e49d>] generic_ide_resume+0x7d/0x90 [ide_core] [<c025bf80>] blk_end_sync_rq+0x0/0x20 [<e0864cd0>] task_no_data_intr+0x0/0x70 [ide_core] [<c02589e8>] resume_device+0x48/0xb0 [<c0258ade>] dpm_resume+0x8e/0xa0 [<c0258b01>] device_resume+0x11/0x20 - [<c0136836>] swsusp_write+0x6/0x20 - [<c0137379>] pm_suspend_disk+0x79/0x90 + [<c01371f5>] finish+0x5/0x40 + [<c0137355>] pm_suspend_disk+0x55/0x90 [<c0135395>] enter_state+0x45/0x70 [<c01354ca>] state_store+0x8a/0x98 [<c0135440>] state_store+0x0/0x98 [<c01912de>] subsys_attr_store+0x1e/0x30 [<c01914ed>] flush_write_buffer+0x1d/0x30 [<c0191530>] sysfs_write_file+0x30/0x50 [<c0191500>] sysfs_write_file+0x0/0x50 [<c015935d>] vfs_write+0x8d/0x170 [<c01594ec>] sys_write+0x3c/0x70 [<c0102d1b>] sysenter_past_esp+0x54/0x79 +kdesktop_lock D 62CD9000 0 6419 6292 (NOTLB) +cee5fef8 00200082 00000000 62cd9000 000000cb 00000000 deafbc4c deef2b68 + deef2a40 62cd9000 000000cb 00000000 00000000 179a7b00 00000000 cee5e000 + 00000000 00000000 cee5e000 c0135575 cee5ffbc 00000000 c0102b95 c025b6fb +Call Trace: + [<c0135575>] refrigerator+0x55/0x70 + [<c0102b95>] do_signal+0x85/0x110 + [<c025b6fb>] __freed_request+0x5b/0x60 + [<c025b718>] freed_request+0x18/0x60 + [<e085eb7e>] __ide_end_request+0x9e/0x140 [ide_core] + [<e085ec72>] ide_end_request+0x52/0x60 [ide_core] + [<c02594b2>] elv_queue_empty+0x12/0x20 + [<e085ff99>] ide_do_request+0xb9/0x3b0 [ide_core] + [<c02f9883>] schedule+0x343/0x610 + [<c013bc23>] handle_IRQ_event+0x33/0x60 + [<c013bcf3>] __do_IRQ+0xa3/0xd0 + [<c0102c49>] do_notify_resume+0x29/0x40 + [<c0102e0e>] work_notifysig+0x13/0x25 More notes on what I do to get the hang during suspend: Log into KDE, click on Logout->Suspend. Calling powersave -U from console doesn't trigger it reliably. Upping severity because I can reproduce this 100% reliably, and it worked just fine in 9.3 I suspect this is actually in the cdrom code, but lets try with elevator=deadline please Unfortunately, elevator=deadline didn't help. Will retry with elevator=noop. A cdrom is no longer needed in RC1 to trigger the bug reliably. elevator=noop didn't help either. But after >150 journal replays in the last few days I really appreciate having reiserfs. Complete backtrace follows as attachment. Created attachment 49704 [details]
SysRq-P, SysRq-T during hanging suspend-to-disk
The above backtrace was generated with elevator=noop.
Some info about my system (dmesg excerpt):
VP_IDE: IDE controller at PCI slot 0000:00:04.1
ACPI-0212: *** Warning: Device is not power manageable
VP_IDE: chipset revision 16
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt82c686a (rev 22) IDE UDMA66 controller on pci0000:00:04.1
ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:DMA, hdd:DMA
hda: ST340823A, ATA DISK drive
hdb: ST3200822A, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: max request size: 128KiB
hda: 78165360 sectors (40020 MB) w/1024KiB Cache, CHS=65535/16/63, UDMA(66)
hda: cache flushes not supported
hda: hda1 hda2 < hda5 hda6 hda7 hda8 hda9 hda10 >
hdb: max request size: 1024KiB
hdb: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63, UDMA(66)
hdb: cache flushes supported
hdb: hdb1 hdb2 < hdb5 hdb6 hdb7 hdb8 >
hdc: LITE-ON CD-RW SOHR-5238S, ATAPI CD/DVD-ROM drive
hdd: LITEON DVD-ROM LTD122, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hdc: ATAPI 52X CD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
hdd: ATAPI 40X DVD-ROM drive, 512kB Cache, UDMA(33)
I don't know whether the ACPI warning applies to my IDE controller or just
happens to be printed at that time.
All filesystems (except swap and /boot) are on LVM on /dev/hdb.
swap and /boot are on normal dos-type partitions on /dev/hdb.
Please note that at the end of the backtrace, SysRq-S and SysRq-U seem to do
nothing and SysRq-O hangs after the "Shutdown: hdb" message.
It even hangs after deleting cdrom.ko and ide-cd.ko from /lib/modules and from the initrd, (and I verified with lsmod that they're indeed not loaded). So it's not cdrom related. What else can I try? Were the backtraces helpful? Vojtech, it looks like we're stuck in ide land after resume, could you please take a quick look? Small correction: The hang occurs almost always during suspend, not during resume. I'm having this problem as well. When my panics happen, the screen corrupts so I'm not able to see the backtraces and I have no way to connect to the machine via serial (it's a laptop). I'm going to try with vanilla sources tomorrow, and I will also watch this bug for resolution. Hardware is HP ZE2000z (ZE2200 CTO) with ATI Radeon XPRESS 200M mainboard and AMD Turion 64 processor running x86_64. I've had one successful suspend in all of my attempts, but that didn't resume. Suspend to ram works fine and I use it often. Carl: can you confirm that you have no processes in "D" state before you attempt suspend? Does vanilla kernel work for you? Would it be possible to try it without second harddrive, and preferably with LVM out of the picture? People don't usually use LVM/hdb on notebooks, so that gets little testing. Oh and decreasing severity. This is normal problem. "9.3 worked" is not too helpfull, unless you want to binary search where it broke. Yes, there are no processes in D state before suspend. And this is not a notebook, but a normal desktop. Trying without a second harddrive is unfortunately not possible because the first harddrive has the boot loader and the BIOS can't boot from the second hard drive because it's bigger than 128 GB. Killing LVM would mean I have to burn 200 GB to CD-ROMs (I have no DVD burner, nor would that help because some files are >10 GB) before repartioning. So what is left to check? Vanilla 2.6.13? Yes, checking vanilla 2.6.13 would be nice. Plus if you see processes in "D" state during normal operation... that should not be. File a separate bug about that, and mark this one as "depends on that". I tried vanilla sources and I got the same display corruption, but I could not see any signs of a kernel panic (that is not to say a panic didn't happen). I think I would like to try an older 2.6 kernel as I've been experiencing another problem which I had not experienced on other distributions that also continued on the vanilla sources. Oh, my last attempt, I did check for D state processes and none existed. For the first time ever, I was able to suspend to disk and then resume. The kernel that I used was vanilla 2.6.12 compiled from make oldconfig. I'll have to do some additional testing later. With vanilla 2.6.13.1 and identical .config, it did survive one suspend/resume
cycle. The second cycle hung during suspend, again in IDE land. Short excerpt
from backtrace, full backtrace will be attached:
do_acpi_sleep D E08512A0 0 5786 4885 5720 (NOTLB)
c9df7dac 00000082 0008b824 e08512a0 e083fb6a d24296d8 d24295b0 9d82ea00
003d0931 00000000 00000000 7d2b7500 00000000 c9df6000 c9df7ddc e0851330
c9df7dd0 c02ec846 00000001 d24295b0 c01183d0 c9df7de0 c9df7de0 c9df7e04
Call Trace:
[<e083fb6a>] do_rw_taskfile+0x1fa/0x210 [ide_core]
[<c02ec846>] wait_for_completion+0x76/0xb0
[<c01183d0>] default_wake_function+0x0/0x10
[<e083b7d1>] ide_do_drive_cmd+0xa1/0xf0 [ide_core]
[<e083948d>] generic_ide_resume+0x7d/0x90 [ide_core]
[<c024f9c0>] blk_end_sync_rq+0x0/0x20
[<e083fcc0>] task_no_data_intr+0x0/0x70 [ide_core]
[<c024c428>] resume_device+0x48/0xb0
[<c024c51e>] dpm_resume+0x8e/0xa0
[<c024c541>] device_resume+0x11/0x20
[<c0135096>] swsusp_write+0x6/0x20
[<c0135bd9>] pm_suspend_disk+0x79/0x90
[<c0133bf5>] enter_state+0x45/0x70
[<c0133d2a>] state_store+0x8a/0x98
[<c0133ca0>] state_store+0x0/0x98
[<c018e80e>] subsys_attr_store+0x1e/0x30
[<c018ea1d>] flush_write_buffer+0x1d/0x30
[<c018ea60>] sysfs_write_file+0x30/0x50
[<c018ea30>] sysfs_write_file+0x0/0x50
[<c0157339>] vfs_write+0x99/0x160
[<c01574ac>] sys_write+0x3c/0x70
[<c0102d1b>] sysenter_past_esp+0x54/0x79
and it hangs there forever.
Created attachment 50228 [details]
backtrace of hang during suspend-to-disk (vanilla 2.6.13.1)
this last backtrace was with vanilla 2.6.13.1. Vojtech, can you please have a look? OK, the vanilla kernel is much better than the SUSE kernel.
* SUSE kernel from RC1 always hangs during suspend
* vanilla kernel always survives the first suspend cycle but will hang on the
second cycle
Excerpt of backtrace of hang during resume:
do_acpi_sleep D E08512A0 0 5792 4958 (NOTLB)
ddb95db0 00000086 0008b824 e08512a0 e083fb6a d112d1c8 d112d0a0 0f9b0500
003d0919 00000000 00000000 ad9d9200 00000018 ddb94000 ddb95de0 e0851330
ddb95dd4 c02ec846 00000001 d112d0a0 c01183d0 ddb95de4 ddb95de4 ddb95e08
Call Trace:
[<e083fb6a>] do_rw_taskfile+0x1fa/0x210 [ide_core]
[<c02ec846>] wait_for_completion+0x76/0xb0
[<c01183d0>] default_wake_function+0x0/0x10
[<e083b7d1>] ide_do_drive_cmd+0xa1/0xf0 [ide_core]
[<e083948d>] generic_ide_resume+0x7d/0x90 [ide_core]
[<c024f9c0>] blk_end_sync_rq+0x0/0x20
[<e083fcc0>] task_no_data_intr+0x0/0x70 [ide_core]
[<c024c428>] resume_device+0x48/0xb0
[<c024c51e>] dpm_resume+0x8e/0xa0
[<c024c541>] device_resume+0x11/0x20
[<c0135a55>] finish+0x5/0x40
[<c0135bb5>] pm_suspend_disk+0x55/0x90
[<c0133bf5>] enter_state+0x45/0x70
[<c0133d2a>] state_store+0x8a/0x98
[<c0133ca0>] state_store+0x0/0x98
[<c018e80e>] subsys_attr_store+0x1e/0x30
[<c018ea1d>] flush_write_buffer+0x1d/0x30
[<c018ea60>] sysfs_write_file+0x30/0x50
[<c018ea30>] sysfs_write_file+0x0/0x50
[<c0157339>] vfs_write+0x99/0x160
[<c01574ac>] sys_write+0x3c/0x70
[<c0102d1b>] sysenter_past_esp+0x54/0x79
Common part of backtrace: do_acpi_sleep D E08512A0 0 5792 4958 (NOTLB) Call Trace: [<e083fb6a>] do_rw_taskfile+0x1fa/0x210 [ide_core] [<c02ec846>] wait_for_completion+0x76/0xb0 [<c01183d0>] default_wake_function+0x0/0x10 [<e083b7d1>] ide_do_drive_cmd+0xa1/0xf0 [ide_core] [<e083948d>] generic_ide_resume+0x7d/0x90 [ide_core] [<c024f9c0>] blk_end_sync_rq+0x0/0x20 [<e083fcc0>] task_no_data_intr+0x0/0x70 [ide_core] [<c024c428>] resume_device+0x48/0xb0 [<c024c51e>] dpm_resume+0x8e/0xa0 [<c024c541>] device_resume+0x11/0x20 During each hang, hda has already spun up while hdb still is not spinning. On my testing with 2.6.13.1, I was unable to get any successful suspends. However, with 2.6.12, I was able to suspend and resume 5 times in a row without displayed errors, but it wasn't without problems. The problem I ran into is being investigated in another bug. It seems my noapictimer kernel flag is ignored on resume so the system clock goes back to 2x speed or faster. Can you find which patch between 2.6.12 and 2.6.13 breaks it? I am going through the various versions from 2.6.12 to 2.6.13.1 with clean compiles. It's taking some time, but I am honestly not liking the way suspend to disk is working in 2.6.12.* with the previously mentioned problem. However, I hope to have at least up to 2.6.12.4 tested by today, and the rest tomorrow. It's possible the 2x speed problem will be resolved with bug 113323. Uff, no. Going through 2.6.12.1, 2.6.12.2, 2.6.12.3 is not going to be helpfull. These are stable releases, nothing significant changes there. Go through 2.6.12, 2.6.13-rc1, 2.6.13-rc2... I've found that this starts between 2.6.13-rc1 and 2.6.13-rc3. I've not been able to get 2.6.13-rc2 to compile yet, I'll work on that more. I was able to do multiple successful suspends with 2.6.13-rc1. I'll try to have information on rc2 by tomorrow. And yeah, I wasn't thinking when I orginally posted my testing method. I started breaking it down into groups after a couple of tests. Also, I noticed 2.6.13.2 was released so I tested this, and it is not fixed. Removed a pci audio device module and rc2 compiled. Problem exists between 2.6. 13-rc2 and 2.6.13-rc3. 2.6.13-rc2 to -rc3 has some differences in SMP swsusp handling, but they should not matter to you (right?). So it has to be something in surrounding code. Carl, can you confirm that -rc2 works for you? Beau, can you do binary search with git? Diff between -rc2 and -rc3 is still 2MB :-(. [Alternatively, just try few daily snapshots, if it is easier than getting git up/running.] I'm not sure about using git, so I'll check out the daily snapshots this weekend. I apologize I can't work much faster on this, doing this when I have time. It has been way more than a weekend, sorry. |
Suspend-to-disk hangs _during suspend_ if a cdrom has been mounted before suspend. If the cdrom is already unmounted again, the same problem happens. Perfectly reproducible, clean install of default beta4. Machine is a Pentium3 desktop with 1GHz, no other special features. It suspends and resumes just fine if I don't use the CD drive. normal suspend-to-disk kernel messages: uhci_hcd 0000:00:04.3: remove, state 1 usb usb2: USB disconnect, address 1 uhci_hcd 0000:00:04.3: USB bus 2 deregistered ACPI: PCI interrupt for device 0000:00:04.3 disabled uhci_hcd 0000:00:04.2: remove, state 1 usb usb1: USB disconnect, address 1 uhci_hcd 0000:00:04.2: USB bus 1 deregistered ACPI: PCI interrupt for device 0000:00:04.2 disabled end_request: I/O error, dev fd0, sector 0 Stopping tasks: ============================================================| Freeing memory... done (104114 pages freed) swsusp: Need to copy 16869 pages swsusp: critical section/: done (16936 pages copied) swsusp: Restoring Highmem PCI: Setting latency timer of device 0000:00:01.0 to 64 ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKD] -> GSI 5 (level, low) -> IRQ 5 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1 Writing data to swap (16936 pages)... done Writing pagedir...done (67 pages) S| Shutdown: hdb Shutdown: hda messages when the box hangs: uhci_hcd 0000:00:04.3: remove, state 1 usb usb2: USB disconnect, address 1 uhci_hcd 0000:00:04.3: USB bus 2 deregistered ACPI: PCI interrupt for device 0000:00:04.3 disabled uhci_hcd 0000:00:04.2: remove, state 1 usb usb1: USB disconnect, address 1 uhci_hcd 0000:00:04.2: USB bus 1 deregistered ACPI: PCI interrupt for device 0000:00:04.2 disabled end_request: I/O error, dev fd0, sector 0 Stopping tasks: ===============================================================| Freeing memory... done (25192 pages freed) swsusp: Need to copy 11903 pages swsusp: critical section/: done (11950 pages copied) swsusp: Restoring Highmem PCI: Setting latency timer of device 0000:00:01.0 to 64 ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKD] -> GSI 5 (level, low) -> IRQ 5 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1 <== it hangs here forever (at least 2 hours) I got backtraces with sysrq-t on serial console. Probably interesting things: hald-addon-storage and do_acpi_sleep because of disk access and ktorrent because of being stopped and in interrupt. The following is an excerpt from sysrq-t (only processes not sitting in refrigerator+0x55/0x70). ksoftirqd/0 S 08C56E00 0 2 1 3 (L-TLB) dfd83fc0 00000046 00000000 08c56e00 003d104e c029a361 d61e9000 dfdd2658 dfdd2530 08c56e00 003d104e 00000001 c040c710 007a1200 00000000 dfd82000 dfd81fa0 00000000 c0120f60 c0120fc2 fffffffc c012db65 ffffffff ffffffff Call Trace: [<c029a361>] qdisc_restart+0x11/0x1a0 [<c0120fc2>] ksoftirqd+0x62/0xb0 [<c012db65>] kthread+0x85/0x90 [<c012dae0>] kthread+0x0/0x90 [<c01012f1>] kernel_thread_helper+0x5/0x14 events/0 R running 0 3 1 4 2 (L-TLB) khelper S 9E3E7B00 0 4 1 5 3 (L-TLB) dff4df70 00000046 00000000 9e3e7b00 003d1046 0000007b 00000001 dffe1b88 dffe1a60 9e3e7b00 003d1046 cab27e9c dff4df6c 029f6300 00000000 dfff7548 00000202 cab27e9c dfff7540 c012a59e dfff7548 dfff7550 dfff7558 dff4c000 Call Trace: [<c012a59e>] worker_thread+0x1ee/0x200 [<c012a1a0>] __call_usermodehelper+0x0/0x50 [<c01195c7>] __wake_up_common+0x37/0x60 [<c0119580>] default_wake_function+0x0/0x10 [<c012a3b0>] worker_thread+0x0/0x200 [<c012db65>] kthread+0x85/0x90 [<c012dae0>] kthread+0x0/0x90 [<c01012f1>] kernel_thread_helper+0x5/0x14 kthread S AE408100 0 5 1 7 107 4 (L-TLB) df84bf70 00000046 00000000 ae408100 003d08d1 ad4c5d00 003d08d1 dffe1678 dffe1550 ae408100 003d08d1 ae408100 003d08d1 00000000 00000000 dffd5e68 00000282 db9a1f64 dffd5e60 c012a59e dffd5e68 dffd5e70 dffd5e78 df84a000 Call Trace: [<c012a59e>] worker_thread+0x1ee/0x200 [<c012db70>] keventd_create_kthread+0x0/0x50 [<c01195c7>] __wake_up_common+0x37/0x60 [<c0119580>] default_wake_function+0x0/0x10 [<c012a3b0>] worker_thread+0x0/0x200 [<c012db65>] kthread+0x85/0x90 [<c012dae0>] kthread+0x0/0x90 [<c01012f1>] kernel_thread_helper+0x5/0x14 kacpid S 5BE0F600 0 7 5 74 (L-TLB) dff49f70 00000046 00000000 5be0f600 003d1039 c022a4c1 00000001 dffe1168 dffe1040 5be0f600 003d1039 dba51e20 dff49f6c 00000000 00000000 dffd5268 00000282 dba51e20 dffd5260 c012a59e dffd5268 dffd5270 dffd5278 dff48000 Call Trace: [<c022a4c1>] acpi_bus_check_device+0x3e/0x61 [<c012a59e>] worker_thread+0x1ee/0x200 [<c021443f>] acpi_os_execute_deferred+0x0/0x16 [<c01195c7>] __wake_up_common+0x37/0x60 [<c0119580>] default_wake_function+0x0/0x10 [<c012a3b0>] worker_thread+0x0/0x200 [<c012db65>] kthread+0x85/0x90 [<c012dae0>] kthread+0x0/0x90 [<c01012f1>] kernel_thread_helper+0x5/0x14 kblockd/0 S A5870C00 0 74 5 105 7 (L-TLB) dfea3f70 00000046 00000000 a5870c00 003d104c d564172c 00000001 dff76698 dff76570 a5870c00 003d104c df6d765c dfea3f6c 02625a00 00000000 dff73b88 00000247 df6d765c dff73b80 c012a59e dff73b88 dff73b90 dff73b98 dfea2000 Call Trace: [<c012a59e>] worker_thread+0x1ee/0x200 [<c0262a30>] cfq_kick_queue+0x0/0x60 [<c01195c7>] __wake_up_common+0x37/0x60 [<c0119580>] default_wake_function+0x0/0x10 [<c012a3b0>] worker_thread+0x0/0x200 [<c012db65>] kthread+0x85/0x90 [<c012dae0>] kthread+0x0/0x90 [<c01012f1>] kernel_thread_helper+0x5/0x14 pdflush D 936E5D00 0 105 5 106 74 (L-TLB) dfee3ef4 00000046 00000000 936e5d00 003d1058 e66e5f00 003d1049 dff76188 dff76060 936e5d00 003d1058 e66e5f00 dfee3efc 00b71b00 00000000 001ecfce c0365428 000010e5 dfee3f24 c02f7517 c040d058 dfed8ab4 001ecfce 4b87ad6e Call Trace: [<c02f7517>] schedule_timeout+0x47/0xb0 [<c01246a0>] process_timeout+0x0/0x10 [<c02f749e>] io_schedule_timeout+0xe/0x20 [<c025965e>] blk_congestion_wait+0x6e/0x90 [<c012dea0>] autoremove_wake_function+0x0/0x30 [<c0142392>] wb_kupdate+0xe2/0x130 [<c0142b20>] pdflush+0x0/0x20 [<c0142a82>] __pdflush+0xb2/0x150 [<c0142b3a>] pdflush+0x1a/0x20 [<c01422b0>] wb_kupdate+0x0/0x130 [<c012db65>] kthread+0x85/0x90 [<c012dae0>] kthread+0x0/0x90 [<c01012f1>] kernel_thread_helper+0x5/0x14 pdflush S 5EF83600 0 106 5 108 105 (L-TLB) df923f9c 00000046 00000000 5ef83600 003d1016 5ef83600 003d1016 dfeb3bc8 dfeb3aa0 5ef83600 003d1016 5ef83600 003d1016 01ab3f00 00000000 df923fc0 df923fb4 df922000 c0142b20 c0142a47 fffffffc dfd81f98 00000000 c0142b3a Call Trace: [<c0142b20>] pdflush+0x0/0x20 [<c0142a47>] __pdflush+0x77/0x150 [<c0142b3a>] pdflush+0x1a/0x20 [<c012db65>] kthread+0x85/0x90 [<c012dae0>] kthread+0x0/0x90 [<c01012f1>] kernel_thread_helper+0x5/0x14 aio/0 S 6737CF00 0 108 5 698 106 (L-TLB) df927f70 00000046 00000000 6737cf00 003d08ba 6737cf00 003d08ba dfeb31a8 dfeb3080 6737cf00 003d08ba 6737cf00 003d08ba 00000000 00000000 df926000 dfd81f64 dff73480 dff73480 c012a59e dff73488 dff73490 df927fc8 df926000 Call Trace: [<c012a59e>] worker_thread+0x1ee/0x200 [<c01195c7>] __wake_up_common+0x37/0x60 [<c0119580>] default_wake_function+0x0/0x10 [<c012a3b0>] worker_thread+0x0/0x200 [<c012db65>] kthread+0x85/0x90 [<c012dae0>] kthread+0x0/0x90 [<c01012f1>] kernel_thread_helper+0x5/0x14 kseriod S C1571900 0 698 5 977 108 (L-TLB) dfe63fa4 00000046 00000000 c1571900 003d08ba ffffffea c03679a8 dff76ba8 dff76a80 c1571900 003d08ba c03679bc 00000246 016e3600 00000000 dfe62000 dfe62000 00000000 c0249370 c024942d 00000000 dff76a80 c012dea0 c0362bdc Call Trace: [<c0249370>] serio_thread+0x0/0xf0 [<c024942d>] serio_thread+0xbd/0xf0 [<c012dea0>] autoremove_wake_function+0x0/0x30 [<c012db65>] kthread+0x85/0x90 [<c012dae0>] kthread+0x0/0x90 [<c01012f1>] kernel_thread_helper+0x5/0x14 reiserfs/0 S F128F400 0 977 5 12269 698 (L-TLB) deeb7f70 00000046 00000000 f128f400 003d1049 00000000 00000001 dee536d8 dee535b0 f128f400 003d1049 d828de00 deeb7f6c 00000000 00000000 dee611e8 00000246 d828de00 dee611e0 c012a59e dee611e8 dee611f0 dee611f8 deeb6000 Call Trace: [<c012a59e>] worker_thread+0x1ee/0x200 [<e15afb30>] flush_async_commits+0x0/0x70 [reiserfs] [<c01195c7>] __wake_up_common+0x37/0x60 [<c0119580>] default_wake_function+0x0/0x10 [<c012a3b0>] worker_thread+0x0/0x200 [<c012db65>] kthread+0x85/0x90 [<c012dae0>] kthread+0x0/0x90 [<c01012f1>] kernel_thread_helper+0x5/0x14 khubd S 5C6ACA00 0 12269 5 12443 977 (L-TLB) dc98dfa4 00000046 00000000 5c6aca00 003d103a 5c6aca00 003d103a dbf44148 dbf44020 5c6aca00 003d103a 5c6aca00 003d103a 00000000 00000000 dc98c000 dc98c000 00000000 e194c540 e194c5fa 00000000 dbf44020 c012dea0 e1963828 Call Trace: [<e194c540>] hub_thread+0x0/0xf0 [usbcore] [<e194c5fa>] hub_thread+0xba/0xf0 [usbcore] [<c012dea0>] autoremove_wake_function+0x0/0x30 [<c012db65>] kthread+0x85/0x90 [<c012dae0>] kthread+0x0/0x90 [<c01012f1>] kernel_thread_helper+0x5/0x14 kgameportd S AE7D8A00 0 12443 5 12269 (L-TLB) d6093fa4 00000046 0028b0aa ae7d8a00 003d08d1 c022aa00 d63126ac d6328148 d6328020 afebc000 003d08d1 d6312600 00000246 016e3600 00000000 d6092000 d6092000 00000000 e197f630 e197f6ed 00000000 d6328020 c012dea0 e19825fc Call Trace: [<c022aa00>] acpi_platform_notify+0x25/0x4d [<e197f630>] gameport_thread+0x0/0xf0 [gameport] [<e197f6ed>] gameport_thread+0xbd/0xf0 [gameport] [<c012dea0>] autoremove_wake_function+0x0/0x30 [<c012db65>] kthread+0x85/0x90 [<c012dae0>] kthread+0x0/0x90 [<c01012f1>] kernel_thread_helper+0x5/0x14 hald-addon-st D EE899100 0 15147 14499 15103 (NOTLB) d1021ef8 00000086 00000000 ee899100 003d1049 df77d964 c016fe10 dbed0148 dbed0020 ee899100 003d1049 00000000 b7f46ff4 1a39de00 00000000 d1020000 00000000 b7f46ff4 d1020000 c0135575 d1021fbc 00000000 c0102b95 c01604db Call Trace: [<c016fe10>] __invalidate_device+0x30/0x40 [<c0135575>] refrigerator+0x55/0x70 [<c0102b95>] do_signal+0x85/0x110 [<c01604db>] do_open+0x22b/0x2f0 [<c0160649>] blkdev_open+0x39/0x50 [<c0158743>] dentry_open+0x153/0x1d0 [<c01585de>] filp_open+0x3e/0x50 [<c0102c49>] do_notify_resume+0x29/0x40 [<c0102e0e>] work_notifysig+0x13/0x25 ktorrent D ECDE5200 0 17552 1 17791 15948 (NOTLB) cfea7ef8 00200086 00061a80 ecde5200 003d1049 c0103dea cfea6000 d14c9bc8 d14c9aa0 ed1b5b00 003d1049 cfea6000 d14c9aa0 577fe700 00000000 cfea6000 00000000 4129cce8 cfea6000 c0135575 cfea7fbc 00000000 c0102b95 00000000 Call Trace: [<c0103dea>] common_interrupt+0x1a/0x20 [<c0135575>] refrigerator+0x55/0x70 [<c0102b95>] do_signal+0x85/0x110 [<c016a170>] __pollwait+0x0/0xa0 [<c016a910>] sys_select+0x290/0x360 [<c0102c49>] do_notify_resume+0x29/0x40 [<c0102e0e>] work_notifysig+0x13/0x25 do_acpi_sleep D 19B2ED00 0 18064 14534 (NOTLB) c7469dac 00000082 00000000 19b2ed00 003d104e 00000006 00000008 d11d2148 d11d2020 19b2ed00 003d104e c0261973 e087662c 863acb00 00000000 c7468000 c7469ddc e0876434 c7469dd0 c02f6ed6 00000001 d11d2020 c0119580 c7469de0 Call Trace: [<c0261973>] cfq_next_request+0x33/0x90 [<c02f6ed6>] wait_for_completion+0x76/0xb0 [<c0119580>] default_wake_function+0x0/0x10 [<e08607e1>] ide_do_drive_cmd+0xa1/0xf0 [ide_core] [<e085e49d>] generic_ide_resume+0x7d/0x90 [ide_core] [<c02595d0>] blk_end_sync_rq+0x0/0x20 [<c0256038>] resume_device+0x48/0xb0 [<c025612e>] dpm_resume+0x8e/0xa0 [<c0256151>] device_resume+0x11/0x20 [<c0136836>] swsusp_write+0x6/0x20 [<c0137379>] pm_suspend_disk+0x79/0x90 [<c0135395>] enter_state+0x45/0x70 [<c01354ca>] state_store+0x8a/0x98 [<c0135440>] state_store+0x0/0x98 [<c01912de>] subsys_attr_store+0x1e/0x30 [<c01914ed>] flush_write_buffer+0x1d/0x30 [<c0191530>] sysfs_write_file+0x30/0x50 [<c0191500>] sysfs_write_file+0x0/0x50 [<c015935d>] vfs_write+0x8d/0x170 [<c01594ec>] sys_write+0x3c/0x70 [<c0102d1b>] sysenter_past_esp+0x54/0x79 Full backtrace will follow as attachment. May be related to Bug 99589.