Bugzilla – Bug 104647
Suspend to Disk broken for SATA hard drives
Last modified: 2006-01-26 12:39:11 UTC
When trying to do a suspend to disk from a KDE session, the system freezes before reaching suspend state, displaying the following messages: Writing data to swap (40876 pages)...0% ata1: error occurred, prot reset ata1: status=0x01 { Error } ata1: called with no error (01)! sda: Current: sense key: Medium Error Additional sense: Unrecovered read error - auto reallocate failed end_request: I/O error, dev sda, sector 90811040 Doing a suspend to disk from a minimum environment (boot option init=/bin/bash) causes the system to suspend as expected, but after waking up, no HDD access is possible. Error messages: ata1: error occurred, port reset ata1: status=0x01 { Error } ata1: called with no error (01)! <these three lines occur several times.> SCSI error: <0 0 0 0> return code = 0x8000002 sda: Current: sense key: Medium Error Additional sense: Unrecovered read error - auto reallocate failed end_request: I/O error, dev sda, sector 116893847 ReiserFS: sda4: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [5435 24741 0x0 SD] Suspend to RAM also fails on this system, (suspending works, but system does not wake up any more.) I am sure that this does not (at least not only) depend on a display problem, so this has possibly the same cause? Test system: FSC AMILO Pro 8010, Intel Centrino i915GM chipset, Pentium M 1.73 GHz, 60 GB SATA HDD
Possibly this behavior is caused by the same circumstances as the suspend to RAM problem described in Bug #105800 (same test system).
Still not working in beta3, raised severity (please adjust back if not appropriate)
This is very bad indead. AJ: More and more new Laptops have SATA, and the last reviews I saw that tested Distros had at least 40% of the test machines with SATA :( I find it strange that we have one Acer where STD works with SATA and the FSC where it doesn't.
Stefan and Jens, please work together on this.
Andreas, is it using ata_piix or ahci as the disk driver? Please post full dmesg from a booted system, thanks.
Stefan, what are the hardware details of the acer STD and FSC?
The FSC is the one Andreas mentioned. I'll add hwinfo, lspci and lsmod from the Acer in a few minutes (the machine is currently resuming)
Created attachment 48149 [details] Acer dmesg
Created attachment 48150 [details] Acer lsmod
Created attachment 48151 [details] Acer lspic -vv
If you need anything else for the Acer please let me know. Suspend is working there except the 'pause' described in bug 113335 before writting to disk.
Created attachment 48153 [details] dmesg FSC
Created attachment 48154 [details] lsmod FSC
Created attachment 48155 [details] lspci -vv FSC
Hope this is enough info about the FSC for the moment... If not, please tell me, I'll add everything necessary.
Interesting, so the FSC is using ahci while the Acer is using ata_piix. Is it possible to configured the FSC to not use ahci in the bios, for testing purposes? Check for any ide options in there.
"AHCI configuration" can be disabled in the BIOS setup, but this does not change the behavior of the machine, it still displays the same errors when trying to suspend.
Have you verified that it actually loads ata_piix instead when you disable ahci?
Regardless whether AHCI is disabled, it loads both modules (possibly because of the DVD drive??), but I don't know how to determine which one is actually used for the hard drive.
The dmesg from comment #12 has: ahci(0000:00:1f.2) AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0x5 impl SATA mode ahci(0000:00:1f.2) flags: 64bit ncq pm led slum part which means it actually detected and used ahci. But just grep for the scsi0: xx line, if xx is ata_piix it is using ata_piix, vice versa for ahci.
Grepping for scsi shows that it still uses ahci. Can I do something about that?
Hmm, and you disabled everything that looks like ahci in the bios? You can try and make sure that ata_piix is loaded first (or remove ahci and ata_piix, then load ata_piix again), I forget what the order is by default.
In bios, "AHCI configurtion" (the one I changed) was the only option that seems to have something to do with AHCI. I removed ahci from /etc/sysconfig/kernel now, and a reboot with the new initrd showed the following: -booting up worked without any obvious problems -the above described grep in dmesg shows that now ata_piix is definitely used. -suspend to disk still does not work, but it generates different error messages: <snip> Writing data to swap (26976 pages)... 0%<3>ata1: command 0x35 timeout, stat 0x50 host_stat 0x24 ata1: command 0x35 timeout, stat 0x50 host_stat 0x24 <repeating continuously every few seconds> </snip>
Ok, so it breaks with both ata_piix and ahci on that hardware. It could be that we are not reenabling the hardware properly, we really do nothing special except the generic way. Ehm looking at the kernel source branch, the libata suspend patch isn't even there anymore. Very strange, I will check up on that, but it definitely could explain your issues!
Created attachment 48189 [details] My libata suspend patch I'm committing this to CVS now for easier testing, I'll let you know when a KOTD kernel with it is available.
kernel-default-2.6.13-20050830151713.i586.rpm from KOTD has it, please install that kernel ASAP and retest!
Alexander, have you had a chance to test that kernel yet? It's important that we move a little fast on this bug, so if you could test I would much appreciate it.
I tried the KOTD now in all variations. The results are as follows: - When I turn off AHCI in bios and do not load the ahci module, suspending and resuming both works fine now. - Situation with AHCI has significantly improved: The system suspends perfectly, but HDD access is not possible after resuming. This means that the system still cannot resume at all when running an X server (KDE) because the missing hard disk leads to immediate crash. Do you need some further information? (The error messages of the KDE problem passed too fast to read and I was not able to stop them with <break> or something similar, but I believe they were similar to the ones I mentioned in the initial bug creation.) Sorry for the delay, I'm still quite new here at SuSE and I had to ask for advice several times... Hope this is no big deal...
No worries, feel free to ask here as well if you have questions/problems. So at least ata_piix works now. It actually cannot work reliably on the Acer before, it's only by luck that sata suspend/resume works without the syncronization stuff I added to libata. Does ahci work with suspend-to-disk (not sure if you are testing suspend-to-ram or to disk)? I'm not sure we can do much about ahci resume yet, I think we need more hardware details. So it's possible that we can only support suspend on ata_piix for 10.0, which is not a problem in my opinion since all the notebooks out there should be able to run in piix mode instead of ahci.
Wait, I think I see that ahci is missing a hunk. I'll update the patch and ask you to retest a KOTD soon!
Ok, committed, I'll watch for a KOTD for you to test. I'll add suspend/resume support to the other libata drivers as well.
It's ready, please test kernel-default-2.6.13-20050831104926.i586.rpm right away!
hmmm, I believe there is something broken in that kernel, I can't boot the system any more regardless whether I have AHCI activated in BIOS or whether AHCI module is loaded. :-( I only see a black screen just after GRUB. Is there any information about this that I could provide to you? Otherwise I would remove my installation and set up beta4 instead of recovering my old system.
How annoying, perhaps some other patch screwed it up in the meantime. Please try adding 'apic' as a boot parameter.
Thanks, apic made my system boot again :-) But the ahci behavior has not improved: suspending (to disk) works fine, but resuming still doesn't, the system hangs after printing out the following (just as with the last kernel): <snip> ... ... PCI: Setting latency timer of device 0000:00:1f.2 to 64 ACPI: PCI Interrupt 0000:06:09.0[A] -> GSI 16 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:06:09.2[C] -> GSI 18 (level, low) -> IRQ 185 ata1: dev 0 configured for UDMA/100 Restarting tasks... done </snip> ata_piix works as expected, though. My first suspect was framebuffer, but it is not even loaded. I guess there is still no nard disk access. Is there anything else left I can do? By the way: My suspend to RAM problem is desribed in bug #105800, but I didn't do anything more about it yet since seife told me that this seems to be a much larger problem.
If there is no more information I can provide at the moment, I would like to install beta4 to verify some other bugs meanwhile. Would that be ok for you?
tested beta4 on the mentioned Acer, suspend2disk is still working suspend2ram tested with init=/bin/bash and acpi_sleep=s3_bios,s3_mode is not working (as in beta3). The disk seems to be the problem in that case. (This just for completeness)
Andi, please see comment #33-35, I'm guessing your patches.arch/i386-apic-up broke the boot of Alexander's notebook.
Stefan, please try the KOTD I mentioned in comment #32 - or a newer one and see if that makes suspend-to-ram work any better. Alexander, when you use ata_piix, what comes after the "Restarting tasks... done" line? Do you get any disk errors with ahci, or is that the least message you ever see? Is the machine pingable at that point and responding to key presses?
When I use ata_piix, Restarting tasks... done is the last thing I see (for about 2 secs) before X server gests started. After that, KDE is back and I can work on as normal. But when using ahci, the system hangs at that point and does not react to key presses. Just a moment, I'll reproduce the situation to find out whether ping works...
That (funny as it may sound) does not sound like a disk problem since at this point you already paged in lots of data from swap. Can you try ahci with a minimal boot and see if the disk works after resume?
done, suspending in minimal boot works well - but I have the same error messages as mentioned in description when trying to access the hard disk after resuming (e.g. with 'ls' command).
Installed kernel-default-2.6.13-20050831104926.i586.rpm. suspend2disk: coredump suspend2ram: not tested after the results with s2d :( Jens, I will open an extr bug for the core with that kernel, I don't think we should put all in that bug here.
I've added the information for the dump mentioned in comment 43 to bug 114648 since I think that one should be easier to fix.
Stefan, any changes with the newer KOTD kernels?
2.6.13-20050901172817-default was the last one I tried on the Acer, there everything was fine (see bug 114648). I'll trigger Alexander to test it on the FSC. I have not seen any newer KotD.
Alexander, please test! I would love to close this bug.
I tried 2.6.13-20050901172817-default, and still not working, I'm afraid :-( Resume from suspend to disk from KDE hangs after printing "Restarting tasks... done" and before starting X-server. Somehow also resume from an "init=/bin/bash" environment does not work any more - instead of resuming (without hard drive access, but at least with a prompt and responding as it was before), the system now freezes with the following output : ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 185 ACPI-0212: *** Warning: Device is not power manageable ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 193 PCI: Setting latency timer of device 0000:00:1f.2 to 64 ACPI: PCI Interrupt 0000:06:09.0[A] -> GSI 16 (level, low) -> IRQ 169 I believe this came in with the last changes, but I'm not completely sure. Is there another KOTD that could possibly improve my situation? I'm extremely sorry for messing up your weekend :-(
That's too bad, I'm still not convinced this is a sata problem, but it's hard to diagnose this further without actually having the laptop at hand. Don't worry, you are not messing up my weekend, I wont even be home :-). So followups will have to wait until monday.
CC'ing Pavel. He's the expert in suspend kernel debugging and hopefully could help here...
I'm testing on a FSC Amilo M1437G and I am seeing the same problem when doing suspend to disk: Writing data to swap (17715 pages)... 0% ata1: command 0x35 timeout, stat 0x50 host-stat 0x4 [repeat until poweroff] I'll try to kotd when I get home. The file currently on ftp is kernel-default-2.6.13-20050904145455.i586.rpm. Is it ok to test that version?
Yes, please test that version when you get home!
Suspend completed this time. Resume too. As far as I am concerned, kernel-default-2.6.13-20050904145455.i586.rpm solves the problem.
I'm downgrading this issue to major, since I don't consider it a blocker anymore. SATA suspend should work well on most machines now, it's likely that the remaining problems with some machines isn't SATA related.
Tried suspend to disk with kernel-default-2.6.13-20050904145455 using AHCI, still the following result: Suspend works, but during resume I get lots of these Eroor messages already described in the initial bug description - then the system freezes. Using ata_piix (just removed ahci module from initrd, I don't even need to change the ahci bios option), everything works perfect, even with a running X server.
Just an update (unfortunately): I'm currently running beta4 with kernel kernel-default-2.6.13-20050906125922. Doing suspend (powersave -U) and resume works fine - but only the first time. If I do a powersave -U again on a system that was resumed, several things went wrong in several tests. In most cases, the system hung before the line mentioning that pages are being written to disk. In one case the system managed to succeed on the second suspend only to hang during resume.
Created attachment 49281 [details] dmesg output There are several messages like: ACPI-0212: *** Warning: Device is not power manageable
Joerg: Is this a x86_64 machine? I have the feeling that this problem is unrelated to SATA? Just an idea, you could change the suspend mode from platform to shutdown in /etc/sysconfig/powersave/sleep SUSPEND2DISK_SHUTDOWN_MODE="platform" -> SUSPEND2DISK_SHUTDOWN_MODE="shutdown" maybe it helps? Better open a new bug for that and assign it to pavel@suse.de.
No, it isn't - it's a pentium-m (just have a look at the dmesg output that I attached in #57). Setting that value doesn't change anything and why should it. The box normally fails after initiating the suspend process and closely before actually writing out the data to disk. This time, I actually got an oops (unfortunately it scrolled out very quickly), followed by an endless stream of the following message: atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be trying access hardware directly. Note: The system is in runlevel 3, no X running. Does it really make sense to open a new bug for that? The problem and the hardware are similar to that mentioned above (see #51)
Jörg: Is your problem maybe related to bug 115095 ? A SysRq-T backtrace captured via serial console when the system hangs during suspend could help find out what's going on.
OK, turned on alt-sysrq and verified that it works. Then started wondering about how to get a serial console (read the file in Documentation) but the laptop doesn't have a serial console, only usb. An ideas?
Just use small font and digital camera. usb is not usefull for serial console.
Please open a new bug for 10.1-beta2 if the problem still exists there, it should work on ata_piix as well as ahci (10.0 missed an important ahci fix).