Bugzilla – Bug 143285
Resume after Suspend-To-RAM "hangs" the harddrive on IBM T41P
Last modified: 2007-06-05 10:01:27 UTC
I just enabled suspend to RAM, which all seems to work. But when trying to resume, the harddrive seems to hang. I can switch between the consoles and on C10 I see the following message; Jan 15 20:34:45 linux kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Will attach the log files (which seem to indicate that everything is OK). Versions: powersave-0.10.15.2-0.1 hal-0.5.4-6.5 dbus-1-0.35.2-8.1 2.6.13-15.7-default Using the following kernel load line when booting the latop: kernel /boot/vmlinuz root=/dev/hda2 vga=0x317 selinux=0 acpi_sleep=s3_bios,s3_mode splash=silent showopts
Created attachment 63356 [details] suspend2ram.log
Created attachment 63357 [details] suspend2ram-state.resume
this is very probably another case of "HDD needs ACPI resume methods to resume after STR". These are not implemented in the kernel yet. The newer HPs are behaving exactly like that. Magnus: do you have some special settings, e.g. a harddrive password?
The way I've been trying this is to simply boot up my machine, login to GNOME, wait for everything to finish loading, then I close the lid. I then wait a while before opening the lid again. I do have a harddrive password. I will try to disable it and see if it starts working.
The harddrive password and suspend to ram don't work together well yet. I mean - you were not prompted for the password, but the drive was switched off during suspend to ram, so it needs the password to allow access again. There is OS support needed that just is not there yet.
I removed the harddrive password but it still gives me the same issue. Are there any specific IBM modules that should be loaded? This is from lsmod: Module Size Used by cpufreq_ondemand 6044 1 cpufreq_userspace 4444 0 deflate 3840 0 zlib_deflate 23960 1 deflate twofish 48384 0 cpufreq_powersave 1792 0 speedstep_centrino 7508 1 serpent 28160 0 freq_table 4612 1 speedstep_centrino aes_i586 38912 0 blowfish 9728 0 sha256 11008 0 sha1 2560 0 vmnet 35236 9 crypto_null 2304 0 vmmon 110060 0 af_key 33552 0 ibm_acpi 25600 0 button 7056 0 pcmcia 37176 0 firmware_class 9856 1 pcmcia af_packet 21384 2 ipv6 242752 8 battery 10244 0 ac 5252 0 edd 9824 0 snd_pcm_oss 59168 0 snd_mixer_oss 18944 1 snd_pcm_oss ath_pci 70300 0 ath_rate_sample 16136 1 ath_pci snd_seq 51984 0 wlan 138652 3 ath_pci,ath_rate_sample yenta_socket 23820 2 rsrc_nonstatic 12800 1 yenta_socket pcmcia_core 39952 3 pcmcia,yenta_socket,rsrc_nonstatic ath_hal 148432 3 ath_pci,ath_rate_sample e1000 100660 0 snd_seq_device 8588 1 snd_seq generic 4484 0 [permanent] snd_intel8x0 33504 1 snd_ac97_codec 91004 1 snd_intel8x0 snd_ac97_bus 2432 1 snd_ac97_codec snd_pcm 93064 3 snd_pcm_oss,snd_intel8x0,snd_ac97_codec snd_timer 24452 2 snd_seq,snd_pcm snd 60420 10 snd_pcm_oss,snd_mixer_oss,snd_seq,snd_seq_device,snd_intel8x0,snd_ac97_codec,snd_pcm,snd_timer soundcore 9184 1 snd i2c_i801 8844 0 i2c_core 20368 1 i2c_i801 snd_page_alloc 10632 2 snd_intel8x0,snd_pcm uhci_hcd 32016 0 ehci_hcd 32136 0 hw_random 5268 0 intel_agp 22044 1 agpgart 33096 1 intel_agp usbcore 112512 3 uhci_hcd,ehci_hcd shpchp 88676 0 pci_hotplug 26164 1 shpchp parport_pc 38980 1 lp 11460 0 parport 33864 2 parport_pc,lp dm_mod 54972 0 reiserfs 250480 2 ide_cd 39684 0 cdrom 36896 1 ide_cd fan 4996 0 thermal 14472 0 processor 24380 2 speedstep_centrino,thermal piix 9988 0 [permanent] ide_disk 17152 4 ide_core 122380 4 generic,ide_cd,piix,ide_disk
no, it pobably needs those unimplemented methods even without the password. HPs do so for example, maybe your Thinkpad does the same.
Any chance of us implementing a patch for this?
I'm afraid this is a WONTFIX for 10.0. We might have a chance to support suspend on these notebooks on 10.1 if things flesh out. Does the T41 use a PATA drive on a SATA bridge? If you don't know, check dmesg around sda detection looking for "applying bridge limits".
Created attachment 63372 [details] Output from dmesg
Couldn't find anything about applying bridge limit in the output from dmesg. Attached the output above. Also, I'm on a T41P, just incase it makes a difference. Would be great to have this in 10.1.
Ah, it's plain PATA then (the newer T notebooks have PATA drives no SATA controllers, I thought the T41 did as well). As far as I know, there are nothing in progress to make PATA drives that require ACPI help to resume working... It should work if you disable any password and host protected area on the drive.
Well, I disabled the password on the harddrive but that didn't help. Other than that, there's no security features enabled. The harddrive actually works for a couple of seconds before it "hangs". If this is suppose to work on T41P, do you have any suggestions on what to do next?
I have no further suggestions, I can't say what goes wrong. The IDE suspend/resume code is 100% generic, it does what is needed to bring the devices up and down. If some drives require extra commands to wake them up properly, then it wont work. The extra commands are _usually_ due to security features as noted further up, but could also be because of things like the host protected area (check your BIOS for that as well). A patch like this one: http://lwn.net/Articles/162958/ may make it work on your notebook. I will give it a spin on my T43 and see if it finds any taskfiles to execute in the ACPI tables.
Please do as root: # acpidump > acpi.dump and attach that to this bug, then I can see what ACPI thinks should be done to your drive.
Created attachment 63421 [details] acpi.dump as requested
Looking at your ACPI dump, it doesn't contain anything weird for suspend/resume. I'd be inclined to think the IDE code is fine and that it could be other parts of your system not liking to STR very much. Does STD work? I'd also like you to try removing all of usb and sound modules and retesting. Boot in run level 3 and rmmod ehci_hcd, uhci_hcd and all the snd_* modules and do a STR cycle.
Ok, I've got a few findings... 1. STD works without any problems 2. On my laptop, I have 3 partitions; hda1 = swap hda2 = / hda3 = /home 3. When doing STM as root (either in runlevel 3 or runlevel 5) it works fine 4. Creating a new user and having the home directory on hda2 will make STM work for that user 5. Creating a new user and having the home directory on hda3 will result in the original issue as I've reported here 6. An fsck.resier /hda3 will not have any issues after a clean reboot 7. An fsck.reiser /hda3 will report bad blocks after an STM has been performed 8. After an STM as root or as a user with the home dir on /, any access to /home will result error message "Jan 16 xx:xx:xx linux kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }" on console 10.
You still have host protected area enabled, does the hda3 partition extend to the end of the drive? You probably can't access that after the resume, unless we issue a new set_max command.
Yes, hda3 does extend to the end of the drive. I do not know what "host protected area" is. How can I turn it off?
Created attachment 63427 [details] Hack to check if HPA is the reason for the problem This could work-around the issue, it reissues a set max to the native capacity on resume.
Thanks. I'll apply this tomorrow and let you know. It's after midnight here and I need to keep customers happy tomorrow so... Thanks for your help so far... I'm on 2.6.13-15.7 but will try to tweak your patch for my kernel.
I'm building you an image, should be ready in about 20 minutes or so. Test it whenever you can (it doesn't have to be today, if you are beat by all means go home and get some rest :-)
Build failed due to some abuild reason. So if you can test this yourself, great. I'll just wait for feedback tomorrow. I'm attaching a patch that applies to the 10.0 kernel source (the one I generated was for HEAD).
Created attachment 63430 [details] SL10 version of the patch
HPA is something i never thought of. Will have to check it on the HPs, too :-) Magnus, if you ever need to debug stuff like this again, http://www.opensuse.org/ACPI_Suspend_debugging contains a "hardcore ACPI debugging HOWTO" that may help.
I added the changes to ide-disk.c and ide-io.c as suggested. But when I tried to resume from STM, I got a kernel panic. The information regarding the panic quickly scrolled of the screen so I can't give any more information about that. I reverted back and enabled DEBUG_PM. Not sure if this helps you, but it produced the following; Jan 17 11:47:38 linux kernel: hda: Wakeup request inited, waiting for !BSY... Jan 17 11:47:38 linux kernel: hda: start_power_step(step: 1000) Jan 17 11:47:38 linux kernel: hda: complete_power_step(step: 1000, stat: 50, err: 0) Jan 17 11:47:38 linux kernel: hda: start_power_step(step: 1001) Jan 17 11:47:38 linux kernel: hda: completing PM request, resume Jan 17 11:47:38 linux kernel: hdc: Wakeup request inited, waiting for !BSY... Jan 17 11:47:38 linux kernel: hdc: start_power_step(step: 1000) Jan 17 11:47:38 linux kernel: hdc: completing PM request, resume Jan 17 11:47:38 linux kernel: Restarting tasks... done
Magnus, the last log with debug enabled looks like it succeeded, correct?
just a guess: of course it succeeds first, but as soon as the fs wants to access somehting inside the HPA, it blows up, since the drive is no longer 40GB but only 38GB (numbers made up)
Oh missed the reverted back bit, dang. The patch is supposed to reset the size to the native size, I might have messed something up that makes it crash of course. Magnus, any chance you can get a little of that oops info out? Just the EIP (name + offset) would help a lot.
Created attachment 63523 [details] Better patch for SL10 This fixes a bug in the previous patch, and also adds the support properly (the way it should be included). Magnus, can you test with this one?
I'll try the latest patch in an hour or so... Just have to stop drinking beers at the pub :( I'll let you know how the patch goes... Also, for comment#29, the resume did not succeed
First run resulted in a kernel panic before the screen was even turned on (CAPS lock is flashing and nothing else is happening)
The patch didn't touch anything that early, must be something else going wrong. How are you building that kernel?
I'm only building the modules in the IDE directory, then I'm doing an module_install and finally an mkinitrd. I'll verify that I got everything correct from your patch... Is this patch suppose to be an addon to your previous one, or a replacement? Are you on the Novell GroupWise Messenger? I'm there as mboman
In /usr/src/linux/drivers/ide I execute; make -C /usr/src/linux M=$(pwd) make -C /usr/src/linux M=$(pwd) modules_install Then I go to /lib/modules/2.6.13-15.7-default/extra and copy ide-*.* to; /lib/modules/2.6.13-15.7-default/kernel/drivers/ide Then I run mkinitrd
I'm assuming you did a # zcat /proc/config.gz > .config # make oldconfig before that? The more correct approach would be (after the above): # cd /usr/src/linux # make SUBDIRS=drivers/ide modules then copy the drivers/ide/*.ko files on top of the ones in /lib/modules/2.6.13-15.7-default/kernel/drivers/ide, then run mkinitrd. Does that make a difference?
Probably wouldn't make a differnce? But I'll try your suggestion and let you know.... My ide-disk.c and ide-io.c is now confirmed to be the same as yours according to the patch.
It probably wont, I'm just more curious on whether you did the oldconfig steps correctly? The patch is a new replacement, but you probably noticed this already.
Yhea, I normally do make cloneconfig and then make prepare-all as I have to install vmware on my machine. Anyhow, it didn't make a difference and I still get the kernel panic. Not sure what happen the first when the screen didn't show anything, but... I see something about "not syncing" and "fatal not syncing" before it scrolls off he screen. The rest of the error messages are complaining about something regarding that X11 might try to access something directly. STD still works without any issues... And as I sad before, if I keep it to one partition only, it seems to work fine. Also, the protected area seems to be disabled by the ide driver according to dmesg,
I'm afraid I can't do much about it, unless you capture those messages. What if you boot to runlevel 3 (to exclude X) and add vga=ext to the boot line, are you able to see at least the call back trace? The problem is exactly because Linux has disabled the host protected area, but when you suspend the machine likely enables it again. So if the resume doesn't disable it, you can't reach the last few gigabytes of the drive, which the file system doesn't really like very much.
Anything new here, Magnus?
Yes, sort of... I finally got my hands on a spare T41P so that I can play with this. I installed SL10.1B3 today and tried the STR and it works... I will reinstall that machine tomorrow to make sure I have the same setup (partitioning etc).
This works fine with SL10.1B3. I'm happy to close this bug report if you are :)
Thanks, closing...