Bug 143285 - Resume after Suspend-To-RAM "hangs" the harddrive on IBM T41P
Summary: Resume after Suspend-To-RAM "hangs" the harddrive on IBM T41P
Status: VERIFIED FIXED
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Mobile Devices (show other bugs)
Version: RC 4
Hardware: i586 Other
: P5 - None : Normal
Target Milestone: ---
Assignee: Holger Macht
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-15 10:01 UTC by Magnus Boman
Modified: 2007-06-05 10:01 UTC (History)
2 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
suspend2ram.log (4.03 KB, text/x-log)
2006-01-15 10:02 UTC, Magnus Boman
Details
suspend2ram-state.resume (110 bytes, application/octet-stream)
2006-01-15 10:02 UTC, Magnus Boman
Details
Output from dmesg (18.25 KB, text/plain)
2006-01-16 08:48 UTC, Magnus Boman
Details
acpi.dump as requested (230.39 KB, application/octet-stream)
2006-01-16 11:52 UTC, Magnus Boman
Details
Hack to check if HPA is the reason for the problem (1.91 KB, patch)
2006-01-16 13:10 UTC, Jens Axboe
Details | Diff
SL10 version of the patch (1.85 KB, patch)
2006-01-16 13:23 UTC, Jens Axboe
Details | Diff
Better patch for SL10 (2.48 KB, patch)
2006-01-17 08:20 UTC, Jens Axboe
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Magnus Boman 2006-01-15 10:01:20 UTC
I just enabled suspend to RAM, which all seems to work. But when trying to resume, the harddrive seems to hang.
I can switch between the consoles and on C10 I see the following message;

Jan 15 20:34:45 linux kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Will attach the log files (which seem to indicate that everything is OK).

Versions:
powersave-0.10.15.2-0.1
hal-0.5.4-6.5
dbus-1-0.35.2-8.1
2.6.13-15.7-default

Using the following kernel load line when booting the latop:
kernel /boot/vmlinuz root=/dev/hda2 vga=0x317 selinux=0 acpi_sleep=s3_bios,s3_mode splash=silent showopts
Comment 1 Magnus Boman 2006-01-15 10:02:26 UTC
Created attachment 63356 [details]
suspend2ram.log
Comment 2 Magnus Boman 2006-01-15 10:02:41 UTC
Created attachment 63357 [details]
suspend2ram-state.resume
Comment 4 Forgotten User ZhJd0F0L3x 2006-01-15 13:01:57 UTC
this is very probably another case of "HDD needs ACPI resume methods to resume after STR". These are not implemented in the kernel yet.
The newer HPs are behaving exactly like that.

Magnus: do you have some special settings, e.g. a harddrive password?
Comment 5 Magnus Boman 2006-01-15 21:43:55 UTC
The way I've been trying this is to simply boot up my machine, login to GNOME, wait for everything to finish loading, then I close the lid. I then wait a while before opening the lid again.

I do have a harddrive password. I will try to disable it and see if it starts working.
Comment 6 Forgotten User ZhJd0F0L3x 2006-01-15 22:04:55 UTC
The harddrive password and suspend to ram don't work together well yet. I mean - you were not prompted for the password, but the drive was switched off during suspend to ram, so it needs the password to allow access again. There is OS support needed that just is not there yet.
Comment 7 Magnus Boman 2006-01-15 22:09:42 UTC
I removed the harddrive password but it still gives me the same issue.
Are there any specific IBM modules that should be loaded?

This is from lsmod:

Module                  Size  Used by
cpufreq_ondemand        6044  1 
cpufreq_userspace       4444  0 
deflate                 3840  0 
zlib_deflate           23960  1 deflate
twofish                48384  0 
cpufreq_powersave       1792  0 
speedstep_centrino      7508  1 
serpent                28160  0 
freq_table              4612  1 speedstep_centrino
aes_i586               38912  0 
blowfish                9728  0 
sha256                 11008  0 
sha1                    2560  0 
vmnet                  35236  9 
crypto_null             2304  0 
vmmon                 110060  0 
af_key                 33552  0 
ibm_acpi               25600  0 
button                  7056  0 
pcmcia                 37176  0 
firmware_class          9856  1 pcmcia
af_packet              21384  2 
ipv6                  242752  8 
battery                10244  0 
ac                      5252  0 
edd                     9824  0 
snd_pcm_oss            59168  0 
snd_mixer_oss          18944  1 snd_pcm_oss
ath_pci                70300  0 
ath_rate_sample        16136  1 ath_pci
snd_seq                51984  0 
wlan                  138652  3 ath_pci,ath_rate_sample
yenta_socket           23820  2 
rsrc_nonstatic         12800  1 yenta_socket
pcmcia_core            39952  3 pcmcia,yenta_socket,rsrc_nonstatic
ath_hal               148432  3 ath_pci,ath_rate_sample
e1000                 100660  0 
snd_seq_device          8588  1 snd_seq
generic                 4484  0 [permanent]
snd_intel8x0           33504  1 
snd_ac97_codec         91004  1 snd_intel8x0
snd_ac97_bus            2432  1 snd_ac97_codec
snd_pcm                93064  3 snd_pcm_oss,snd_intel8x0,snd_ac97_codec
snd_timer              24452  2 snd_seq,snd_pcm
snd                    60420  10 snd_pcm_oss,snd_mixer_oss,snd_seq,snd_seq_device,snd_intel8x0,snd_ac97_codec,snd_pcm,snd_timer
soundcore               9184  1 snd
i2c_i801                8844  0 
i2c_core               20368  1 i2c_i801
snd_page_alloc         10632  2 snd_intel8x0,snd_pcm
uhci_hcd               32016  0 
ehci_hcd               32136  0 
hw_random               5268  0 
intel_agp              22044  1 
agpgart                33096  1 intel_agp
usbcore               112512  3 uhci_hcd,ehci_hcd
shpchp                 88676  0 
pci_hotplug            26164  1 shpchp
parport_pc             38980  1 
lp                     11460  0 
parport                33864  2 parport_pc,lp
dm_mod                 54972  0 
reiserfs              250480  2 
ide_cd                 39684  0 
cdrom                  36896  1 ide_cd
fan                     4996  0 
thermal                14472  0 
processor              24380  2 speedstep_centrino,thermal
piix                    9988  0 [permanent]
ide_disk               17152  4 
ide_core              122380  4 generic,ide_cd,piix,ide_disk
Comment 8 Forgotten User ZhJd0F0L3x 2006-01-15 22:17:24 UTC
no, it pobably needs those unimplemented methods even without the password. HPs do so for example, maybe your Thinkpad does the same.
Comment 9 Magnus Boman 2006-01-16 08:06:30 UTC
Any chance of us implementing a patch for this?
Comment 10 Jens Axboe 2006-01-16 08:32:34 UTC
I'm afraid this is a WONTFIX for 10.0. We might have a chance to support suspend on these notebooks on 10.1 if things flesh out.

Does the T41 use a PATA drive on a SATA bridge? If you don't know, check dmesg around sda detection looking for "applying bridge limits".
Comment 11 Magnus Boman 2006-01-16 08:48:32 UTC
Created attachment 63372 [details]
Output from dmesg
Comment 12 Magnus Boman 2006-01-16 08:49:28 UTC
Couldn't find anything about applying bridge limit in the output from dmesg. Attached the output above. Also, I'm on a T41P, just incase it makes a difference. Would be great to have this in 10.1.
Comment 13 Jens Axboe 2006-01-16 09:23:46 UTC
Ah, it's plain PATA then (the newer T notebooks have PATA drives no SATA controllers, I thought the T41 did as well).

As far as I know, there are nothing in progress to make PATA drives that require ACPI help to resume working... It should work if you disable any password and host protected area on the drive.
Comment 14 Magnus Boman 2006-01-16 10:18:15 UTC
Well, I disabled the password on the harddrive but that didn't help. Other than that, there's no security features enabled. The harddrive actually works for a couple of seconds before it "hangs". If this is suppose to work on T41P, do you have any suggestions on what to do next?
Comment 15 Jens Axboe 2006-01-16 10:24:46 UTC
I have no further suggestions, I can't say what goes wrong. The IDE suspend/resume code is 100% generic, it does what is needed to bring the devices up and down. If some drives require extra commands to wake them up properly, then it wont work. The extra commands are _usually_ due to security features as noted further up, but could also be because of things like the host protected area (check your BIOS for that as well).

A patch like this one:

http://lwn.net/Articles/162958/

may make it work on your notebook. I will give it a spin on my T43 and see if it finds any taskfiles to execute in the ACPI tables.
Comment 16 Jens Axboe 2006-01-16 11:18:06 UTC
Please do as root:

# acpidump > acpi.dump

and attach that to this bug, then I can see what ACPI thinks should be done to your drive.
Comment 17 Magnus Boman 2006-01-16 11:52:51 UTC
Created attachment 63421 [details]
acpi.dump as requested
Comment 18 Jens Axboe 2006-01-16 12:09:51 UTC
Looking at your ACPI dump, it doesn't contain anything weird for suspend/resume. I'd be inclined to think the IDE code is fine and that it could be other parts of your system not liking to STR very much. Does STD work?

I'd also like you to try removing all of usb and sound modules and retesting. Boot in run level 3 and rmmod ehci_hcd, uhci_hcd and all the snd_* modules and do a STR cycle.
Comment 19 Magnus Boman 2006-01-16 12:50:29 UTC
Ok, I've got a few findings...

1. STD works without any problems
2. On my laptop, I have 3 partitions;
  hda1 = swap
  hda2 = /
  hda3 = /home
3. When doing STM as root (either in runlevel 3 or runlevel 5) it works fine
4. Creating a new user and having the home directory on hda2 will make STM work for that user
5. Creating a new user and having the home directory on hda3 will result in the original issue as I've reported here
6. An fsck.resier /hda3 will not have any issues after a clean reboot
7. An fsck.reiser /hda3 will report bad blocks after an STM has been performed
8. After an STM as root or as a user with the home dir on /, any access to /home will result error message "Jan 16 xx:xx:xx linux kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }" on console 10.
Comment 20 Jens Axboe 2006-01-16 12:54:29 UTC
You still have host protected area enabled, does the hda3 partition extend to the end of the drive? You probably can't access that after the resume, unless we issue a new set_max command.
Comment 21 Magnus Boman 2006-01-16 12:58:16 UTC
Yes, hda3 does extend to the end of the drive. I do not know what "host protected area" is. How can I turn it off?
Comment 22 Jens Axboe 2006-01-16 13:10:02 UTC
Created attachment 63427 [details]
Hack to check if HPA is the reason for the problem

This could work-around the issue, it reissues a set max to the native capacity on resume.
Comment 23 Magnus Boman 2006-01-16 13:14:54 UTC
Thanks. I'll apply this tomorrow and let you know. It's after midnight here and I need to keep customers happy tomorrow so... Thanks for your help so far...
I'm on 2.6.13-15.7 but will try to tweak your patch for my kernel.
Comment 24 Jens Axboe 2006-01-16 13:21:53 UTC
I'm building you an image, should be ready in about 20 minutes or so. Test it whenever you can (it doesn't have to be today, if you are beat by all means go home and get some rest :-)
Comment 25 Jens Axboe 2006-01-16 13:23:14 UTC
Build failed due to some abuild reason. So if you can test this yourself, great. I'll just wait for feedback tomorrow. I'm attaching a patch that applies to the 10.0 kernel source (the one I generated was for HEAD).
Comment 26 Jens Axboe 2006-01-16 13:23:44 UTC
Created attachment 63430 [details]
SL10 version of the patch
Comment 27 Forgotten User ZhJd0F0L3x 2006-01-16 13:34:16 UTC
HPA is something i never thought of. Will have to check it on the HPs, too :-)

Magnus, if you ever need to debug stuff like this again, http://www.opensuse.org/ACPI_Suspend_debugging contains a "hardcore ACPI debugging HOWTO" that may help.
Comment 28 Magnus Boman 2006-01-17 00:51:36 UTC
I added the changes to ide-disk.c and ide-io.c as suggested. But when I tried to resume from STM, I got a kernel panic. The information regarding the panic quickly scrolled of the screen so I can't give any more information about that. I reverted back and enabled DEBUG_PM. Not sure if this helps you, but it produced the following;

Jan 17 11:47:38 linux kernel: hda: Wakeup request inited, waiting for !BSY...
Jan 17 11:47:38 linux kernel: hda: start_power_step(step: 1000)
Jan 17 11:47:38 linux kernel: hda: complete_power_step(step: 1000, stat: 50, err: 0)
Jan 17 11:47:38 linux kernel: hda: start_power_step(step: 1001)
Jan 17 11:47:38 linux kernel: hda: completing PM request, resume
Jan 17 11:47:38 linux kernel: hdc: Wakeup request inited, waiting for !BSY...
Jan 17 11:47:38 linux kernel: hdc: start_power_step(step: 1000)
Jan 17 11:47:38 linux kernel: hdc: completing PM request, resume
Jan 17 11:47:38 linux kernel: Restarting tasks... done

Comment 29 Jens Axboe 2006-01-17 07:37:59 UTC
Magnus, the last log with debug enabled looks like it succeeded, correct?
Comment 30 Forgotten User ZhJd0F0L3x 2006-01-17 08:00:50 UTC
just a guess: of course it succeeds first, but as soon as the fs wants to access somehting inside the HPA, it blows up, since the drive is no longer 40GB but only 38GB (numbers made up)
Comment 31 Jens Axboe 2006-01-17 08:08:55 UTC
Oh missed the reverted back bit, dang. The patch is supposed to reset the size to the native size, I might have messed something up that makes it crash of course. Magnus, any chance you can get a little of that oops info out? Just the EIP (name + offset) would help a lot.
Comment 32 Jens Axboe 2006-01-17 08:20:39 UTC
Created attachment 63523 [details]
Better patch for SL10

This fixes a bug in the previous patch, and also adds the support properly (the way it should be included). Magnus, can you test with this one?
Comment 33 Magnus Boman 2006-01-17 09:41:47 UTC
I'll try the latest patch in an hour or so... Just have to stop drinking beers at the pub :(
I'll let you know how the patch goes...
Also, for comment#29, the resume did not succeed
Comment 34 Magnus Boman 2006-01-17 11:36:44 UTC
First run resulted in a kernel panic before the screen was even turned on (CAPS lock is flashing and nothing else is happening)
Comment 35 Jens Axboe 2006-01-17 11:38:13 UTC
The patch didn't touch anything that early, must be something else going wrong. How are you building that kernel?
Comment 36 Magnus Boman 2006-01-17 11:43:10 UTC
I'm only building the modules in the IDE directory, then I'm doing an module_install and finally an mkinitrd.
I'll verify that I got everything correct from your patch... Is this patch suppose to be an addon to your previous one, or a replacement?

Are you on the Novell GroupWise Messenger? I'm there as mboman
Comment 37 Magnus Boman 2006-01-17 11:58:31 UTC
In /usr/src/linux/drivers/ide I execute;
make -C /usr/src/linux M=$(pwd)
make -C /usr/src/linux M=$(pwd) modules_install

Then I go to /lib/modules/2.6.13-15.7-default/extra and copy ide-*.* to;
/lib/modules/2.6.13-15.7-default/kernel/drivers/ide

Then I run mkinitrd
Comment 38 Jens Axboe 2006-01-17 12:06:55 UTC
I'm assuming you did a

# zcat /proc/config.gz > .config
# make oldconfig

before that? The more correct approach would be (after the above):

# cd /usr/src/linux
# make SUBDIRS=drivers/ide modules

then copy the drivers/ide/*.ko files on top of the ones in /lib/modules/2.6.13-15.7-default/kernel/drivers/ide, then run mkinitrd. Does that make a difference?
Comment 39 Magnus Boman 2006-01-17 12:12:45 UTC
Probably wouldn't make a differnce? But I'll try your suggestion and let you know.... My ide-disk.c and ide-io.c is now confirmed to be the same as yours according to the patch.
Comment 40 Jens Axboe 2006-01-17 12:16:17 UTC
It probably wont, I'm just more curious on whether you did the oldconfig steps correctly?

The patch is a new replacement, but you probably noticed this already.
Comment 41 Magnus Boman 2006-01-17 12:25:51 UTC
Yhea, I normally do make cloneconfig and then make prepare-all as I have to install vmware on my machine. Anyhow, it didn't make a difference and I still get the kernel panic. Not sure what happen the first when the screen didn't show anything, but... I see something about "not syncing" and "fatal not syncing" before it scrolls off he screen. The rest of the error messages are complaining about something regarding that X11 might try to access something directly.
STD still works without any issues... And as I sad before, if I keep it to one partition only, it seems to work fine. Also, the protected area seems to be disabled by the ide driver according to dmesg,
Comment 42 Jens Axboe 2006-01-17 13:47:46 UTC
I'm afraid I can't do much about it, unless you capture those messages. What if you boot to runlevel 3 (to exclude X) and add vga=ext to the boot line, are you able to see at least the call back trace?

The problem is exactly because Linux has disabled the host protected area, but when you suspend the machine likely enables it again. So if the resume doesn't disable it, you can't reach the last few gigabytes of the drive, which the file system doesn't really like very much.
Comment 43 Holger Macht 2006-02-13 10:43:00 UTC
Anything new here, Magnus?
Comment 44 Magnus Boman 2006-02-13 10:50:45 UTC
Yes, sort of... I finally got my hands on a spare T41P so that I can play with this. I installed SL10.1B3 today and tried the STR and it works... I will reinstall that machine tomorrow to make sure I have the same setup (partitioning etc).
Comment 45 Magnus Boman 2006-02-14 22:16:10 UTC
This works fine with SL10.1B3. I'm happy to close this bug report if you are :)
Comment 46 Holger Macht 2006-02-15 07:04:09 UTC
Thanks, closing...