Bugzilla – Bug 151517
FSC Lifebook does not resume
Last modified: 2006-03-17 07:56:27 UTC
FSC Lifebooks C-series (I have an C1320 here) doesn't resume properly. It definitely needs the SATA ACPI support (cf bugzilla #142375), but even with that it doesn't resume properly. Well, actually it does resume, but the disk stalls; the _GTF commands are send to the device but the device never responds. So appearently there is some initialisation missing...
This is what I get during bootup: ata1: SATA link up 1.5 Gbps (SStatus 113) ata1: dev 0 cfg 00:045a 49:2f00 82:346b 83:7f09 84:6063 85:3469 86:3f09 87:6063 88:203f 93:0000 ata1: dev 0 ATA-7, max UDMA/100, 78140160 sectors: LBA48 ata_acpi_push_id: ap->id: 1, ix = 0, port#: 0, hard_port#: 0 sata_get_dev_handle: SATA dev addr=0x1f0002, handle=0xdf672e28 :\_SB_.PCI0.SATA: matches pcidevfn (0x1f0002) GOT ONE: (\_SB_.PCI0.SATA.PRT0) root_port = 0x0, port_num = 0xffff GOT ONE: (\_SB_.PCI0.SATA.PRT0) root_port = 0x0, port_num = 0xffff THIS ^^^^^ is the requested SATA drive (handle = 0xdf6672a4) GOT ONE: (\_SB_.PCI0.SATA.PRT2) root_port = 0x2, port_num = 0xffff ata1: dev 0 configured for UDMA/100 ata_acpi_exec_tfs: ENTER: ata_acpi_exec_tfs: call get_GTF, ix=0 do_drive_get_GTF: ENTER: ap->id: 1, port#: 0, hard_port#: 0 sata_get_dev_handle: SATA dev addr=0x1f0002, handle=0xdf672e28 do_drive_get_GTF: returning gtf_length=14, gtf_address=0xded46834, obj_loc=0xded 46824 ata_acpi_exec_tfs: call set_taskfiles, ix=0 do_drive_set_taskfiles: ENTER: ap->id: 1, port#: 0, hard_port#: 0 do_drive_set_taskfiles: total GTF bytes=14 (0xe), gtf_count=2, addr=0xded46834 taskfile_load_raw: (0x1f1-1f7): hex: 10 03 00 00 00 a0 ef call ata_exec_internal: taskfile_load_raw: (0x1f1-1f7): hex: 00 00 00 00 00 a0 f5 call ata_exec_internal: ata_acpi_exec_tfs: call get_GTF, ix=1 do_drive_get_GTF: ENTER: ap->id: 1, port#: 0, hard_port#: 0 do_drive_get_GTF: ERR: ata_dev_present: 0, PORT_DISABLED: 0 ata_acpi_exec_tfs: get_GTF error (-19) ata_acpi_exec_tfs: ret=-19 scsi0 : ahci
It's sending first a 'SET FEATURES' command, with subcommand 'Set CFA Power 1', followed by some vendor specific command '0xf5'.
Created attachment 68857 [details] sata-acpi-fsc-debug.patch Patch to call ata_set_xfermode _after_ the _GTF methods.
When using the above patch, I'm actually able to resume the notebook under ideal conditions (ie syslog & hal off, rmmod ipw2200, runlevel 3). I'm using these commands: echo platform > /sys/power/disk echo disk > /sys/power/state Jens, given that the _GTF method might be unlocking the device, shouldn't we call set_xfermode after _GTF?
Hannes, yep makes sense. We should always execute the machine specific parts first, then do what we need after that.
Created attachment 69602 [details] DSDT.dsl Disassembled DSDT from the FSC laptop.
Created attachment 69656 [details] enable ACPI via AML debug statement With this patch you can debug the GTF function without tons of ACPI debug garbage: Put a: Store("acpi_dbg_level=0xFFFFF", debug) (or whatever debug value) at the beginning of the GTF function and a Store("acpi_dbg_level=0xF", debug) at the end. ials -sa dsdt.dsl cp DSDT.aml /etc/DSDT.aml vi /etc/sysconfig/kernel -> ACPI_DSDT="/etc/DSDT.aml" mkinitrd Good luck.
enable higher ACPI debug tracing via AML debug statement describes it better ... You must have ACPI_DEBUG=y comipled in!
Hmm. Doesn't appear to be the fault of the ACPI patches. The machine _does_ resume properly, but upon resume the drive locks up. Either I get a message like: ata1: resume device ata_acpi_exec_tfs: ENTER: ata_acpi_exec_tfs: call get_GTF, ix=0 do_drive_get_GTF: ENTER: ap->id: 1, port#: 0, hard_port#: 0 sata_get_dev_handle: SATA dev addr=0x1f0002, handle=0xdf66e1a8 do_drive_get_GTF: returning gtf_length=14, gtf_address=0xdea6c1d0, obj_loc=0xdea 6c1c0 ata_acpi_exec_tfs: call set_taskfiles, ix=0 do_drive_set_taskfiles: ENTER: ap->id: 1, port#: 0, hard_port#: 0 do_drive_set_taskfiles: total GTF bytes=14 (0xe), gtf_count=2, addr=0xdea6c1d0 taskfile_load_raw: (0x1f1-1f7): hex: 10 03 00 00 00 a0 ef call ata_qc_issue: Uhhuh. NMI received. Dazed and confused, but trying to continue You probably have a hardware problem with your RAM chips ata1: port reset, p_is 24000000 is 1 pis 0 cmd 4017 tf d0 ss 113 se 0 taskfile_load_raw: (0x1f1-1f7): hex: 00 00 00 00 00 a0 f5 call ata_qc_issue: ata1: port reset, p_is 24000000 is 1 pis 0 cmd 4017 tf 1d0 ss 113 se 0 and the entire device is gone after that, or it's more like: ata1: resume device ata_acpi_exec_tfs: ENTER: ata_acpi_exec_tfs: call get_GTF, ix=0 do_drive_get_GTF: ENTER: ap->id: 1, port#: 0, hard_port#: 0 pata_get_dev_handle: enter: dev->bus_id='0000:00:1f.2' pata_get_dev_handle: dev_handle: 0xdf66e1a8, parent_handle: 0xdf67c628 pata_get_dev_handle: for dev=0x1f.2, addr=0x1f0002, parent=0xdf640800, *handle= 0xdf66e1a8 do_drive_get_GTF: returning gtf_length=14, gtf_address=0xdc2223f0, obj_loc=0xdc 2223e0 ata_acpi_exec_tfs: call set_taskfiles, ix=0 do_drive_set_taskfiles: ENTER: ap->id: 1, port#: 0, hard_port#: 0 do_drive_set_taskfiles: total GTF bytes=14 (0xe), gtf_count=2, addr=0xdc2223f0 taskfile_load_raw: (0x1f1-1f7): hex: 10 03 00 00 00 a0 ef call ata_qc_issue: taskfile_load_raw: (0x1f1-1f7): hex: 00 00 00 00 00 a0 f5 call ata_qc_issue: ata_acpi_exec_tfs: call get_GTF, ix=1 do_drive_get_GTF: ENTER: ap->id: 1, port#: 0, hard_port#: 0 do_drive_get_GTF: ERR: ata_dev_present: 0, PORT_DISABLED: 0 ata_acpi_exec_tfs: get_GTF error (-19) ata_acpi_exec_tfs: ret=-19 ata1: dev 0 configured for UDMA/100 Restarting tasks... done ReiserFS: warning: is_tree_node: node level 8240 does not match to the expected one 4 and the machine continues operation; only reiserfs is quite unhappy and switches to read-only. I'm a bit at a loss how to debug this further. Vojtech, any ideas?
As the patch should be ready by the end of the month, this is getting somewhat urgent.
Well, I'm no ACPI expert nor SATA expert, so I'm not sure why you've chosen me to help. ;) Anyway, from the above output it seems that the ACPI BIOS is asking sata_acpi to send two commands to the drive, of which the second fails badly, either confusing the BIOS (1st case) or only the drive (2nd case). I guess Jens Axboe could help you by looking at what the commands are and why they confuse the drive. Jens, any idea?
This seems to be the problem: upon resume, the status is: ata1: enter resume, p_is 0 is 0 pis 0 cmd 4017 tf 50 ss 113 se 0 then I issue a reset and the status is: ata1: exit resume, p_is 0 is 0 pis 0 cmd 4017 tf 150 ss 113 se 0
Hannes: That enter/exit resume info didn't tell me anything. What does it tell you?
It tells me that the disk's gone south. tf bit 8 means disk error.
Ok. Finally nailed him. It turns out that AHCI is not correctly initialised after all. There are four registers (PxCLB, PxCLBU, PxFB, PxFBU) which contain the addresses for the DMA memory space. And of course these addresses are not necessarily identical across reboots. IOW they might be different before and after a resume. So we _need_ to update them after a resume. What I've actually done was to stop DMA entirely upon suspend (via ahci_port_stop()) and re-enable it upon resume (via ahci_port_start()). Those two function ensure that the said registers are always up-to-date.
Encouraging news Hannes, the stop/start of the port sounds like a correct fix too.
I think it would be best to split out the read of the address and initializing from the port_stop/start routines and call these on suspend and resume. There's no reason to free and reallocate memory on suspend/resume, that might just get us into other problems (and it can't be GFP_KERNEL) currently either from that path.
Ah. Like this? static void ahci_port_suspend(struct ata_port *ap) { struct device *dev = ap->host_set->dev; struct ahci_port_priv *pp = ap->private_data; void *mmio = ap->host_set->mmio_base; void *port_mmio = ahci_port_base(mmio, ap->port_no); u32 tmp; tmp = readl(port_mmio + PORT_CMD); tmp &= ~(PORT_CMD_START | PORT_CMD_FIS_RX); writel(tmp, port_mmio + PORT_CMD); readl(port_mmio + PORT_CMD); /* flush */ /* spec says 500 msecs for each PORT_CMD_{START,FIS_RX} bit, so * this is slightly incorrect. */ msleep(500); } static void ahci_port_stop(struct ata_port *ap) { ahci_port_suspend(ap); ap->private_data = NULL; dma_free_coherent(dev, AHCI_PORT_PRIV_DMA_SZ, pp->cmd_slot, pp->cmd_slot_dma); kfree(pp); }
Yep that looks good, and ditto for ahci_port_start() of course, I guess you'll add ahci_port_resume() for that? The msleep() might be troublesome too, which is a little unfortunate since it's a quite long sleep. Doesn't it complain about non-atomic sleep with that called from the suspend path?
Not really (this is 10.0. That doesn't complain). How should we replace the msleep?
mdelay looks like a candidate.
It might need to become mdelay() - however that is a little nasty on module load, as it will busy block for half a second... It's a little ugly, but we could pass in a 'int is_resume/suspend' parameter and use mdelay() then and msleep() for the module load path.
Even better; the spec actually says what we should wait for. So we can use udelay(10) within the wait loop and everything should be ok. I'll attach the patch.
Created attachment 70613 [details] sata-acpi-suspend.patch Patch to enable suspend to AHCI w/ ACPI integration. Not sure whether the ACPI integration is actually needed, but they certainly don't seem to cause any harm. This is mostly a backport of Randy's original patch with the ahci suspend / resume stuff done by me.
So the comment is incorrect? 500ms is a huge amount of time.
Yes. The spec says: Software is allowed to manipulate PxCMD.FRE so that it may move the FIS receive area to a new location. When this bit is cleared to ‘0’, software must first wait for PxCMD.FR to clear to ‘0’, indicating that the DMA engine for FIS reception is in an idle condition. When PxCMD.FR and PxCMD.FRE are both cleared to ‘0’, software may update the values of PxFB and PxFBU. Prior to setting PxCMD.FRE to ‘1’, software shall ensure that PxFB and PxFBU are set to valid values. Software shall not write PxFB and PxFBU while PxCMD.FRE is set to ‘1’. Might be that there are some timing things somewhere else, but this looks far more reliable.
Overall that patch looks like a good thing. Can you please split it up, though? Lets keep Randys patch separate, and do the suspend/resume fixes separately from the HPA stuff. The udelay(10) splitup and check look good.
I'll rediff them with libata-dev and post them to linux-ide. But that's something I wanted to do anyway, as the ahci reset stuff is too important to _not_ have it.
Adjusting the product.
Don't the ahci pci_driver .suspend and .resume method pointers need to be added also? At least they are not yet in mainline.
Yes they do, I think that's already in our tree. Hannes, any reason it's not part of this rollup?
Yes, it is. It's been your previous patch (ahci-suspend) which included those hooks already. The mainline patch will have them.
So I think we can close this as it works as designed now.
Created attachment 73634 [details] ahci-update.patch AHCI update backported from mainline.
There is actually a small glitch in the patch from comment #25; the spec mandates that the DMA should only be turned on after port probing. Hence we might get those spurious 'ataX; dma not running' during booting. The above patch fixes this. I've also backported the mainline version as it's far clearer by design and does have more reliable error messages. It helps to track down errors (like a missing BIOS reset :-) far better than the original one.