Bug 150189 - ICH2 / ST3120023A hangs post beta3 during IDE initialization
Summary: ICH2 / ST3120023A hangs post beta3 during IDE initialization
Status: RESOLVED WONTFIX
Alias: None
Product: openSUSE 10.2
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Alpha 1
Hardware: i386 Other
: P5 - None : Major (vote)
Target Milestone: ---
Assignee: Thomas Renninger
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-11 21:44 UTC by Andreas Kleen
Modified: 2006-07-17 18:05 UTC (History)
1 user (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Kleen 2006-02-11 21:44:55 UTC
i updated a beta3 xeon 32bit box to 2.6.15-rc2-git5-3 from stable. It has one
IDE disk in a ide change box as hdc (main file systems are on SCSI). Now the box doesn't boot anymore.

When booting with an earlier kernel the problem disappears, so it must be some kernel change

Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SCSI subsystem initialized
md: raid1 personality registered for level 1
ICH2: IDE controller at PCI slot 0000:00:1f.1
ICH2: chipset revision 4
ICH2: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:pio
hda: TOSHIBA DVD-ROM SD-M1612, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: ST3120023A, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
hdc: max request size: 128KiB
hdc: 234441648 sectors (120034 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
hdc: cache flushes supported
 hdc:<4>hdc: dma_timer_expiry: dma status == 0x21
hdc: DMA timeout error
hdc: dma timeout error: status=0x50 { DriveReady SeekComplete }
ide: failed opcode was: unknown
hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: task_in_intr: error=0x04 { DriveStatusError }
ide: failed opcode was: unknown
hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: task_in_intr: error=0x04 { DriveStatusError }
ide: failed opcode was: unknown
hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: task_in_intr: error=0x04 { DriveStatusError }
ide: failed opcode was: unknown
hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: task_in_intr: error=0x04 { DriveStatusError }
ide: failed opcode was: unknown
ide1: reset: success
hdc: lost interrupt
hdc: lost interrupt
... continues with more lost interrupts ....

With barrier=off it goes past that point, but eventually generates
the same timeouts - but at least the system continues booting very slowly.

I tried to work around it with hdc=noprobe etc
but that didn't work either (will file a new bug for that)

Also there is another issue -  in the non barrier=off case the timeouts continue for a long time (i let it run for > 5 minutes at some point). That's broken - it should eventually really timeout and stop trying to access the disk. I will open
a separate bug for that.
Comment 1 Jens Axboe 2006-02-13 10:46:13 UTC
Must be the ACPI update...
Comment 2 Thomas Renninger 2006-02-22 11:40:48 UTC
When barrier=off makes the machine boot (even if slowly), how can that be ACPI?
Only thing ACPI can break at this point is suggesting a wrong interrupt, possibly some wrong IO/mem resources offered, but I doubt that?
The machine should also not boot with barrier=off then, Jens?
If it is really ACPI then pci=noacpi should make it work. Please provide acpidump output then.
Comment 3 Andreas Kleen 2006-02-22 11:43:07 UTC
I agree with Thomas. This has nothing to do with ACPI.
Comment 4 Jens Axboe 2006-02-22 12:07:44 UTC
Andi, you said you still get interrupt timeouts with barrier=off, so clearly something isn't working correctly. With barriers we do some cache flushes that then also time out, so it slows things down some more. Perhaps there's something to be done for error recovery there, but the fundemental problem here seems to be that we have irq delivery problems.

I'd suggest to treat this bug as such.
Comment 5 Andreas Kleen 2006-02-22 12:43:08 UTC
The IDE interrupts work just fine - otherwise the CD access wouldn't work.
Comment 6 Thomas Renninger 2006-02-22 14:11:16 UTC
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: ST3120023A, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
The interrupts look sane? After proposing the irqs to the driver ACPI is out of the game...
Isn't ICH2 quite old? Maybe laptopteam has something similar and could verify whether it's a ich2 related problem, Seife? Not sure whether it's worth it...
Comment 7 Thomas Renninger 2006-03-01 09:25:13 UTC
This starts to become a sleeper ...
How shall we proceed here?
Andi, if you still get the interrupt errors with pci=noacpi or noapic/apic I am quite sure it's not ACPI.
Comment 8 Thomas Renninger 2006-07-17 14:55:11 UTC
now it's too late anyway...
Comment 9 Forgotten User ZhJd0F0L3x 2006-07-17 18:05:01 UTC
well, if the problem still exists, we surely need to fix it sooner or later, don't we?