Bugzilla – Bug 150189
ICH2 / ST3120023A hangs post beta3 during IDE initialization
Last modified: 2006-07-17 18:05:01 UTC
i updated a beta3 xeon 32bit box to 2.6.15-rc2-git5-3 from stable. It has one IDE disk in a ide change box as hdc (main file systems are on SCSI). Now the box doesn't boot anymore. When booting with an earlier kernel the problem disappears, so it must be some kernel change Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx SCSI subsystem initialized md: raid1 personality registered for level 1 ICH2: IDE controller at PCI slot 0000:00:1f.1 ICH2: chipset revision 4 ICH2: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:pio hda: TOSHIBA DVD-ROM SD-M1612, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hdc: ST3120023A, ATA DISK drive ide1 at 0x170-0x177,0x376 on irq 15 hdc: max request size: 128KiB hdc: 234441648 sectors (120034 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100) hdc: cache flushes supported hdc:<4>hdc: dma_timer_expiry: dma status == 0x21 hdc: DMA timeout error hdc: dma timeout error: status=0x50 { DriveReady SeekComplete } ide: failed opcode was: unknown hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error } hdc: task_in_intr: error=0x04 { DriveStatusError } ide: failed opcode was: unknown hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error } hdc: task_in_intr: error=0x04 { DriveStatusError } ide: failed opcode was: unknown hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error } hdc: task_in_intr: error=0x04 { DriveStatusError } ide: failed opcode was: unknown hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error } hdc: task_in_intr: error=0x04 { DriveStatusError } ide: failed opcode was: unknown ide1: reset: success hdc: lost interrupt hdc: lost interrupt ... continues with more lost interrupts .... With barrier=off it goes past that point, but eventually generates the same timeouts - but at least the system continues booting very slowly. I tried to work around it with hdc=noprobe etc but that didn't work either (will file a new bug for that) Also there is another issue - in the non barrier=off case the timeouts continue for a long time (i let it run for > 5 minutes at some point). That's broken - it should eventually really timeout and stop trying to access the disk. I will open a separate bug for that.
Must be the ACPI update...
When barrier=off makes the machine boot (even if slowly), how can that be ACPI? Only thing ACPI can break at this point is suggesting a wrong interrupt, possibly some wrong IO/mem resources offered, but I doubt that? The machine should also not boot with barrier=off then, Jens? If it is really ACPI then pci=noacpi should make it work. Please provide acpidump output then.
I agree with Thomas. This has nothing to do with ACPI.
Andi, you said you still get interrupt timeouts with barrier=off, so clearly something isn't working correctly. With barriers we do some cache flushes that then also time out, so it slows things down some more. Perhaps there's something to be done for error recovery there, but the fundemental problem here seems to be that we have irq delivery problems. I'd suggest to treat this bug as such.
The IDE interrupts work just fine - otherwise the CD access wouldn't work.
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hdc: ST3120023A, ATA DISK drive ide1 at 0x170-0x177,0x376 on irq 15 The interrupts look sane? After proposing the irqs to the driver ACPI is out of the game... Isn't ICH2 quite old? Maybe laptopteam has something similar and could verify whether it's a ich2 related problem, Seife? Not sure whether it's worth it...
This starts to become a sleeper ... How shall we proceed here? Andi, if you still get the interrupt errors with pci=noacpi or noapic/apic I am quite sure it's not ACPI.
now it's too late anyway...
well, if the problem still exists, we surely need to fix it sooner or later, don't we?