Bugzilla – Bug 113234
SCSI driver atp870u: fatal I/O ERRORS
Last modified: 2005-09-01 09:34:26 UTC
The combination of ACARD AEC-67160 SCSI host and hard disk: QUANTUM ATLAS10K2-TY184L (driver: atp870u V2.6+ac) does not work anymore with SUSE 10.0 beta3 (error msgs attached). Verified both on my workstation (Dell Precision WS340) and a new HP ProLiant ML310 G2. Immediately afterwards, the SCSI host and disk did a good bonnie test under the SuSE 9.0 update kernel 2.4.21-266, so it's not the fault of the hardware. Going to attach the error messages I get upon the first attempt to access the disk during SUSE 10.0b3 installation...
Created attachment 47722 [details] /var/log/boot.msg of SUSE 10.0b3 installation boot.msg is intended to document the normal boot process. No error messages yet in here.
Created attachment 47723 [details] dmesg output upon the SCSI I/O ERRORS This is the collection of error msgs issued by the atp870u driver. It coincided with the first mkfs attempt on a /dev/sda partition. Needless to say the installation got stuck here.
I could attempt to start a SUSE 9.1 or 9.3 installation on this hardware in order to narrow down the moment it wouldn't work anymore. If you're interested please say so. Ditto for more logs (hwinfo etc.).
Jens, any idea?
Klaus, if you could try 9.1 through 9.3 it would be very helpful.
Tested on the HP ProLiant ML310 G2: all is well with 9.1: installation kernel 2.6.4-52 9.3: installation kernel 2.6.11.4-20a In each case I copied a few hundred MB from the installation medium to the hard disk in question without encountering any problems or error msgs. On the other hand, with 10.0b3, not even the "Installation -- safe settings" kernel parameters (acpi=off barrier=off ... ) don't prevent messages as attached in comment #2 and the subsequent hang.
(sorry, forgot to tag previous comment as "provides needed info")
Raising severity to "critical" after consulting QA (taking also into account the up-coming SLES SP3).
SP3 is not affected, this looks like a bug introduced in 2.6.12 or 2.6.13. I'm attaching a patch for testing,
Created attachment 48308 [details] Remove the huge diff from 2.6.12 2.6.12 introduced support for a new card, lets test if it broke the older card in the process. Please test a kernel with this applied!
OK, tested in the following way: - used kernel-source-2.6.13_rc6_git13-4 from SL 10.0b3 - patched drivers/scsi/atp870u.[ch] using patch from comment #10 - retrieved /proc/config.gz from running kernel 2.6.13-rc6-git13-4-default (from SL 10.0b3) - built the modules from drivers/scsi according to instructions in www.suse.de/~agruen/kernel-doc/; retrieved drivers/scsi/atp870u.ko - Started a new "manual" SL 10.0b3 installation on HP ProLiant ML310 G2 using mini-ISO CD ("manual" in order to avoid premature loading of new, bad atp870u driver which would mess things up immediately) - Loaded modules for SCSI (mptscsih) and Ethernet (tg3). - Started installation via ftp from dist.suse.de - As soon as the "Software Agreement Request" popped up (prior to HW detection and driver loading!): loaded patched atp870u.ko module manually. - Then "yes" to "Software Agreement Request". - Now YaST2 started, did HW detection, found all disks. - Selected again the disk connected to the AEC67160 for installation. _Result_: this time disk access works with no errors, installation is going smoothly (currently still in progress). I'll add more info about the AEC67160 SCSI controller in a moment...
Created attachment 48328 [details] Tarball: lspci, hwinfo about AEC67160 on ProLiant ML310G2
Ok, so that is promising, The question is how to proceed with this. We don't even know if support for the new card works, so I'd be inclined to back out the patch so we at least don't have regressions in this area and report the issue to the acard maintainer. Andreas, what do you think?
Agreed (unsurprisingly, since I'd profit from your proposal ;-). Of course, I don't have the authority to decide anything, though. Add-on note: Here, at least, no more recent atp870u hardware seems available while, on the other hand, the AEC67160 even got a SuSE certification once.
Jens, go ahead.
Done, committed.