Bug 145759 - Installation not possible on SCSI-system; Module aic7xxx
Summary: Installation not possible on SCSI-system; Module aic7xxx
Status: RESOLVED FIXED
Alias: None
Product: SUSE Linux 10.1
Classification: openSUSE
Component: Installation (show other bugs)
Version: Beta 2
Hardware: x86 Other
: P5 - None : Major (vote)
Target Milestone: ---
Assignee: Hannes Reinecke
QA Contact: Klaus Kämpf
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-26 10:07 UTC by Forgotten User jstLBAkSfa
Modified: 2006-02-20 11:02 UTC (History)
3 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
dmesg output (120.75 KB, text/plain)
2006-02-01 17:07 UTC, Thomas Roelz
Details
aic7xxx-disable-tcq-fix (2.09 KB, patch)
2006-02-03 10:00 UTC, Hannes Reinecke
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Forgotten User jstLBAkSfa 2006-01-26 10:07:33 UTC
After selecting "Installation" in the boot menu of CD1 of Beta2, the installation starts but after the entry "starting yast" nothing else happens. (On Beta1 it went very slowly a bit further, but then stopped latest when writing the partition table).

Hardware: 

Dual Pentium III 

Output of lspci:
-------
00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 03)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 03)
00:04.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02)
00:04.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:04.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 01)
00:04.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
00:06.0 SCSI storage controller: Adaptec AHA-2940U2/U2W / 7890/7891
00:0a.0 SCSI storage controller: LSI Logic / Symbios Logic 53c895 (rev 01)
00:0b.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)
00:0c.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)
01:00.0 VGA compatible controller: nVidia Corporation NV4 [RIVA TNT] (rev 04)
--------


Output of dmesg has repeatedly the following entries:
--------
sr 0:0:6:0: Device is active, asserting ATN
Recovery code sleeping
Recovery code awake
Timer Expired
aic7xxx_abort returns 0x2003
sd 0:0:0:0: Attempting to queue an ABORT message
CDB: 0x28 0x0 0x0 0x0 0x0 0x40 0x0 0x0 0x80 0x0
scsi0: At time of recovery, card was not paused
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi0: Dumping Card State in Command phase, at SEQADDR 0xb8
Card was paused
ACCUM = 0x80, SINDEX = 0xa0, DINDEX = 0xe4, ARG_2 = 0x2
HCNT = 0xc SCBPTR = 0x7
SCSISIGI[0x54]:(BSYI|ATNI|IOI) ERROR[0x0] SCSIBUSL[0x0]
LASTPHASE[0x80]:(CDI) SCSISEQ[0x12]:(ENAUTOATNP|ENRSELI)
SBLKCTL[0xa]:(SELWIDE|SELBUSB) SCSIRATE[0x15]:(SINGLE_EDGE)
SEQCTL[0x10]:(FASTMODE) SEQ_FLAGS[0x0] SSTAT0[0x0]
SSTAT1[0x3]:(REQINIT|PHASECHG) SSTAT2[0x50]:(EXP_ACTIVE|SHVALID)
SSTAT3[0x8] SIMODE0[0x8]:(ENSWRAP) SIMODE1[0xac]:(ENSCSIPERR|ENBUSFREE|ENSCSIRST|ENSELTIMO)
SXFRCTL0[0x80]:(DFON) DFCNTRL[0x24]:(DIRECTION|SCSIEN)
DFSTATUS[0x80]:(PRELOAD_AVAIL)
STACK: 0x0 0x167 0x17d 0x35
SCB count = 12
Kernel NEXTQSCB = 5
Card NEXTQSCB = 4
QINFIFO entries: 4
Waiting Queue entries:
Disconnected Queue entries:
QOUTFIFO entries:
Sequencer Free SCB List: 6 5 4 3 2 0 1 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Sequencer SCB Info:
 0 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7]
SCB_LUN[0x0] SCB_TAG[0xff]
  1 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7]
SCB_LUN[0x0] SCB_TAG[0xff]
  2 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7]
SCB_LUN[0x0] SCB_TAG[0xff]
  3 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7]
SCB_LUN[0x0] SCB_TAG[0xff]
  4 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7]
SCB_LUN[0x0] SCB_TAG[0xff]
  5 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7]
SCB_LUN[0x0] SCB_TAG[0xff]
  6 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7]
SCB_LUN[0x0] SCB_TAG[0xff]
  7 SCB_CONTROL[0x40]:(DISCENB) SCB_SCSIID[0x67] SCB_LUN[0x0]
SCB_TAG[0xb]
 8 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
  9 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 10 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 11 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 12 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 13 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 14 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 15 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 16 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 17 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 18 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 19 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 20 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 21 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 22 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 23 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 24 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 25 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 26 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 27 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 28 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 29 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 30 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
 31 SCB_CONTROL[0x0] SCB_SCSIID[0xff]:(TWIN_CHNLB|OID|TWIN_TID)
SCB_LUN[0xff]:(SCB_XFERLEN_ODD|LID) SCB_TAG[0xff]
Pending list:
  4 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) SCB_SCSIID[0x7]
SCB_LUN[0x0]
 11 SCB_CONTROL[0x40]:(DISCENB) SCB_SCSIID[0x67] SCB_LUN[0x0]
Kernel Free SCB list: 6 0 3 7 2 1 10 9 8
<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
scsi0:0:0:0: Cmd aborted from QINFIFO
aic7xxx_abort returns 0x2002
sd 0:0:0:0: Attempting to queue an ABORT message
CDB: 0x0 0x0 0x0 0x0 0x0 0x0
scsi0: At time of recovery, card was not paused
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi0: Dumping Card State in Command phase, at SEQADDR 0xb8
Card was paused
ACCUM = 0x80, SINDEX = 0xa0, DINDEX = 0xe4, ARG_2 = 0x2
HCNT = 0xc SCBPTR = 0x7
SCSISIGI[0x54]:(BSYI|ATNI|IOI) ERROR[0x0] SCSIBUSL[0x0]
LASTPHASE[0x80]:(CDI) SCSISEQ[0x12]:(ENAUTOATNP|ENRSELI)
SBLKCTL[0xa]:(SELWIDE|SELBUSB) SCSIRATE[0x15]:(SINGLE_EDGE)
SEQCTL[0x10]:(FASTMODE) SEQ_FLAGS[0x0] SSTAT0[0x0]
SSTAT1[0x3]:(REQINIT|PHASECHG) SSTAT2[0x50]:(EXP_ACTIVE|SHVALID)
SSTAT3[0x8] SIMODE0[0x8]:(ENSWRAP) SIMODE1[0xac]:(ENSCSIPERR|ENBUSFREE|ENSCSIRST|ENSELTIMO)
SXFRCTL0[0x80]:(DFON) DFCNTRL[0x24]:(DIRECTION|SCSIEN)
DFSTATUS[0x80]:(PRELOAD_AVAIL)
STACK: 0x0 0x167 0x17d 0x35
SCB count = 12
Kernel NEXTQSCB = 4
Card NEXTQSCB = 5
QINFIFO entries: 5
Waiting Queue entries:
Disconnected Queue entries:
QOUTFIFO entries:
...
--------

Notes: 
Another machine with an Adaptec controler had the same issues.
Choosing safe settings etc. doesn't make a difference.
Installation of SLES9 on same hardware is no problem.
Comment 1 Martin Lasarsch 2006-01-26 10:35:36 UTC
have you tried with aic7xxx_old?
Comment 2 Forgotten User jstLBAkSfa 2006-01-26 10:44:13 UTC
No - what do I have to enter at the boot prompt to use that module?

Comment 3 Martin Lasarsch 2006-01-26 10:58:44 UTC
boot with manual=1 and select aic7xxx_old at the modules list. look also at loaded modules after that if it loaded and not aic7xxx.
Comment 4 Forgotten User jstLBAkSfa 2006-01-26 13:04:47 UTC
aic7xxx_old worked to some extent. 

After unloading aic7xxx and loading aic7xxx_old YaST started up okay and I could partition the harddisk etc. Software installation proceeded as expected, but the machine didn't start up after reboot. I was offered a shell prompt only.

I started the installation again (as the rescue system didn't seem to work) and modified the initrd (changed to console 2, mounted the root partition, chrooted to it, edited /etc/sysconfig/kernel to include aic7xxx_old, called mkinitrd, exited, unmounted root partition, reset). 

This helped a bit, the machine found the root partition and mounted it. However a little later there was a kernel panic, the output looked to me like it tried to load aic7xxx anyway and then failed. (But maybe aic7xxx_old also uses aic7xxx in its output, so it could also have been the aic7xxx_old module.)
Comment 5 Martin Lasarsch 2006-01-26 13:39:05 UTC
any chance to capture the panic?
Comment 6 Forgotten User jstLBAkSfa 2006-01-26 14:03:05 UTC
The last messages on the screen (typed off by hand, so might have some typos):

aic7xxx: PCI0:6:0 Mem Region 0xdd800000 unavailable. Cannot memory map device
aic7xxx: PCI0:6:0 IO Region 0xdd000[0..255] unavailable. Cannot map device
aic7xxx: probe of 0000:00:06.0 failed with error -12

(scsi0) BRKADRINT error (0xff)
 Illegal Host Access
 Illegal Sequence Address referenced
 Illegal Opcode in sequencer program
 Sequencer Ram Parity Error
 Data Path Ram Parity Error
 Scratch Ram/SCB Array Ram Parity Error
 PCI Error detected
(scsi0) SEQADDR=0xff
Kernel panic - not syncing: aic7xxx unrecoverable BRKADRINT

(Note: this is not from the dual processor machine, but the other one. Don't know if that makes any difference.)
Comment 7 Forgotten User jstLBAkSfa 2006-01-26 14:38:00 UTC
(scsi0) SEQADDR=0xff above should read
(scsi0) SEQADDR=0x1ff
Comment 8 Martin Lasarsch 2006-01-26 14:46:24 UTC
Hannes: any ideas?
Comment 9 Thomas Roelz 2006-02-01 17:07:50 UTC
Created attachment 66055 [details]
dmesg output

Encounterd this bug too. System (PIII-SCSI) worked with
all previous distros up to 10.0. Obviously this bug is
not too seldom.

Piped dmesg output to attached file.
Comment 10 Hannes Reinecke 2006-02-02 07:29:31 UTC
yeah, I know. aic7{x,9}xx are notoriously bad at error recovery.
Quite surprisingly, given the amount of code involved :-(

Anyway, if you want to use aic7xxx_old be sure to enter an alias in /etc/modprobe.conf, otherwise udev will load the wrong adapter.

I see what I can do for fixing this.
Comment 11 Hannes Reinecke 2006-02-03 10:00:14 UTC
Created attachment 66313 [details]
aic7xxx-disable-tcq-fix

Setting default queue depth to '1' instead of '2' if tagged queueing is disabled.
Comment 12 Hannes Reinecke 2006-02-03 10:03:42 UTC
I have found an error (or at least a questionable behaviour) in the aic7xxx driver; it sets the queue depth to '2' if tagged queueing is disabled.
Hence the midlayer might send two commands simultaneously which can confuse the driver and / or the device in question.

If you have the means to compile a kernel yourself please test the above patch. Otherwise please wait for Beta4 which will include the patch.

Oh, and please post the results.
Your patience is very much appreciated here, as the aic7xxx driver occasionally moves in mysterious ways.
Comment 13 Hannes Reinecke 2006-02-06 12:54:55 UTC
Well, actually I've found some more errors (bugzilla #148061). A fix will be in Beta4. Can you please re-test and check whether the problem is fixed?
Comment 14 Eric Whiting 2006-02-08 00:00:57 UTC
Sun W2100z dual opteron boxes -- same issue with 10.1 beta2 and beta3 (64 bit versions)

I booted the mini iso. same issue. I tried the CD1 with manual=1 -- once I loaded the aic7* module the box hung. 

Actually the 32bit version of beta3 almost worked, but not quite. 

I'll try the aic*_old module and see what happens. 
Comment 15 Eric Whiting 2006-02-08 00:02:06 UTC
More details:

13:04.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
        Subsystem: Sun Microsystems Computer Corp.: Unknown device 534d
        Flags: bus master, 66MHz, slow devsel, latency 72, IRQ 169
        I/O ports at 4400 [disabled] [size=256]
        Memory at 00000000ea000000 (64-bit, non-prefetchable) [size=8K]
        I/O ports at 4000 [disabled] [size=256]
        [virtual] Expansion ROM at 00000000e0200000 [disabled] [size=512K]
        Capabilities: [dc] Power Management version 2
        Capabilities: [a0] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
        Capabilities: [94] PCI-X non-bridge device

13:04.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
        Subsystem: Sun Microsystems Computer Corp.: Unknown device 534d
        Flags: bus master, 66MHz, slow devsel, latency 72, IRQ 177
        I/O ports at 4c00 [disabled] [size=256]
        Memory at 00000000ea010000 (64-bit, non-prefetchable) [size=8K]
        I/O ports at 4800 [disabled] [size=256]
        [virtual] Expansion ROM at 00000000e0280000 [disabled] [size=512K]
        Capabilities: [dc] Power Management version 2
        Capabilities: [a0] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
        Capabilities: [94] PCI-X non-bridge device
Comment 16 Eric Whiting 2006-02-08 14:05:32 UTC
I tried aic7xxx_old.
It gives and error -1 No such device.

I tried loading the aic* modules in a different order. I'm not sure this is signficant, but first I load amd74xx (for ide), next I load aic7xxx. 

Next when I load aic79xx I get something like:

BUG spinlock recursion swapper/0
spinlock lockup. 

I was able to replicate this several times.. (just don't have a console to capture the output..)
Comment 17 Andreas Jaeger 2006-02-12 13:16:52 UTC
Tom, do you have the same problem?
Comment 18 Thomas Roelz 2006-02-13 13:48:56 UTC
Sorry, no time to test this now. Machine is in Room 3.1.9.
Comment 19 Hannes Reinecke 2006-02-13 16:02:41 UTC
Thomas' machine works with KOTD.
Comment 20 Forgotten User jstLBAkSfa 2006-02-15 15:22:31 UTC
I had another try (or better several tries) with Beta 3.91, however the installation failed each time with

/usr/lib/YaST2/startup/YaST2.call: line 301: 2380 Aborted (core dumped) Y2base "Y2_MODULE_NAME" $Y2_MODE_FLAGS $Y2_MODULE_ARGS $Y2_MODE $Y2_QT_ARGS

y2log says:
[DEFINE_LOGGROUP] Exception.cc(log):83 MediaCD.cc(MediaCD):65 THROW Unsupported scheme in the URL cd:/?devices=/dev/sr0 

As I didn't see any error messages as in my first description and as it happened with aic7xxx and aic7xxx_old I could imagine that it is not related to the original issue. In any event, due to the above I cannot find out if the matter has been fixed or not. 
Comment 21 Forgotten User jstLBAkSfa 2006-02-20 10:12:40 UTC
Installing Beta4 on the hardware where I first observed the bug now went well.

Thank you for fixing this bug!
Comment 22 Hannes Reinecke 2006-02-20 11:02:58 UTC
Ah. Good. finally.