Bug 144628 - Suspect aic7xxx driver is broken.
Summary: Suspect aic7xxx driver is broken.
Status: RESOLVED WONTFIX
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: YaST2 (show other bugs)
Version: Final
Hardware: i386 SuSE Linux 10.0
: P5 - None : Normal
Target Milestone: ---
Assignee: E-mail List
QA Contact: Klaus Kämpf
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-21 21:30 UTC by Ben Phillips
Modified: 2008-06-25 09:53 UTC (History)
1 user (show)

See Also:
Found By: Customer
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Testing installation method (2.79 KB, text/plain)
2006-01-24 11:45 UTC, Ben Phillips
Details
Log files from ram disk /var/log/* (119.39 KB, application/zip)
2006-02-03 21:47 UTC, Ben Phillips
Details
log files from RAID device /mnt/var/log/* (5.32 KB, application/zip)
2006-02-03 21:48 UTC, Ben Phillips
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Phillips 2006-01-21 21:30:11 UTC
Forgive me if I have not logged this bug correctly, I am a relative newb to Linux. I can only assume that the problem I am having is indeed a bug as I cannot find a fix for this problem anywhere.

I suspect that the aic7xxx scsi drivers may be broken or incompatible with the 2.6.13-15 kernel as I cannot get SuSE linux 10.0 OSS to install in the manner that wish it to. 

I am trying to use a stripped raid0 partition that span a mix of 14 disks, 7 IDE disks (spread between the onboard IDE controller and a PCI IDE controller) and 7 scsi disks (all attached to the same AHA2940UW SCSI adaptor although there are two AHA2940 adaptors in this machine). The primary master ide disk is partitioned with a 100MB /boot partition at the start of the drive and a 1GB swap partition at the end of that drive, all blocks in between the /boot and the swap partition are deicated to the raid0 volume md0. All other disks in this machine are also dedicated to the raid0 partition md0.

At this point I should state that this partiton scheme works perfectly under both Mandriva2006 and Fedora C4, but these operating systems fall well short of SusE Linux in many ways and I would prefer not to use them if I can at all avoid it.

When I try to install SuSE Linux 10.0 OSS with this partition scheme, the yast2 installer does not ever proceed beyond the first installation disk, indeed the computer ALWAYS crashes with between 100 and 200 files remaining to be extracted from the first installation cd.

Neddless to say that I have found this problem to be quite perplexing an frustrating. I can be very confident that there is no problem with my hardware as, like I said earlier, this scheme works perfectly under other flavours of Linux. I have also spent a considerable amount of time ensuring that my hardware is not a fault by replacing the scsi adaptors with other versions of the AHA2940 Adaptec SCSI controller, and by trying several different hardware configurations (eg. 1 scsi adaptor instead of two, no scsi adaptors etc).

SuSE Linux 10.0 OSS installs and operates very well if the raid0 volume used for the root partition only span the IDE disks and not the scsi disks. Conversely it does not install at all if the root partition span the SCSI disks only and not the IDE.

I can discern no difference between using any particular combination SCSI disk over any other combination of SCSI disk. This coupled with the fact that it makes no difference which type of AHA2940 SCSI host controller I use, and the fact that this works under other Linux distro's, leads me to the conclusion that the problem is caused by software (ie drivers) and not hardware.

I have at this point not been able to retreive any log files post installation attempt which give me any further information, as I said I am a relative newb to Linux and I am still learinig my way around the operating system. I have, however been a power user under Windows for some time (not that it helps in this case).

Following is a list of the hardware in this computer:

Intel Celeron 1.7G (socket 478)
Gigabyte GA-8I848 mainboard (Intel 848 chipset)
1GB DDR RAM
1 x Gigabyte R955 (ATI 9550) AGP Graphics Card
2 x Adaptec AHA-2940UW SCSI host controllers
1 x Silicon Image Sil 0649 Ultra ATA/100 PCI to ATA host controller
2 x Surecom EP320X-R 10/100 LAN adaptors

The results of the lsmod shell command are shown below. NOTE: this is from an installation with the raid0 volume on the IDE disks only, the scsi controllers and disks are present but they are idle (not being used).

Module                  Size  Used by
joydev                  9408  0
parport_pc             38980  1
lp                     11460  0
parport                33864  2 parport_pc,lp
nls_utf8                2048  0
hfsplus                75140  0
vfat                   12800  0
fat                    49692  1 vfat
subfs                   7552  2
ipt_pkttype             1664  1
ipt_LOG                 6912  7
ipt_limit               2304  7
speedstep_lib           4228  0
freq_table              4612  0
snd_pcm_oss            59168  0
snd_mixer_oss          18944  1 snd_pcm_oss
snd_seq                51984  0
snd_seq_device          8588  1 snd_seq
button                  7056  0
battery                10244  0
ac                      5252  0
edd                     9824  0
usbhid                 43616  0
usblp                  12544  0
ide_cd                 39684  0
cdrom                  36896  1 ide_cd
sk98lin               193112  1
8139too                26112  0
snd_intel8x0           33408  1
snd_ac97_codec         90876  1 snd_intel8x0
mii                     5504  1 8139too
ehci_hcd               32136  0
snd_ac97_bus            2432  1 snd_ac97_codec
snd_pcm                93064  3 snd_pcm_oss,snd_intel8x0,snd_ac97_codec
snd_timer              24452  2 snd_seq,snd_pcm
i2c_i801                8844  0
i2c_core               20368  1 i2c_i801
snd                    60420  10 snd_pcm_oss,snd_mixer_oss,snd_seq,snd_seq_device,snd_intel8x0,snd_ac97_codec,snd_pcm,snd_timer
soundcore               9184  1 snd
snd_page_alloc         10632  2 snd_intel8x0,snd_pcm
ip6t_REJECT             5504  3
generic                 4484  0 [permanent]
intel_agp              22044  1
agpgart                33096  1 intel_agp
uhci_hcd               32016  0
shpchp                 88676  0
pci_hotplug            26164  1 shpchp
usbcore               112640  5 usbhid,usblp,ehci_hcd,uhci_hcd
ipt_REJECT              5632  3
ipt_state               1920  12
iptable_mangle          2688  0
iptable_nat            22228  0
iptable_filter          2816  1
ip6table_mangle         2304  0
ip_conntrack           42168  2 ipt_state,iptable_nat
ip_tables              19456  8 ipt_pkttype,ipt_LOG,ipt_limit,ipt_REJECT,ipt_state,iptable_mangle,iptable_nat,iptable_filter
ip6table_filter         2688  1
ip6_tables             18176  3 ip6t_REJECT,ip6table_mangle,ip6table_filter
ipv6                  242752  11 ip6t_REJECT
dm_mod                 54972  0
reiserfs              250480  2
raid0                   8448  1
fan                     4996  0
thermal                14472  0
processor              24252  1 thermal
st                     38944  0
sg                     35744  0
aic7xxx               176308  0
scsi_transport_spi     20864  1 aic7xxx
cmd64x                 11164  0 [permanent]
piix                    9988  0 [permanent]
sd_mod                 18576  0
scsi_mod              131304  5 st,sg,aic7xxx,scsi_transport_spi,sd_mod
ide_disk               17152  16
ide_core              122380  5 ide_cd,generic,cmd64x,piix,ide_disk


This problem is highly repeatable so if you need more info' (like log files) please let me know and I will attempt to retreive it.

Thankyou, 
Ben.
Comment 1 Ben Phillips 2006-01-21 22:03:32 UTC
I should also add that it makes no difference which filesystem is used (ie Reiser, xfs, ext2/3 etc) nor does it make any difference if I try to use LVM instead to create the spanned volume. The computer still locks up before completing the extraction of files from the first installation cd.
Comment 2 Martin Lasarsch 2006-01-23 12:09:08 UTC
is there a kernel panic? (keyboard leds blinking)

have you tried safe settings?

is it always at the same time?
Comment 3 Michael Gross 2006-01-23 14:07:55 UTC
Hello Ben!

If you're using the Adaptec AHA2940UW, you'll have to use the kernel module aic7xxx_old as the support for these rather old models have been moved to this module. You can set this up in /etc/sysconfig/kernel (MODULES_LOADED_ON_BOOT).
Still, this might be a problem as YaST should detect this adapter correctly. Please attach the YaST logs in /var/log/YaST, thanks.
Comment 4 Martin Lasarsch 2006-01-23 14:59:24 UTC
right, i forgot about the _old driver. To activate this during installation, boot with manual=1 or abort the installation at the first yast screen. When you are in linuxrc select Kernel-Modules/SCSI and load aic7xxx_old
Comment 5 Ben Phillips 2006-01-24 11:45:53 UTC
Created attachment 64704 [details]
Testing installation method
Comment 6 Ben Phillips 2006-01-24 11:46:36 UTC
O.K. tried your suggestions several times and varied some other settings as well, still a no go I'm sorry to say (please see the attached text file "installation method" for details). There was a significant improvement though. Instead of the computer freezing completely, this time it gave me an error code and it gave a slight variant of the same error code each time I attempted an install with a RAID0 partition for / that spanned both the IDE and SCSI disks.

Please bear in mind that this computer has functioned for nearly twelve months under windows where it had a primary NTFS partition for the OS on the primary master IDE disk and all other ide and scsi disks were grouped into 1 RAID0 volume as drive D:\

Also this partition scheme has also proven to function perfectly under both Mandriva2006 and Fedora Core 4. I have serious doubts that the hardware or its configuration is at fault. I have performed low level formats and surface scans on all drives both SCSI and IDE. I have updated the AHA2940UW BIOS to the latest version (2.20) released by Adaptec and I have even tried installing SuSE in an 8 cycle run isolating each SCSI drive in turn to verify the integrity of each drive and it's connection to the bus. I have checked and verified that each SCSI bus is correctly terminated at each end.
I have spent nearly a month trying to get SuSE 10.0 OSS to install to this partition scheme.

Please also note, that the scsi disks appear to function normally if the / root partition is constrained to the IDE disks in a RAID0 array and the scsi disks are mounted in their own RAID0 array. My Tandberg SLR5 8GB SCSI tape drive which is the only device on the second AHA2940UW adaptor, also functions faultlessly using tar commands from the shell konsole.

Would you like me to try and retrive any log files from the failed installations? If so which would you like? and where do I look for them?

I think I will also try building a set of floppy based SuSE installation disks which I can then modify to contain the aix7xxx.ko driver file from either the Mandriva2006 and/orFedora C4 install cd's. This will take me several days to complete I expect as I will first have to extract it from those kernel RPMS and learn how to use RPM repackage to build it into the SuSE kernel-default RPM. I have a four day weekend comming up due to a national holiday here in Australia, so I will attempt it then. Any thoughts on this idea?

Ben
Comment 7 Michael Gross 2006-01-24 15:49:13 UTC
Let me summarize: You was able to take the RAID into working order... it just didn't work once you've included the IDE-disks. You can attach the Yast logfiles (/var/log/YaST/*) and also attach the boot-messages (/var/log/boot.msg) and especially 500 lines of your syslog in /var/log/messages.

Using a compiled kernel module of another distribution is a bad idea because it will not work. You could try compiling the module from source, though, which has a good chance of success.

You should try installing 10.1 Beta1 and check if it works there. If it does not work there, it will be handled with a higher priority.

I will take the kernel-maintainers into CC with the request of a comment. Is there something known about this problem? Ben tries to use a RAID-0 with one IDE and one SCSI-stack, using the Adaptec AHA2940-UW.
Comment 8 Hannes Reinecke 2006-01-24 16:32:58 UTC
Are all modules loaded in the correct order?

If you're trying to boot from this combined RAID device _all_ modules (ie aic7xxx and the ide driver) have to be present in the initrd, otherwise not all devices will be activated during boot.

Also there have been issues where boot.localfs runs too early; to eliminate that add

# Required-Start:    boot.rootfsck boot.udev

to /etc/init.d/boot.localfs
Comment 9 Ben Phillips 2006-01-24 19:43:09 UTC
Yes, thats basically correct. Each stack can be made into a separate RAID array and it will work (that is md0=IDE disks only and md1=SCSI disks only), but if i try to put both stacks into the same RAID or LVM array (md0=IDE+SCSI disks) it does not work.

O.K. I'll download SuSE 10.1 and try it to see what it does, although I am dubious as 10.1 is still only an alpha release....

While 10.1 is downloading, I'll try manually forcing the driver modules to load in different combinations of order, that is piix 1st, cmd64x 2nd, aic7xxx last then in a different order. does it matter in which order they appear in the BIOS?

I'll try to recover those log files for you as well in the meantime.

As for editing /etc/init.d/boot.localfs, I can't, I'm booting from the install CD to do a system install when this happens. I'm not booting from the hard disks. Would it help to try this with a set of boot floppys? can this file be found and edited on a set of boot floppys?

thanks

Ben.
Comment 10 Michael Gross 2006-01-25 11:59:14 UTC
Ben: It's almost Beta2, you can wait for that release.
Comment 11 Michael Gross 2006-01-25 12:02:02 UTC
Ben, you can get into linuxrc with the parameter manual=1 - there you have control what modules are loaded and you can also add modules from floppies or other disks...
Comment 12 Ben Phillips 2006-01-26 03:54:40 UTC
O.K. I've downloaded the first disc from SuSE 10.1 Beta1 (sorry I wasn't aware that the alpha version had been updated already).

It didn't work. In fact it wouldn't even allow me to open the disk partitioner. It crashed and exited to linuxxrc as soon as the system configuration screen appeared. I can see no difference in its behaviour whether I use aic7xxx or aic7xxx_old. However I have not yet managed to recover the log files you asked for, therein may lie the information we are looking for.

I did see something else perculiar though, when I was trying some different system BIOS settings, I noticed some strange behaviour in my system BIOS that suggested it may have become corrupted, I don't think this was caused by SuSE, I have a feeling that it was caused by Mandriva as that OS did a BIOS scan on it's first boot which always crashed (second boot always worked fine). So I then decided to update the mainboard BIOS to the latest version in the hope that it would fix this issue. It did. The BIOS corruption no longer appears, but this changed the behaviour of SuSE 10.0 installation. Now it will not even format the partition md0 (IDE+SCSI) in resierFS. I tried all the different supported file systems after seeing this and discovered that now the only file system that will format is xfs. none of the others will work.

I have also noticed that ms-dos (win98 startup disk) also crashes when trying install its ASPI drivers for cd rom operation unless I enter AHA2940UW BIOS and disable int13 extensions and prevent the SCSI drives from being included in the system BIOS scan. Then it worked fine. Unfortunately this made no difference to SuSE. I will from here on leave the SCSI disks configured this way as it works with more operating systems than the previous configuration (logic being that the more OS that work, the closer the settings are to being correct for all OS).

Finally, I also noticed that pressing alt+F4 in the YAST2 installer brings up the kernel messages (I did not know this earlier). Each time I was able to bring up the kernel msgs after the install crashed I noticed that the error is generated by the same disk drive, that being /dev/sdb, each time. I am surprised at this as I have already run disk verification and low-level format utils on these disks several times and they have always been O.K. I am currently re-running the low-level format utility and media verification scans on ALL SCSI drives just to be sure they are not faulty.

I am yet to try SuSE 10.1 since updating the system BIOS. I will keep you posted of further developments and I will try to get those log files as well.

Ben.
Comment 13 Martin Lasarsch 2006-01-26 13:38:22 UTC
please reopen the bug when you can provide new information, thanks.
Comment 14 Ben Phillips 2006-01-29 11:12:10 UTC
New Information.

This is what I did.

1> downloaded UBCD (Ultimate Bootdisk CD) with INSERT Live.
2> changed hardware configuration (added three scsi disks)
3> re-examined all of my hardware, scsi terminations, cables, bios, etc. - No Problems
4> ran diagostics on as much hardware as possible, disks, memory, controllers, cpu, etc - No Problems
5> run reference install of Fedora Core 4, worked O.K.
6> retried SuSE 10.0 install, Failed.

I then tried different install configurations in SuSE and I noticed somthing very interesting. Using different combinations of disk controller yeilds different results.

Intel 848 (piix driver) - works fine in RAID array on it's own
CSA649U (cmd64x driver) - works fine in RAID array on it's own
both AHA2940UW (aic7xxx driver) - work fine in RAID array both together and individually.

Intel 848 + CSA649U - work fine together in a RAID array

Intel 848 + AHA2940UW + AHA2940UW - USUALLY FAILS BUT HAS WORKED ONCE.

Intel 848 + CSA649U + AHA2940UW - FAILS EVERY TIME!!
Intel 848 + CSA649U + AHA2940UW + AHA2940UW - FAILS EVERY TIME!!

In each case, the redundant controller is not removed or disabled, it is simply not used in the array.

As usual, the system either crashes outright (freezes) or returns some obscure error report which is different almost every time.

Using the aic7xxx_old driver makes no difference (manual=1)
Pressing Alt-F3 early in the install shows that the drivers are being loaded in the correct order automatically, that is they are loaded with respect to the order in which the devices are physically installed.
SAFE Settings makes no difference.
No ACPI makes no difference
File system type (reiserfs, xfs...) makes no difference.
chunk size in RAID superblock makes no difference

I think we can safely say that there IS a problem with the aic7xxx driver. especially since there has been another bug report involving this driver in the last few days, albeit he was using 10.1 beta2 but he still had problems with this driver.

I have thus far been unable to retreive the install log files you asked for as the failed install produces a corrupted file system. Using the INSERT Live distro from the UBCD, the command "mount -t reiserfs /dev/md0 /mnt/hd" returns the error "Bad option, incorrect file system, Bad superblock or too many mounted devices".

Bad option - no, the only option I specified was that of the file system type.
Incorrect File System - I KNOW that it was formatted with reiserfs
Too many mounted devices - only 5 devices are mounted when this occurs.

This leaves the Bad Superblock as the culprit by logical deduction. Also the file system can be mounted in this way with no problems after a successful install and the file system created by a Fedora install can always be mounted in this way. Is there some way I can repair the superblock without destroying the log files?

SuSE 10.1 Beta1 crashes and reverts to the linuxXRC screen as soon as the install configuration screen appears. It flashes a whole heap of error messages on the screen as it does so but they are to quick to read and the pause key on the keyboard has no effect, nor does ctl-break for that matter.

I am not going to even contemplate trying 10.1 beta2 after reading the bug logged by the other fellow, unless you can give some reason as to why you think it will work.

Sorry I couldn't give you more info but this is all I can do. I am hampered as much by my frustration as by my inexperience with Linux. If you can suggest a way I might be able to recover the log files after a failed install, despite the file system corruption, then I'll give it a go, but I am at a complete loss as to what I should try next. The only thing I can think of is perhaps obtainig a replacement scsi controller of a different brand (requiring a different driver) but I am reluctatnt to do this as I cannot justify the cost of replacing hardware that IS NOT FAULTY, especially when the operating system it has a problem with is free.

Ben.
Comment 15 Michael Gross 2006-01-30 16:07:16 UTC
Ben, sorry, but I am at a loss, too - I don't know whom to assign this problem at the moment. Indeed we weren't able to trace the problem down to the causing component, which makes this naturally a little difficult.

In order to do something, we would need at least:
+ your hardware configuration (output of hwinfo)
+ /etc/fstab
+ fdisk -l
+ yast logs of the broken or failing installation (/var/log/Yast2)
+ boot messages (/var/log/boot.msg)
+ your syslog in /var/log/messages

You could help by investigating further if others had this problem and maby even solved it (www)?
Comment 16 Michael Gross 2006-01-31 10:16:31 UTC
I don't know why the last message was marked private? It was at least not intended that way. Ben, please read comment #15.
Comment 17 Ben Phillips 2006-01-31 19:32:10 UTC
Is there some way I can forcibly drop to a console from YAST2 or linuxXRC? If there is, I may be able to access the log files without the need to reboot and mount from aother operating system.

I'm also going to try another brand of SCSI adaptor. I know I said that I was reluctanat to do this, but there are other advantages to going in this direction for me. I have just purchased on Ebay, a Mylex DAC960. It's 3 channel SCSI RAID adaptor. This has a hardware advantage for me as will allow me to free a PCI slot for other uses while still increasing my hard disk capacity. I chose this adaptor as there are plenty of Linux and Unix based drivers available for this card and they have been around for a long time (theory is that they should be very well developed).

It will take about a week for this new card to arrive. In the meantime will I try pulling the redundant controller out of the machine in each of the previous hardware configurations. I have a funny feeling that it is actually caused by a conflict between the CSA649U and the AHA2940. This is just a theory of course.

Have you personally read bug #145759, this guy sounds like he is having a very similar problem only he has apparently had better luck than I. He is not using a PCI-IDE adaptor, which may be making all the difference. He also seems to have been able to get a console, presumably by pressing Alt-F2. I have tried this but I didn't think that this was a console, just a message output screen. can I enter commands here?
Comment 18 Michael Gross 2006-02-01 16:44:56 UTC
> Is there some way I can forcibly drop to a console from YAST2 or linuxXRC? If
> there is, I may be able to access the log files without the need to reboot and
> mount from aother operating system.

You can switch to a console by pressing Alt+F-keys, AFAIK console 3 is a working one, but there is at least one, just search for it ;) A working console will give you a prompt.

If PCI is making problems here, you can try an installation in safe mode, with ACPI=off (...), PCI IRQ-routing will be handled in a different way then.

Comment 19 Ben Phillips 2006-02-03 21:44:13 UTC
O.K. I had a partial success!!

This is what I did:

Booted from CD1
Selected:
  Installation
  Pressed F3
Typed:
  pci=routeirq manual=1 (acpi=1 makes the computer freeze)

At the Linuxrc Main Menu
Selected:
  Settings
  Debug
  Enable/Disable SSH Mode
  Start SSH for text install - YES

Returned to Linuxrc Main Menu
Selected:
  Kernel Modules (Hardware Drivers)
  Load IDE/RAID/SCSI Modules
    Loaded:
    piix
    cmd64x
    aic7xxx (aic7xxx_old makes no difference)
  Load Network Card Modules
    sk98lin
    8139too

Returned to Linuxrc Main Menu
Selected:
  Start installation or system
  Start installation or update
  CDROM
  Entered arbitrary password (000000)
  Configured eth0 with static IP address

Switched to console alt-F2
Typed:
  yast

Allowed the installation of additional modules:
  usb-storage
  dm-mod
  dm-snapshot

Set time zone
Selected KDE desktop
Selected Partitioning
  Formatted /dev/hda1 reiserfs /Boot (101.9MB)
            /dev/hda3 swap     swap  (1024MB)
            /dev/md0  reiserfs /     (110GB) inc. IDE+SCSI disks
Selected:
  Accept
  Install.

I then let the installation run until it stalled. At this point I switched to console alt-F5 and I started poking around.
I found the log files that asked for but I found them in two places. I found them at:

/var/log
/var/log/YaST2

and

/mnt/var/log
/mnt/var/log/YaST2

with respect to the ram disk rd0 as the root file system. Not sure which ones you wanted so I copied them all, each to it's own floppy disk which were then labed rd0 and md0 respectively. I have created zip files with these names and attached them for your perusal.

As usual the installation stalled as a result of md0 suddenly becoming a read-only file system.

The only problem I had was getting the output of the "fdisk -l" command and the "hwinfo" command. These two commands produce more than one screen of information each, so I can't copy very much of either output by hand.

How do I print the output of these commands to a file so that I can send it to you?
Comment 20 Ben Phillips 2006-02-03 21:47:48 UTC
Created attachment 66431 [details]
Log files from ram disk /var/log/*
Comment 21 Ben Phillips 2006-02-03 21:48:51 UTC
Created attachment 66432 [details]
log files from RAID device /mnt/var/log/*
Comment 22 Ben Phillips 2006-02-05 08:09:36 UTC
O.K.
Today I repeated the last install method used so that I could try and capture some of the error messages seen on the kernel message console screen.
These are out of oreder because they were repeatededly scrolling up the screen, so I just wrote down whatever I could at the time. I hope it gives you some more clues.
Not all of these error messages are repeated, some only appear once during each install attempt.

kernel:  losing some ticks ... checking if CPU frequency has changed
hdb: cdrom_pc_intr: the drive appears confused (ireason = 0x0)
spurious 8259A interrupt: IRQ7
udevd[1739]: get_netlink_msg:no ACTION inpayload found, skip event 'mount'
udevd[1739]: get_netlink_msg:no ACTION inpayload found, skip event 'umount'

aic7xxx_abort returns 0x2002
aic7xxx_abort returns 0x2003

scsi2: PCI error Interrupt at seqaddr= 0x8
scsi2: PCI error Interrupt at seqaddr= 0x7d
scsi2: PCI error Interrupt at seqaddr= 0x7c
scsi3: PCI error Interrupt at seqaddr= 0x8
scsi2: received a Target Abort
CBD: 0x2a 0x0 0x0 0x12 0x87 0xff 0x0 0x1 0xc0 0x0
aic7xxx_dev_reset returns 0x2003
kernel Free SCB list: 0 5 1 2 7 4
untagged Q(5): 4
untagged Q(5): 3
3 SCB_Control [0x0] SCB_SCSIID[0x5f]:(OID) SCB_LUN[0x0}
6 SCB_Control [0x0] SCB_SCSIID[0x5f]:(OID) SCB_LUN[0x0}

hdb: status error: status=0x58{DriveReady SeekComplete DataRequest}
ide: failed opcode was: unknown
hdb: drive not ready for command

There is much much more information than what I have written here, but I don't know how to capture all of it so I can send it to you.

Any ideas?

Ben
Comment 23 Michael Gross 2006-02-06 15:53:24 UTC
So just to be sure: You chose `Safe Settings' for the installation? If not, please select this boot option.

scsi2: PCI error Interrupt at seqaddr= 0x8 <-- this still looks like an IRQ-Problem. What is the content of /proc/interrupts for the bootet system?

Are you sure your hardware is OK? Have you run a test?
Comment 24 Ben Phillips 2006-02-06 19:39:37 UTC
I have tried "safe settings" several times. The only difference it makes is that the computer freezes instead of just giving me error messages, unless I omit the 'acpi=off' option, then it just cycles the same error messages over and over.

My hardware is fine. As I have stated previously I have run diagnostic test after diagnostic test and they reveal nothing. In addition to this as I have also stated before, this configuration works fine under both Fedora C4 AND Mandriva 2006. I have also used the IDE and SCSI drives to form a RAID0 partiton under Windows XP as a seconday drive, where again, it works fine. It is not a hardware issue it works under three other operating systems, two of them Linux based. | also have several different AHA2940 adaptors and they all do the same thing.

I did a bit more research yeaterday (upon deciding that you were right about the irq conflict) and I found several sites that mention problems between aic7xxx and network adaptors sharing the same irq. On reading this I disabled ALL the hardware that I could via the system BIOS (ie. Onboard Sound, onboard LAN, onboard USB etc), no difference. I then pulled out both the pci lan adaptors, so now the only PCI cards in this machine are the SCSI and IDE controllers, no different, the computer still crashes.

Tonight I will try moving the cards around between different PCI slots (again) and I will alternate between "normal" and "safe settings" in the installation menu. I will do this because I think you are right about the irq conflict.

I will also try to get the contents of /proc/interrupts for you to have a look at.

Currently both the AHA2940UW cards are sharing IRQ11 in the BIOS as they are installed in PCI1 and PCI5 for that reason. It seems to make no difference what PCI slots they are installed in though. Originally they were installed in PCI2 and PCI3 with the CMD649U in PCI1. Needless to say that this didn't work either. (yes I tried 'safe settings' here too). I have also tried forcing the PCI slots to use various INT numbers in the BIOS instead of letting the computer automatically assign them, no difference.

Here are the url's I metioned earlier for your perusal:

http://www.scyld.com/pci_irq.htm
http://www.ictp.trieste.it/~radionet/nuc1996/ref/howto-html/scsi-howto-5.html

One of these sites mentions having to remove the "SA_INTERRUPT" line from the aic7xxx driver code to prevent irq conflicts. Would this help?

I expect my new SCSI card (a Mylex DAC960-3) to arrive sometime tomorrow or the following day. I will be very interested to see if the problem still occurs with a different SCSI adaptor and driver.
Comment 25 Michael Gross 2006-02-07 13:27:16 UTC
Michael: Can you give us a comment here?
Short version: Ben tries to create a RAID consisting of a mix of SCSI and IDE-disks, using an Adaptec AHA2940UW (so far), but the RAID does not work properly. Is there something he should know? Can we support/fix this somehow?
Comment 26 Michael Gross 2006-02-07 14:29:10 UTC
Ben: I talked to some people and they told me you should wait for Beta4 and try it again, as specially many md-bugs will be fixed in that release. This might also solve your problem. I will close this bug for the moment. If the problem still exists with b4, please reopen this report.
Comment 27 Ben Phillips 2006-02-13 11:29:38 UTC
Guys and Girls,
last week I received a new DAC960PD 3 channel SCSI RAID controller. After learning hoe to set it up, I took both the AHA2940 adaptors out of the machine and attempted another installation of SuSE 10.0

It failed again. This means that the problem is NOT the aic7xxx driver and is most likely one of two problems:

1: a problem with the way Yast assigns IRQ's to devices, especially given that using the acpi=off and pci=routeirq options only seem to make Yast crash faster.

2: a problem with the md subsystem.

Bear in mind that this problem only occurs when trying to build a RAID0 array consisting of both IDE and SCSI disks regardless of the type of SCSI adaptor used.

Any idea when 10.1b4 will be ready for download? I'm interested to try it if you're trying fix bugs in the md system.
Comment 28 Ben Phillips 2006-02-18 09:01:48 UTC
OK

I have given up on 10.0 ever working in this hardware config.
As I am still waiting for 10.1beta4 to be released, I decided to try again with 10.1beta1 and i noticed something very interesting.

If I low level format the scsi drives attachedto my new DAC960PDU 3 channel controller, the installation proceeds perfectly until it tries to install the bootloader. At this point Yast returns an error which reads:

cp: cannot stat '/sbin/raidstart': no such file or directory
Shared libs:    ldd:/sbin/raidstart: no such file or directory

If I try to install 10.1beta1 without performing a low level format on the SCSI drives, Yast crashes at the "Installation Configuration" screen while scanning the existing partitions.

This might be what I did differently with the AHA2940 adaptors. I can remember attempting the installation after low level formatting ALL of my SCSI drives only once. This must be the one occasion on which the installation worked. The reason why I didn't loe level format the SCSI drives before every installation is (A) on the AHA2940 this is a VERY time consuming task (B) I didn't think it would make a lot of difference. I guess I was wrong.

Awaiting 10.1beta4..........
Comment 29 Stephan Kulow 2008-06-25 09:34:50 UTC
mass reopening all SuSE Linux bugs that are set to REMIND+LATER to change the resolution to WONTFIX (adapting to new policy)
Comment 30 Stephan Kulow 2008-06-25 09:36:53 UTC
mass reopening all SuSE Linux bugs that are set to REMIND+LATER to change the resolution to WONTFIX (adapting to new policy)
Comment 31 Stephan Kulow 2008-06-25 09:41:51 UTC
mass reopening all SuSE Linux bugs that are set to REMIND+LATER to change the resolution to WONTFIX (adapting to new policy)
Comment 32 Stephan Kulow 2008-06-25 09:53:19 UTC
Closing old LATER+REMIND bugs as WONTFIX - if you still plan to work on it, feel free to reopen and set to ASSIGNED.

In case the report saw repeated reopen comments, it's due to bugzilla timing out on the huge request ;(