Bug 246701

Summary: ahci SATA RAID - mdraid segment fault
Product: [openSUSE] openSUSE 10.2 Reporter: Alun Peck <peckaj>
Component: InstallationAssignee: Matthias Koenig <mkoenig>
Status: RESOLVED WONTFIX QA Contact: Jiri Srain <jsrain>
Severity: Critical    
Priority: P2 - High    
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: SUSE Other   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: dmegs log
hwinfo --storage
hwinfo --all
lspci -v output
YaST logs
dmraid output
dmraid output
strace output
Output of dmraid -rD
gz of requested files for 2 dmraid coomands

Description Alun Peck 2007-02-19 10:10:12 UTC
System
maniboard: S5000VSA
CPU: 2x xeon dual core 64bit 2.0 GHz
RAM: 2 GB
HD: 4x SATA 500 GB in RAID 10 configuration
Other: pata DVD writer

Installation aborts (drops user to text console) after probing hard drives, after loading kermel modules for hard drives (presumely ahci module).

Significant error message before dumping data to USB stick (/dev/sde):
dmraid[3786]: segfault at 00002b51bafeea00 rip 00002b51bafeea00 rsp 00007ffff0439a18 error 15

Note: had to pass kernel parameter startshell=1 to be able to mount USB stick after Installation aborting.

data to follow in attachments
Comment 1 Alun Peck 2007-02-19 10:13:05 UTC
Created attachment 119885 [details]
dmegs log
Comment 2 Alun Peck 2007-02-19 10:14:16 UTC
Created attachment 119886 [details]
hwinfo --storage
Comment 3 Alun Peck 2007-02-19 10:15:32 UTC
Created attachment 119888 [details]
hwinfo --all
Comment 4 Alun Peck 2007-02-19 10:16:45 UTC
Created attachment 119893 [details]
lspci -v output
Comment 5 Alun Peck 2007-02-19 10:17:29 UTC
Created attachment 119896 [details]
YaST logs
Comment 6 Andreas Jaeger 2007-02-21 08:00:01 UTC
Please do not set any bug to PO-Critsit, those are for internal usages only!
Comment 7 Matthias Koenig 2007-02-21 10:40:39 UTC
Please boot the rescue system and try running dmraid manually.
Provide the output of:
dmraid -r -vvv -d
If that fails, please provide also the output of the strace of the above command.
Comment 8 Alun Peck 2007-02-21 11:10:33 UTC
Created attachment 120274 [details]
dmraid output

Output provided by dmraid -r -vvv -d

Note: the raid logical drive size is 1 TB.
Comment 9 Alun Peck 2007-02-21 11:18:43 UTC
Forgot to tick the option to remove the status of NEEDINFO from this bug
Comment 10 Matthias Koenig 2007-02-21 12:07:32 UTC
Ok, thanks. Next please try the same with the command
dmraid -s -ccc -d
Comment 11 Alun Peck 2007-02-22 04:24:03 UTC
Created attachment 120444 [details]
dmraid output

Using dmraid -s -ccc -d
Comment 12 Matthias Koenig 2007-02-22 09:30:30 UTC
Now lets try the same with the argument of the RAID set, because this seems to be the one that fails according to the Yast log. Please try this:
dmraid -s -ccc -d ddf1_4c534920202020201000005500000000330adc0c00000a28
If this fails, please also provide the output of the strace of this command.
Comment 13 Alun Peck 2007-02-22 10:04:55 UTC
Created attachment 120483 [details]
strace output

Running above command caused Segment Fault.
Ran strace -o outputFilename command
Comment 14 Matthias Koenig 2007-02-28 12:31:36 UTC
It would be very helpful to get more debugging information.
Do you think you could provide a gdb backtrace of the command in comment #12?

Comment 15 Alun Peck 2007-02-28 13:00:49 UTC
Obtain the following response

-bash: gdb: command not found

How do I boot the installation CD to get this application?
Comment 16 Matthias Koenig 2007-02-28 14:25:11 UTC
Unfortunately gdb and debuginfo packages are not in the rescue system. So this will be some more work, I am not sure if you have the resources to do this, but if you want to give it a try here are some hints:
You would have to use another 10.2 system of the same architecture, which has dmraid, the dmraid-debuginfo package and gdb installed and then mount this system into your rescue system.
I think it will enough to export /usr and /sbin and mount them into the rescue system at equal locations. It should be also possible to copy /usr /sbin to an USB stick and mount this.

Indepently from the steps above:
Currently I have not the possibility to test the ddf1 metadata format, as soon as this changes I can try to set up your raid configuration. Could you please dump your RAID metadata with
dmraid -rD
and provide all resulting files.
Comment 17 Matthias Koenig 2007-02-28 16:44:24 UTC
Ah, please forget the complicated stuff about setting up a debugging environment.
All I need is a core file of the command in comment #12.
You can create this core file by setting the resource limits of the max core size,
for example:
ulimit -c 10000
and running the command. The core file will then be written to current directory.
Comment 18 Alun Peck 2007-03-01 04:54:03 UTC
Created attachment 121722 [details]
Output of dmraid -rD

No core dump was generated or segmentation fault.
Comment 19 Matthias Koenig 2007-03-01 09:06:48 UTC
Sorry, I think we have some misconception here, maybe my instructions have not been clear enough.

1. You need to create the core dump for the command that failed in comment #12:
dmraid -s -ccc -d ddf1_4c534920202020201000005500000000330adc0c00000a28
With the this core file I am able to debug dmraid in the state of the failure.

2. dmraid -rD will write files to the current directory containing the metadata.
Please provide these files, not the stdout output of the program.
Comment 20 Alun Peck 2007-03-01 09:49:57 UTC
Created attachment 121759 [details]
gz of requested files for 2 dmraid coomands

Changed my current directory from /root to /var/log, then ran the 2 commands. This time I ended up with a core file (327680 bytes) and 4 sets of sd* files (41926656, 13, 10 bytes each).
Comment 21 Matthias Koenig 2007-03-01 14:18:36 UTC
Some major problem is that the RAID10 configuration is currently not supported by dmraid with the DDF1 metaformat. So even if the segfault will be fixed, you will not be able to use the RAID in this configuration with dmraid.
Comment 22 Alun Peck 2007-03-02 04:25:39 UTC
If need, I can fall back to using 2 logical drives in RAID 1 configuration instead of RAID 10, until such time RAID 10 is properly supported.
Comment 23 Matthias Koenig 2007-03-06 14:21:48 UTC
Did you try to configure 2 RAIDs in RAID 1 configuration and then try to install?
Does the segfault still happen?
Comment 24 Alun Peck 2007-03-08 04:49:27 UTC
Built 2 RAIDs in RAID 1 configuration. Still segfault occurs.

What data do you require for this segfault for RAID 1?
Comment 25 Matthias Koenig 2007-03-14 14:56:57 UTC
Thanks, there is no more data required.
Comment 26 Alun Peck 2007-04-03 05:18:11 UTC
Could you provide me the status of progress to support atleast RAID 1, though RAID 10 is preferred.
Comment 27 Matthias Koenig 2007-04-04 13:28:29 UTC
I am sorry, I currently do not have the resources to debug this further. It will still take some time.
Comment 28 Alun Peck 2007-08-15 06:10:29 UTC
I thought I did post this reply I received from Intel. Sorry about this.

PS: He still does not have the drivers.

---------------------------
Dear Mr. Peck

Unfortunatelly we don't have source code for SuSE* Linux* 10 available as yet.
I have requested the drives but don't have an ETA to when we'll have it available. 

Unfortunatelly the current drivers we have on the web is for Linux* kernels 2.6.16

If you still have any questions plese feel free to contact us.


Kind regards,

Herbert H
Senior Customer Support Engineer · Enterprise Products
Intel® Customer Support (EMEA)
Intel Corporation (UK) Limited
Comment 32 Matthias Koenig 2007-08-22 08:29:15 UTC
I have seen some possibly related segfault issue on the ataraid list.
I got the following response from Intel:

"The S5000VSA board should have BIOS/orom support for ISW metadata, any
reason why DDF is being used?

Jason Gaston and Ying Fang released a patch fixing a segfault issue when
using ISW metadata, perhaps there is a similar issue with DDF.
http://marc.info/?l=ataraid-list&m=118315445123823&w=2"

I am wondering why your RAID is detected as DDF1 format, according to this information it should be ISW format. There has been work by Intel on better ISW support for dmraid.
Can you figure out, if it is possible to configure your RAID in a way it will be in ISW format? Or can you describe how you configured your RAID and why it is ddf1 and not isw? Do you use the the onboard RAID functionality or do you have an additional add-in card?

I got another reply from Intel, which might help:
"If you can risk loosing the array set, you could enable the S5000VSA's
onboard RAID support (assuming that there is not an add-in card being
used).
Buried in the BIOS of the S5000VSA is an option to enable RAID.  Select
Advanced -> ATA Controller Configuration -> set On board SATA Controller
as Enabled -> set SATA Mode as Enhanced -> set Configure SATA as RAID as
Enabled.
This just enables the 'Intel RAID option ROM' which can then be used to
create an ISW RAID set.  Then you can use the ISW segfault patch..."

In this case I could apply the Intel patches, which might resolve this issue.
Comment 33 Stephan Kulow 2007-10-04 12:23:44 UTC
Shall we close this bug? There seems little interest
Comment 34 Stephan Kulow 2007-11-10 16:46:58 UTC
no objections it seems