Bug 115877 - libata crashes when starting xcdroast
Summary: libata crashes when starting xcdroast
Status: RESOLVED INVALID
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 4 Plus
Hardware: x86-64 All
: P5 - None : Critical
Target Milestone: ---
Assignee: Jens Axboe
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-08 14:33 UTC by Hannes Reinecke
Modified: 2006-08-14 05:55 UTC (History)
1 user (show)

See Also:
Found By: Development
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
screenlog.0 (17.29 KB, text/plain)
2005-09-09 08:43 UTC, Hannes Reinecke
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hannes Reinecke 2005-09-08 14:33:13 UTC
And just as we thought we had it fixed:

Kernel BUG at "arch/x86_64/kernel/pci-gart.c":537
invalid operand: 0000 [1] SMP
CPU 2
Modules linked in: st freq_table ipv6 pcc_acpi sony_acpi thermal processor fan
button iscsi_tcp battery ac scsi_transport_iscsi edd af_packet floppy aic79xx
scsi_transport_spi e1000 i2c_amd8111 ohci_hcd usbcore i2c_amd756 i2c_core
hw_random generic shpchp pci_hotplug parport_pc lp parport dm_mod reiserfs
amd74xx sr_mod cdrom sg sata_sil libata sd_mod scsi_mod ide_disk ide_core
Pid: 7652, comm: cdrecord Tainted: G     U 2.6.13-9-smp
RIP: 0010:[<ffffffff801215b8>] <ffffffff801215b8>{dma_map_sg+248}
RSP: 0018:ffff81003d3358e8  EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff81003fe0b0c8 RCX: 000000007feee000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff81003f5a1428 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000019a28 R11: ffffffff802b9930 R12: ffff81003fe0b0c0
R13: 0000000000000001 R14: ffff81004169e870 R15: ffff81003fe0b0a8
FS:  00002aaaaaef1e40(0000) GS:ffffffff8050e900(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaaac1000 CR3: 000000003d32f000 CR4: 00000000000006e0
Process cdrecord (pid: 7652, threadinfo ffff81003d334000, task ffff81007fd40130)
Stack: 0000000000000003 0000000200000002 0000000200000086 0000000000000000
       000000003edb1ef8 ffff81003f5a1958 ffff81003f5a1428 ffff81003fe0b0c0
       ffff81003f5a9000 ffff81003f5a19f8
Call Trace:<ffffffff8806c228>{:libata:ata_qc_issue+536}
<ffffffff88070d60>{:libata:atapi_xlat+0}
       <ffffffff88070f20>{:libata:ata_scsi_translate+208}
<ffffffff880359b0>{:scsi_mod:scsi_done+0}
       <ffffffff880359b0>{:scsi_mod:scsi_done+0}
<ffffffff880712f5>{:libata:ata_scsi_queuecmd+293}
       <ffffffff88035bf4>{:scsi_mod:scsi_dispatch_cmd+548}
       <ffffffff8803ca69>{:scsi_mod:scsi_request_fn+1097}
<ffffffff802aeab8>{generic_unplug_device+24}
       <ffffffff802aeb97>{blk_execute_rq+199}
<ffffffff8019115f>{bio_phys_segments+15} 
       <ffffffff802b41ff>{sg_io+479} <ffffffff802b4688>{scsi_cmd_ioctl+872}
       <ffffffff801a6ae1>{__d_lookup+257} <ffffffff8809e228>{:cdrom:cdrom_ioctl+72}
       <ffffffff802b2440>{blkdev_driver_ioctl+112}
<ffffffff802b2bb6>{blkdev_ioctl+1718}
       <ffffffff8035c5c6>{thread_return+0} <ffffffff8035c618>{thread_return+82}
       <ffffffff8018014f>{free_pages_and_swap_cache+127}
<ffffffff8017a8ba>{unmap_region+362}
       <ffffffff80145139>{lock_timer_base+41} <ffffffff80193d0b>{block_ioctl+27}
       <ffffffff801a03cf>{do_ioctl+47} <ffffffff801a06eb>{vfs_ioctl+683}
       <ffffffff801a0796>{sys_ioctl+118} <ffffffff8010ed7e>{system_call+126}


Code: 0f 0b a3 60 b0 37 80 ff ff ff ff c2 19 02 49 8b 86 c0 01 00
RIP <ffffffff801215b8>{dma_map_sg+248} RSP <ffff81003d3358e8>

just starting up xcdroast did the trick here ...
Comment 1 Hannes Reinecke 2005-09-08 14:37:13 UTC
The bug is here:

		struct scatterlist *s = &sg[i];
		dma_addr_t addr = page_to_phys(s->page) + s->offset;
		s->dma_address = addr;
		BUG_ON(s->length == 0); 

So appearently the lenght of the scatterlist is 0. Hmm.
Comment 2 Hannes Reinecke 2005-09-09 06:39:06 UTC
Upped severity.
Comment 3 Hannes Reinecke 2005-09-09 08:43:21 UTC
Created attachment 49339 [details]
screenlog.0

screenlog with debug patch.
Comment 4 Hannes Reinecke 2005-09-09 08:43:38 UTC
As I thought: padding is to blame.
Comment 5 Jens Axboe 2005-09-09 08:57:27 UTC
Yeah, that was my suspicion as well. Thanks for doing the debug run, Hannes.
Comment 6 Hannes Reinecke 2005-09-09 09:05:07 UTC
It's getting even better:

ENTER(ata2): elems 1 length 2

just before the crash. elems is qc->n_elem, len is lsg->length.
Comment 7 Jens Axboe 2005-09-09 09:11:33 UTC
Let me kill the alignment patch for now, I'm not even convinced it is strictly
necessary. If it is, a better fix for now would be to make sure that unaligned
transfers just use pio instead of dma. This is what ide-cd has always done. The
padding code is horrible.
Comment 8 Jens Axboe 2005-09-09 09:12:16 UTC
Hannes, wrt commen $6, there's nothing wrong with that. 1 sg element, 2 bytes in
length. The problem is that the padding code always pads down, which of course
breaks when lenght < sizeof(word).
Comment 9 Hannes Reinecke 2005-09-09 09:14:33 UTC
Yeah, that's what kept me wondering, too.
The comment says it tries to pad upwards, but the code seems to imply different.
But it complicated as hell, so we should get rid of it.
Comment 10 Jens Axboe 2005-09-09 09:15:25 UTC
It is indeed a horrible piece of code. I'm killing it now and readjusting the
following patches. We should do a trial run with a new kernel afterwards.
Comment 11 Jens Axboe 2005-09-09 09:28:13 UTC
It's committed now, Hannes can you do a test with current HEAD?
Comment 12 Hannes Reinecke 2005-09-09 09:40:15 UTC
Testing here with patched removed locally (faster spin-around time :-)
Comment 13 Hannes Reinecke 2005-09-09 09:53:57 UTC
xcdroast started up flawlessly. Trying to burn a CD now.
Comment 14 Hannes Reinecke 2005-09-09 12:15:58 UTC
Doesn't work:

g40:/tmp # cdrecord dev=ATAPI:0,0,0 k3b_image.img -tao
cdrecord: No write mode specified.
cdrecord: Asuming -tao mode.
cdrecord: Future versions of cdrecord may have different drive dependent defaults.
cdrecord: Continuing in 5 seconds...
Cdrecord-Clone 2.01 (x86_64-suse-linux) Copyright (C) 1995-2004 Jörg Schilling
Note: This version is an unofficial (modified) version
Note: and therefore may have bugs that are not present in the original.
Note: Please send bug reports or support requests to http://www.suse.de/feedback
Note: The author of cdrecord should not be bothered with problems in this version.
scsidev: 'ATAPI:0,0,0'
devname: 'ATAPI'
scsibus: 0 target: 0 lun: 0
Warning: Using ATA Packet interface.
Warning: The related Linux kernel interface code seems to be unmaintained.
Warning: There is absolutely NO DMA, operations thus are slow.
Using libscg version 'schily-0.8'.
cdrecord: Warning: controller creates hard SCSI failure when retrieving CD
capabilities page.
Device type    : Removable CD-ROM
Version        : 5
Response Format: 2
Capabilities   : 
Vendor_info    : 'PLEXTOR '
Identifikation : 'DVDR   PX-712A  '
Revision       : '1.03'
Device seems to be: Generic mmc2 DVD-R/DVD-RW.
cdrecord: This version of cdrecord does not include DVD-R/DVD-RW support code.
cdrecord: If you need DVD-R/DVD-RW support, ask the Author for cdrecord-ProDVD.
cdrecord: Free test versions and free keys for personal use are at
ftp://ftp.berlios.de/pub/cdrecord/ProDVD/
Using generic SCSI-3/mmc   CD-R/CD-RW driver (mmc_cdr).
Driver flags   : MMC-3 SWABAUDIO BURNFREE VARIREC FORCESPEED SPEEDREAD
SINGLESESSION HIDECDR 
Supported modes: TAO PACKET SAO SAO/R96P SAO/R96R RAW/R16 RAW/R96P RAW/R96R
cdrecord: Input/output error. read disk info: scsi sendcmd: no error
CDB:  51 00 00 00 00 00 00 00 22 00
status: 0x2 (CHECK CONDITION)
Sense Bytes: 70 00 00 00 00 00 00 0A 00 00 00 00 00 00 00 00 00 00
Sense Key: 0x0 No Additional Sense, Segment 0
Sense Code: 0x00 Qual 0x00 (no additional sense information) Fru 0x0
Sense flags: Blk 0 (not valid) 
cmd finished after 599.966s timeout 240s
cdrecord: Cannot get disk type.
Comment 15 Hannes Reinecke 2005-09-09 13:08:44 UTC
And this with some SCSI logging enabled:

g40:~ # scsi <1:0:0:0> done TIMEOUT        0 scsi1 : destination target 0, lun 0
        command: Mode Sense (10): 5a 00 2a 00 00 00 00 00 02 00
Error handler scsi_eh_1 waking up
SCSI error : <1 0 0 0> return code = 0x8000002
sr0: Current: sense key: No Sense
    Additional sense: No additional sense information
Error handler scsi_eh_1 sleeping
scsi <1:0:0:0> done FAILED         2 scsi1 : destination target 0, lun 0
        command: Read DVD structure: ad 00 00 00 00 00 00 00 00 20 00 00
Unrecognized sense data (in hex):
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Error handler scsi_eh_1 waking up
Error handler scsi_eh_1 sleeping
scsi <1:0:0:0> done FAILED         2 scsi1 : destination target 0, lun 0
        command: Mode Sense (10): 5a 00 30 00 00 00 00 00 08 00
Unrecognized sense data (in hex):
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Error handler scsi_eh_1 waking up
Error handler scsi_eh_1 sleeping
scsi: unknown opcode 0xe9
scsi: unknown opcode 0xed
scsi: unknown opcode 0x01
scsi <1:0:0:0> done TIMEOUT        0 scsi1 : destination target 0, lun 0
        command: Xpwrite, Read disk info: 51 00 00 00 00 00 00 00 22 00
Error handler scsi_eh_1 waking up
SCSI error : <1 0 0 0> return code = 0x8000002
sr0: Current: sense key: No Sense
    Additional sense: No additional sense information
Error handler scsi_eh_1 sleeping
scsi <1:0:0:0> done FAILED         2 scsi1 : destination target 0, lun 0
        command: Mode Select (10): 55 10 00 00 00 00 00 00 10 00
Unrecognized sense data (in hex):
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Error handler scsi_eh_1 waking up
Error handler scsi_eh_1 sleeping

Bastard. I thought we had that one fixed.
Comment 16 Hannes Reinecke 2005-09-09 14:13:54 UTC
Well, seems to be a red herring.
The sense data is retrieved correctly in ata_qc_timeout().

This is from the debug output of cdrecord:

Executing 'mode select g1' command on Bus 0 Target 0, Lun 0 timeout 40s
CDB:  55 10 00 00 00 00 00 00 10 00
Sending 16 (0x10) bytes of data.
Write Data:  00 00 00 00 00 00 00 08 00 00 00 00 00 00 08 00
cdrecord: Input/output error. mode select g1: scsi sendcmd: no error
CDB:  55 10 00 00 00 00 00 00 10 00
status: 0x2 (CHECK CONDITION)
Sense Bytes: 70 00 05 00 00 00 00 0A 00 00 00 00 26 00 00 00 00 00
Sense Key: 0x5 Illegal Request, Segment 0
Sense Code: 0x26 Qual 0x00 (invalid field in parameter list) Fru 0x0
Sense flags: Blk 0 (not valid) 
Comment 17 Greg Kroah-Hartman 2006-08-14 05:55:50 UTC
Closing this, as it is quite stale.