Bugzilla – Bug 115877
libata crashes when starting xcdroast
Last modified: 2006-08-14 05:55:50 UTC
And just as we thought we had it fixed: Kernel BUG at "arch/x86_64/kernel/pci-gart.c":537 invalid operand: 0000 [1] SMP CPU 2 Modules linked in: st freq_table ipv6 pcc_acpi sony_acpi thermal processor fan button iscsi_tcp battery ac scsi_transport_iscsi edd af_packet floppy aic79xx scsi_transport_spi e1000 i2c_amd8111 ohci_hcd usbcore i2c_amd756 i2c_core hw_random generic shpchp pci_hotplug parport_pc lp parport dm_mod reiserfs amd74xx sr_mod cdrom sg sata_sil libata sd_mod scsi_mod ide_disk ide_core Pid: 7652, comm: cdrecord Tainted: G U 2.6.13-9-smp RIP: 0010:[<ffffffff801215b8>] <ffffffff801215b8>{dma_map_sg+248} RSP: 0018:ffff81003d3358e8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff81003fe0b0c8 RCX: 000000007feee000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff81003f5a1428 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000019a28 R11: ffffffff802b9930 R12: ffff81003fe0b0c0 R13: 0000000000000001 R14: ffff81004169e870 R15: ffff81003fe0b0a8 FS: 00002aaaaaef1e40(0000) GS:ffffffff8050e900(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaaac1000 CR3: 000000003d32f000 CR4: 00000000000006e0 Process cdrecord (pid: 7652, threadinfo ffff81003d334000, task ffff81007fd40130) Stack: 0000000000000003 0000000200000002 0000000200000086 0000000000000000 000000003edb1ef8 ffff81003f5a1958 ffff81003f5a1428 ffff81003fe0b0c0 ffff81003f5a9000 ffff81003f5a19f8 Call Trace:<ffffffff8806c228>{:libata:ata_qc_issue+536} <ffffffff88070d60>{:libata:atapi_xlat+0} <ffffffff88070f20>{:libata:ata_scsi_translate+208} <ffffffff880359b0>{:scsi_mod:scsi_done+0} <ffffffff880359b0>{:scsi_mod:scsi_done+0} <ffffffff880712f5>{:libata:ata_scsi_queuecmd+293} <ffffffff88035bf4>{:scsi_mod:scsi_dispatch_cmd+548} <ffffffff8803ca69>{:scsi_mod:scsi_request_fn+1097} <ffffffff802aeab8>{generic_unplug_device+24} <ffffffff802aeb97>{blk_execute_rq+199} <ffffffff8019115f>{bio_phys_segments+15} <ffffffff802b41ff>{sg_io+479} <ffffffff802b4688>{scsi_cmd_ioctl+872} <ffffffff801a6ae1>{__d_lookup+257} <ffffffff8809e228>{:cdrom:cdrom_ioctl+72} <ffffffff802b2440>{blkdev_driver_ioctl+112} <ffffffff802b2bb6>{blkdev_ioctl+1718} <ffffffff8035c5c6>{thread_return+0} <ffffffff8035c618>{thread_return+82} <ffffffff8018014f>{free_pages_and_swap_cache+127} <ffffffff8017a8ba>{unmap_region+362} <ffffffff80145139>{lock_timer_base+41} <ffffffff80193d0b>{block_ioctl+27} <ffffffff801a03cf>{do_ioctl+47} <ffffffff801a06eb>{vfs_ioctl+683} <ffffffff801a0796>{sys_ioctl+118} <ffffffff8010ed7e>{system_call+126} Code: 0f 0b a3 60 b0 37 80 ff ff ff ff c2 19 02 49 8b 86 c0 01 00 RIP <ffffffff801215b8>{dma_map_sg+248} RSP <ffff81003d3358e8> just starting up xcdroast did the trick here ...
The bug is here: struct scatterlist *s = &sg[i]; dma_addr_t addr = page_to_phys(s->page) + s->offset; s->dma_address = addr; BUG_ON(s->length == 0); So appearently the lenght of the scatterlist is 0. Hmm.
Upped severity.
Created attachment 49339 [details] screenlog.0 screenlog with debug patch.
As I thought: padding is to blame.
Yeah, that was my suspicion as well. Thanks for doing the debug run, Hannes.
It's getting even better: ENTER(ata2): elems 1 length 2 just before the crash. elems is qc->n_elem, len is lsg->length.
Let me kill the alignment patch for now, I'm not even convinced it is strictly necessary. If it is, a better fix for now would be to make sure that unaligned transfers just use pio instead of dma. This is what ide-cd has always done. The padding code is horrible.
Hannes, wrt commen $6, there's nothing wrong with that. 1 sg element, 2 bytes in length. The problem is that the padding code always pads down, which of course breaks when lenght < sizeof(word).
Yeah, that's what kept me wondering, too. The comment says it tries to pad upwards, but the code seems to imply different. But it complicated as hell, so we should get rid of it.
It is indeed a horrible piece of code. I'm killing it now and readjusting the following patches. We should do a trial run with a new kernel afterwards.
It's committed now, Hannes can you do a test with current HEAD?
Testing here with patched removed locally (faster spin-around time :-)
xcdroast started up flawlessly. Trying to burn a CD now.
Doesn't work: g40:/tmp # cdrecord dev=ATAPI:0,0,0 k3b_image.img -tao cdrecord: No write mode specified. cdrecord: Asuming -tao mode. cdrecord: Future versions of cdrecord may have different drive dependent defaults. cdrecord: Continuing in 5 seconds... Cdrecord-Clone 2.01 (x86_64-suse-linux) Copyright (C) 1995-2004 Jörg Schilling Note: This version is an unofficial (modified) version Note: and therefore may have bugs that are not present in the original. Note: Please send bug reports or support requests to http://www.suse.de/feedback Note: The author of cdrecord should not be bothered with problems in this version. scsidev: 'ATAPI:0,0,0' devname: 'ATAPI' scsibus: 0 target: 0 lun: 0 Warning: Using ATA Packet interface. Warning: The related Linux kernel interface code seems to be unmaintained. Warning: There is absolutely NO DMA, operations thus are slow. Using libscg version 'schily-0.8'. cdrecord: Warning: controller creates hard SCSI failure when retrieving CD capabilities page. Device type : Removable CD-ROM Version : 5 Response Format: 2 Capabilities : Vendor_info : 'PLEXTOR ' Identifikation : 'DVDR PX-712A ' Revision : '1.03' Device seems to be: Generic mmc2 DVD-R/DVD-RW. cdrecord: This version of cdrecord does not include DVD-R/DVD-RW support code. cdrecord: If you need DVD-R/DVD-RW support, ask the Author for cdrecord-ProDVD. cdrecord: Free test versions and free keys for personal use are at ftp://ftp.berlios.de/pub/cdrecord/ProDVD/ Using generic SCSI-3/mmc CD-R/CD-RW driver (mmc_cdr). Driver flags : MMC-3 SWABAUDIO BURNFREE VARIREC FORCESPEED SPEEDREAD SINGLESESSION HIDECDR Supported modes: TAO PACKET SAO SAO/R96P SAO/R96R RAW/R16 RAW/R96P RAW/R96R cdrecord: Input/output error. read disk info: scsi sendcmd: no error CDB: 51 00 00 00 00 00 00 00 22 00 status: 0x2 (CHECK CONDITION) Sense Bytes: 70 00 00 00 00 00 00 0A 00 00 00 00 00 00 00 00 00 00 Sense Key: 0x0 No Additional Sense, Segment 0 Sense Code: 0x00 Qual 0x00 (no additional sense information) Fru 0x0 Sense flags: Blk 0 (not valid) cmd finished after 599.966s timeout 240s cdrecord: Cannot get disk type.
And this with some SCSI logging enabled: g40:~ # scsi <1:0:0:0> done TIMEOUT 0 scsi1 : destination target 0, lun 0 command: Mode Sense (10): 5a 00 2a 00 00 00 00 00 02 00 Error handler scsi_eh_1 waking up SCSI error : <1 0 0 0> return code = 0x8000002 sr0: Current: sense key: No Sense Additional sense: No additional sense information Error handler scsi_eh_1 sleeping scsi <1:0:0:0> done FAILED 2 scsi1 : destination target 0, lun 0 command: Read DVD structure: ad 00 00 00 00 00 00 00 00 20 00 00 Unrecognized sense data (in hex): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Error handler scsi_eh_1 waking up Error handler scsi_eh_1 sleeping scsi <1:0:0:0> done FAILED 2 scsi1 : destination target 0, lun 0 command: Mode Sense (10): 5a 00 30 00 00 00 00 00 08 00 Unrecognized sense data (in hex): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Error handler scsi_eh_1 waking up Error handler scsi_eh_1 sleeping scsi: unknown opcode 0xe9 scsi: unknown opcode 0xed scsi: unknown opcode 0x01 scsi <1:0:0:0> done TIMEOUT 0 scsi1 : destination target 0, lun 0 command: Xpwrite, Read disk info: 51 00 00 00 00 00 00 00 22 00 Error handler scsi_eh_1 waking up SCSI error : <1 0 0 0> return code = 0x8000002 sr0: Current: sense key: No Sense Additional sense: No additional sense information Error handler scsi_eh_1 sleeping scsi <1:0:0:0> done FAILED 2 scsi1 : destination target 0, lun 0 command: Mode Select (10): 55 10 00 00 00 00 00 00 10 00 Unrecognized sense data (in hex): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Error handler scsi_eh_1 waking up Error handler scsi_eh_1 sleeping Bastard. I thought we had that one fixed.
Well, seems to be a red herring. The sense data is retrieved correctly in ata_qc_timeout(). This is from the debug output of cdrecord: Executing 'mode select g1' command on Bus 0 Target 0, Lun 0 timeout 40s CDB: 55 10 00 00 00 00 00 00 10 00 Sending 16 (0x10) bytes of data. Write Data: 00 00 00 00 00 00 00 08 00 00 00 00 00 00 08 00 cdrecord: Input/output error. mode select g1: scsi sendcmd: no error CDB: 55 10 00 00 00 00 00 00 10 00 status: 0x2 (CHECK CONDITION) Sense Bytes: 70 00 05 00 00 00 00 0A 00 00 00 00 26 00 00 00 00 00 Sense Key: 0x5 Illegal Request, Segment 0 Sense Code: 0x26 Qual 0x00 (invalid field in parameter list) Fru 0x0 Sense flags: Blk 0 (not valid)
Closing this, as it is quite stale.