Bug 105377 - oops in reiserfs_writepage
Summary: oops in reiserfs_writepage
Status: RESOLVED INVALID
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 1
Hardware: Other All
: P5 - None : Normal
Target Milestone: ---
Assignee: Chris L Mason
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-18 00:36 UTC by Andreas Kleen
Modified: 2005-09-13 09:19 UTC (History)
2 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Kleen 2005-08-18 00:36:13 UTC
alfaro (running SLES9, but with a HEAD kernel) just threw this nice oops.

alfano login: general protection fault: 0000 [1] SMP
CPU 0
Modules linked in: freq_table edd autofs4 ipv6 thermal processor fan button
battery ac af_packet tg3 i2c_i801 i2c_core ehci_hcd generic uhci_hcd usbcore
shpchp pci_hotplug parport_pc lp parport video1394 ohci1394 raw1394 ieee1394
dm_mod reiserfs ata_piix ahci libata piix ide_disk ide_cd ide_core sr_mod cdrom
sd_mod scsi_mod
Pid: 384, comm: pdflush Not tainted 2.6.13-rc5-git3-3-smp
RIP: 0010:[<ffffffff880d2748>] <ffffffff880d2748>{:reiserfs:reiserfs_writepage+392}
RSP: 0018:ffff81015f511a38  EFLAGS: 00010246
RAX: 000000008fb6c9fb RBX: 0000000000000000 RCX: 0000000000000000
RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffff81015e0d05b0
RBP: ffff810143397c70 R08: 0000000000001000 R09: 0000000000000019
R10: 0000000000000258 R11: 0000000000000019 R12: ffff81014ba92270
R13: 0000000000000000 R14: ffff810150c22920 R15: 068f4832bb77ee4a
FS:  0000000000000000(0000) GS:ffffffff80588800(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000594170 CR3: 0000000155aab000 CR4: 00000000000006e0
Process pdflush (pid: 384, threadinfo ffff81015f510000, task ffff81015f814760)
Stack: 0000000000000000 ffff81015f511e38 ffff810005566e00 00000000805d2800
       ffff81015e1a6800 000000015f4f80a8 00000000000001cd 000001cc00000000
       00000000001cc001 0000000000000000
Call Trace:<ffffffff8036e217>{thread_return+145}
<ffffffff80166888>{find_get_pages_tag+152} 
       <ffffffff801b3267>{mpage_writepages+455}
<ffffffff880d25c0>{:reiserfs:reiserfs_writepage+0}
       <ffffffff801b189c>{__writeback_single_inode+428}
<ffffffff8016d770>{pdflush+0}
       <ffffffff801b1ec2>{generic_sync_sb_inodes+546} <ffffffff8016d770>{pdflush+0}
       <ffffffff80152cc0>{keventd_create_kthread+0}
<ffffffff801b220d>{writeback_inodes+125}
       <ffffffff8016cde6>{wb_kupdate+214} <ffffffff8016d8a5>{pdflush+309}
       <ffffffff8016cd10>{wb_kupdate+0} <ffffffff80152f73>{kthread+243}
       <ffffffff80137e30>{schedule_tail+64} <ffffffff8010fa52>{child_rip+8}
       <ffffffff80152cc0>{keventd_create_kthread+0} <ffffffff80152e80>{kthread+0}
       <ffffffff8010fa4a>{child_rip+0}

Code: 41 8b 07 89 c0 a8 02 0f 84 a2 05 00 00 41 8b 07 89 c0 a8 20
RIP <ffffffff880d2748>{:reiserfs:reiserfs_writepage+392} RSP <ffff81015f511a38>

Workload was probably just autobuild/icecream.
Comment 1 Andreas Kleen 2005-08-18 03:55:31 UTC
While testing on AIM7 on Adams (16 core Opteron) I hit the following 
lockup too. Two CPUs ran into the same backtrace while spinning on the BKL
in reiserfs_setattr. This was a mainline 2.6.13rc6-git3 kernel with some x86-64
patches, but should be near HEAD.

reiserfs to blame too?  

(same lockup on CPU 11, after that panic reboot stopped things)
NMI Watchdog detected LOCKUP on CPU14CPU 14 
^MModules linked in:
^MPid: 19381, comm: reaim Not tainted 2.6.13-rc6-git7
^MRIP: 0010:[<ffffffff80416989>] <ffffffff80416989>{_spin_lock_irqsave+9}
^MRSP: 0018:ffff81013be31c40  EFLAGS: 00000002
^MRAX: 0000000000000000 RBX: ffffffff804bcd20 RCX: ffff81013be30000
^MRDX: ffff81013ef90000 RSI: ffff81013be0d0b0 RDI: ffffffff804bcd28
^MRBP: 0000000000000282 R08: ffff81013be30000 R09: 0000000000000002
^MR10: 00000000ffffffff R11: ffff810180e235e0 R12: ffffffff804bcd28
^MR13: ffff81013be0d0b0 R14: ffff81013be31c50 R15: 00000000000001ff
^MFS:  00002aaaaaf3b0a0(0000) GS:ffffffff805f7f00(0000) knlGS:0000000000000000
^MCS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
^MCR2: 00002aaaaaf1f7e8 CR3: 000000013be2f000 CR4: 00000000000006a0
^MProcess reaim (pid: 19381, threadinfo ffff81013be30000, task ffff81013be0d0b0)
^MStack: 0000000000000282 ffffffff80414860 0000000000000001 ffff81013be0d0b0 
^M       ffffffff80131d60 ffff81007d84fc68 ffff81007a8e1c68 ffff81013be31dd8 
^M       ffff81013be31dd8 ffff8100cdfb9cb0 
^MCall Trace:<ffffffff80414860>{__down+160}
<ffffffff80131d60>{default_wake_function+
0}
^M       <ffffffff80416789>{__down_failed+53}
<ffffffff80416d56>{.text.lock.kernel_lo
ck+25}
^M       <ffffffff801c385c>{reiserfs_setattr+44}
<ffffffff80131d43>{try_to_wake_up+10
43}
^M       <ffffffff80416613>{__down_write+51} <ffffffff8019a574>{notify_change+340}
^M       <ffffffff8017d011>{do_truncate+65} <ffffffff8018e2c4>{may_open+468}
^M       <ffffffff8018fc0e>{open_namei+734} <ffffffff80415b05>{thread_return+0}
^M       <ffffffff8017cc57>{filp_open+39} <ffffffff8017ca0b>{get_unused_fd+219}
^M       <ffffffff8017ccd4>{sys_open+84} <ffffffff8010d95e>{system_call+126}
^M       

Comment 2 Jeff Mahoney 2005-08-18 17:39:54 UTC
The Oops in the description maps to the buffer_dirty check in the mapping loop
of reiserfs_write_full_page():

        bh = head;
        block = page->index << (PAGE_CACHE_SHIFT - s->s_blocksize_bits);
        /* first map all the buffers, logging any direct items we find */
        do {
                                /* v----- oops */
                if ((checked || buffer_dirty(bh)) && (!buffer_mapped(bh) ||
                                                      (buffer_mapped(bh)
                                                       && bh->b_blocknr ==
                                                       0))) {
                        /* not mapped yet, or it points to a direct item, search
                         * the btree for the mapping info, and log any direct
                         * items found
                         */
                        if ((error = map_block_for_writepage(inode, bh, block))) {
                                goto fail;
                        }
                }
                bh = bh->b_this_page;
                block++;
        } while (bh != head);
Comment 3 Jeff Mahoney 2005-08-18 18:17:10 UTC
More detail:
    b748:       41 8b 07                mov    (%r15),%eax # bh->b_state
    b74b:       89 c0                   mov    %eax,%eax
    b74d:       a8 02                   test   $0x2,%al    # BH_Dirty

r15 contains garbage, definatley not a kernel address: 068f4832bb77ee4a

This looks like memory corruption. Can you try with a more recent kernel?
2.6.13-rc5-git3 is two weeks old already.
Comment 4 Andreas Kleen 2005-08-25 18:29:13 UTC
The machine runs rc7 fine some days, but I haven't retried with  
the AIM7 stress test yet 
Comment 5 Andreas Kleen 2005-09-13 09:19:24 UTC
Reproduced lots of deadlocks (tracked in oterh bug), but not the memory
corruption. So it might have been an one-off.