Bugzilla – Bug 105377
oops in reiserfs_writepage
Last modified: 2005-09-13 09:19:24 UTC
alfaro (running SLES9, but with a HEAD kernel) just threw this nice oops. alfano login: general protection fault: 0000 [1] SMP CPU 0 Modules linked in: freq_table edd autofs4 ipv6 thermal processor fan button battery ac af_packet tg3 i2c_i801 i2c_core ehci_hcd generic uhci_hcd usbcore shpchp pci_hotplug parport_pc lp parport video1394 ohci1394 raw1394 ieee1394 dm_mod reiserfs ata_piix ahci libata piix ide_disk ide_cd ide_core sr_mod cdrom sd_mod scsi_mod Pid: 384, comm: pdflush Not tainted 2.6.13-rc5-git3-3-smp RIP: 0010:[<ffffffff880d2748>] <ffffffff880d2748>{:reiserfs:reiserfs_writepage+392} RSP: 0018:ffff81015f511a38 EFLAGS: 00010246 RAX: 000000008fb6c9fb RBX: 0000000000000000 RCX: 0000000000000000 RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffff81015e0d05b0 RBP: ffff810143397c70 R08: 0000000000001000 R09: 0000000000000019 R10: 0000000000000258 R11: 0000000000000019 R12: ffff81014ba92270 R13: 0000000000000000 R14: ffff810150c22920 R15: 068f4832bb77ee4a FS: 0000000000000000(0000) GS:ffffffff80588800(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000594170 CR3: 0000000155aab000 CR4: 00000000000006e0 Process pdflush (pid: 384, threadinfo ffff81015f510000, task ffff81015f814760) Stack: 0000000000000000 ffff81015f511e38 ffff810005566e00 00000000805d2800 ffff81015e1a6800 000000015f4f80a8 00000000000001cd 000001cc00000000 00000000001cc001 0000000000000000 Call Trace:<ffffffff8036e217>{thread_return+145} <ffffffff80166888>{find_get_pages_tag+152} <ffffffff801b3267>{mpage_writepages+455} <ffffffff880d25c0>{:reiserfs:reiserfs_writepage+0} <ffffffff801b189c>{__writeback_single_inode+428} <ffffffff8016d770>{pdflush+0} <ffffffff801b1ec2>{generic_sync_sb_inodes+546} <ffffffff8016d770>{pdflush+0} <ffffffff80152cc0>{keventd_create_kthread+0} <ffffffff801b220d>{writeback_inodes+125} <ffffffff8016cde6>{wb_kupdate+214} <ffffffff8016d8a5>{pdflush+309} <ffffffff8016cd10>{wb_kupdate+0} <ffffffff80152f73>{kthread+243} <ffffffff80137e30>{schedule_tail+64} <ffffffff8010fa52>{child_rip+8} <ffffffff80152cc0>{keventd_create_kthread+0} <ffffffff80152e80>{kthread+0} <ffffffff8010fa4a>{child_rip+0} Code: 41 8b 07 89 c0 a8 02 0f 84 a2 05 00 00 41 8b 07 89 c0 a8 20 RIP <ffffffff880d2748>{:reiserfs:reiserfs_writepage+392} RSP <ffff81015f511a38> Workload was probably just autobuild/icecream.
While testing on AIM7 on Adams (16 core Opteron) I hit the following lockup too. Two CPUs ran into the same backtrace while spinning on the BKL in reiserfs_setattr. This was a mainline 2.6.13rc6-git3 kernel with some x86-64 patches, but should be near HEAD. reiserfs to blame too? (same lockup on CPU 11, after that panic reboot stopped things) NMI Watchdog detected LOCKUP on CPU14CPU 14 ^MModules linked in: ^MPid: 19381, comm: reaim Not tainted 2.6.13-rc6-git7 ^MRIP: 0010:[<ffffffff80416989>] <ffffffff80416989>{_spin_lock_irqsave+9} ^MRSP: 0018:ffff81013be31c40 EFLAGS: 00000002 ^MRAX: 0000000000000000 RBX: ffffffff804bcd20 RCX: ffff81013be30000 ^MRDX: ffff81013ef90000 RSI: ffff81013be0d0b0 RDI: ffffffff804bcd28 ^MRBP: 0000000000000282 R08: ffff81013be30000 R09: 0000000000000002 ^MR10: 00000000ffffffff R11: ffff810180e235e0 R12: ffffffff804bcd28 ^MR13: ffff81013be0d0b0 R14: ffff81013be31c50 R15: 00000000000001ff ^MFS: 00002aaaaaf3b0a0(0000) GS:ffffffff805f7f00(0000) knlGS:0000000000000000 ^MCS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b ^MCR2: 00002aaaaaf1f7e8 CR3: 000000013be2f000 CR4: 00000000000006a0 ^MProcess reaim (pid: 19381, threadinfo ffff81013be30000, task ffff81013be0d0b0) ^MStack: 0000000000000282 ffffffff80414860 0000000000000001 ffff81013be0d0b0 ^M ffffffff80131d60 ffff81007d84fc68 ffff81007a8e1c68 ffff81013be31dd8 ^M ffff81013be31dd8 ffff8100cdfb9cb0 ^MCall Trace:<ffffffff80414860>{__down+160} <ffffffff80131d60>{default_wake_function+ 0} ^M <ffffffff80416789>{__down_failed+53} <ffffffff80416d56>{.text.lock.kernel_lo ck+25} ^M <ffffffff801c385c>{reiserfs_setattr+44} <ffffffff80131d43>{try_to_wake_up+10 43} ^M <ffffffff80416613>{__down_write+51} <ffffffff8019a574>{notify_change+340} ^M <ffffffff8017d011>{do_truncate+65} <ffffffff8018e2c4>{may_open+468} ^M <ffffffff8018fc0e>{open_namei+734} <ffffffff80415b05>{thread_return+0} ^M <ffffffff8017cc57>{filp_open+39} <ffffffff8017ca0b>{get_unused_fd+219} ^M <ffffffff8017ccd4>{sys_open+84} <ffffffff8010d95e>{system_call+126} ^M
The Oops in the description maps to the buffer_dirty check in the mapping loop of reiserfs_write_full_page(): bh = head; block = page->index << (PAGE_CACHE_SHIFT - s->s_blocksize_bits); /* first map all the buffers, logging any direct items we find */ do { /* v----- oops */ if ((checked || buffer_dirty(bh)) && (!buffer_mapped(bh) || (buffer_mapped(bh) && bh->b_blocknr == 0))) { /* not mapped yet, or it points to a direct item, search * the btree for the mapping info, and log any direct * items found */ if ((error = map_block_for_writepage(inode, bh, block))) { goto fail; } } bh = bh->b_this_page; block++; } while (bh != head);
More detail: b748: 41 8b 07 mov (%r15),%eax # bh->b_state b74b: 89 c0 mov %eax,%eax b74d: a8 02 test $0x2,%al # BH_Dirty r15 contains garbage, definatley not a kernel address: 068f4832bb77ee4a This looks like memory corruption. Can you try with a more recent kernel? 2.6.13-rc5-git3 is two weeks old already.
The machine runs rc7 fine some days, but I haven't retried with the AIM7 stress test yet
Reproduced lots of deadlocks (tracked in oterh bug), but not the memory corruption. So it might have been an one-off.