Bugzilla – Bug 113237
XFS oops when trying installation repair
Last modified: 2005-09-05 23:55:46 UTC
when prompted to choose update or install, clicked on other. choose automatic repair activate the swap repair did not proceed I tried to reproduce the problem but it did not occur again here is an extract of dmesg: md: ... autorun DONE. end_request: I/O error, dev fd0, sector 0 end_request: I/O error, dev fd0, sector 0 end_request: I/O error, dev fd0, sector 0 end_request: I/O error, dev fd0, sector 0 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. Adding 1052216k swap on /dev/hda1. Priority:-1 extents:1 Adding 1052216k swap on /dev/hda1. Priority:-2 extents:1 SGI XFS with ACLs, security attributes, realtime, large block numbers, no debug enabled SGI XFS Quota Management subsystem Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c0118c4d *pde = 00000000 Oops: 0000 [#1] Modules linked in: xfs exportfs reiserfs ext3 jbd parport_pc parport edd dm_snap shot multipath raid6 raid5 xor raid1 raid0 dm_mod st piix fan thermal processor usb_storage usbhid uhci_hcd usbcore ide_disk ide_cd ide_core sg sr_mod sd_mod sc si_mod cdrom cramfs vfat fat nls_iso8859_1 nls_cp437 af_packet nvram CPU: 0 EIP: 0060:[<c0118c4d>] Not tainted VLI EFLAGS: 00210046 (2.6.13-rc6-git7-3-default) EIP is at try_to_wake_up+0xd/0x90 eax: 00000000 ebx: 00000000 ecx: 00000000 edx: 0000000f esi: 00200246 edi: 00000000 ebp: c4c21d74 esp: c4c21d68 ds: 007b es: 007b ss: 0068 Process y2base (pid: 2499, threadinfo=c4c20000 task=c1617530) Stack: 00000000 c3bbafc0 000000c0 0000000b c928d534 c0145f91 00000000 00000000 00000000 000000d2 00000000 00000180 0000658f 00000006 c4c21e18 00000080 0000000b c0146f88 000000d2 c035b4c4 0000658e 00000009 00000060 00000003 Call Trace: [<c928d534>] xfsbufd_wakeup+0x24/0x30 [xfs] [<c0145f91>] shrink_slab+0xa1/0x180 [<c0146f88>] try_to_free_pages+0xe8/0x1b0 [<c0140eaf>] __alloc_pages+0x1ef/0x420 [<c0149fef>] do_wp_page+0x9f/0x2e0 [<c014aefb>] __handle_mm_fault+0x11b/0x130 [<c0117497>] do_page_fault+0x127/0x5ef [<c016a178>] poll_freewait+0x48/0x60 [<c016a64a>] do_select+0x30a/0x340 [<c0117370>] do_page_fault+0x0/0x5ef [<c0103f0f>] error_code+0x4f/0x60 [<c016007b>] bd_acquire+0x2b/0xa0 [<c01e33de>] __put_user_4+0x12/0x18 [<c016a8ba>] sys_select+0x21a/0x360 [<c0102d79>] syscall_call+0x7/0xb Code: 3c 54 3f c0 55 0f 94 c0 25 ff 00 00 00 89 e5 5d c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 57 89 cf 56 53 89 c3 9c 5e fa <8b> 08 31 c0 85 ca 74 0d 8b 53 28 85 d2 74 14 c7 03 00 00 00 00
Walks like an XFS bug, quacks like an XFS bug, must be an XFS bug :)
SGI, is this enough information for you to fix this?
Created attachment 47757 [details] xfsbufd startup and wakeup fixes
What does this automatic repair choice do? Anyway, this oops means xfsbufd_task is NULL. The only reason I see this could happen is due to a race when the xfsbufd thread is started - xfsbufd_wakeup is called before the child had a chance to run. The patch below switches xfsbufd startup to the kthread infrastructure that avoids this race, and adds some small fixes to xfsbufd_wakeup. Could you spin a kernel with that fix for the reporter? (sorry, this should have gotten out with the attachment, but @#W%#% bugzilla ignores the comment when you're adding an attachment)
http://portal.suse.de/sdb/de/2003/11/YaST-System-Repair.html contains a German description of system repair. This feature most certainly has nothing to do with this bug, though. It's a bit tricky to test this: system repair is offered during installation only. Can we maybe trigger this with a simple test case? At least a trivial loop didn't fail for me, with or without the fix: #! /bin/sh set -v while :; do modprobe xfs mount /dev/hda7 /mnt dd if=/dev/urandom of=/mnt/random bs=16 count=1 umount /mnt rmmod xfs done I saw you submitted this change to the xfs cvs, but without this hunk: @@ -1744,10 +1743,15 @@ int priority, unsigned int mask) { - if (xfsbufd_force_sleep) + if (xfsbufd_force_sleep || !priority) return 0; + if (!xfsbufd_task) { + WARN_ON(1); + return 0; + } + xfsbufd_force_flush = 1; - barrier(); + wmb(); wake_up_process(xfsbufd_task); return 0; } Christoph, I think we can just take the fix. Which version should we go with? Thanks.
The additional fixes shouldn't be nessecary. They're were just some more defensive programming, but I talked to Nathan and am pretty sure either the diff I checked into CVS or today's "Make sure the threads and shaker in xfs_buf arede-initialized in reverse startup order" checking should fix the issue.
Okay, I have added the two changesets from the xfs cvs as patches.fixes/xfs-switch-to-kthread-api and patches.fixes/xfs-switch-to-kthread-api-2. Thanks.