Bug 145204

Summary: dmapi seems to break xfs
Product: [openSUSE] SUSE Linux 10.1 Reporter: Andreas Gruenbacher <agruen>
Component: KernelAssignee: Forgotten User aHtZ2osk0j <forgotten_aHtZ2osk0j>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Blocker    
Priority: P5 - None CC: adrian.schroeter, aj, asklein, bobo, bugproxy, dgc, forgotten_4Cp5OYkKcG, forgotten_mbQyAD5r4K, gholmer, gp, jengelh, kuenne, nathans, suse-beta, susedev
Version: Beta 2   
Target Milestone: ---   
Hardware: Other   
OS: Other   
See Also: http://bugworks.engr.sgi.com/query.cgi/948724
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on:    
Bug Blocks: 144595    
Attachments: NULL pointer dereference in xfs_buf_rele
Possible fix for xfs_buf_rele panic

Description Andreas Gruenbacher 2006-01-24 15:22:57 UTC
We have the following two issues with xfs right now:

- xfs.ko currently has a dependency on dmapi.ko and exportfs.ko. At least
  the dependency on dmapi,ko is definitely wrong. Can this please be fixed?

- When trying to mount an xfs filesystem, we get a NULL pointer
  dereference attempt in xfs_buf_rele.

This is with the 2.6.16-rc1-git3-3-default kernel from SUSE Linux 10.1 Beta 2. Can you please look into these problems?
Comment 1 Andreas Gruenbacher 2006-01-24 15:23:39 UTC
Created attachment 64766 [details]
NULL pointer dereference in xfs_buf_rele
Comment 2 Chris L Mason 2006-01-24 15:39:38 UTC
AJ FYI -> blocker
Comment 3 Jan Engelhardt 2006-01-24 20:09:26 UTC
This already happened with most kotd based on 2.6.15 and 2.6.15-gits.
Comment 4 David Chinner 2006-01-24 23:55:39 UTC
Nathan is at linux.conf.au at the moment, so he won't be able
to answer directly. In the mean time, I'll do my best....

If I understand things correctly, when you build your kernel with CONFIG_XFS_DMAPI (as it appears to be from the attached oops report) XFS becomes dependent on dmapi.ko. This appears to be the same module
dependency tree as in sles9sp3 (from modules.dep):

/lib/modules/2.6.5-7.244-default/kernel/fs/xfs/xfs.ko: /lib/modules/2.6.5-7.244-default/kernel/fs/exportfs/exportfs.ko /lib/modules/2.6.5-7.244-default/kernel/fs/dmapi/dmapi.ko

Can you explain in a little more detail what the problem is here?

As to the oops in xfs_buf_rele(), I have not seen this before. It looks
to be a reference counting problem on the buffer being used to read
the log resulting in it being freed too early and the wrong way. 

It is not immediately obvious what is wrong here, so can you give us some indication of what the filesystem is doing during log recovery? i.e.
if it still oops during mount, can you run 'xfs_logprint -t <device>`
before attempting to mount it and attach the output? FWIW, did the system crash prior to this problem, or was it after a clean unmount?
Comment 5 Andreas Gruenbacher 2006-01-25 16:34:43 UTC
Update: we managed to still disable DMAPI in the configs for Beta2.
SGI, could you please look into what's broken?
Comment 6 Andreas Gruenbacher 2006-01-25 17:07:57 UTC
Comment 4: It doesn't seem right to me to pull in dmapi,ko when dmapi isn't being used. I didn't notice that SP3 has the same dependency.

The Oops goes away with CONFIG_XFS_DMAPI=n, so it's some interaction with the DMAPI code. The patches I received from Bob were incomplete: at least the things in patches.suse/dmapi-enable2 were missing. Likely something went wrong there. I had asked Bob to check if our KOTD worked but didn't receive feedback. So either Bob didn't hit this case, or he didn't get to testing a KOTD with the dmapi patches in.
Comment 7 Andreas Gruenbacher 2006-01-25 18:06:44 UTC
*** Bug 145517 has been marked as a duplicate of this bug. ***
Comment 8 Robert Kierski 2006-01-25 20:05:07 UTC
I've been trying to reproduce "The DMAPI problem" but I haven't been able to see anything wrong.  I built 1) XFS w/DMAPI, 2) XFS wo/DMAPI, 3) DMAPI wo/XFS, 4) XFS and DMAPI as loadable modules, and 5) XFS and DMAPI as in kernel modules.

None of the above combinations cause any problems for me.

I did test the KOTD with respect to the DMAPI changes you made.  I didn't run it through a full course of tests, but I was able to verify that basic functionality was working -- files migrated, unmigrated, the DM attributes were reported correctly.

Unfortunately, there seems to be a delay between your KOTD and our KOTD.  Maybe I've been testing with something different than what you've got.  I'll start over using a fresh workarea.
Comment 9 Jan Engelhardt 2006-01-25 20:33:00 UTC
#8: Try the KOTD from 2006-01-11.
Comment 10 Gerald Pfeifer 2006-01-25 21:28:09 UTC
I'm afraid I may miss something, but why should Robert try that old kernel
version?  If something reproduces there, but not with the current kernel,
that won't be a problem for us to worry about, would it?
Comment 11 Andreas Gruenbacher 2006-01-25 21:32:29 UTC
Indeed, rebuilding a current KOTD or the Beta2 kernel with CONFIG_XFS_DMAPI enabled would be more helpful.
Comment 12 Robert Kierski 2006-01-25 22:26:20 UTC
Sorry... I was building and not paying attention to the bug.  I built 2.6.16-rc1-git3-sn2 (kernel-source-2.6.16_rc1_git3-20060124182340.src.rpm).

I did a bunch of tests with different mount options.  I tried hitting reset while a file was being written.  None of the tests resulted in failures or errors of
any kind.

Comment 13 Andreas Gruenbacher 2006-01-25 22:31:13 UTC
You did that after enabling CONFIG_XFS_DMAPI and CONFIG_DMAPI in the configs, right?
Comment 14 Eric Sandeen 2006-01-25 23:03:38 UTC
Replying for Bob... yep with dmapi configured on.

I (Eric) also did this test; I installed the
kernel-source-2.6.16_rc1_git3-20060124182340
kernel-default-2.6.16_rc1_git3-20060124182340

packages, and edited the .config to enable CONFIG_XFS_DMAPI=y and CONFIG_DMAPI=m

Then I rebuilt just xfs & dmapi modules:
make -j2 O=`pwd`/../linux-2.6.16-rc1-git3-20060124182340-obj/ia64/default/ oldconfig
make O=`pwd`/../linux-2.6.16-rc1-git3-20060124182340-obj/ia64/default/ M=fs/xfs/ modules
make O=`pwd`/../linux-2.6.16-rc1-git3-20060124182340-obj/ia64/default/ M=fs/dmapi modules

and loaded up these new modules.  Clean & dirty xfs filesystems also mount 
fine for me.  Any tips on reproducing this bug...?

dmesg & modinfo output for successful mount:

dmapi: module not supported by Novell, setting U taint flag.
xfs: module not supported by Novell, setting U taint flag.
SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, dmapi support, no debug enabled
xfs_quota: module not supported by Novell, setting U taint flag.
SGI XFS Quota Management subsystem
XFS mounting filesystem sda10
Ending clean XFS mount for filesystem: sda10

cxfsopus9:/usr/src/linux-2.6.16-rc1-git3-20060124182340-obj/ia64/default/fs/xfs # modinfo ./xfs.ko
filename:       ./xfs.ko
author:         Silicon Graphics, Inc.
description:    SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, dmapi support, no debug enabled
license:        GPL
vermagic:       2.6.16-rc1-git3-20060124182340-default SMP ia64gcc-4.1
depends:
srcversion:     D9DCEFBADB45A357649C361

p.s. for some reason we're not getting email for traffic on this bug, apologies
if replies are a bit slow.
Comment 15 Andreas Gruenbacher 2006-01-25 23:20:14 UTC
That's all very weird. Ludwig, can we do some more testing on your laptop and try to reproduce?
Comment 16 Christoph Thiel 2006-01-25 23:36:58 UTC
Andreas, I guess you wanted to ask s/Ludwig/Christoph/ to reproduce, right? I'v already updated my laptop and don't have any XFS partition any longer :( But AFAIK Adrian ran into this kind of problem as well. CCing Adrian.
Comment 17 Eric Sandeen 2006-01-25 23:41:38 UTC
Can I ask, how many failures with dmapi, and how many successes without dmapi,
were seen?  The backtrace for the oops really doesn't look like it could possibly
have much to do with dmapi, for what it's worth.
Comment 19 Nathan Scott 2006-01-30 05:03:14 UTC
Created attachment 65575 [details]
Possible fix for xfs_buf_rele panic

Can someone who can reproduce this (the xfs_buf_rele panic during a journal read, I mean) please try this attached patch and report back?

thanks!
Comment 20 Forgotten User 4Cp5OYkKcG 2006-01-30 05:53:37 UTC
I have reproduced the problem (the xfs_buf_rele panic) on i386. 
After upgrading to 10.1 beta2 (kernel linux-2.6.15-git12-6) it crashed every
time. I have tried to recompile with CONFIG_XFS_DMAPI=y and CONFIG_DMAPI=m
without change.

I have then tried the kotd linux-2.6.16-rc1-git3-20060128210603 with the same result - both the default and CONFIG_XFS_DMAPI=y, CONFIG_DMAPI=m crashed.

The patch from #19 fixes the problem for me and I can mount xfs file systems as usual.

linux:~ # uname -a
Linux linux 2.6.16-rc1-git3-20060128210603-default #1 Sat Jan 28 21:06:03 UTC 2006 i686 i686 i386 GNU/Linux
linux:~ # lsmod | grep xfs
xfs_quota              44896  0
xfs                   508248  4 xfs_quota
exportfs                5504  1 xfs
dmapi                  43688  1 xfs,[permanent]


Comment 21 Andreas Gruenbacher 2006-01-30 15:10:50 UTC
Gerald, many thanks for testing this!
Nathan, can I check in the fix and re-enable DMAPI?
Comment 22 Nathan Scott 2006-01-30 22:42:47 UTC
Hi Andreas,

Sure thing.  I'll make sure this gets into mainline before 2.6.16.

cheers.
Comment 23 Andreas Gruenbacher 2006-02-01 00:28:11 UTC
Ah, not being in the CC list explains why I didn't notice your comment, thanks. We didn't make it for Beta 3 unfortunately.
Comment 24 Andreas Gruenbacher 2006-02-02 13:15:33 UTC
I yesterday had a machine on which this bug triggered, and the patch fixed it for me as well.
Comment 25 Nathan Scott 2006-02-02 22:34:22 UTC
Oh, forgot to update here - this was merged into mainline a day/two ago,
so will be there if/when the next -rc merge happens.

cheers.
Comment 26 Andreas Gruenbacher 2006-02-03 12:00:25 UTC
*** Bug 147960 has been marked as a duplicate of this bug. ***
Comment 27 Chris L Mason 2006-02-04 01:34:49 UTC
*** Bug 147962 has been marked as a duplicate of this bug. ***
Comment 28 Chris L Mason 2006-02-06 20:15:23 UTC
*** Bug 148491 has been marked as a duplicate of this bug. ***
Comment 29 Christoph Thiel 2006-02-07 10:32:09 UTC
*** Bug 146060 has been marked as a duplicate of this bug. ***
Comment 30 Chris L Mason 2006-02-23 18:11:07 UTC
*** Bug 152347 has been marked as a duplicate of this bug. ***