Bug 145204 - dmapi seems to break xfs
Summary: dmapi seems to break xfs
Status: RESOLVED FIXED
: 145517 146060 147960 147962 148491 152347 (view as bug list)
Alias: None
Product: SUSE Linux 10.1
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 2
Hardware: Other Other
: P5 - None : Blocker (vote)
Target Milestone: ---
Assignee: Forgotten User aHtZ2osk0j
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 144595
  Show dependency treegraph
 
Reported: 2006-01-24 15:22 UTC by Andreas Gruenbacher
Modified: 2006-02-23 18:11 UTC (History)
15 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
NULL pointer dereference in xfs_buf_rele (3.13 KB, text/plain)
2006-01-24 15:23 UTC, Andreas Gruenbacher
Details
Possible fix for xfs_buf_rele panic (381 bytes, patch)
2006-01-30 05:03 UTC, Nathan Scott
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Gruenbacher 2006-01-24 15:22:57 UTC
We have the following two issues with xfs right now:

- xfs.ko currently has a dependency on dmapi.ko and exportfs.ko. At least
  the dependency on dmapi,ko is definitely wrong. Can this please be fixed?

- When trying to mount an xfs filesystem, we get a NULL pointer
  dereference attempt in xfs_buf_rele.

This is with the 2.6.16-rc1-git3-3-default kernel from SUSE Linux 10.1 Beta 2. Can you please look into these problems?
Comment 1 Andreas Gruenbacher 2006-01-24 15:23:39 UTC
Created attachment 64766 [details]
NULL pointer dereference in xfs_buf_rele
Comment 2 Chris L Mason 2006-01-24 15:39:38 UTC
AJ FYI -> blocker
Comment 3 Jan Engelhardt 2006-01-24 20:09:26 UTC
This already happened with most kotd based on 2.6.15 and 2.6.15-gits.
Comment 4 David Chinner 2006-01-24 23:55:39 UTC
Nathan is at linux.conf.au at the moment, so he won't be able
to answer directly. In the mean time, I'll do my best....

If I understand things correctly, when you build your kernel with CONFIG_XFS_DMAPI (as it appears to be from the attached oops report) XFS becomes dependent on dmapi.ko. This appears to be the same module
dependency tree as in sles9sp3 (from modules.dep):

/lib/modules/2.6.5-7.244-default/kernel/fs/xfs/xfs.ko: /lib/modules/2.6.5-7.244-default/kernel/fs/exportfs/exportfs.ko /lib/modules/2.6.5-7.244-default/kernel/fs/dmapi/dmapi.ko

Can you explain in a little more detail what the problem is here?

As to the oops in xfs_buf_rele(), I have not seen this before. It looks
to be a reference counting problem on the buffer being used to read
the log resulting in it being freed too early and the wrong way. 

It is not immediately obvious what is wrong here, so can you give us some indication of what the filesystem is doing during log recovery? i.e.
if it still oops during mount, can you run 'xfs_logprint -t <device>`
before attempting to mount it and attach the output? FWIW, did the system crash prior to this problem, or was it after a clean unmount?
Comment 5 Andreas Gruenbacher 2006-01-25 16:34:43 UTC
Update: we managed to still disable DMAPI in the configs for Beta2.
SGI, could you please look into what's broken?
Comment 6 Andreas Gruenbacher 2006-01-25 17:07:57 UTC
Comment 4: It doesn't seem right to me to pull in dmapi,ko when dmapi isn't being used. I didn't notice that SP3 has the same dependency.

The Oops goes away with CONFIG_XFS_DMAPI=n, so it's some interaction with the DMAPI code. The patches I received from Bob were incomplete: at least the things in patches.suse/dmapi-enable2 were missing. Likely something went wrong there. I had asked Bob to check if our KOTD worked but didn't receive feedback. So either Bob didn't hit this case, or he didn't get to testing a KOTD with the dmapi patches in.
Comment 7 Andreas Gruenbacher 2006-01-25 18:06:44 UTC
*** Bug 145517 has been marked as a duplicate of this bug. ***
Comment 8 Robert Kierski 2006-01-25 20:05:07 UTC
I've been trying to reproduce "The DMAPI problem" but I haven't been able to see anything wrong.  I built 1) XFS w/DMAPI, 2) XFS wo/DMAPI, 3) DMAPI wo/XFS, 4) XFS and DMAPI as loadable modules, and 5) XFS and DMAPI as in kernel modules.

None of the above combinations cause any problems for me.

I did test the KOTD with respect to the DMAPI changes you made.  I didn't run it through a full course of tests, but I was able to verify that basic functionality was working -- files migrated, unmigrated, the DM attributes were reported correctly.

Unfortunately, there seems to be a delay between your KOTD and our KOTD.  Maybe I've been testing with something different than what you've got.  I'll start over using a fresh workarea.
Comment 9 Jan Engelhardt 2006-01-25 20:33:00 UTC
#8: Try the KOTD from 2006-01-11.
Comment 10 Gerald Pfeifer 2006-01-25 21:28:09 UTC
I'm afraid I may miss something, but why should Robert try that old kernel
version?  If something reproduces there, but not with the current kernel,
that won't be a problem for us to worry about, would it?
Comment 11 Andreas Gruenbacher 2006-01-25 21:32:29 UTC
Indeed, rebuilding a current KOTD or the Beta2 kernel with CONFIG_XFS_DMAPI enabled would be more helpful.
Comment 12 Robert Kierski 2006-01-25 22:26:20 UTC
Sorry... I was building and not paying attention to the bug.  I built 2.6.16-rc1-git3-sn2 (kernel-source-2.6.16_rc1_git3-20060124182340.src.rpm).

I did a bunch of tests with different mount options.  I tried hitting reset while a file was being written.  None of the tests resulted in failures or errors of
any kind.

Comment 13 Andreas Gruenbacher 2006-01-25 22:31:13 UTC
You did that after enabling CONFIG_XFS_DMAPI and CONFIG_DMAPI in the configs, right?
Comment 14 Eric Sandeen 2006-01-25 23:03:38 UTC
Replying for Bob... yep with dmapi configured on.

I (Eric) also did this test; I installed the
kernel-source-2.6.16_rc1_git3-20060124182340
kernel-default-2.6.16_rc1_git3-20060124182340

packages, and edited the .config to enable CONFIG_XFS_DMAPI=y and CONFIG_DMAPI=m

Then I rebuilt just xfs & dmapi modules:
make -j2 O=`pwd`/../linux-2.6.16-rc1-git3-20060124182340-obj/ia64/default/ oldconfig
make O=`pwd`/../linux-2.6.16-rc1-git3-20060124182340-obj/ia64/default/ M=fs/xfs/ modules
make O=`pwd`/../linux-2.6.16-rc1-git3-20060124182340-obj/ia64/default/ M=fs/dmapi modules

and loaded up these new modules.  Clean & dirty xfs filesystems also mount 
fine for me.  Any tips on reproducing this bug...?

dmesg & modinfo output for successful mount:

dmapi: module not supported by Novell, setting U taint flag.
xfs: module not supported by Novell, setting U taint flag.
SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, dmapi support, no debug enabled
xfs_quota: module not supported by Novell, setting U taint flag.
SGI XFS Quota Management subsystem
XFS mounting filesystem sda10
Ending clean XFS mount for filesystem: sda10

cxfsopus9:/usr/src/linux-2.6.16-rc1-git3-20060124182340-obj/ia64/default/fs/xfs # modinfo ./xfs.ko
filename:       ./xfs.ko
author:         Silicon Graphics, Inc.
description:    SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, dmapi support, no debug enabled
license:        GPL
vermagic:       2.6.16-rc1-git3-20060124182340-default SMP ia64gcc-4.1
depends:
srcversion:     D9DCEFBADB45A357649C361

p.s. for some reason we're not getting email for traffic on this bug, apologies
if replies are a bit slow.
Comment 15 Andreas Gruenbacher 2006-01-25 23:20:14 UTC
That's all very weird. Ludwig, can we do some more testing on your laptop and try to reproduce?
Comment 16 Christoph Thiel 2006-01-25 23:36:58 UTC
Andreas, I guess you wanted to ask s/Ludwig/Christoph/ to reproduce, right? I'v already updated my laptop and don't have any XFS partition any longer :( But AFAIK Adrian ran into this kind of problem as well. CCing Adrian.
Comment 17 Eric Sandeen 2006-01-25 23:41:38 UTC
Can I ask, how many failures with dmapi, and how many successes without dmapi,
were seen?  The backtrace for the oops really doesn't look like it could possibly
have much to do with dmapi, for what it's worth.
Comment 19 Nathan Scott 2006-01-30 05:03:14 UTC
Created attachment 65575 [details]
Possible fix for xfs_buf_rele panic

Can someone who can reproduce this (the xfs_buf_rele panic during a journal read, I mean) please try this attached patch and report back?

thanks!
Comment 20 Forgotten User 4Cp5OYkKcG 2006-01-30 05:53:37 UTC
I have reproduced the problem (the xfs_buf_rele panic) on i386. 
After upgrading to 10.1 beta2 (kernel linux-2.6.15-git12-6) it crashed every
time. I have tried to recompile with CONFIG_XFS_DMAPI=y and CONFIG_DMAPI=m
without change.

I have then tried the kotd linux-2.6.16-rc1-git3-20060128210603 with the same result - both the default and CONFIG_XFS_DMAPI=y, CONFIG_DMAPI=m crashed.

The patch from #19 fixes the problem for me and I can mount xfs file systems as usual.

linux:~ # uname -a
Linux linux 2.6.16-rc1-git3-20060128210603-default #1 Sat Jan 28 21:06:03 UTC 2006 i686 i686 i386 GNU/Linux
linux:~ # lsmod | grep xfs
xfs_quota              44896  0
xfs                   508248  4 xfs_quota
exportfs                5504  1 xfs
dmapi                  43688  1 xfs,[permanent]


Comment 21 Andreas Gruenbacher 2006-01-30 15:10:50 UTC
Gerald, many thanks for testing this!
Nathan, can I check in the fix and re-enable DMAPI?
Comment 22 Nathan Scott 2006-01-30 22:42:47 UTC
Hi Andreas,

Sure thing.  I'll make sure this gets into mainline before 2.6.16.

cheers.
Comment 23 Andreas Gruenbacher 2006-02-01 00:28:11 UTC
Ah, not being in the CC list explains why I didn't notice your comment, thanks. We didn't make it for Beta 3 unfortunately.
Comment 24 Andreas Gruenbacher 2006-02-02 13:15:33 UTC
I yesterday had a machine on which this bug triggered, and the patch fixed it for me as well.
Comment 25 Nathan Scott 2006-02-02 22:34:22 UTC
Oh, forgot to update here - this was merged into mainline a day/two ago,
so will be there if/when the next -rc merge happens.

cheers.
Comment 26 Andreas Gruenbacher 2006-02-03 12:00:25 UTC
*** Bug 147960 has been marked as a duplicate of this bug. ***
Comment 27 Chris L Mason 2006-02-04 01:34:49 UTC
*** Bug 147962 has been marked as a duplicate of this bug. ***
Comment 28 Chris L Mason 2006-02-06 20:15:23 UTC
*** Bug 148491 has been marked as a duplicate of this bug. ***
Comment 29 Christoph Thiel 2006-02-07 10:32:09 UTC
*** Bug 146060 has been marked as a duplicate of this bug. ***
Comment 30 Chris L Mason 2006-02-23 18:11:07 UTC
*** Bug 152347 has been marked as a duplicate of this bug. ***