Bugzilla – Bug 145204
dmapi seems to break xfs
Last modified: 2006-02-23 18:11:07 UTC
We have the following two issues with xfs right now: - xfs.ko currently has a dependency on dmapi.ko and exportfs.ko. At least the dependency on dmapi,ko is definitely wrong. Can this please be fixed? - When trying to mount an xfs filesystem, we get a NULL pointer dereference attempt in xfs_buf_rele. This is with the 2.6.16-rc1-git3-3-default kernel from SUSE Linux 10.1 Beta 2. Can you please look into these problems?
Created attachment 64766 [details] NULL pointer dereference in xfs_buf_rele
AJ FYI -> blocker
This already happened with most kotd based on 2.6.15 and 2.6.15-gits.
Nathan is at linux.conf.au at the moment, so he won't be able to answer directly. In the mean time, I'll do my best.... If I understand things correctly, when you build your kernel with CONFIG_XFS_DMAPI (as it appears to be from the attached oops report) XFS becomes dependent on dmapi.ko. This appears to be the same module dependency tree as in sles9sp3 (from modules.dep): /lib/modules/2.6.5-7.244-default/kernel/fs/xfs/xfs.ko: /lib/modules/2.6.5-7.244-default/kernel/fs/exportfs/exportfs.ko /lib/modules/2.6.5-7.244-default/kernel/fs/dmapi/dmapi.ko Can you explain in a little more detail what the problem is here? As to the oops in xfs_buf_rele(), I have not seen this before. It looks to be a reference counting problem on the buffer being used to read the log resulting in it being freed too early and the wrong way. It is not immediately obvious what is wrong here, so can you give us some indication of what the filesystem is doing during log recovery? i.e. if it still oops during mount, can you run 'xfs_logprint -t <device>` before attempting to mount it and attach the output? FWIW, did the system crash prior to this problem, or was it after a clean unmount?
Update: we managed to still disable DMAPI in the configs for Beta2. SGI, could you please look into what's broken?
Comment 4: It doesn't seem right to me to pull in dmapi,ko when dmapi isn't being used. I didn't notice that SP3 has the same dependency. The Oops goes away with CONFIG_XFS_DMAPI=n, so it's some interaction with the DMAPI code. The patches I received from Bob were incomplete: at least the things in patches.suse/dmapi-enable2 were missing. Likely something went wrong there. I had asked Bob to check if our KOTD worked but didn't receive feedback. So either Bob didn't hit this case, or he didn't get to testing a KOTD with the dmapi patches in.
*** Bug 145517 has been marked as a duplicate of this bug. ***
I've been trying to reproduce "The DMAPI problem" but I haven't been able to see anything wrong. I built 1) XFS w/DMAPI, 2) XFS wo/DMAPI, 3) DMAPI wo/XFS, 4) XFS and DMAPI as loadable modules, and 5) XFS and DMAPI as in kernel modules. None of the above combinations cause any problems for me. I did test the KOTD with respect to the DMAPI changes you made. I didn't run it through a full course of tests, but I was able to verify that basic functionality was working -- files migrated, unmigrated, the DM attributes were reported correctly. Unfortunately, there seems to be a delay between your KOTD and our KOTD. Maybe I've been testing with something different than what you've got. I'll start over using a fresh workarea.
#8: Try the KOTD from 2006-01-11.
I'm afraid I may miss something, but why should Robert try that old kernel version? If something reproduces there, but not with the current kernel, that won't be a problem for us to worry about, would it?
Indeed, rebuilding a current KOTD or the Beta2 kernel with CONFIG_XFS_DMAPI enabled would be more helpful.
Sorry... I was building and not paying attention to the bug. I built 2.6.16-rc1-git3-sn2 (kernel-source-2.6.16_rc1_git3-20060124182340.src.rpm). I did a bunch of tests with different mount options. I tried hitting reset while a file was being written. None of the tests resulted in failures or errors of any kind.
You did that after enabling CONFIG_XFS_DMAPI and CONFIG_DMAPI in the configs, right?
Replying for Bob... yep with dmapi configured on. I (Eric) also did this test; I installed the kernel-source-2.6.16_rc1_git3-20060124182340 kernel-default-2.6.16_rc1_git3-20060124182340 packages, and edited the .config to enable CONFIG_XFS_DMAPI=y and CONFIG_DMAPI=m Then I rebuilt just xfs & dmapi modules: make -j2 O=`pwd`/../linux-2.6.16-rc1-git3-20060124182340-obj/ia64/default/ oldconfig make O=`pwd`/../linux-2.6.16-rc1-git3-20060124182340-obj/ia64/default/ M=fs/xfs/ modules make O=`pwd`/../linux-2.6.16-rc1-git3-20060124182340-obj/ia64/default/ M=fs/dmapi modules and loaded up these new modules. Clean & dirty xfs filesystems also mount fine for me. Any tips on reproducing this bug...? dmesg & modinfo output for successful mount: dmapi: module not supported by Novell, setting U taint flag. xfs: module not supported by Novell, setting U taint flag. SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, dmapi support, no debug enabled xfs_quota: module not supported by Novell, setting U taint flag. SGI XFS Quota Management subsystem XFS mounting filesystem sda10 Ending clean XFS mount for filesystem: sda10 cxfsopus9:/usr/src/linux-2.6.16-rc1-git3-20060124182340-obj/ia64/default/fs/xfs # modinfo ./xfs.ko filename: ./xfs.ko author: Silicon Graphics, Inc. description: SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, dmapi support, no debug enabled license: GPL vermagic: 2.6.16-rc1-git3-20060124182340-default SMP ia64gcc-4.1 depends: srcversion: D9DCEFBADB45A357649C361 p.s. for some reason we're not getting email for traffic on this bug, apologies if replies are a bit slow.
That's all very weird. Ludwig, can we do some more testing on your laptop and try to reproduce?
Andreas, I guess you wanted to ask s/Ludwig/Christoph/ to reproduce, right? I'v already updated my laptop and don't have any XFS partition any longer :( But AFAIK Adrian ran into this kind of problem as well. CCing Adrian.
Can I ask, how many failures with dmapi, and how many successes without dmapi, were seen? The backtrace for the oops really doesn't look like it could possibly have much to do with dmapi, for what it's worth.
Created attachment 65575 [details] Possible fix for xfs_buf_rele panic Can someone who can reproduce this (the xfs_buf_rele panic during a journal read, I mean) please try this attached patch and report back? thanks!
I have reproduced the problem (the xfs_buf_rele panic) on i386. After upgrading to 10.1 beta2 (kernel linux-2.6.15-git12-6) it crashed every time. I have tried to recompile with CONFIG_XFS_DMAPI=y and CONFIG_DMAPI=m without change. I have then tried the kotd linux-2.6.16-rc1-git3-20060128210603 with the same result - both the default and CONFIG_XFS_DMAPI=y, CONFIG_DMAPI=m crashed. The patch from #19 fixes the problem for me and I can mount xfs file systems as usual. linux:~ # uname -a Linux linux 2.6.16-rc1-git3-20060128210603-default #1 Sat Jan 28 21:06:03 UTC 2006 i686 i686 i386 GNU/Linux linux:~ # lsmod | grep xfs xfs_quota 44896 0 xfs 508248 4 xfs_quota exportfs 5504 1 xfs dmapi 43688 1 xfs,[permanent]
Gerald, many thanks for testing this! Nathan, can I check in the fix and re-enable DMAPI?
Hi Andreas, Sure thing. I'll make sure this gets into mainline before 2.6.16. cheers.
Ah, not being in the CC list explains why I didn't notice your comment, thanks. We didn't make it for Beta 3 unfortunately.
I yesterday had a machine on which this bug triggered, and the patch fixed it for me as well.
Oh, forgot to update here - this was merged into mainline a day/two ago, so will be there if/when the next -rc merge happens. cheers.
*** Bug 147960 has been marked as a duplicate of this bug. ***
*** Bug 147962 has been marked as a duplicate of this bug. ***
*** Bug 148491 has been marked as a duplicate of this bug. ***
*** Bug 146060 has been marked as a duplicate of this bug. ***
*** Bug 152347 has been marked as a duplicate of this bug. ***