Bug 471249

Summary: mount hangs trying to loopback an image on a fuse mouted NTFS partition
Product: [openSUSE] openSUSE 11.1 Reporter: Philip Ashmore <contact>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: VERIFIED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: archie.cobbs, forgotten_qMyteedNxa, forgotten_sLJ7K2dvxj, meissner
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 11.1   
Whiteboard: maint:released:11.1:23622 maint:released:sle11:23621
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: proposed fix

Description Philip Ashmore 2009-01-31 02:17:27 UTC
User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.9.0.5) Gecko/2008121300 SUSE/3.0.5-1.1 Firefox/3.0.5

In Vista I reduced the two NTFS partitons to their minimum to make space for a GNU/Linux install.
Now I can dual boot OpenSuse and Vista.

As it turned out, Vista still has over 20GB of free space, so I created a 15GB file using a small C++ program that calls truncate() on the specified file.
I could be wrong but I think that by calling truncate, I'm creating a sparse file.
It's thousands of times faster than "dd if=/dev/zero ..." which I think also creates a sparse file.
    # touch /windows/C/Users/me/Desktop/Linux/15G.img
    # truncate /windows/C/Users/me/Desktop/Linux/15G.img 15G
    # mkfs -t ext2 /windows/C/Users/me/Desktop/Linux/15G.img
Finally I mounted the filesystem using a loopback device.
    # mount -o,loop,exec /windows/C/Users/me/Desktop/Linux/15G.img /home/butthaed/WinPart

I've used this loopback partition intensively and it works fine.

Recently, however, I'm getting a kerneloops...
    http://www.kerneloops.org/submitresult.php?number=205350
...and the mount never completes.
Shutdown also doesn't complete (5 seconds on the power off button solves this) and I get the "recovering journal" and "orphaned inodes" when the machine boots up.

Reproducible: Always

Steps to Reproduce:
1. Install OpenSuse on a Vista machine dual boot.
2. Create a block file on an NTFS partition.
3. Create an ext2 filesystem on the block file.
4. Mount it and use it.
Actual Results:  
Worked great for a while. Now it kerneloops'es.

Expected Results:  
No kerneloops'es.
Comment 1 Forgotten User qMyteedNxa 2009-01-31 10:37:30 UTC
that could either be a fuse or ntfs-3g problem.

here is a probably similar one: http://forums.opensuse.org/applications/402160-truecrypt-6-1a-opensuse-11-1-a-2.html

it can also be a suse specific problem, as googling for "kernel BUG at fs/fuse/dir.c:1162" gives exactly ONE hit. (that one above)

i`d try upgrading to the latest ntfs-3g first and if that doesn`t help, also upgrade to latest fuse.
Comment 2 Philip Ashmore 2009-01-31 22:15:48 UTC
> i`d try upgrading to the latest ntfs-3g first
The only version available is the one on the DVD - ntfs-3g-1.5012-2.15
> and if that doesn`t help, also upgrade to latest fuse.
The only version available is the one on the DVD - fuse-2.7.2-61.16
Comment 3 Philip Ashmore 2009-02-01 11:59:58 UTC
I found a (testing?) repository for the kernel...
    http://download.opensuse.org/repositories/Kernel:/SL111_BRANCH/openSUSE_11.1/
...and updated the kernel to us it.

The graphics performance is a LOT better (must be the intel driver) :)

Unfortunately I've still got the mount-hang problem.

Could you point me at a repository for fuse/ntfs-3g updates?
Comment 4 Philip Ashmore 2009-02-03 11:08:38 UTC
I tried doing the same thing from Fedora 10.
It refused to mount the block device saying that it basically couldn't identify it.
so I ran...
    # e2fsck -f <block file on ntfs partition>
...and I can mount this loopback from Fedora 10 just fine, although I did notice that F10 is using a later kernel.
Comment 5 Forgotten User sLJ7K2dvxj 2009-03-05 15:56:04 UTC
Looks like this BUG is caused by the loop driver calling the filesystem's fsync method without holding i_mutex.

The responsible SUSE specific patch is 'patches.fixes/loop-barriers2'.

Seems to affect SLE10 and SLE11 as well.
Comment 6 Forgotten User sLJ7K2dvxj 2009-03-05 16:20:56 UTC
Created attachment 277405 [details]
proposed fix

I'm not sure what the original 'loop-barrier2' patch does or if it is really needed.  It hasn't been pushed to mainline, which is a sign that it isn't so important.

But in case it's a needed feature, this patch should fix the unlocked fsync calls.
Comment 7 Forgotten User qMyteedNxa 2009-03-05 16:58:40 UTC
>I'm not sure what the original 'loop-barrier2' patch does 
>or if it is really needed.  

cat loop-barriers |head -n 15
From: Jeff Mahoney <jeffm@suse.com>
Subject: [PATCH] loop: add support for O_SYNC
References: 189051
Patch-mainline: never - this is a temporary band-aid for SLES10

 This patch adds support for O_SYNC to the block loop device. When the
 backing file is opened with O_SYNC, the loop device will sync writes before
 returning successful.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>



cat loop-barriers2 |head -n 15
From: kraxel@suse.de
Subject: Make the loop driver handle barrier requests.
Patch-mainline: no

Make the loop driver handle sync and barrier requests correctly.
Depends on loop-barrier patch.


>It hasn't been pushed to mainline, which is a sign that it isn't so important.

oh,i think there is MUCH important kernel stuff around which is NOT in mainline.
Comment 8 Archie Cobbs 2009-03-13 18:37:50 UTC
I am seeing this same problem. I'm not using NTFS, but rather a custom FUSE filesystem (s3backer) on openSUSE 11.1.

Kernel: 2.6.27.19-3.2-trace

------------[ cut here ]------------
kernel BUG at fs/fuse/dir.c:1162!
invalid opcode: 0000 [1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:18.3/modalias
CPU 0
Modules linked in: cbc crypto_blkcipher serpent cryptoloop binfmt_misc af_packet wctc4xxp(N) dahdi_transcode(N) wcfxo(N) wctdm24xxp(N) dahdi(N) crc_ccitt cpufreq_conservative cpufreq_userspace cpufreq_powersave powernow_k8 fuse nls_utf8 loop dm_mod dcdbas(X) container rtc_cmos rtc_core button rtc_lib shpchp pci_hotplug sr_mod cdrom i2c_nforce2 pcspkr i2c_core serio_raw tg3 libphy sg usbhid hid ff_memless raid456 async_xor async_memcpy async_tx xor raid0 sd_mod crc_t10dif ohci_hcd ehci_hcd usbcore edd raid1 reiserfs fan ide_pci_generic ide_core ata_generic sata_nv libata scsi_mod dock thermal processor thermal_sys hwmon
Supported: No
Pid: 15802, comm: loop1 Tainted: G          2.6.27.19-3.2-trace #1
RIP: 0010:[<ffffffffa02384a9>]  [<ffffffffa02384a9>] fuse_set_nowrite+0x22/0xc6 [fuse]
RSP: 0018:ffff880059c95d40  EFLAGS: 00010246
RAX: ffff880059d88c00 RBX: ffff880059c57800 RCX: 0000000000000000
RDX: ffff880059d8fa50 RSI: ffff880059d8fb48 RDI: ffff880059d8f940
RBP: ffff880059d8f940 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 00000000c09611c5 R12: 0000000000000000
R13: ffff880059c57800 R14: 0000000000000000 R15: ffff88006d5fa300
FS:  00007fa4a5d1a780(0000) GS:ffffffff80a3c080(0000) knlGS:00000000b7295970
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000007da1a8 CR3: 0000000059c61000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process loop1 (pid: 15802, threadinfo ffff880059c94000, task ffff880059c5a540)
Stack:  0000000000000000 7fffffffffffffff 0000000000000000 0000000000000000
 7fffffffffffffff 0000000000000000 0000000000000000 ffff880059d8f940
 0000000000000000 ffffffffa023aabe ffff88007e185a00 00000001802871b8
Call Trace:
 [<ffffffffa023aabe>] fuse_fsync_common+0x79/0x149 [fuse]
 [<ffffffffa0230176>] sync_file+0x4d/0x6d [loop]
 [<ffffffffa02301cd>] do_bio_filebacked+0x37/0x226 [loop]
 [<ffffffffa0230594>] loop_thread+0x1d8/0x20e [loop]
 [<ffffffff8024f9fb>] kthread+0x47/0x73
 [<ffffffff8020cec9>] child_rip+0xa/0x11


Code: 66 ff 45 00 41 5a 5b 5d c3 41 54 55 48 89 fd 53 48 83 ec 30 83 bf b8 00 00 00 01 48 8b 87 f8 00 00 00 48 8b 98 90 02 00 00 75 04 <0f> 0b eb fe 48 89 df e8 ed 0b 26 e0 8b 85 80 02 00 00 85 c0 79
RIP  [<ffffffffa02384a9>] fuse_set_nowrite+0x22/0xc6 [fuse]
 RSP <ffff880059c95d40>
---[ end trace 22f62985c27a0788 ]---
Comment 9 Nikanth Karthikesan 2009-03-17 07:05:08 UTC
Both the loop-barriers patch and loop-barriers2 patch has been removed now.
Comment 10 Nikanth Karthikesan 2009-03-26 09:17:25 UTC
As the offending patches are removed, I am marking this as fixed. Please reopen if it does not fix the issue
Comment 11 Archie Cobbs 2009-03-26 13:56:47 UTC
Can you indicate which kernel version contains the fix and when it will be available?

I still see the latest version in the openSUSE 11.1-update repo as 2.6.27.7-19.3.2.1, which was built on Feb. 25.

Thanks.
Comment 13 Archie Cobbs 2009-03-27 14:09:28 UTC
Thanks, that kernel does work for me with FUSE now.

Unfortunately I'm still stuck for another reason... 

This machine not only uses FUSE but also runs Asterisk (PBX software), which in turn requires a kernel module (dahdi-linux-kmp-trace) built in the network:telephony:asterisk OBS project. This kernel module is built against the standard openSUSE 11.1 kernel (currently 2.6.27.19-3.2.1), not the SL111_BRANCH version.

So is this patch going to be merged into the standard openSUSE 11.1 kernel (and thus appear in the 11.1-updates repo), so that I can use it on my Asterisk box?

Thanks.
Comment 14 Archie Cobbs 2009-04-06 14:02:11 UTC
No answer to previous question, try again....

I don't think this bug should be marked as resolved. The bug still exists in the latest openSUSE 11.1 updates kernel (2.6.27.19-3.2.1).

Kernel:/SL111_BRANCH is not mainstream and most people don't know about it. Sure, the bug is fixed in that particular kernel version but so what? That's like saying the bug is fixed in Redhat -- not relevant to this issue, which is reported against 11.1.
Comment 15 Greg Kroah-Hartman 2009-04-06 14:58:59 UTC
When we fix the bug in our tree, that is when we close the bug.  The fix will be in the next kernel update, or you can pull it from the repo you were pointed to, that's our development process.
Comment 16 Archie Cobbs 2009-04-06 15:43:11 UTC
OK, thanks.

Any idea when the next kernel update will be published? Is there a regular schedule for kernel updates?

(I will also ask this question on the forum).
Comment 17 Swamp Workflow Management 2009-04-08 07:24:39 UTC
Update released for: kernel-debug, kernel-debug-base, kernel-debug-debuginfo, kernel-debug-debugsource, kernel-debug-extra, kernel-default, kernel-default-base, kernel-default-debuginfo, kernel-default-debugsource, kernel-default-extra, kernel-docs, kernel-kdump, kernel-kdump-debuginfo, kernel-kdump-debugsource, kernel-pae, kernel-pae-base, kernel-pae-extra, kernel-ppc64, kernel-ppc64-base, kernel-ppc64-debuginfo, kernel-ppc64-debugsource, kernel-ppc64-extra, kernel-ps3, kernel-ps3-debuginfo, kernel-ps3-debugsource, kernel-source, kernel-source-debuginfo, kernel-syms, kernel-trace, kernel-trace-base, kernel-trace-debuginfo, kernel-trace-debugsource, kernel-trace-extra, kernel-vanilla, kernel-vanilla-debuginfo, kernel-vanilla-debugsource, kernel-xen, kernel-xen-base, kernel-xen-debuginfo, kernel-xen-debugsource, kernel-xen-extra
Products:
openSUSE 11.1 (debug, i586, ppc, x86_64)
Comment 18 Marcus Meissner 2009-04-16 15:54:48 UTC
A kernel update for SUSE Linux Enterprise 11 was just released that references / fixses this bug, with RPM version "2.6.27.21-0.1.2".
Comment 19 Swamp Workflow Management 2009-04-16 22:09:02 UTC
Update released for: cluster-network-kmp-default, cluster-network-kmp-xen, ext4dev-kmp-default, ext4dev-kmp-xen, kernel-default, kernel-default-base, kernel-default-debuginfo, kernel-default-debugsource, kernel-default-extra, kernel-source, kernel-source-debuginfo, kernel-syms, kernel-xen, kernel-xen-base, kernel-xen-debuginfo, kernel-xen-debugsource, kernel-xen-extra, ocfs2-kmp-default, ocfs2-kmp-xen
Products:
SLE-DEBUGINFO 11 (x86_64)
SLE-DESKTOP 11 (x86_64)
SLE-HAE 11 (x86_64)
SLE-SERVER 11 (x86_64)