Bugzilla – Bug 1213878
xfs_copy stuck at 90% completion, tryiing to read past end of device
Last modified: 2023-08-10 12:12:00 UTC
I am using openSUSE-Tumbleweed-DVD-x86_64-Snapshot20230731-Media.iso as rescue medium to save data from a broken SSD (filesystem corruption). I was trying to copy the file system using "xfs_copy -d /dev/vg/home /some/remote/file". The copy process got stuck at 90%, when it suddenly started to use 130% CPU. strace showed that xfs_copy had already reached the end of the input device (judging from /proc/$PID/fdinfo), but was still trying to read from it. It would continuously do something like read(4, buf, 1048576) = 0 write(5, buf, 0) = 0 (in reality it was more complicated because the tool uses threading). I then used dd_rescue to copy the file system, and it worked.
Would it be possible to provide a metadump? I can try to reproduce and see what's going on.
metadump copied internally to /mounts/users-space/mwilck/bsc1213878
(In reply to Martin Wilck from comment #2) > metadump copied internally to /mounts/users-space/mwilck/bsc1213878 thanks, I can't actually reproduce the behavior you are describing, but it does hit an assertion at some point after 90% progress, which seems related: > # xfs_copy -d /dev/vdb /tmp/foo > Creating file /tmp/foo > 0% ... 10% ... 20% ... 30% ... 40% ... 50% ... 60% ... 70% ... 80% ... 90% xfs_copy: xfs_copy.c:432: read_wbuf: Assertion `res == buf->length' failed. > > Aborted BTW as long as you have a single destination file, there should only be a single thread doing the copying. In general I do not see any advantage in using xfs_copy over dd in cases when the filesystem is corrupted, as basically xfs_copy -d bypasses the unclean filesystem check, but nevertheless relies on xfs metadata to skip over free blocks (in order to do sparse copying). Honestly in cases of corrupted filesystem it is much better to use dd/dd_rescue or even something like cat /dev/vdb | zstd -v > /tmp/vdb.zst to salvage the image.
(In reply to Anthony Iliopoulos from comment #3) > Aborted I just tried again with a device created from my metadump, and I observe the same behavior as originally reported. xfs_copy is hanging with ~100% CPU. # xfs_copy -V xfs_copy version 6.4.0 # ls -l /proc/25710/fd total 0 lrwx------ 1 root root 64 Aug 10 13:09 0 -> /dev/pts/7 lrwx------ 1 root root 64 Aug 10 13:09 1 -> /dev/pts/7 lrwx------ 1 root root 64 Aug 10 13:09 2 -> /dev/pts/7 lrwx------ 1 root root 64 Aug 10 13:09 3 -> /var/tmp/xfs_copy.log.H99Xoc lr-x------ 1 root root 64 Aug 10 13:09 4 -> /dev/loop0 lrwx------ 1 root root 64 Aug 10 13:09 5 -> /net/192.168.1.167/volume1/Rescue/apollon/xfs.copy # grep pos /proc/25710/fdinfo/{4,5} /proc/25710/fdinfo/4:pos: 53687091200 /proc/25710/fdinfo/5:pos: 53687779328 These file positions don't change any more. strace shows this, endlessly: 25709 rt_sigprocmask(SIG_BLOCK, [CHLD], <unfinished ...> 25710 futex(0x56042060f298, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> 25709 <... rt_sigprocmask resumed>NULL, 8) = 0 25709 read(4, "", 1048576) = 0 25709 futex(0x56042060f298, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 25710 <... futex resumed>) = 0 25709 <... futex resumed>) = 1 25710 write(5, "", 0 <unfinished ...> 25709 rt_sigprocmask(SIG_UNBLOCK, [CHLD], <unfinished ...> 25710 <... write resumed>) = 0 25709 <... rt_sigprocmask resumed>NULL, 8) = 0 25710 futex(0x56041fda7540, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 25709 rt_sigprocmask(SIG_BLOCK, [CHLD], <unfinished ...> 25710 <... futex resumed>) = 0 25709 <... rt_sigprocmask resumed>NULL, 8) = 0 25710 futex(0x56042060f298, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> 25709 read(4, "", 1048576) = 0 25709 futex(0x56042060f298, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 25710 <... futex resumed>) = 0 25709 <... futex resumed>) = 1 25710 write(5, "", 0 <unfinished ...> 25709 rt_sigprocmask(SIG_UNBLOCK, [CHLD], <unfinished ...> 25710 <... write resumed>) = 0 > BTW as long as you have a single destination file, there should only be a single thread doing the copying. I observe 2 threads, see above. Maybe because the target was on NFS? But that's kind of common for recovery operations, isn't it? > In general I do not see any advantage in using xfs_copy over dd in cases > when the filesystem is corrupted I wasn't aware of that. I was assuming xfs_copy was written for that purpose. Btw, the metadump belongs to my corrupted /home file system that we discussed on the suse-labs ML. Should I open a new bug for the FS corruption itself?
(In reply to Martin Wilck from comment #4) > (In reply to Anthony Iliopoulos from comment #3) > > > Aborted > > I just tried again with a device created from my metadump, and I observe the > same behavior as originally reported. xfs_copy is hanging with ~100% CPU. > > # xfs_copy -V > xfs_copy version 6.4.0 Same here > # ls -l /proc/25710/fd > total 0 > lrwx------ 1 root root 64 Aug 10 13:09 0 -> /dev/pts/7 > lrwx------ 1 root root 64 Aug 10 13:09 1 -> /dev/pts/7 > lrwx------ 1 root root 64 Aug 10 13:09 2 -> /dev/pts/7 > lrwx------ 1 root root 64 Aug 10 13:09 3 -> /var/tmp/xfs_copy.log.H99Xoc > lr-x------ 1 root root 64 Aug 10 13:09 4 -> /dev/loop0 > lrwx------ 1 root root 64 Aug 10 13:09 5 -> > /net/192.168.1.167/volume1/Rescue/apollon/xfs.copy > > # grep pos /proc/25710/fdinfo/{4,5} > /proc/25710/fdinfo/4:pos: 53687091200 > /proc/25710/fdinfo/5:pos: 53687779328 > > These file positions don't change any more. > > strace shows this, endlessly: > > 25709 rt_sigprocmask(SIG_BLOCK, [CHLD], <unfinished ...> > 25710 futex(0x56042060f298, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> > 25709 <... rt_sigprocmask resumed>NULL, 8) = 0 > 25709 read(4, "", 1048576) = 0 > 25709 futex(0x56042060f298, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> > 25710 <... futex resumed>) = 0 > 25709 <... futex resumed>) = 1 > 25710 write(5, "", 0 <unfinished ...> > 25709 rt_sigprocmask(SIG_UNBLOCK, [CHLD], <unfinished ...> > 25710 <... write resumed>) = 0 > 25709 <... rt_sigprocmask resumed>NULL, 8) = 0 > 25710 futex(0x56041fda7540, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> > 25709 rt_sigprocmask(SIG_BLOCK, [CHLD], <unfinished ...> > 25710 <... futex resumed>) = 0 > 25709 <... rt_sigprocmask resumed>NULL, 8) = 0 > 25710 futex(0x56042060f298, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> > 25709 read(4, "", 1048576) = 0 > 25709 futex(0x56042060f298, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> > 25710 <... futex resumed>) = 0 > 25709 <... futex resumed>) = 1 > 25710 write(5, "", 0 <unfinished ...> > 25709 rt_sigprocmask(SIG_UNBLOCK, [CHLD], <unfinished ...> > 25710 <... write resumed>) = 0 Not sure why it is falling into a loop instead of asserting like in my case (or how to reproduce this behavior), but I have verified that indeed this is due to xfs_copy relying on the metadata: after xfs_repair, xfs_copy will successfully complete. > > BTW as long as you have a single destination file, there should only be a single thread doing the copying. > > I observe 2 threads, see above. Maybe because the target was on NFS? But > that's kind of common for recovery operations, isn't it? It shouldn't matter, xfs is oblivious to the target file being on NFS (other than possibly tuning the IO parameters for the target). Two threads sounds normal (the main process plus one thread created for the one target), IOW there should be only a single pthread_create() call / clone3 syscall. > > In general I do not see any advantage in using xfs_copy over dd in cases > > when the filesystem is corrupted > > I wasn't aware of that. I was assuming xfs_copy was written for that purpose. I think it was written to speed up replication of xfs partitions to multiple targets and to minimize the copying (by skipping free blocks), but not for filesystem recovery. This is why without the "-d" option, xfs_copy will bail out saying there's a dirty log. B But the utility also predates the v5 format with metadata checksumming, so there is no verification of metadata (even if checksums are available, which is the case in your fs). If indeed xfs_copy would verify metadata, it will also bail out (irrespective of the "-d" option). So it really doesn't make any sense to rely on metadata when trying to copy a corrupted filesystem. In that case xfs_copy would be anyway reduced to read from target and write to source, so nothing much on top of cat/dd. > Btw, the metadump belongs to my corrupted /home file system that we > discussed on the suse-labs ML. Should I open a new bug for the FS corruption > itself? Not sure what we can do further about that, as you mention in the mailing list it was not only xfs being affected (thus it doesn't appear to be filesystem-specific), and from the metadata all I can see is that there are indeed severe corruptions that would not normally occur *through* xfs writes (given that your fs has crc enabled). There are no hints whatsoever on what may have caused this.
(In reply to Anthony Iliopoulos from comment #5) > So it really doesn't make any sense to rely on metadata when trying to copy > a corrupted filesystem. In that case xfs_copy would be anyway reduced to > read from target and write to source, so nothing much on top of cat/dd. here I meant to say "read from source and write to target".
(In reply to Anthony Iliopoulos from comment #5) > Not sure why it is falling into a loop instead of asserting like in my case > (or how to reproduce this behavior), but I have verified that indeed this is > due to xfs_copy relying on the metadata: after xfs_repair, xfs_copy will > successfully complete. That makes sense. As I said before, I was (mistakenly) trying to use xfs_copy to create a dump of the corrupted FS. Wrt reproducing, I'd expect you'd see this behavior if you run xfs_copy on my metadump without running xfs_repair first. > > Btw, the metadump belongs to my corrupted /home file system that we > > discussed on the suse-labs ML. Should I open a new bug for the FS corruption > > itself? > > Not sure what we can do further about that, as you mention in the mailing > list it was not only xfs being affected (thus it doesn't appear to be > filesystem-specific), and from the metadata all I can see is that there are > indeed severe corruptions that would not normally occur *through* xfs writes > (given that your fs has crc enabled). There are no hints whatsoever on what > may have caused this. That's unfortunate, but well ok, let's just leave it at that. FTR: Meanwhile I found that another FS was also corrupted at the same occasion, but in more subtle ways. That FS contained mostly git repositories, mostly clones of public repos, and was my main work area. There were no errors when this FS was mounted. But as I start working, git spits out lots of warnings about broken objects, packs, and refs. It seems most likely that some writeback happened after kernel memory had been corrupted, possibly in that failed suspend/resume operation.
(In reply to Martin Wilck from comment #7) > FTR: Meanwhile I found that another FS was also corrupted ... which was also XFS. Forgot to mention that.
Closing per previous comments.
(In reply to Martin Wilck from comment #7) > (In reply to Anthony Iliopoulos from comment #5) > > > Not sure why it is falling into a loop instead of asserting like in my case > > (or how to reproduce this behavior), but I have verified that indeed this is > > due to xfs_copy relying on the metadata: after xfs_repair, xfs_copy will > > successfully complete. > > That makes sense. As I said before, I was (mistakenly) trying to use > xfs_copy to create a dump of the corrupted FS. Wrt reproducing, I'd expect > you'd see this behavior if you run xfs_copy on my metadump without running > xfs_repair first. Before repair I can only see an assertion but not the infinite looping, but that may be due to different io parameters of the target device. > FTR: Meanwhile I found that another FS was also corrupted at the same > occasion, but in more subtle ways. That FS contained mostly git > repositories, mostly clones of public repos, and was my main work area. > There were no errors when this FS was mounted. But as I start working, git > spits out lots of warnings about broken objects, packs, and refs. Irrespective of potential kernel bugs, I've had reports on git corruptions on top of xfs after power failures, which were due to git missing fsync during atomic renames. The (current) solution is to have "git config core.fsync all" for the repos that matter (since there is a perf penalty). By default there's no syncing at all.