Bug 1222847 - Snapper leaves stale qgroups
Summary: Snapper leaves stale qgroups
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Current
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Arvin Schnell
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-04-15 13:34 UTC by Fabian Vogt
Modified: 2024-07-10 08:40 UTC (History)
4 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
strace of snapper delete (4.86 MB, text/x-log)
2024-04-15 13:34 UTC, Fabian Vogt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Vogt 2024-04-15 13:34:58 UTC
Created attachment 874286 [details]
strace of snapper delete

I noticed that I have dozens of <stale> qgroups both on my recently set up system as well as in some older VMs:

localhost:~ # btrfs qgroup show / | grep -c stale
66
fvogt-thinkpad:~ # btrfs qgroup show / | grep -c stale
173

At first I managed to reproduce it only with a call to "snapper cleanup number" after creating some empty snapshots to force a cleanup, but then I also managed to do it with "snapper -v --no-dbus delete 104". strace output attached.

The interesting part appears to be this -EBUSY:

openat(4, "snapshot", O_RDONLY|O_NOFOLLOW|O_NOATIME|O_CLOEXEC) = 5
fstat(5, {st_dev=makedev(0, 0x5d), st_ino=256, st_mode=S_IFDIR|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=142, st_atime=1713181536 /* 2024-04-15T13:45:36.570794904+0200 */, st_atime_nsec=570794904, st_mtime=1701182809 /* 2023-11-28T15:46:49.180000002+0100 */, st_mtime_nsec=180000002, st_ctime=1701182809 /* 2023-11-28T15:46:49.180000002+0100 */, st_ctime_nsec=180000002}) = 0
close(4)                                = 0
ioctl(5, BTRFS_IOC_INO_LOOKUP, {treeid=0, objectid=BTRFS_FIRST_FREE_OBJECTID} => {treeid=369, name=""}) = 0
close(5)                                = 0
ioctl(3, BTRFS_IOC_SNAP_DESTROY, {fd=0, name="snapshot"}) = 0
openat(AT_FDCWD, "/", O_RDONLY|O_NOATIME|O_CLOEXEC) = 4
fstat(4, {st_dev=makedev(0, 0x22), st_ino=256, st_mode=S_IFDIR|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=142, st_atime=1713181536 /* 2024-04-15T13:45:36.570794904+0200 */, st_atime_nsec=570794904, st_mtime=1701182809 /* 2023-11-28T15:46:49.180000002+0100 */, st_mtime_nsec=180000002, st_ctime=1701182809 /* 2023-11-28T15:46:49.180000002+0100 */, st_ctime_nsec=180000002}) = 0
flistxattr(4, NULL, 0)                  = 0
fstat(4, {st_dev=makedev(0, 0x22), st_ino=256, st_mode=S_IFDIR|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=142, st_atime=1713181536 /* 2024-04-15T13:45:36.570794904+0200 */, st_atime_nsec=570794904, st_mtime=1701182809 /* 2023-11-28T15:46:49.180000002+0100 */, st_mtime_nsec=180000002, st_ctime=1701182809 /* 2023-11-28T15:46:49.180000002+0100 */, st_ctime_nsec=180000002}) = 0
openat(4, ".snapshots", O_RDONLY|O_NOFOLLOW|O_NOATIME|O_CLOEXEC) = 5
fstat(5, {st_dev=makedev(0, 0x20), st_ino=256, st_mode=S_IFDIR|0750, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=118, st_atime=1713183328 /* 2024-04-15T14:15:28.889972908+0200 */, st_atime_nsec=889972908, st_mtime=1713183330 /* 2024-04-15T14:15:30.996646362+0200 */, st_mtime_nsec=996646362, st_ctime=1713183330 /* 2024-04-15T14:15:30.996646362+0200 */, st_ctime_nsec=996646362}) = 0
fstat(5, {st_dev=makedev(0, 0x20), st_ino=256, st_mode=S_IFDIR|0750, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=118, st_atime=1713183328 /* 2024-04-15T14:15:28.889972908+0200 */, st_atime_nsec=889972908, st_mtime=1713183330 /* 2024-04-15T14:15:30.996646362+0200 */, st_mtime_nsec=996646362, st_ctime=1713183330 /* 2024-04-15T14:15:30.996646362+0200 */, st_ctime_nsec=996646362}) = 0
close(4)                                = 0
ioctl(5, BTRFS_IOC_QGROUP_CREATE, {create=0, qgroupid=369}) = -1 EBUSY (Device or resource busy)
futex(0x7f1e18ec3070, FUTEX_WAKE_PRIVATE, 2147483647) = 0
close(5)                                = 0
close(3)                                = 0
openat(AT_FDCWD, "/", O_RDONLY|O_NOATIME|O_CLOEXEC) = 3
Comment 1 Arvin Schnell 2024-04-15 13:46:08 UTC
Can you delete the qgroup with e.g. 'btrfs qgroup destroy 0/369 /'?
Comment 2 Fabian Vogt 2024-04-15 14:00:51 UTC
(In reply to Arvin Schnell from comment #1)
> Can you delete the qgroup with e.g. 'btrfs qgroup destroy 0/369 /'?

Yes, but sometimes not immediately but only later.

Manual deletion appears to always work after btrfs quota rescan -w /.
Comment 3 Arvin Schnell 2024-04-15 14:05:11 UTC
It seems as if now the connection to the higher qgroup has to be deleted
before the qgroup itself can be deleted itself.
Comment 4 Wenruo Qu 2024-04-16 01:05:47 UTC
Just want to make sure some thing.

1. Did the stale ones shows up again after deleting the qgroup, and do a rescan?

2. Did the subvolume of the corresponding qgroup still exists?
   `btrfs subvolume list -d` could show such unlinked but not yet deleted subvolume
Comment 5 Arvin Schnell 2024-04-16 07:23:42 UTC
Now I am even more confused and the problem seems even more complicated.
What I am trying to do to remove a qgroup is:

1. remove the corresponding subvolume
2. remove all relations to the group
3. do a rescan
4. remove the qgroup

Still the last step fails with EBUSY. Note that all these steps are done
using ioctl not using the btrfs tool.

Next to the steps not working I also dislike having to do a rescan since
that is a very slow operation.

So how can the group be removed fast and reliable?
Comment 6 Wenruo Qu 2024-04-16 07:33:23 UTC
Firstly, deleting a subvolume is just unlinking it and queue it for background cleanup.
Thus the subvolume can still be there if it's large enough.

Secondly you can not remove a qgroup if it still has any usage (The -EBUSY you hit).

Finally, for newer kernel (maybe newer snapper too), dropping a subvolume with higher level would mark qgroup inconsistent immediately
(to avoid super heavy subtree rescan).
So you'll need to check if qgroup is inconsistent already. If so, a rescan is needed anyway.


To ensure a subvolume is really removed completely, you will need to delete the subvolume first, then wait for it to be cleaned up.
Btrfs-progs provides `btrfs subvolume sync` command to wait for subvolume deletion, and a sync is already recommended to make sure
the transaction is committed so that the qgroup numbers are updated, before trying to delete the qgroup.

In fact, if the subvolume is properly deleted, its qgroup should also be removed by default IIRC.
Comment 7 Arvin Schnell 2024-04-16 08:36:20 UTC
OK, adding a step 1b. sync (BTRFS_IOC_SYNC) does help to get the
qgroup deleted. AFAICT all other steps are required. Deleting a
snapshot now takes 16 seconds on my system (with an almost empty
test btrfs). But if that is required I will add it.
Comment 8 Wenruo Qu 2024-04-16 10:29:02 UTC
Just to be clear, the wait (SYNC) is not due to the operation itself, but due to the commit interval (default to be 30s).

In a lot of fstests test cases, we intentionally remount to "commit=1" mount option to reduce the wait time.
Comment 9 Arvin Schnell 2024-04-17 09:23:12 UTC
I have something working now but as already mentioned it can be very slow.

Why cannot btrfs do this kind of cleanup on its own? Maybe in the btrfs
maintenance service. With the '--no-delete-qgroup' option being the default
for 'btrfs subvolume delete' people will pile up stale qgroup.

BTW: The '--delete-qgroup' option for 'btrfs subvolume delete' does not
work for me (WARNING: unable to delete qgroup 0/0: Device or resource busy).
Comment 10 Wenruo Qu 2024-04-17 10:34:57 UTC
> Why cannot btrfs do this kind of cleanup on its own?

Btrfs already does the cleanup, although the only problem is, the delete only handles the number (thus its excl/rfer is 0), but not handling the relationship.

IIRC I have submitted a similar patch long time ago, but didn't get merged.

I can retry, but do not expect this feature get merged anytime soon.
Comment 11 Arvin Schnell 2024-04-17 12:32:36 UTC
For the problem with --delete-qgroup I created bug #1222963.
Comment 12 Wenruo Qu 2024-04-17 22:20:20 UTC
In fact, you can skip the relationship deletion, as qgroup deletion ioctl would handle that all by itself.
The only thing you really need is to make sure the deleted subvolume really got fully dropped, then delete the qgroup.

There are already tools to do that, "btrfs qgroup clear-stale".

I'll also push the auto deletion soon into the upstream kernel.

I'm wondering why you're working on a full ioctl based solution while we already have the btrfs-progs to handle them all.
Comment 13 Arvin Schnell 2024-04-18 06:43:10 UTC
(In reply to Wenruo Qu from comment #12)
> In fact, you can skip the relationship deletion, as qgroup deletion ioctl
> would handle that all by itself.

OK, although the btrfs-qgroup man page says the qgroup has to be "isolated".

> I'll also push the auto deletion soon into the upstream kernel.

Thanks, that is good. I will then not continue with implementing the
steps from comment #5 in snapper since the result is a significant
slowdown.
Comment 14 Wenruo Qu 2024-04-18 07:36:16 UTC
(In reply to Arvin Schnell from comment #13)
> (In reply to Wenruo Qu from comment #12)
> > In fact, you can skip the relationship deletion, as qgroup deletion ioctl
> > would handle that all by itself.
> 
> OK, although the btrfs-qgroup man page says the qgroup has to be "isolated".

Then another point to improve.
The requirements (at least from the latest upstream kernel code) are:

- No usage
  Aka, 0 excl/rfer.
  Although I have to improve this, as for large snapshot we will
  mark qgroup inconsistent, and in that case excl/rfer can be non-zero.

- No child
  Thus it's allowed the qgroup to be a child.

> 
> > I'll also push the auto deletion soon into the upstream kernel.
> 
> Thanks, that is good. I will then not continue with implementing the
> steps from comment #5 in snapper since the result is a significant
> slowdown.

You don't really need to bother the wait/cleanup inside snapper.
The subvolume drop is really an async background work, it's not worthy to
wait for each subvolume to be dropped.

Just calling "btrfs qgroup clear-stale" periodically would be more reasonable.
Comment 15 Arvin Schnell 2024-04-18 13:23:29 UTC
Yes, I have added calling "btrfs qgroup clear-stale" in the systemd
cleanup service now: https://github.com/openSUSE/snapper/pull/899
Comment 16 Wenruo Qu 2024-04-19 09:56:19 UTC
The upstream patches to be pushed can be found here:

https://github.com/btrfs/linux/issues/1239