|
Bugzilla – Full Text Bug Listing |
| Summary: | Snapper leaves stale qgroups | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | Fabian Vogt <fvogt> |
| Component: | Basesystem | Assignee: | Arvin Schnell <aschnell> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | aschnell, fvogt, iforster, wqu |
| Version: | Current | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| See Also: | https://bugzilla.suse.com/show_bug.cgi?id=1222963 | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: | strace of snapper delete | ||
Can you delete the qgroup with e.g. 'btrfs qgroup destroy 0/369 /'? (In reply to Arvin Schnell from comment #1) > Can you delete the qgroup with e.g. 'btrfs qgroup destroy 0/369 /'? Yes, but sometimes not immediately but only later. Manual deletion appears to always work after btrfs quota rescan -w /. It seems as if now the connection to the higher qgroup has to be deleted before the qgroup itself can be deleted itself. Just want to make sure some thing. 1. Did the stale ones shows up again after deleting the qgroup, and do a rescan? 2. Did the subvolume of the corresponding qgroup still exists? `btrfs subvolume list -d` could show such unlinked but not yet deleted subvolume Now I am even more confused and the problem seems even more complicated. What I am trying to do to remove a qgroup is: 1. remove the corresponding subvolume 2. remove all relations to the group 3. do a rescan 4. remove the qgroup Still the last step fails with EBUSY. Note that all these steps are done using ioctl not using the btrfs tool. Next to the steps not working I also dislike having to do a rescan since that is a very slow operation. So how can the group be removed fast and reliable? Firstly, deleting a subvolume is just unlinking it and queue it for background cleanup. Thus the subvolume can still be there if it's large enough. Secondly you can not remove a qgroup if it still has any usage (The -EBUSY you hit). Finally, for newer kernel (maybe newer snapper too), dropping a subvolume with higher level would mark qgroup inconsistent immediately (to avoid super heavy subtree rescan). So you'll need to check if qgroup is inconsistent already. If so, a rescan is needed anyway. To ensure a subvolume is really removed completely, you will need to delete the subvolume first, then wait for it to be cleaned up. Btrfs-progs provides `btrfs subvolume sync` command to wait for subvolume deletion, and a sync is already recommended to make sure the transaction is committed so that the qgroup numbers are updated, before trying to delete the qgroup. In fact, if the subvolume is properly deleted, its qgroup should also be removed by default IIRC. OK, adding a step 1b. sync (BTRFS_IOC_SYNC) does help to get the qgroup deleted. AFAICT all other steps are required. Deleting a snapshot now takes 16 seconds on my system (with an almost empty test btrfs). But if that is required I will add it. Just to be clear, the wait (SYNC) is not due to the operation itself, but due to the commit interval (default to be 30s). In a lot of fstests test cases, we intentionally remount to "commit=1" mount option to reduce the wait time. I have something working now but as already mentioned it can be very slow. Why cannot btrfs do this kind of cleanup on its own? Maybe in the btrfs maintenance service. With the '--no-delete-qgroup' option being the default for 'btrfs subvolume delete' people will pile up stale qgroup. BTW: The '--delete-qgroup' option for 'btrfs subvolume delete' does not work for me (WARNING: unable to delete qgroup 0/0: Device or resource busy). > Why cannot btrfs do this kind of cleanup on its own?
Btrfs already does the cleanup, although the only problem is, the delete only handles the number (thus its excl/rfer is 0), but not handling the relationship.
IIRC I have submitted a similar patch long time ago, but didn't get merged.
I can retry, but do not expect this feature get merged anytime soon.
For the problem with --delete-qgroup I created bug #1222963. In fact, you can skip the relationship deletion, as qgroup deletion ioctl would handle that all by itself. The only thing you really need is to make sure the deleted subvolume really got fully dropped, then delete the qgroup. There are already tools to do that, "btrfs qgroup clear-stale". I'll also push the auto deletion soon into the upstream kernel. I'm wondering why you're working on a full ioctl based solution while we already have the btrfs-progs to handle them all. (In reply to Wenruo Qu from comment #12) > In fact, you can skip the relationship deletion, as qgroup deletion ioctl > would handle that all by itself. OK, although the btrfs-qgroup man page says the qgroup has to be "isolated". > I'll also push the auto deletion soon into the upstream kernel. Thanks, that is good. I will then not continue with implementing the steps from comment #5 in snapper since the result is a significant slowdown. (In reply to Arvin Schnell from comment #13) > (In reply to Wenruo Qu from comment #12) > > In fact, you can skip the relationship deletion, as qgroup deletion ioctl > > would handle that all by itself. > > OK, although the btrfs-qgroup man page says the qgroup has to be "isolated". Then another point to improve. The requirements (at least from the latest upstream kernel code) are: - No usage Aka, 0 excl/rfer. Although I have to improve this, as for large snapshot we will mark qgroup inconsistent, and in that case excl/rfer can be non-zero. - No child Thus it's allowed the qgroup to be a child. > > > I'll also push the auto deletion soon into the upstream kernel. > > Thanks, that is good. I will then not continue with implementing the > steps from comment #5 in snapper since the result is a significant > slowdown. You don't really need to bother the wait/cleanup inside snapper. The subvolume drop is really an async background work, it's not worthy to wait for each subvolume to be dropped. Just calling "btrfs qgroup clear-stale" periodically would be more reasonable. Yes, I have added calling "btrfs qgroup clear-stale" in the systemd cleanup service now: https://github.com/openSUSE/snapper/pull/899 The upstream patches to be pushed can be found here: https://github.com/btrfs/linux/issues/1239 |
Created attachment 874286 [details] strace of snapper delete I noticed that I have dozens of <stale> qgroups both on my recently set up system as well as in some older VMs: localhost:~ # btrfs qgroup show / | grep -c stale 66 fvogt-thinkpad:~ # btrfs qgroup show / | grep -c stale 173 At first I managed to reproduce it only with a call to "snapper cleanup number" after creating some empty snapshots to force a cleanup, but then I also managed to do it with "snapper -v --no-dbus delete 104". strace output attached. The interesting part appears to be this -EBUSY: openat(4, "snapshot", O_RDONLY|O_NOFOLLOW|O_NOATIME|O_CLOEXEC) = 5 fstat(5, {st_dev=makedev(0, 0x5d), st_ino=256, st_mode=S_IFDIR|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=142, st_atime=1713181536 /* 2024-04-15T13:45:36.570794904+0200 */, st_atime_nsec=570794904, st_mtime=1701182809 /* 2023-11-28T15:46:49.180000002+0100 */, st_mtime_nsec=180000002, st_ctime=1701182809 /* 2023-11-28T15:46:49.180000002+0100 */, st_ctime_nsec=180000002}) = 0 close(4) = 0 ioctl(5, BTRFS_IOC_INO_LOOKUP, {treeid=0, objectid=BTRFS_FIRST_FREE_OBJECTID} => {treeid=369, name=""}) = 0 close(5) = 0 ioctl(3, BTRFS_IOC_SNAP_DESTROY, {fd=0, name="snapshot"}) = 0 openat(AT_FDCWD, "/", O_RDONLY|O_NOATIME|O_CLOEXEC) = 4 fstat(4, {st_dev=makedev(0, 0x22), st_ino=256, st_mode=S_IFDIR|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=142, st_atime=1713181536 /* 2024-04-15T13:45:36.570794904+0200 */, st_atime_nsec=570794904, st_mtime=1701182809 /* 2023-11-28T15:46:49.180000002+0100 */, st_mtime_nsec=180000002, st_ctime=1701182809 /* 2023-11-28T15:46:49.180000002+0100 */, st_ctime_nsec=180000002}) = 0 flistxattr(4, NULL, 0) = 0 fstat(4, {st_dev=makedev(0, 0x22), st_ino=256, st_mode=S_IFDIR|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=142, st_atime=1713181536 /* 2024-04-15T13:45:36.570794904+0200 */, st_atime_nsec=570794904, st_mtime=1701182809 /* 2023-11-28T15:46:49.180000002+0100 */, st_mtime_nsec=180000002, st_ctime=1701182809 /* 2023-11-28T15:46:49.180000002+0100 */, st_ctime_nsec=180000002}) = 0 openat(4, ".snapshots", O_RDONLY|O_NOFOLLOW|O_NOATIME|O_CLOEXEC) = 5 fstat(5, {st_dev=makedev(0, 0x20), st_ino=256, st_mode=S_IFDIR|0750, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=118, st_atime=1713183328 /* 2024-04-15T14:15:28.889972908+0200 */, st_atime_nsec=889972908, st_mtime=1713183330 /* 2024-04-15T14:15:30.996646362+0200 */, st_mtime_nsec=996646362, st_ctime=1713183330 /* 2024-04-15T14:15:30.996646362+0200 */, st_ctime_nsec=996646362}) = 0 fstat(5, {st_dev=makedev(0, 0x20), st_ino=256, st_mode=S_IFDIR|0750, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=118, st_atime=1713183328 /* 2024-04-15T14:15:28.889972908+0200 */, st_atime_nsec=889972908, st_mtime=1713183330 /* 2024-04-15T14:15:30.996646362+0200 */, st_mtime_nsec=996646362, st_ctime=1713183330 /* 2024-04-15T14:15:30.996646362+0200 */, st_ctime_nsec=996646362}) = 0 close(4) = 0 ioctl(5, BTRFS_IOC_QGROUP_CREATE, {create=0, qgroupid=369}) = -1 EBUSY (Device or resource busy) futex(0x7f1e18ec3070, FUTEX_WAKE_PRIVATE, 2147483647) = 0 close(5) = 0 close(3) = 0 openat(AT_FDCWD, "/", O_RDONLY|O_NOATIME|O_CLOEXEC) = 3