Bugzilla – Bug 1222346
snapperd can't work
Last modified: 2024-05-22 07:06:20 UTC
This morning I noticed that my home PC is loading 5 out of 20 cores without running application software. In htop I found that the only consumer of CPU time is snapperd. The same lines appear in the /var/log/snapper.log file several times a second: > 2024-04-05 08:57:46 WAR libsnapper(1375) BtrfsUtils.cc(quota_rescan):474 - waiting for old quota rescan to finish > 2024-04-05 08:57:46 WAR libsnapper(1375) BtrfsUtils.cc(quota_rescan):474 - waiting for old quota rescan to finish > 2024-04-05 08:57:46 WAR libsnapper(1375) BtrfsUtils.cc(quota_rescan):474 - waiting for old quota rescan to finish I tried: > [werwolf@home] ~ > ❯ sudo btrfs quota disable / && sudo systemctl restart snapperd the CPU load ended, but after a couple of hours I again found exactly the same behavior with the same messages in the log.
additionally: the sudo snapper list command has been running for about 20 minutes and is not showing any results.
unfortunately, after 2 days the problem went away and no longer appears, I still could not understand what this problem was connected with. I also don’t know what to do with this bug report now.
The problem didn't happen again, I'm closing it
the problem repeated itself. in a few minutes /var/log/snapper.log began to take up 70 gigabytes and occupied the entire system partition.
I have already added a sleep in snapper to avoid the excessive logging (see https://github.com/openSUSE/snapper/issues/892 - not included in Tumbleweed yet). Anyway, btrfs seems to have a problem: snapper does a BTRFS_IOC_QUOTA_RESCAN and if that reports EINPROGRESS it waits using BTRFS_IOC_QUOTA_RESCAN_WAIT. But AFAIS the RESCAN_WAIT returns immediately. Also, the rescan itself takes apparently more that 20 minutes. snapper is waiting for the rescan to finish. Whether such a long time to rescan is a bug in btrfs I cannot say. Also see https://bugzilla.suse.com/show_bug.cgi?id=1206814#c11.
I'm wondering if snapper is not handling rescan correctly. Yes, qgroup is very heavy, but that's so far the only way to get accurate accounting numbers, considering how many tricks we put to get super fast snapshotting. All the super fast snapshotting has put a lot of complexity to qgroups. Not to mention the extra complexity from multi-level qgroups, utilized by snapper. But for the RESCAN_WAIT returns immeidately problem, I think it's something from the snapper side. Snapper should check if qgroup is even enabled and rescan is running. Or RESCAN_WAIT ioctl immediately return 0 if qgroup rescan is not running nor qgroup is enabled. Any clue how snapper triggers qgroup rescan? Did it check if qgroup is enabled or handles qgroup rescan return value correctly?
snapper does a BTRFS_IOC_QUOTA_RESCAN and if that fails and errno is EINPROGRESS it does a BTRFS_IOC_QUOTA_RESCAN_WAIT. And only if that succeeds it calls BTRFS_IOC_QUOTA_RESCAN again. For details: https://github.com/openSUSE/snapper/blob/master/snapper/BtrfsUtils.cc#L464