Bugzilla – Bug 1222313
btrfs storage discrepancy between used and free space of about 1/3 of the total disk capacity
Last modified: 2024-04-09 09:01:06 UTC
I have a btrfs filesystem of about 1 TiB on my Laptop where I'm missing about 1/3 of the capacity of the disk. According to the output of `du`, my disk should occupy about 540 GiB of storage space. This includes snapper snapshots, ignoring the possible space savings by shared space, so this is a generous upper limit. However, df reports that 778GiB of storage are being used at the moment, leaving a discrepancy of about 240 GiB or almost 1/4 of the total SSD capacity that I am unable to account for. If I take the used space reported by snapper instead of the du output of /.snapshots the discrepancy increases to 314 GiB or about 1/3 of the total SSD capacity that are missing. I wrote to the research@suse.de mailing list on Tuesday (https://mailman.suse.de/mlarch/SuSE/research/2024/research.2024.04/msg00000.html) and we were unable to find where the missing storage went. Two other users confirmed my issue by reporting they hit a similar issues - one on-list one in a private conversation. In both cases they report increased disk usage with no obvious consumer present. To exclude the known btrfs metadata 6.7 kernel bug, I run a full balance this week, and also checked the output of fi df, which shows that metadata are occupying only 4GiB. The balance didn't change the overall picture. I will attach all collected logs and information as requested in the mail thread below. ## System description Running Tumbleweed 20240402 with Kernel 6.8.1-1-default. I'm using the full disk encryption layout as suggested by the installer in November 2023. This means a single btrfs volume atop a LUKS encrypted lvm volume. I run scrubs manually, and otherwise left the default btrfs maintenance script in their default configuration (balance/trim enabled).
Created attachment 874057 [details] btrfs fi df
Created attachment 874058 [details] btrfs fi du
Created attachment 874059 [details] btrfs subvolume list
Created attachment 874060 [details] btrfs fi usage
Created attachment 874061 [details] df -h
Created attachment 874062 [details] du -hs
Created attachment 874063 [details] ls -al
Created attachment 874064 [details] lsblk
Created attachment 874065 [details] snapper ls
du: 459GiB + 80GiB (snapshots) > 80G /.snapshots > 0 /dev > 142G /home > 224K /opt > 0 /proc > 23M /root > 2.6M /run > 304G /srv > 0 /sys > 4.0K /tmp > 13G /var df: 778G > /dev/mapper/system-root 932G 778G 148G 85% /.snapshots > /dev/mapper/system-root 932G 778G 148G 85% /boot/grub2/i386-pc > /dev/mapper/system-root 932G 778G 148G 85% /boot/grub2/x86_64-efi > /dev/mapper/system-root 932G 778G 148G 85% /home > /dev/mapper/system-root 932G 778G 148G 85% /opt > /dev/mapper/system-root 932G 778G 148G 85% /usr/local > /dev/mapper/system-root 932G 778G 148G 85% /srv > /dev/mapper/system-root 932G 778G 148G 85% /root > /dev/mapper/system-root 932G 778G 148G 85% /var btrfs fi df: 769GiB > Data, single: total=794.00GiB, used=769.06GiB > System, DUP: total=32.00MiB, used=128.00KiB > Metadata, DUP: total=7.00GiB, used=4.03GiB > GlobalReserve, single: total=512.00MiB, used=0.00B snapper: > # | Type | Pre # | Date | User | Used Space | Cleanup | Description | Userdata > -----+--------+-------+--------------------------+------+------------+---------+----------------------------------+------------- > 0 | single | | | root | | | current | > 1* | single | | Tue Nov 7 14:02:39 2023 | root | 1.22 MiB | | first root filesystem | > 3 | single | | Tue Nov 7 14:23:38 2023 | root | 5.38 GiB | | Fresh | > 298 | single | | Tue Apr 2 08:52:47 2024 | root | 286.58 MiB | | TW 20240329 - after libzma vuln | > 301 | pre | | Thu Apr 4 08:03:38 2024 | root | 85.66 MiB | number | zypp(zypper) | important=no > 302 | post | 301 | Thu Apr 4 08:04:52 2024 | root | 19.06 MiB | number | | important=no > 303 | single | | Thu Apr 4 13:37:29 2024 | root | 944.00 KiB | | TW 20240402 - after liblzma vuln | However I count, there's always a considerable portion of the disk storage capacity being eaten away by something.
If you have some random IO workload, it's very possible that btrfs bookend extents are causing the problem. Furthermore, if you have truncated files (which is previously very large or preallocated), it can also take tons of unexpected space. Another point is, preallocation (falloc) is very btrfs unfriendly, if you have something like VM images, you'd be much better disable the snapshot of that subvolume, and set NOCOW flag for them. But since you have snapshots, the normal way to solve the problem (defrag) is not suitable as it would break the shared extents and cause extra space usage. It's recommended to delete all the unnecessary snapshots, and try "btrfs fi defrag" to see if it help (needs a sync after full defrag). Although there is a limitation on defrag that truncated files may not be that well defragged. If regular defrag (after deleting all snapshots) is not helping, you may want to try the following patches: - kernel part to enhance defrag: https://lore.kernel.org/linux-btrfs/cover.1710213625.git.wqu@suse.com/ - btrfs-progs support https://lore.kernel.org/linux-btrfs/cover.1710214834.git.wqu@suse.com/
(In reply to Wenruo Qu from comment #11) > If you have some random IO workload, it's very possible that btrfs bookend > extents are causing the problem. > > Furthermore, if you have truncated files (which is previously very large or > preallocated), it can also take tons of unexpected space. > Another point is, preallocation (falloc) is very btrfs unfriendly, if you > have something like VM images, you'd be much better disable the snapshot of > that subvolume, and set NOCOW flag for them. I believe this could the the reason. I store my VM disk images on /srv, which by default does not has the NOCOW flag set. I'll try to delete and restore the VM disk images in question after setting the flag, but this is going to take some time and then report back.
(In reply to Wenruo Qu from comment #11) > If you have some random IO workload, it's very possible that btrfs bookend > extents are causing the problem. Is there a way to check the booked extents of a certain file? I have a bunch of VM images that could be the issue, but I'd like to check this hypothesis.
(In reply to Felix Niederwanger from comment #13) > (In reply to Wenruo Qu from comment #11) > > If you have some random IO workload, it's very possible that btrfs bookend > > extents are causing the problem. > > Is there a way to check the booked extents of a certain file? I have a bunch > of VM images that could be the issue, but I'd like to check this hypothesis. Pretty hard, we do not have any good way to check that. There are tools like compsize which goes TREE_SEARCH ioctl to verify each file extent, but compsize is not designed to check the bookend wasted bytes, thus it doesn't do much help. A more convieant way is to defrag that subvolume (as long as that subvolume is not snapshotted). We're moving towards enhancing fiemap sysctl to export more info, but that may even take years. Meanwhile we may want to develop a tool to do the bookend accounting soon, since it's not the first time an end user is complaining about it.
Moving the disk images to an external medium and back made a HUGE difference - I got almost 200 GiB back: > /dev/mapper/system-root 932G 586G 338G 64% /.snapshots > /dev/mapper/system-root 932G 586G 338G 64% /boot/grub2/i386-pc > /dev/mapper/system-root 932G 586G 338G 64% /boot/grub2/x86_64-efi > /dev/mapper/system-root 932G 586G 338G 64% /home > /dev/mapper/system-root 932G 586G 338G 64% /opt > /dev/mapper/system-root 932G 586G 338G 64% /root > /dev/mapper/system-root 932G 586G 338G 64% /srv > /dev/mapper/system-root 932G 586G 338G 64% /usr/local > /dev/mapper/system-root 932G 586G 338G 64% /var I now also disabled COW for the filesystem in question via `chattr +C -R /srv` and hope this prevents the problem from arising. The "storage hole" were my VM disk images that I keep on this laptop for testing. All VMs are updated once per day during lunch time, and I keep them until the products run EOL. We do apply the +C attribute for the /var partition by default and this is also the case here. I wonder if it would make sense to apply the same defaults for /srv, which are likely holding similar kind of data like /var does.
I recall that btrfs is generally not recommended for storing VM images. I'm using XFS for this kind of thing.