Bugzilla – Bug 1217041
BTRFS: performance is too slow when running LTP fs test
Last modified: 2023-12-20 00:33:06 UTC
Created attachment 870707 [details] ext4 vs btrfs LTP log ## Observation openQA test in scenario sle-15-SP6-Online-ppc64le-ltp_fs@ppc64le-virtio fails in [fs_fill](https://openqa.suse.de/tests/12789339/modules/fs_fill/steps/7) LTP log failed with following info: tst_fill_fs.c:36: TINFO: Creating file mntpoint/subdir/thread2/file1 size 4807935 tst_fill_fs.c:36: TINFO: Creating file mntpoint/subdir/thread2/file2 size 90739786 tst_fill_fs.c:36: TINFO: Creating file mntpoint/subdir/thread3/file1 size 76896492 Test timeouted, sending SIGKILL! tst_test.c:1629: TINFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1 tst_test.c:1630: TBROK: Test killed! (timeout?) Manual check found: Same LTP test case running on ext4 will spend 3s but btrfs need 3mins. See attachment log and also strace log
Created attachment 870708 [details] strace on btrfs ltp test
Further investigate 1)check the LTP test code, the test case create multi thread to write content into filesystem and try to get error ENOSPC. https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/fs/fs_fill/fs_fill.c 2)using strace i with -T found test case seems wait futex lock too long. strace -t -f -o strace_btrfs_with_T_f -T ./fs_fill awk -F'[<> ?]' '{if ($(NF-1) > 1) print}' strace_btrfs_with_T_f 27724 19:45:54 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=27725, si_uid=0, si_status=1, si_utime=0, si_stime=0} --- 27722 19:45:54 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=27724, si_uid=0, si_status=1, si_utime=0, si_stime=0} --- 27728 19:45:54 +++ exited with 127 +++ 27727 19:45:54 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=27728, si_uid=0, si_status=127, si_utime=0, si_stime=0} --- 27727 19:45:54 +++ exited with 127 +++ 27722 19:45:54 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=27727, si_uid=0, si_status=127, si_utime=0, si_stime=0} --- 27722 19:45:54 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=27729, si_uid=0, si_status=0, si_utime=1, si_stime=0} --- 27722 19:45:54 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=27752, si_uid=0} --- 27722 19:45:54 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=27752, si_uid=0} --- 27722 19:45:55 <... wait4 resumed>0x7fffe8926ce0, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <1.622117> 27722 19:45:55 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=27752, si_uid=0} --- 27763 19:50:10 <... unlink resumed>) = 0 <2.707933> 27752 19:51:53 <... futex resumed>) = 0 <356.857147> <<<<<<<<<<< 27752 19:51:55 <... futex resumed>) = 0 <1.269408> 27722 19:51:55 <... wait4 resumed>0x7fffe8926ce0, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <359.480639> <<<<<< 27722 19:51:55 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=27752, si_uid=0} --- 27722 19:51:55 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=27752, si_uid=0, si_status=0, si_utime=0, si_stime=17} ---
Created attachment 870710 [details] strace on btrfs with time spent in system calls
Created attachment 870711 [details] strace on ext4 with time spend in system call
Upload strace on both btrfs and ext4 with time spent in system calls. The issue not exist if you ONLY run one thread. I suspect with some internal lock? The memory/io/cpu load is normall when do btrfs test. sar -h -r -u -b -q -d 5 05:59:45 CPU %user %nice %system %iowait %steal %idle 05:59:50 all 0.0% 0.0% 0.1% 2.1% 0.0% 97.8% 05:59:45 tps rtps wtps bread/s bwrtn/s 05:59:50 615.60 0.20 615.40 25.60 111107.20 05:59:45 kbmemfree kbavail kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty 05:59:50 21.9M 307.0M 92.6M 18.8% 192.0k 274.4M 260.6M 10.3% 170.2M 136.6M 2.4M 05:59:45 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked 05:59:50 0 168 3.39 1.03 0.71 0 05:59:45 tps rkB/s wkB/s areq-sz aqu-sz await svctm %util DEV 05:59:50 615.60 12.8k 54.3M 90.3k 0.11 0.17 0.42 25.8% vda 05:59:50 0.00 0.0k 0.0k 0.0k 0.00 0.00 0.00 0.0% sr0 05:59:50 465.60 0.0k 25.1M 55.1k 0.13 0.18 1.04 48.4% loop0 strace -t -f -o strace_btrfs -T ./fs_fill In strace on btrfs with time spent in attachment, you can get time spend on write system call is large compare with the ext4. Such as following trace(pid 1924,1925,1926,1927 are write thread in strace file), write time spend 0.049914s 1927 04:49:02 <... write resumed>) = 4096 <0.049914> 1926 04:49:02 <... write resumed>) = 4096 <0.049910> 1924 04:49:02 <... write resumed>) = 4096 <0.049904> In strace on ext4 write time spend 0.000040s 1635 04:46:11 write(3, "\245\370\336\202\212\224\215\240\2gyS\371w4i\215\333\346\374\242\312\25\320m\f\302S\210\357,\331"..., 4096) = -1 ENOSPC (No space left on device) <0.000040>
I don't see it in test case file, what is the partition size? On PPC64 the page size is 64K and this causes trouble on filesystems of ~1G size, the threads could compete for a very small number of pages/blocks to actually write the data. There could still be a problem somewhere but the test scenario should be properly scaled.
(In reply to David Sterba from comment #6) > I don't see it in test case file, what is the partition size? On PPC64 the > page size is 64K and this causes trouble on filesystems of ~1G size, the > threads could compete for a very small number of pages/blocks to actually > write the data. There could still be a problem somewhere but the test > scenario should be properly scaled. Test env: getconf PAGE_SIZE 65536 mount /dev/loop0 on /tmp/LTP_fs_Fr6GS9/mntpoint type btrfs (rw,relatime,ssd,discard=async,space_cache=v2,subvolid=5,subvol=/) df -h /dev/loop0 Filesystem Size Used Avail Use% Mounted on /dev/loop0 300M 29M 214M 12% /tmp/LTP_fs_Fr6GS9/mntpoint When Use% increase to 14%, the write speed start become very slow, after few mins later the Use% can reach 98%. The LTP has two write patterns, one is write fix block size use writev syscall(block size is 4096*512 for each writev call), this pattern no issue happen. Another write pattern is write with random size, this pattern trigger the above issue.
(In reply to WEI GAO from comment #7) > (In reply to David Sterba from comment #6) > > I don't see it in test case file, what is the partition size? On PPC64 the > > page size is 64K and this causes trouble on filesystems of ~1G size, the > > threads could compete for a very small number of pages/blocks to actually > > write the data. There could still be a problem somewhere but the test > > scenario should be properly scaled. > > Test env: > getconf PAGE_SIZE > 65536 > > mount > /dev/loop0 on /tmp/LTP_fs_Fr6GS9/mntpoint type btrfs > (rw,relatime,ssd,discard=async,space_cache=v2,subvolid=5,subvol=/) > > df -h /dev/loop0 > Filesystem Size Used Avail Use% Mounted on > /dev/loop0 300M 29M 214M 12% /tmp/LTP_fs_Fr6GS9/mntpoint > > When Use% increase to 14%, the write speed start become very slow, after few > mins later the Use% can reach 98%. > > The LTP has two write patterns, one is write fix block size use writev > syscall(block size is 4096*512 for each writev call), this pattern no issue > happen. > Another write pattern is write with random size, this pattern trigger the > above issue. When i increase dev size to 1G the case can PASS.
Unlike ext4, btrfs sets the sectorsize (FS block size) to PAGE_SIZE (which is 64k in this case). To match with ext4, can we retry the test case with sectorsize 4k (support of sub-page sectorsize). e.g. mkfs.btrfs -s 4k
(In reply to Santosh Pradhan from comment #9) > Unlike ext4, btrfs sets the sectorsize (FS block size) to PAGE_SIZE (which > is 64k in this case). To match with ext4, can we retry the test case with > sectorsize 4k (support of sub-page sectorsize). > > e.g. mkfs.btrfs -s 4k Set LTP case with fs opts "-s 4k", the test case can pass. static struct tst_test test = { .max_runtime = 60, .dev_min_size = 300, .dev_fs_opts = (const char *const[]){"-s 4k", NULL}, <<<<<<<<< .needs_root = 1, .mount_device = 1, .mntpoint = MNTPOINT, .all_filesystems = 1, .setup = setup, .cleanup = cleanup, .test = testrun, .skip_filesystems = (const char *[]) { "tmpfs", "ext2", "ext3", "ext4", "xfs", "vfat", "exfat", // "btrfs", NULL }, .tcnt = 2 };
Did the test case pass with modified option (i.e. -s 4k)?
(In reply to Santosh Pradhan from comment #11) > Did the test case pass with modified option (i.e. -s 4k)? Yes case is pass with modified option. See https://bugzilla.suse.com/show_bug.cgi?id=1217041#c9
(In reply to WEI GAO from comment #12) > (In reply to Santosh Pradhan from comment #11) > > Did the test case pass with modified option (i.e. -s 4k)? > > Yes case is pass with modified option. See > https://bugzilla.suse.com/show_bug.cgi?id=1217041#c9 sorry, above comments link is wrong, the feedback in comment #10
Thanks much David and Qu. Thanks much Wei for all your effort. Now with 64KB PAGE_SIZE (on PPC), btrfs needs the file system size (partition size) to be bigger than 1GB (as btrfs by default sets the sectorsize to PAGE_SIZE).The test case passes with 1G or bigger file system size. But to compare with (ext4 or so which uses FS block size or sectorsize to 4K), btrfs sectorsize can be tuned to 4K (smaller than PAGE_SIZE i.e. sub-page size sectorsize e.g. mkfs.btrfs -s 4k). In fact his helps to succeed with test case. As there is nothing much left to be investigated, I would suggest to close the ticket if-and-only-if David, Wei and Goldwyn agree.
Santosh Pradhan, with this said, you can close the ticket. ANd we hope the tests to be readjusted. Any of the participants could reopen it. Santosh, Hector do we need to document this in the release notes or docs? If so, please create a task or bug for the doc-team.
https://progress.opensuse.org/issues/151759 for tracking LTP case update
(In reply to Radoslav Tzvetkov from comment #15) > Santosh Pradhan, with this said, you can close the ticket. ANd we hope the > tests to be readjusted. Any of the participants could reopen it. > > Santosh, Hector do we need to document this in the release notes or docs? If > so, please create a task or bug for the doc-team. No idea what's happen, the result show pass for 44.1, any update from kernel side already? https://openqa.suse.de/tests/13003538#step/fs_fill/7
Though the decision was to modify the LTP test case, Wei Gao said he will address it. I have not done anything in that. Please check if the test case is modified in git, I will also take a look and update.
I am also going to check if btrfs code is modified to address the issue for ppc.
(In reply to Santosh Pradhan from comment #18) > Though the decision was to modify the LTP test case, Wei Gao said he will > address it. I have not done anything in that. Please check if the test case > is modified in git, I will also take a look and update. My patch is still in review backlog. https://patchwork.ozlabs.org/project/ltp/patch/20231218131142.12464-1-wegao@suse.com/
(In reply to Santosh Pradhan from comment #19) > I am also going to check if btrfs code is modified to address the issue for > ppc. I also schedule rerun on 44.1, 11 rerun both failed, so i can not explain why 44.1 show a pass result. https://openqa.suse.de/tests/13119838#next_previous Also latest build 45.1 show failed again. So let's continue keep an eye on this. https://openqa.suse.de/tests/13085149#