Bugzilla – Bug 1218178
LTP: memcontrol03 failed by over reserve memory in wrong cgroup
Last modified: 2024-06-17 17:56:08 UTC
## Observation openQA test in scenario sle-15-SP6-Online-x86_64-ltp_controllers@64bit fails in [memcontrol03](https://openqa.suse.de/tests/13086047/modules/memcontrol03/steps/6) tst_test.c:1650: TINFO: === Testing on ext2 === tst_test.c:1106: TINFO: Formatting /dev/loop0 with ext2 opts='' extra opts='' mke2fs 1.47.0 (5-Feb-2023) tst_test.c:1120: TINFO: Mounting /dev/loop0 to /tmp/LTP_memk3Sdv4/mntdir fstyp=ext2 flags=0 memcontrol03.c:145: TINFO: Child 1864 in leaf_C: Allocating pagecache: 52428800 memcontrol03.c:145: TINFO: Child 1865 in leaf_D: Allocating pagecache: 52428800 memcontrol03.c:145: TINFO: Child 1866 in leaf_F: Allocating pagecache: 52428800 memcontrol03.c:108: TINFO: Child 1868 in trunk_G: Allocating anon: 155189248 memcontrol03.c:129: TFAIL: Expected child 1868 to exit(0), but instead killed by SIGKILL <<<<<< memcontrol03.c:208: TFAIL: Expect: (A/B memory.current=64839680) ~= 52428800 memcontrol03.c:214: TPASS: Expect: (A/B/C memory.current=30720000) ~= 34603008 memcontrol03.c:216: TPASS: Expect: (A/B/D memory.current=21671936) ~= 17825792 memcontrol03.c:218: TPASS: Expect: (A/B/E memory.current=0) ~= 0 memcontrol03.c:108: TINFO: Child 1870 in trunk_G: Allocating anon: 178257920 memcontrol03.c:116: TPASS: Child 1870 killed by OOM memcontrol03.c:224: TPASS: Expect: (A/B memory.current=52555776) ~= 52428800 ## reproduce rate Manual run memcontrol03 can not reproduce, BUT if you use openqa env reproduce the failed rate is ~60% (7 case failed after run 11 times). Detail you can check following result: https://openqa.suse.de/tests/13103336#next_previous Failed rate: ~60% (7 case failed after run 11 times) in openqa env. ## compare other product SLE micro and ALP result is good such as following result: SLE Micro https://openqa.suse.de/tests/13062073#step/memcontrol03/8 ALP https://openqa.suse.de/tests/12962380#step/memcontrol03/8 ## case description 1) test creates the following hierarchy and allocate 50M to C,D,F use sub process within group(Allocate Usages are pagecache) A memory.min = 50M, memory.max = 200M A/B memory.min = 50M, A/B/C memory.min = 75M, memory.load = 50M A/B/D memory.min = 25M, memory.load = 50M A/B/E memory.min = 500M, memory.load = 0 A/B/F memory.min = 0, memory.load = 50M 2) create A/G and create a process within A/G alloc 148M, then expect following memory status A/B memory.current ~= 50M A/B/C memory.current ~= 33M A/B/D memory.current ~= 17M A/B/E memory.current ~= 0 4) process in A/G exit normally 5) create another process within A/G alloc 170M which create significant memory pressure in it. Then case expect this process will killed by system. The issue happen on step4, test case expect process exit normally but encounter a kill. tst_test.c:1650: TINFO: === Testing on btrfs === tst_test.c:1106: TINFO: Formatting /dev/loop0 with btrfs opts='' extra opts='' tst_test.c:1120: TINFO: Mounting /dev/loop0 to /tmp/LTP_memrYGmkf/mntdir fstyp=btrfs flags=0 memcontrol03.c:145: TINFO: Child 1928 in leaf_C: Allocating pagecache: 52428800 memcontrol03.c:145: TINFO: Child 1929 in leaf_D: Allocating pagecache: 52428800 memcontrol03.c:145: TINFO: Child 1930 in leaf_F: Allocating pagecache: 52428800 memcontrol03.c:108: TINFO: Child 1931 in trunk_G: Allocating anon: 155189248 memcontrol03.c:129: TFAIL: Expected child 1931 to exit(0), but instead killed by SIGKILL <<<<< memcontrol03.c:208: TFAIL: Expect: (A/B memory.current=63684608) ~= 52428800 <<<<<<< memcontrol03.c:214: TPASS: Expect: (A/B/C memory.current=30224384) ~= 34603008 memcontrol03.c:216: TPASS: Expect: (A/B/D memory.current=22204416) ~= 17825792 memcontrol03.c:218: TPASS: Expect: (A/B/E memory.current=0) ~= 0 memcontrol03.c:108: TINFO: Child 1933 in trunk_G: Allocating anon: 178257920 memcontrol03.c:116: TPASS: Child 1933 killed by OOM memcontrol03.c:224: TPASS: Expect: (A/B memory.current=52477952) ~= 52428800
After add more trace i found A/B/F still contain 9M memory(which set with memory.min = 0), this lead A/B total have 62M memory, after add extra ~150M memory from A/B/G already > 200M(memory.max of A) then process of A/B/G get killed. Question is why A/B/F still has 9M memory, i suppose it should be less or close to zero. https://openqa.suse.de/tests/13109223#step/memcontrol03/6 A/B memory.current=62038016 A/B/C memory.current=30031872 A/B/D memory.current=22065152 A/B/E memory.current=0 A/B/F memory.current=9900032 <<<<<< tst_test.c:1650: TINFO: === Testing on btrfs === tst_test.c:1106: TINFO: Formatting /dev/loop0 with btrfs opts='' extra opts='' tst_test.c:1120: TINFO: Mounting /dev/loop0 to /tmp/LTP_memrGrSJS/mntdir fstyp=btrfs flags=0 memcontrol03.c:145: TINFO: Child 28862 in leaf_C: Allocating pagecache: 52428800 memcontrol03.c:145: TINFO: Child 28863 in leaf_D: Allocating pagecache: 52428800 memcontrol03.c:145: TINFO: Child 28864 in leaf_F: Allocating pagecache: 52428800 memcontrol03.c:108: TINFO: Child 28865 in trunk_G: Allocating anon: 155189248 memcontrol03.c:129: TFAIL: Expected child 28865 to exit(0), but instead killed by SIGKILL memcontrol03.c:208: TFAIL: Expect: (A/B memory.current=62038016) ~= 52428800 <<<<< memcontrol03.c:214: TPASS: Expect: (A/B/C memory.current=30031872) ~= 34603008 memcontrol03.c:216: TPASS: Expect: (A/B/D memory.current=22065152) ~= 17825792 memcontrol03.c:218: TPASS: Expect: (A/B/E memory.current=0) ~= 0 memcontrol03.c:220: TFAIL: Expect: (A/B/F memory.current=9900032) ~= 0 <<<<<< memcontrol03.c:108: TINFO: Child 28866 in trunk_G: Allocating anon: 178257920 memcontrol03.c:116: TPASS: Child 28866 killed by OOM memcontrol03.c:228: TPASS: Expect: !(A/B/C memory.current=30031872) ~= 34603008 memcontrol03.c:230: TPASS: Expect: !(A/B/D memory.current=22065152) ~= 17825792 memcontrol03.c:232: TPASS: Expect: !(A/B/E memory.current=0) ~= 0 memcontrol03.c:234: TFAIL: Expect: !(A/B/F memory.current=274432) ~= 0 memcontrol03.c:240: TPASS: Expect: (A/B memory.current=52412416) ~= 52428800 LTP code add extra trace: diff --git a/testcases/kernel/controllers/memcg/memcontrol03.c b/testcases/kernel/controllers/memcg/memcontrol03.c index 9c6c808e0..03bdb145c 100644 --- a/testcases/kernel/controllers/memcg/memcontrol03.c +++ b/testcases/kernel/controllers/memcg/memcontrol03.c @@ -216,10 +216,26 @@ static void test_memcg_min(void) "(A/B/D memory.current=%ld) ~= %d", c[1], MB(17)); TST_EXP_EXPR(values_close(c[2], 0, 1), "(A/B/E memory.current=%ld) ~= 0", c[2]); + TST_EXP_EXPR(values_close(c[3], 0, 1), + "(A/B/F memory.current=%ld) ~= 0", c[3]); alloc_anon_in_child(trunk_cg[G], MB(170), 1); + for (i = 0; i < ARRAY_SIZE(leaf_cg); i++) + SAFE_CG_SCANF(leaf_cg[i], "memory.current", "%ld", c + i); + + TST_EXP_EXPR(values_close(c[0], MB(33), 20), + "!(A/B/C memory.current=%ld) ~= %d", c[0], MB(33)); + TST_EXP_EXPR(values_close(c[1], MB(17), 20), + "!(A/B/D memory.current=%ld) ~= %d", c[1], MB(17)); + TST_EXP_EXPR(values_close(c[2], 0, 1), + "!(A/B/E memory.current=%ld) ~= 0", c[2]); + TST_EXP_EXPR(values_close(c[3], 0, 1), + "!(A/B/F memory.current=%ld) ~= 0", c[3]); + + SAFE_CG_SCANF(trunk_cg[B], "memory.current", "%ld", c); + TST_EXP_EXPR(values_close(c[0], MB(50), 5), "(A/B memory.current=%ld) ~= %d", c[0], MB(50));
Created attachment 871434 [details] memcontrol03-ltp-log-with-more-trace
(In reply to WEI GAO from comment #1) > After add more trace i found A/B/F still contain 9M memory(which set with > memory.min = 0), this lead A/B total have 62M memory, after add extra ~150M > memory from A/B/G already > 200M(memory.max of A) then process of A/B/G get > killed. > Question is why A/B/F still has 9M memory, i suppose it should be less or > close to > zero. > https://openqa.suse.de/tests/13109223#step/memcontrol03/6 > > A/B memory.current=62038016 > A/B/C memory.current=30031872 > A/B/D memory.current=22065152 > A/B/E memory.current=0 > A/B/F memory.current=9900032 <<<<<< > > > tst_test.c:1650: TINFO: === Testing on btrfs === > tst_test.c:1106: TINFO: Formatting /dev/loop0 with btrfs opts='' extra > opts='' > tst_test.c:1120: TINFO: Mounting /dev/loop0 to /tmp/LTP_memrGrSJS/mntdir > fstyp=btrfs flags=0 > memcontrol03.c:145: TINFO: Child 28862 in leaf_C: Allocating pagecache: > 52428800 > memcontrol03.c:145: TINFO: Child 28863 in leaf_D: Allocating pagecache: > 52428800 > memcontrol03.c:145: TINFO: Child 28864 in leaf_F: Allocating pagecache: > 52428800 > memcontrol03.c:108: TINFO: Child 28865 in trunk_G: Allocating anon: 155189248 > memcontrol03.c:129: TFAIL: Expected child 28865 to exit(0), but instead > killed by SIGKILL > memcontrol03.c:208: TFAIL: Expect: (A/B memory.current=62038016) ~= 52428800 > <<<<< > memcontrol03.c:214: TPASS: Expect: (A/B/C memory.current=30031872) ~= > 34603008 > memcontrol03.c:216: TPASS: Expect: (A/B/D memory.current=22065152) ~= > 17825792 > memcontrol03.c:218: TPASS: Expect: (A/B/E memory.current=0) ~= 0 > memcontrol03.c:220: TFAIL: Expect: (A/B/F memory.current=9900032) ~= 0 > <<<<<< > memcontrol03.c:108: TINFO: Child 28866 in trunk_G: Allocating anon: 178257920 > memcontrol03.c:116: TPASS: Child 28866 killed by OOM > memcontrol03.c:228: TPASS: Expect: !(A/B/C memory.current=30031872) ~= > 34603008 > memcontrol03.c:230: TPASS: Expect: !(A/B/D memory.current=22065152) ~= > 17825792 > memcontrol03.c:232: TPASS: Expect: !(A/B/E memory.current=0) ~= 0 > memcontrol03.c:234: TFAIL: Expect: !(A/B/F memory.current=274432) ~= 0 > memcontrol03.c:240: TPASS: Expect: (A/B memory.current=52412416) ~= 52428800 > > LTP code add extra trace: > diff --git a/testcases/kernel/controllers/memcg/memcontrol03.c > b/testcases/kernel/controllers/memcg/memcontrol03.c > index 9c6c808e0..03bdb145c 100644 > --- a/testcases/kernel/controllers/memcg/memcontrol03.c > +++ b/testcases/kernel/controllers/memcg/memcontrol03.c > @@ -216,10 +216,26 @@ static void test_memcg_min(void) > "(A/B/D memory.current=%ld) ~= %d", c[1], MB(17)); > TST_EXP_EXPR(values_close(c[2], 0, 1), > "(A/B/E memory.current=%ld) ~= 0", c[2]); > + TST_EXP_EXPR(values_close(c[3], 0, 1), > + "(A/B/F memory.current=%ld) ~= 0", c[3]); > > alloc_anon_in_child(trunk_cg[G], MB(170), 1); > > + for (i = 0; i < ARRAY_SIZE(leaf_cg); i++) > + SAFE_CG_SCANF(leaf_cg[i], "memory.current", "%ld", c + i); > + > + TST_EXP_EXPR(values_close(c[0], MB(33), 20), > + "!(A/B/C memory.current=%ld) ~= %d", c[0], MB(33)); > + TST_EXP_EXPR(values_close(c[1], MB(17), 20), > + "!(A/B/D memory.current=%ld) ~= %d", c[1], MB(17)); > + TST_EXP_EXPR(values_close(c[2], 0, 1), > + "!(A/B/E memory.current=%ld) ~= 0", c[2]); > + TST_EXP_EXPR(values_close(c[3], 0, 1), > + "!(A/B/F memory.current=%ld) ~= 0", c[3]); > + > + > SAFE_CG_SCANF(trunk_cg[B], "memory.current", "%ld", c); > + > TST_EXP_EXPR(values_close(c[0], MB(50), 5), > "(A/B memory.current=%ld) ~= %d", c[0], MB(50)); Sorry for typo in above comments: A/B/G ==> A/G
memcontrol04 also failed and there is bug log this: https://bugzilla.suse.com/show_bug.cgi?id=1196298 My assumption this two failed cases point to one issue. memcontrol03 caused by A/B/F reserve too much memory, since check memory has some tolerance, so not 100% reproduce. memcontrol04 caused by A/B/F has the low memory event which should be 0, so 100% reproduce in openqa test result.
Hello Takashi, Could you please look at the bug
Michal, it might be in your area. Could you check it?
(In reply to WEI GAO from comment #1) > Question is why A/B/F still has 9M memory, i suppose it should be less or > close to zero. I observe that > memcontrol03.c:208: TFAIL: Expect: (A/B memory.current=64839680) ~= 52428800 > memcontrol03.c:214: TPASS: Expect: (A/B/C memory.current=30720000) ~= 34603008 > memcontrol03.c:216: TPASS: Expect: (A/B/D memory.current=21671936) ~= 17825792 B.memory.min - (C.memory.current + D.memory.current) = 9*4096 This little difference can cause protection spill to A/B/F (similar mechanism to bug 1218178). Interestingly, reclaim gave up too soon (B.memory.current=64839680 > 52428800), so it seems the "overprotection" may reflect upwards too (only my hypothesis without deeper analysis). Let me look at this after sibling overprotection tackled.
Hello Michal , I am the new project manager for SLE15 and copilot for Release management ,what is the status of this bug? Could you please update the bug.
Created attachment 872772 [details] collected samples of OOM reports when test fails (Kernel is SLE15-SP6 @ suse-commit: 32d23409c89b97d60ed7c3f0a6c77b0fc68583dc with custom config.)
(In reply to Michal Koutný from comment #10) > When I can reproduce this, it's ~1/3 of runs of memcontrol03 test in my > testing > VM. > I considered two factors: > - memory_recursiveprot effect (A/B/F overprotection, bug 1196298) > - this is enabled by default > - slow reclaim (simulated by throttling VMs access to storage) > > I observed the following (~no is likely no, given low number of tries): > > throttle recursiveprot reproduces > no no ~no > no yes ~no > yes no yes > yes yes yes > > from which I conclude the test is primarily sensitive reclaim rate and > unnecessary factor is extra protection of A/B/F due to memory_recursiveprot. > The result of the test thus depends on the amount of A/G alloc (148M), > consequently on the time available for that reclaim (which depends on > storage bandwidth). > > I don't consider the failures a bug (I suggest worksforme). > > I wonder whether you (Wei) can second the hypothesis of possibly varied > storage bandwidth in the testing environment (depending on which tests > happen to share the resources of the machine). Test result: storage bandwidth is 300MB/sec, reproduce rate is ~5%.(1 fail after 20 run) storage bandwidth is 10MB/sec, reproduce rate is ~80%. (4 times fail after 5 runs ) set qemu with: -drive file=$1,if=virtio,throttling.iops-total=10 hdparm -t /dev/vda2 /dev/vda2: Timing buffered disk reads: 30 MB in 3.00 seconds = 9.99 MB/sec
Thanks for the additional measurements. As I've written above, I don't think the failures indicate anything but a varied IO bandwidth (i.e. within expected behavior of memory.min). TL;DR for Jack: memory.min is a "hard" protection (akin to memlock), meaning that if an allocating task cannot reclaim enough memory outside unprotected amount, it invokes OOM. In this test, there is 150M of dirty data, 50M of spare memory (below 200M limit) and 50M of the dirty data is under memory.min. The allocating task must reclaim >=98M of dirty to have space for itself (148M). Test expects to always succeed (because own(148M)+spare(50M) < 200M) but it occasionally doesn't fails, apparently due to slow writeout. Writeback already takes into account memory.max (and memory.high) to initiate background threads. I don't think memory.min should affect those thresholds additionally. Additionally, the LRU scanning rate is reduced due to memory.min. Generally, setting memory.low/memory.min is supposed to decrease IO from the memcg. In accordance with that, I don't think WB_REASON_VMSCAN flushing should be changed. Jack, do you agree that the observed behavior is reasonable? Or should an allocator in presence of memory.min "try harder" (which would mean "wait longer" as more scanning is impossible due to protection). (The test could also fixed by: a) increasing tolerance between allocator and memory.max-memory.min b) using clean page cache to avoid dependency on IO rate.)
(In reply to Michal Koutný from comment #13) > own(148M)+spare(50M) < 200M) but it occasionally doesn't fails, apparently own(148M)+protected(50M) < 200M (It's not same 50M.)
(In reply to Michal Koutný from comment #13) > TL;DR for Jack: memory.min is a "hard" protection (akin to memlock), meaning > that if an allocating task cannot reclaim enough memory outside unprotected > amount, it invokes OOM. In this test, there is 150M of dirty data, 50M of > spare memory (below 200M limit) and 50M of the dirty data is under > memory.min. The allocating task must reclaim >=98M of dirty to have space > for itself (148M). Test expects to always succeed (because > own(148M)+spare(50M) < 200M) but it occasionally doesn't fails, apparently ^^^^ does fail AFAIU the problem, right? > due to slow writeout. > > Writeback already takes into account memory.max (and memory.high) to > initiate background threads. I don't think memory.min should affect those > thresholds additionally. Additionally, the LRU scanning rate is reduced due > to memory.min. Generally, setting memory.low/memory.min is supposed to > decrease IO from the memcg. In accordance with that, I don't think > WB_REASON_VMSCAN flushing should be changed. Yeah, so the cleaning of dirty pages is unreliable from the reclaim perspective. Reclaim cannot really issue page writeback (lock ordering problems, bad IO pattern resulting in horrendous performance and locked up machine) and background writeback has uncertain progress. That's the reason why we have limits on amount of dirty pages in the page cache and they should be reflected (and scaled) to memcg limits as well - thus I would expect you shouldn't be allowed to have more than 40MB worth of dirty pages in a memcg limited to 200MB? Anyway, even 40MB can take too long to write out in this tight setup with heavily throttled storage (if this was a real setup I'd suggest tuning down dirty limit). > Jack, do you agree that the observed behavior is reasonable? Or should an > allocator in presence of memory.min "try harder" (which would mean "wait > longer" as more scanning is impossible due to protection). > > (The test could also fixed by: a) increasing tolerance between allocator and > memory.max-memory.min b) using clean page cache to avoid dependency on IO > rate.) Yes, I think the case is kind of artificial and everything works as designed here. So for this test verifying memcg reclaim I'd go for fixing the testcase by creating clean page cache instead which is easy to reclaim.
Created attachment 872935 [details] alloc pagecache via reads (In reply to Jan Kara from comment #15) > > own(148M)+spare(50M) < 200M) but it occasionally doesn't fails, apparently > ^^^^ does fail AFAIU the > problem, right? Apologies; reclaim for the allocator (fails|doesn't succeed), ends up with OOM. > That's the reason why we have limits on amount of dirty pages in the page > cache and they should be reflected (and scaled) to memcg limits as well - > thus I would expect you shouldn't be allowed to have more than 40MB worth of > dirty pages in a memcg limited to 200MB? That was imprecise statement from me, 150M is dirtied in total. The OOM report (comment 11) shows that at the OOM time, there is no file_dirty but few MiBs of file_writeback. > Yes, I think the case is kind of artificial and everything works as designed > here. So for this test verifying memcg reclaim I'd go for fixing the > testcase by creating clean page cache instead which is easy to reclaim. Interestingly, I notice now that the upstream memcontrol kselftest that inspired this LTP test uses mere read pagecache too. (In the attachment is an LTP test that I used in my experiments.) [1] tools/testing/selftests/cgroup/cgroup_util.c:alloc_pagecache()
https://progress.opensuse.org/issues/156622 use for tracking LTP case update
Extra sync seems can ONLY fix following error on ppc64 platform(x86 result is good if add sync operation). memcontrol03.c:129: TFAIL: Expected child 4201 to exit(0), but instead killed by SIGKILL <<<<<<<< Following error will still popup on ppc64(with extra sync): tst_test.c:1701: TINFO: === Testing on xfs === tst_test.c:1118: TINFO: Formatting /dev/loop0 with xfs opts='' extra opts='' tst_test.c:1132: TINFO: Mounting /dev/loop0 to /tmp/LTP_memBgDf44/mntdir fstyp=xfs flags=0 memcontrol03.c:145: TINFO: Child 30469 in leaf_C: Allocating pagecache: 52428800 memcontrol03.c:145: TINFO: Child 30470 in leaf_D: Allocating pagecache: 52428800 memcontrol03.c:145: TINFO: Child 30471 in leaf_F: Allocating pagecache: 52428800 memcontrol03.c:108: TINFO: Child 30472 in trunk_G: Allocating anon: 155189248 memcontrol03.c:121: TPASS: Child 30472 exited memcontrol03.c:210: TPASS: Expect: (A/B memory.current=52690944) ~= 52428800 memcontrol03.c:216: TFAIL: Expect: (A/B/C memory.current=6750208) ~= 34603008 <<<< memcontrol03.c:218: TFAIL: Expect: (A/B/D memory.current=36503552) ~= 17825792 memcontrol03.c:220: TPASS: Expect: (A/B/E memory.current=0) ~= 0 memcontrol03.c:108: TINFO: Child 30473 in trunk_G: Allocating anon: 178257920 memcontrol03.c:116: TPASS: Child 30473 killed by OOM memcontrol03.c:226: TPASS: Expect: (A/B memory.current=52363264) ~= 52428800 NOTE: memcontrol03_log_ppc64 in attachment give detail log (without sync vs with sync) Code change for add extra sync: memcontrol03 diff --git a/testcases/kernel/controllers/memcg/memcontrol03.c b/testcases/kernel/controllers/memcg/memcontrol03.c index e927dfd19..5dbf5646f 100644 --- a/testcases/kernel/controllers/memcg/memcontrol03.c +++ b/testcases/kernel/controllers/memcg/memcontrol03.c @@ -144,6 +144,8 @@ static void alloc_pagecache_in_child(const struct tst_cg_group *const cg, tst_res(TINFO, "Child %d in %s: Allocating pagecache: %"PRIdPTR, getpid(), tst_cg_group_name(cg), size); alloc_pagecache(fd, size); +SAFE_FSYNC(fd); + TST_CHECKPOINT_WAKE(CHILD_IDLE); TST_CHECKPOINT_WAIT(TEST_DONE);
Created attachment 873719 [details] memcontrol03_log_ppc64
> memcontrol03.c:216: TFAIL: Expect: (A/B/C memory.current=6750208) ~= 34603008 <<<< That's surprisingly low number given the expectation. Do you see any failures also with accessing pagecache in read mode (as in comment 16)? I'm not sure whether a) alloc_pagecache(READ) b) alloc_pagecache(WRITE) + fsync both yield the same result wrt amount of pages that remain on LRU lists.
alloc_pagecache(READ) without fsync https://openqa.suse.de/tests/13858713#next_previous diff --git a/testcases/kernel/controllers/memcg/memcontrol_common.h b/testcases/kernel/controllers/memcg/memcontrol_common.h index e39e455dd..a5f03be09 100644 --- a/testcases/kernel/controllers/memcg/memcontrol_common.h +++ b/testcases/kernel/controllers/memcg/memcontrol_common.h @@ -30,11 +30,13 @@ static inline void alloc_pagecache(const int fd, size_t size) { char buf[BUFSIZ]; size_t i; + off_t len; - SAFE_LSEEK(fd, 0, SEEK_END); + len = SAFE_LSEEK(fd, 0, SEEK_END); + SAFE_FTRUNCATE(fd, len + size); for (i = 0; i < size; i += sizeof(buf)) - SAFE_WRITE(SAFE_WRITE_ALL, fd, buf, sizeof(buf)); + SAFE_READ(1, fd, buf, sizeof(buf)); } Failed log detail please check attachment, the failed part is not same.memcontrol03.c:214: TFAIL: Expect: (A/B/C memory.current=10682368) ~= 34603008 high number given the expectation
Created attachment 873781 [details] read without fsync log for ppc64
Interesting. Do I get it right that these fails happen only on ppc64le hosts?
FYI memcontrol03 has been fixed with LTP fix [1] using fsync(), first used in 67.1. I guess we can close this. [1] https://github.com/linux-test-project/ltp/commit/ab1c8d16eb131969133edaa79f39e5443216100a
(In reply to Michal Koutný from comment #27) > Interesting. Do I get it right that these fails happen only on ppc64le hosts? Yes, ONLY ppc64.
Please indicate the current expectation for fixing the bug using the Target Milestone field. What do we believe is possible? In the cases where we do not plan to deliver or cannot guess (!?) please do not enter anything, but you can comment if you wish to provide more details.
As a PPC64 problem, Hector, could you please look at its severity?
Downgrading priority, this can be addressed in MU
Hi mkoutny chrubis Current ppc64le issue seems not related with fsync/sync(dirty cache), any other suggestion?
It may be related to larger page size on ppc64le and larger accumulated error in counters used to calculate reclaim amount and protections as well as rounding overreclaim in larger quanta. I.e. it would be interesting to see if it reproduces on ppc64le with > # drop memory_recursivprot > mount -t cgroup2 -oremount,nsdelegate none /sys/fs/cgroup Note that I have observed failures without recursive protection (comment 10) but that was because of dirty cache, which should have been eliminated.
(In reply to Michal Koutný from comment #37) > It may be related to larger page size on ppc64le and larger accumulated > error in counters used to calculate reclaim amount and protections as well > as rounding overreclaim in larger quanta. > > I.e. it would be interesting to see if it reproduces on ppc64le with > > > # drop memory_recursivprot > > mount -t cgroup2 -oremount,nsdelegate none /sys/fs/cgroup > > Note that I have observed failures without recursive protection (comment 10) > but that was because of dirty cache, which should have been eliminated. Following case drop memory_recursivprot but still failed. Also attach the log. https://openqa.suse.de/tests/14037383#step/memcontrol03/6
Created attachment 874296 [details] drop memory_recursivprot