Bugzilla – Bug 1159882
Excessive swapping when buffers / cache expand beyond free physical RAM
Last modified: 2021-06-24 22:30:49 UTC
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36 Build Identifier: https://forums.opensuse.org/showthread.php/538586-observing-excessive-swapping-when-copying-large-files --- Same behavior here: Started dd if=/dev/sdc of=/dev/null bs=4M with no swap being active. Everything was fine until I turned swap on. The machine would freeze immediately. Swapoff runs for minutes until finally succeeding. I think it is a kernel bug. --- Reproducible: Always Steps to Reproduce: 1. copy large file (e.g. 2x RAM size) with cp / use dd that triggers block device caching, eventually swap 2. watch system suffer from high iowait on device holding root-fs & swap Actual Results: Do NOT increasing cache beyond free physical RAM size by swapping Expected Results: Smooth operation, no lags
The system I'm currently running test on: CPU: Dual Core Intel Core2 T5200 Kernel: 5.3.12-2-default x86_64 Mem: 861.9/1969.0 MiB (43.8%) Storage: 29.82 GiB (51.0% used) Drives: Local Storage: total: 29.82 GiB used: 15.20 GiB (51.0%) ID-1: /dev/sda vendor: SanDisk model: SDSSDRC032G size: 29.82 GiB speed: 1.5 Gb/s serial: <filter> Other laptops are affected as well, less so due to being a whole lot newer with more RAM and faster storage.
Could you collect /proc/vmstat data while you are running the test. E.g. something like while true do TS="$(date +%s)" cp /proc/vmstat vmstat.$TS sleep 1s done Also is this a new problem? When have you noticed it? Was it after an update to a maintenance kernel or upgrade to a new major kernel release?
I first noticed it on my work laptop, when copying a VM image (about 30G) from A to B. That machine has an i7, NVME storage (samsung 960 EVO) and 16GB of RAM. At one point it was hitting swap massively (high IO, not amount swapped out at any given time), and the GUI would freeze for seconds at a time. I was wondering why the heck it would start using swap at all. All of this badness goes away when turning swap off completely. I don't have anything useful to say as to when this may have started, just that I've never before experienced such pathological behaviour when just copying a file! As described in the forum posts, it just doesn't make any sense whatsoever. I do expect a massive performance penalty when swap is used, that is not the question. The question here is, why it hits swap at all. My suspicion is that buffers grow to a point that triggers swapping, which should never happen. For convenience I started testing on the old laptop, different hardware, slower, much less RAM, just to rule out some freak HW issue or configuration differences. Both run up to date TW.
Created attachment 826954 [details] vmstat traces - all swap off - all swap on - all swap off again
I ran this during taking the vmstat traces: dd if=/dev/sda1 of=/dev/null status=progress All swap space was turned off initially, transfer speed was compatible with a SATA-I connection, about 150MB/s. Then swap was turned on and badness occurred. Transfer speed dropped to less than 50% (swap on same device, 2x IO, might be OK by the numbers), and the GUI became painlessly unresponsive. I ran this running a plasma5 session + tmux.
(In reply to robert spitzenpfeil from comment #4) > Created attachment 826954 [details] > vmstat traces - all swap off - all swap on - all swap off again This is not useful much without knowing when exactly when the swap is enabled/disabled. Could you provide a single run of the effected workload without any changes in the configuration please?
File size is a pretty good indicator. I will run it again.
Yet another process that is affected by this: Restoring a VM (Virtualbox) from a disk image (clonezilla), Host IO cache is on. Most of the writes end up in buffers, at one point swapping kicks in, just a few 100s of MB, nothing serious. GUI performance gets rather choppy. Turning swap off instantly resolves it on my i7.
Created attachment 827191 [details] vmtraces running dd
(In reply to robert spitzenpfeil from comment #9) > Created attachment 827191 [details] > vmtraces running dd During the 76s captured here we have pgalloc_dma 4009 and pgalloc_dma32 771840 allocated. That means 3G worth of memory allocated. pgfree 564330 pages have been freed during that time period and memory reclaim has recycled pgsteal_direct 195381 (direct reclaim) and pgsteal_kswapd 294145 (kswapd) which is 1.9G. The reclaim effectiveness (scan/steal) was 92% for both the direct reclaim and kswapd. This looks reasonable. 53824 pages have been swapped out (210M) which is not that bad. But more importantly pswpin 2148 (8M) has been swapped back in. This means there were no real refaults from the swap going on so this is not really any form of a swap storm. In other words these numbers are not indicating any form of struggling. Either you haven't captured counters while the system has been really struggling or something else is going on. Can you try to check what are those processes stuck on?
Could it be answered why it is swapping at all? I question the rationale of swapping to increase buffers. As everything works fluently with swap off, I fail to see the necessity for swapping in the first place. My opinion: "just copy the damn data" and limit buffers to actually free RAM, shrink buffers when RAM is required elsewhere, THEN swap if absolutely necessary. I will acquire some data on my i7 machine while doing the VM restore.
(In reply to robert spitzenpfeil from comment #11) > Could it be answered why it is swapping at all? > > I question the rationale of swapping to increase buffers. As everything > works fluently with swap off, I fail to see the necessity for swapping in > the first place. Well, you are right and this is what the memory reclaim implements. In fact the reclaim is heavily page cache biased. There is a heuristic to detect single pagecache access patterns. The anonymous memory is reclaim usually only when the page cache gets really low. I didn't get (and won't get to analyze further before Tuesday) check details but there are several things that might be tried in the mean time. a) rule out memory cgroup controller - background the global reclaim tries to spread the memory pressure evenly to all cgroups. Some of them might be really low on pagecache and swapout is preferred. This could be ruled out by booting with cgroup_disable=memory kernel parameter b) there is more going on and the the page cache is really low c) there is a lot of dirty page cache accumulated. This is not likely much because the reclaim efficiency is high but it would be good to double check by reducing the amount of dirty data that might accumulate /proc/sys/vm/dirty_{background_}bytes to something relatively small (say 300MB for dirty_bytes and 100MB for background) d) there is a bug in the kernel
This bug report may be related. https://bugzilla.kernel.org/show_bug.cgi?id=196729
Apart from things to try mentioned in comment 12 it would be interesting to see how the system behaves with 2c012a4ad1a2cd3fb5a0f9307b9d219f84eda1fa upstream commit reverted. Let me know if you need a help with that.
It seems there have been improvements! So far I cannot reproduce on my i7 machine, which is great. I will try on the older core2 duo. I've also asked on the forum thread for others to test it again and report back. Maybe it's gone :-)
I cannot reproduce on my old laptop as well. This is looking GOOD !
What's the current kernel version that appears fixed?
I'm on TW 5.6.2-1-default and cannot reproduce on both of my laptops. Someone in the forum has tested again and it seems to be OK now. Recently I tested Debian 18.04 LTS and that is borked. If it matters, I currently use: vm.swappiness = 10 vm.vfs_cache_pressure = 50 BTW, these settings didn't do anything to the problem when it still existed.
Linux erlangen 5.6.2-1-default #1 SMP Thu Apr 2 06:31:32 UTC 2020 (c8170d6) x86_64 x86_64 x86_64 GNU/Linux https://forums.opensuse.org/showthread.php/538586-observing-excessive-swapping-when-copying-large-files?p=2932840#post2932840
Robert, I guess you haven't encountered the problem again with 5.6 and later. Should we close with WORKSFORME then?
I haven't been affected by this in quite a while. Let's close it.
Per comment 21.
Confirmed this bug exists in OpenSUSE LEAP 15.2. https://forums.opensuse.org/showthread.php/544618-Hard-Disk-Activity-Memory-Hole?p=2965585#post2965585 Brand new Dell Inspiron 7591 laptop. 16GB RAM, 1TB SSD. dd, rsync, and over operations that use lots of disk I/O result in the system dramatically digging into swap. Even with swappiness=1, swap use (of 1GB swap) increased to 105 MB (10%). Transcript of forum post: ----- QUOTE BEGIN I do not know where else to post this, so here goes. I have a clean basic XFCE installation of OpenSUSE LEAP 15.2. This behavior happens on both the Asus R541U laptop I used to have (8GB RAM, 512MB Swap) and my new Dell Inspiron 7591 (16GB RAM, 1GB swap). I would boot into OpenSUSE to do some "hard drive wrangling", i.e. making disk images of hard drives via USB adapters (dd if=/some/device | gzip -c > imagefile) or zeroizing old disks (dd if=/dev/zero of=/some/device). As soon as I begin the dd process, my RAM and swap climb through the roof. Almost no applications are open when this occurs. For example: 1) When I was using my Asus to read the 512GB SSD via an adapter to another USB external hard drive (BACKUP) (i.e. dd if=/dev/nvme0n1 | gzip -c > /run/media/robert/BACKUP/Windows/dell7591.img.gz) 2) When I was backing up my files on the Asus (rsync -Hav /home/robert/ /run/media/robert/BACKUP/dell/robert/) 3) When I was copying the dd image to the new 1TB SSD upgrade for my dell (gunzip -c /run/media/robert/BACKUP/Windows/dell7591.img.gz | dd of=/dev/nvme0n1) 4) When I was just now zeroizing the old 512GB SSD via the same USB adapter (dd if=/dev/zero of=/dev/sdb) 5) When I was synchronizing my incremental monthly backups (both 2TB external USB drives running LUKS) (rsync -Hav --delete --progress /run/media/robert/BACKUP/ /run/media/robert/BACKUP2/ ) It always seems connected to rsync/gzip/dd, i.e. heavy use of filesystems. If I boot OpenSUSE and I am just sitting in OpenSUSE using applications, usually it does not cause me to dig into swap. At the height of the zeroizing action, for example, swap use (16GB RAM, 1GB swap, new Dell Inspiron 7591) climbed to 108MB. It dropped to 11MB. Given that I have 16GB of RAM, such behavior is absolutely unacceptable. All the I/O should be happening on disks. I have not been able to triangulate, using top, what process is eating RAM so much. I am using EXT4 exclusively, no BTRFS anywhere. I have remounted all tmpfs entries to only give them 1GB of RAM to work with, as in the past this has prevented such excessive swappiness (believe it or not; it's difficult to prove; older versions of OpenSUSE, etc). I am willing to run experiments to see what's going on. I noticed that there were some btrfs components of systemd that were installed. I uninstalled them, but the problem remains. I don't understand how even running something complex as rsync + gzip + dd should need to dig into that much system RAM. I mean, I have 16GB! Have any memory leaks been reported on OpenSUSE? ----- END QUOTE I am very willing to provide any information to help resolve this apparent memory hole or memory leak.
I think I did this properly, but please help me because I'm new to BugZilla. I saved this to OpenSUSE LEAP 15.2 because I experience it "in the wild" in LEAP 15.2. Please forgive me if this is not the right way to do it. Please contact me ASAP if you need anything: I really want to help the community.
see comment 2
I think it would have been better to open a new bug for Leap 15.2 and link it with this one, but whatever. Let's keep it here. It is not surprising to see it strikes 15.2 too. The original bug was reported against 5.3 kernel, which is in 15.2. It got somehow fixed in 5.6 at the latest. We may try to find the fixes but they may be too intrusive to backport. First, it would be nice to walk through the bug and provide the same info Michal and Vlastimil asked for the original report. That is, vmstat logs, swap on/swap off behaviour and such.
I had reinstalled OpenSUSE LEAP 15.2 using the DVD but with network enabled, so what I should've gotten was a fresh installation of the most current stable OpenSUSE LEAP 15.2. VMSTAT information as requested: http://www.puresimplicity.net/~delahunt/vmstat/swapon/ http://www.puresimplicity.net/~delahunt/vmstat/swapoff/ Basically, I had reinstalled OpenSUSE LEAP 15.2 with the LUKS-contained LVM of /dev/system/home and /dev/system/swap but I had deleted the LV of swap and ran the system without swap. So the swapoff is a recording of vmstat while I was doing dd if=/dev/zero of=/dev/sdb bs=4K status=progress and the swapon directory is after I went back into the partitioner, recreated the swap LV, turned swap on, then ran the dd command above all over again. The system immediately dug into swap to about the 40MB mark. Running free -m second by second, I could see available RAM plummet and swap climb. I put other assorted diagnostic information in http://www.puresimplicity.net/~delahunt/vmstat such as dmesg, cpuinfo, lsmod, rpms, etc. I noted that the partitioner installed a package called nvme-cli-1.10-lp152.1.3.x86_64 when I clicked "accept" to add the swap LV. I am very determined to help get this fixed, so please notify me immediately if there's anything else I can help with.
Please note that I had set vm.vfs_cache_pressure=200 on this installation, so both the vmstat results above were while vfs_cache_pressure=200. I noticed that, while the system ran mostly fine with this set and with no swap, at boot sometimes the system seems to have some sort of deep process bogging down, as the keyboard (for instance) seems to not register every 5th key or so. I have to very closely watch the asterisks on log-in pages or else it throws off password typing, etc. Might be an unrelated bug, not sure. I expect that, because this laptop is brand new, there might be some early kernel bugs or new hardware issues, and I'd absolutely love to help out in any way I can. Even if it means running a debug kernel, if you can show me how. I have enabled multiple ACPI and other kernel boot time parameters in the boot command line to see if maybe one of these helps us find the problem.
(In reply to Miroslav Beneš from comment #26) > It is not surprising to see it strikes 15.2 too. The original bug was > reported against 5.3 kernel, which is in 15.2. It got somehow fixed in 5.6 > at the latest. We may try to find the fixes but they may be too intrusive to > backport. > The commit 2c012a4ad1a2cd3fb5a0f9307b9d219f84eda1fa mentioned in comment #14 was effectively removed in commit b91ac374346ba206cfd568bb0ab830af6b205cfd which went into 5.5. I actually observed quite similar symptoms in Ubuntu 18.04 as soon as it bumped HWE kernel to 5.3 and had to install 5.5 (5.4 had the same issue). I do not know if b91ac374346ba206cfd568bb0ab830af6b205cfd alone can be back ported but may be 2c012a4ad1a2cd3fb5a0f9307b9d219f84eda1fa could be reverted as this is what happened in upstream anyway.
Would you recommend the user (me) build or install one of the newer Linux kernels? Do you have a specific version or tree that you would prefer I attempt to build or install? Please let me know. I may decide, some time today, to grab the latest stable Linux kernel and build it anyways, just for "fun," after my class today.
I ran the dd exercise booting into the OpenSUSE LEAP 15.2 debug kernel. The dmesg output: http://www.puresimplicity.net/~delahunt/vmstat/dmesg_debug.txt Is the result of doing so and then running the dd command. Logs filled up pretty quick, etc. Hope this helps someone debug the problem.
(In reply to Robert Delahunt from comment #30) > Would you recommend the user (me) build or install one of the newer Linux > kernels? Do you have a specific version or tree that you would prefer I > attempt to build or install? Please let me know. I may decide, some time > today, to grab the latest stable Linux kernel and build it anyways, just for > "fun," after my class today. Running with the latest Linus' tree with the same config might tell us more. There are certainly other changes in MM that might make a difference. 2c012a4ad1a2 ("mm: vmscan: scan anonymous pages on file refaults") can definitely cause more swapping. I was not particularly happy about the patch (https://lore.kernel.org/linux-mm/20190712071359.GN29483@dhcp22.suse.cz/). Another option would be trying with that one reverted. Let me know if you need a help with that.
(In reply to Robert Delahunt from comment #27) > I had reinstalled OpenSUSE LEAP 15.2 using the DVD but with network enabled, > so what I should've gotten was a fresh installation of the most current > stable OpenSUSE LEAP 15.2. > > VMSTAT information as requested: > > http://www.puresimplicity.net/~delahunt/vmstat/swapon/ > > http://www.puresimplicity.net/~delahunt/vmstat/swapoff/ > > Basically, I had reinstalled OpenSUSE LEAP 15.2 with the LUKS-contained LVM > of /dev/system/home and /dev/system/swap but I had deleted the LV of swap > and ran the system without swap. So the swapoff is a recording of vmstat > while I was doing > > dd if=/dev/zero of=/dev/sdb bs=4K status=progress > > and the swapon directory is after I went back into the partitioner, > recreated the swap LV, turned swap on, then ran the dd command above all > over again. I have looked at swapon data. First vmstat Last vmstat[diff] pgscan_direct 2383 0 pgscan_kswapd 14565113 424280 pgsteal_kswapd 14513308 423192 No direct reclaim, so kswapd was able to cope with the allocation pace. The overall reclaim efficiency is nice as well (99%) and the reclaim itself has freed 1.6G worth of memory pgalloc_dma32 2334299 467232 pgalloc_normal 24091786 3564456 while 15.7G of memory was requested during that time period. pswpin 0 0 pswpout 0 11842 No memory has been swapped in while 46M has been swapped out. This on its own doesn't sound overly excessive. I would be much more worried if pswpin was high because that would suggest that memory actively in use has been swapped out and so the owner would see larger latencies on refault. workingset_activate 86 41 workingset_refault 10237 141 workingset_restore 0 0 These are stats for disk based page cache refaults. workingset_refault will tell us how many page cache pages have been reclaimed and then faulted back in again. 141 pages is really minuscule. workingset_activate tells us how many pages were reclaimed recently so we should consider them active. workingset_restore will tell us that the refault is happening on a previously active page. All in all not much of a refault activity. If there is enough clean page cache then we shouldn't swap at all. There are three jumps in swapout vmstat.1600702573:pswpout 0 vmstat.1600702574:pswpout 2448 [...] vmstat.1600702577:pswpout 5555 vmstat.1600702578:pswpout 10605 vmstat.1600702579:pswpout 11330 The biggest one swapped out 5050 withing one second. vmstat.1600702577 1600702578 [diff] nr_active_anon 373934 -3376 nr_active_file 131158 0 nr_inactive_anon 48441 8556 nr_inactive_file 3288371 1169 nr_dirty 632308 28 There is plenty of inactive pagecache. A large part is dirty but there should be still a lot of clean page cache to reclaim. The file inactive list is clearly not low on the global level. workingset_refault 10281 0 no refaults detected so the heuristic from 2c012a4ad1a2 shouldn't trigger. These are all global numbers. Picture would be quite different if memory cgroups were deployed though. I have asked earlier for the behavior with cgroup_disable=memory on the kernel command line parameter. See comment 12 for more information.
Would using a tumbleweed kernel give you good information? If so, please tell me what specific tumbleweed packages I need to install and what LEAP 15.2 packages I need to remove in order to test this theory. Also, would you have me grab the latest stable kernel and "yes | make oldconfig" and then see how that goes? I haven't built a kernel for OpenSUSE LEAP 15.2 before, so I might need a tutorial on how to mkinitrd. Please notify me.
(In reply to Robert Delahunt from comment #34) Please start by cgroup_disable=memory with your current kernel first. If this works around the problem then my theory about proportional reclaim distributed memory pressure to anonymous mostly cgroups would be a good fit. After that is confirmed then it would be great to run with the current kernel. Installing one from http://download.opensuse.org/repositories/Kernel:/stable/standard/ should do it.
http://www.puresimplicity.net/~delahunt/vmstat/cgroup_disable/ Here are some vmstats for the stock OpenSUSE LEAP 15.2 kernel. I started the dd command then realized I don't have a swap set, so I created one and added it. As soon as I ran swapon, it climbed to about 20MB or so. Still a bit better than the excessive swapping, but still, with 16GB of main RAM and nothing but Chrome running, that's excessive. Let me know if this is enough or you want me to install the latest kernel. I'm very eager to help.
(In reply to Robert Delahunt from comment #36) > http://www.puresimplicity.net/~delahunt/vmstat/cgroup_disable/ Thanks. The data seem to be in line with what we have seen previously: vmstat.1600781117 1600781118 [diff] nr_active_anon 166943 -3026 nr_active_file 85576 3 nr_inactive_anon 33207 8180 nr_inactive_file 3540169 9504 nr_dirty 667650 -203 workingset_activate 154 0 workingset_refault 2455 1 workingset_restore 0 0 pswpin 0 0 pswpout 2448 3118 The inactive list is really large and mostly clean so there shouldn't be any reason to swap out. I suspect the reclaim is confused for some reason. Again anonymous inactive list is low and needs rotation but I fail to see any reason why it should get reclaimed. get_scan_count should opt for page cache reclaim only. Could you give the newer kernel a try as noted in previous comment, please?
I will install the kernel in the repository in the link. Please reply soon with what specific packages I need to install from it, and/or any other information, as I have only ever ran a different kernel than stock twice. "Back in my day" I would compile a static kernel for Slackware-Current. Now, however, I will need a slight bit of coaching. If this needs to come over direct email or text or whatever, please let me know. I will boot back into OpenSUSE LEAP 15.2 and await your instructions while adding the repo.
(In reply to Robert Delahunt from comment #38) > I will install the kernel in the repository in the link. Please reply soon > with what specific packages I need to install from it, and/or any other > information, as I have only ever ran a different kernel than stock twice. > "Back in my day" I would compile a static kernel for Slackware-Current. > Now, however, I will need a slight bit of coaching. If this needs to come > over direct email or text or whatever, please let me know. I will boot back > into OpenSUSE LEAP 15.2 and await your instructions while adding the repo. Installing the kernel should be sufficient AFAIK.
http://www.puresimplicity.net/~delahunt/vmstat/suse_stable/ I guess I didn't need help. So I got the new kernel installed and selected it at boot. Ran the same dd test. I left it running and the system never used swap. /proc/sys/vm/swappiness still = 60.
(In reply to Robert Delahunt from comment #40) > http://www.puresimplicity.net/~delahunt/vmstat/suse_stable/ > > I guess I didn't need help. > > So I got the new kernel installed and selected it at boot. Ran the same dd > test. I left it running and the system never used swap. > /proc/sys/vm/swappiness still = 60. OK, this is good to know. Newer kernels have changes which check refaults on anonymous memory as well so this has likely changed the balance. These would be out of scope for 15.2 unfortunately. Vlastimil, I remember we have discussed this problem in upstream some time ago. You've had a patch which has disabled the heuristic (2c012a4ad1a2). Testing with that reverted would sound like a good next step.
(In reply to Michal Hocko from comment #41) > Vlastimil, I remember we have discussed this problem in upstream some time > ago. You've had a patch which has disabled the heuristic (2c012a4ad1a2). > Testing with that reverted would sound like a good next step. I think the past discussion was about us *not* having 2c012a4ad1a2 (in an older kernel) as the problem was different - file pages thrashing while unused anonymous pages sit idly. See https://lore.kernel.org/linux-mm/b7f5e356-1f0a-98be-4a32-09a766c3949b@suse.cz/ Anyway, what is the actual observed issue here? Is it that part of the swap gets used? I think Michal's analysis in comment 33 shows the swapped out pages are not accessed (no increase in pswpin) so it shouldn't actually cause excessive IO. So is it only that the swap being used looks bad? If there's really observed performance issue (e.g. system being sluggish) while doing the operations listed in comment 23, does disabling swap completely make any difference? If not, we might be looking at a red flag here, IMHO.
(In reply to Vlastimil Babka from comment #42) > (In reply to Michal Hocko from comment #41) > > Vlastimil, I remember we have discussed this problem in upstream some time > > ago. You've had a patch which has disabled the heuristic (2c012a4ad1a2). > > Testing with that reverted would sound like a good next step. > > I think the past discussion was about us *not* having 2c012a4ad1a2 (in an > older kernel) as the problem was different - file pages thrashing while > unused anonymous pages sit idly. See > https://lore.kernel.org/linux-mm/b7f5e356-1f0a-98be-4a32-09a766c3949b@suse. > cz/ Ahh, I remember now. > Anyway, what is the actual observed issue here? Is it that part of the swap > gets used? I think Michal's analysis in comment 33 shows the swapped out > pages are not accessed (no increase in pswpin) so it shouldn't actually > cause excessive IO. So is it only that the swap being used looks bad? Yes this is the case here. But I am more worried this is a more general problem that might actually hit somewhere else. There shouldn't be really any real reason to swap out anything with that much of easily reclaimable page cache which doesn't refault heavily. Remember this is a simple stream writer usecase. That shouldn't really disrupt anonymous memory users. I am quite busy now but I will try to prepare a kernel with 2c012a4ad1a2 reverted because that might be easier to adopt in 15.2 resp SLE15-SP2 kernels than the current upstream which is likely fixing the problem by applying the refault logic to the anonymous memory as well. Thanks Vlastimil!
This is a problem because with default kernel VM settings and a swap, a 16GB system using dd/gzip/rsync is heavily impacted. For instance, I can connect my external 1TB hard drive and (/home LUKS -> external 1TB LUKS) have 100MB or higher swap utilization. And that's the first command being run when the system is booted. Changing swappiness to 1 and VFS cache pressure to 200 doesn't eliminate swapping. System bogs very drastically, even with / being housed on a brand new 1TB Kingston SSD. I understand that maybe some of this is intrinsic to the older kernel plus the brand new hardware, but still, I've never seen previous versions of OpenSUSE dig so heavily into swap just backing up my stuff to my 1TB external, for instance. At some points the system lags so bad that the mouse slows and the system (for all intents and purposes) behaves like it's locked up. Getting to a virtual terminal is possible, so the system isn't locked, but it drags down all of X and XFCE with it. (Which is noteworthy: user is not using a "larger" WM/DE like KDE/Gnome/MATE.) So it's basically every disk I/O. For instance, I got a new MicroSD to put college stuff on (Windows vs Linux, so that my college documents are "portable" in case of a problem or in case I need to do work at school) and even putting maybe 1GB of documents on that 64GB MicroSD caused the system to dig into swap. So it's literally every Disk I/O. Running the original OpenSUSE LEAP 15.2 kernel with the swappiness and cache pressure variables modified but without a swap alleviated half the issues, but it still caused (when the system reached the end of RAM and had to "move things around") the system to lag pretty bad. These issues seem to be completely gone with the bleeding edge kernel. Please consider this a serious issue. Maybe on this fast a system, a user would be willing to ignore it. But it affects OpenSUSE as a whole in that anyone who may be trying OpenSUSE but sees this behavior may just decide to burn a different distribution to DVD and install something else. Which may affect their perception of SUSE Enterprise Linux as a result. For me, I seriously had the thought to switch distributions. And I've been using OpenSUSE since at least 42.3. Of course, I didn't, but still.... I can't tell you what to do, I would just beg you to consider this a serious issue.
(In reply to Andrei Borzenkov from comment #29) > (In reply to Miroslav Beneš from comment #26) > > It is not surprising to see it strikes 15.2 too. The original bug was > > reported against 5.3 kernel, which is in 15.2. It got somehow fixed in 5.6 > > at the latest. We may try to find the fixes but they may be too intrusive to > > backport. > > > > The commit 2c012a4ad1a2cd3fb5a0f9307b9d219f84eda1fa mentioned in comment #14 > was effectively removed in commit b91ac374346ba206cfd568bb0ab830af6b205cfd > which went into 5.5. I actually observed quite similar symptoms in Ubuntu > 18.04 as soon as it bumped HWE kernel to 5.3 and had to install 5.5 (5.4 had > the same issue). > > I do not know if b91ac374346ba206cfd568bb0ab830af6b205cfd alone can be back > ported but may be 2c012a4ad1a2cd3fb5a0f9307b9d219f84eda1fa could be reverted > as this is what happened in upstream anyway. I have read through this bugzilla again and noticed that I have missed this comment previously. So reverting 2c012a4ad1a2c is not really straightforward exactly because of b91ac374346 which openSUSE-15.2 kernel has as well. And looking closer it can contribute to the problem itself. Mostly because it of + /* + * When refaults are being observed, it means a new + * workingset is being established. Deactivate to get + * rid of any stale active pages quickly. + */ + refaults = lruvec_page_state(target_lruvec, + WORKINGSET_ACTIVATE); + if (refaults != target_lruvec->refaults || + inactive_is_low(target_lruvec, LRU_INACTIVE_FILE)) + sc->may_deactivate |= DEACTIVATE_FILE; + else + sc->may_deactivate &= ~DEACTIVATE_FILE; [...] + if (file >> sc->priority && !(sc->may_deactivate & DEACTIVATE_FILE)) + sc->cache_trim_mode = 1; + else + sc->cache_trim_mode = 0; note how refaults != target_lruvec->refaults can easily move us to SCAN_FRACT even if there is a lot of page cache after a single activation. 2c012a4ad1a2c was less agresive in that regards because it only forced active -> inactive rebalance on an activation. I might be misreading this, the logic is quite convoluted but it should be pretty straightforward to drop this patch and have you retest it. Ccing Mel as well.
The kernel with b91ac374346 dropped should appear in https://download.opensuse.org/repositories/home:/mhocko:/bsc1159882/standard. After it gets build etc. Please give it a try with the same setup as previously.
Ran the provided kernel. It seemed to do well. http://www.puresimplicity.net/~delahunt/vmstat/mhock/ Although swappiness = 1 and vfs cache pressure = 200, it didn't seem to go beyond 6 GB of RAM usage. No swap seemed to get used. I was creating 1GB files full of zeros and I ran my rsync command to back up my files.
(In reply to Robert Delahunt from comment #47) > Ran the provided kernel. It seemed to do well. > > http://www.puresimplicity.net/~delahunt/vmstat/mhock/ > > Although swappiness = 1 and vfs cache pressure = 200, it didn't seem to go > beyond 6 GB of RAM usage. No swap seemed to get used. I was creating 1GB > files full of zeros and I ran my rsync command to back up my files. DISREGARD, I realized I had booted into Linux to copy my Music (13GB) to my external hard drive (USB-C enclosure for my NVME 512GB SSD). As I was doing so, I saw free RAM falling and then fired up vmstat again. Swap usage climbed (literally only used Yast and Chrome after a reboot). Check the second set of vmstat logs after the time delay. Sorry about that, I spoke too soon.
(In reply to Robert Delahunt from comment #48) > DISREGARD, I realized I had booted into Linux to copy my Music (13GB) to my > external hard drive (USB-C enclosure for my NVME 512GB SSD). For reliably testing I always copied a large drive: https://forums.opensuse.org/showthread.php/538586-observing-excessive-swapping-when-copying-large-files?p=2932840#post2932840
Do you have vmstats from the swapping situation?
http://www.puresimplicity.net/~delahunt/vmstat/mhock/ Like I said, the vmstats after the time delay. They are in this directory. Thanks for your diligence! :-)
(In reply to Robert Delahunt from comment #51) > http://www.puresimplicity.net/~delahunt/vmstat/mhock/ > > Like I said, the vmstats after the time delay. They are in this directory. I've misunderstood your comment. Anyway. The system has started swapout at 1600908544 until 1600908551 to grow to 10194 and stayed there for some time for some time to repeat a similar pattern. vmstat.1600908545 vmstat.1600908546 [diff] nr_active_anon 200295 -1660 nr_active_file 126906 6879 nr_inactive_anon 35519 7892 nr_inactive_file 3524850 -5187 nr_dirty 27282 -18238 pswpout 271 6282 workingset_activate 146 19 workingset_refault 146 19 So in overall numbers a huge amount of clean page cache. There are some refaults and all of them are eve activations. But the number is still very small to the actual page cache in general. pgscan_kswapd 59636 48139 pgsteal_kswapd 34126 41061 pgscan_direct 0 0 kswapd has relcaim 41k pages but let me outline that the overall number of anonymous pages has increased in total. So it is not just the streaming IO that is going on. We know that ~15% of the reclaimed memory was anonymous (and swapped out) the rest must have been the page cache. If this was fully proportional (swappiness) then the percentage would be different. So I suspect that there is still a prevalent pagecache only reclaim happening with some occasional runs based by refault information. We also age the anonymous active list quite a lot but that shouldn't really lead to swapout on its own. It however points a finger to 2c012a4ad1a2c. I haven't checked the full data set. It would be worth having another test with 2c012a4ad1a2c reverted before we spend more time on the data. I will upload a new kernel to the same location. Please note that the new kernel will have a different release number (bsc1159882_2).
(In reply to Michal Hocko from comment #52) > It would be worth having another test > with 2c012a4ad1a2c reverted before we spend more time on the data. I will > upload a new kernel to the same location. Please note that the new kernel > will have a different release number (bsc1159882_2). Any news?
Sorry, today is crunch time for my graduate college courses. I should be able to get to it tomorrow, 9/30/20. I'll do my best to get to it ASAP. This laptop has had a RAM upgrade to 32GB, by the way, which you'll probably notice in my next vmstat post.
(In reply to Robert Delahunt from comment #54) > Sorry, today is crunch time for my graduate college courses. I should be > able to get to it tomorrow, 9/30/20. I'll do my best to get to it ASAP. No rush.
I do not see a release that is listed as _2 at the end, not from Yast Software or your direct link. Please advise.
Nevermind, I re-checked and saw the date stamp was 25 September, so I reinstalled (what should be) the new kernel. Here are your new vmstats: http://www.puresimplicity.net/~delahunt/vmstat/mhocko2/ I have 32GB of RAM now but still it dug into about 20 MB of swap, even with swappiness=1. Changing swappiness to 60 during this operation didn't seem to influence how much swap it was using, as it still hovered around 20MB or so. It does this both with a file operation (copying large files to an external 512GB SSD in an enclosure) or zeroizing this drive when finished (dd if=/dev/zero of=/dev/sdc1 bs=4K count=1024 etc) I noticed that running sync after the file copy operation dug into swap, i.e. after terminating the copy command, took a long while. Please advise.
I didn't get to process data yet and will unlikely to do it sooner than next week. I am quite surprised that you still see a swapout though. I assume you have double checked the correct kernel is booted, right? (sorry about the stupid question but with more kernels involved this can happen). Have you tried to test with cgroup_disable=memory as well?
Your most recent kernel with cgroup_disable=memory shows no swap usage when copying 16GB of data between drives. There was plenty of time to observe RAM get used up (monitoring free -m every second using a script) but it not resort to using swap. I double-checked and swappiness is set to 60 right now, so it would have had plenty of authority to swap out. vfs_cache_pressure=100 as well. Default system values. http://www.puresimplicity.net/~delahunt/vmstat/mhocko3/ There are the vmstat files for your convenience. Please let me know what else I can do to help eradicate this bug.
(In reply to Robert Delahunt from comment #59) > Your most recent kernel with cgroup_disable=memory shows no swap usage when > copying 16GB of data between drives. Thanks! Do you happen to use memory cgroups controller intentionally or it is being used automagically? I suspect the later. As already mentioned earlier (comment 12) the global memory pressure is spread over all existing memory cgroups. Anyway, your earlier tests suggested that cgroup_disable on its own didn't help and we need to have the 2 patches reverted. I will mull over some more but I unless Vlastimil or Mel oppose I will go ahead and revert both in 15sp2 and openSUSE-15.2 branches. For a better experience with cgroups enabled I would recommend using a most recent kernel (e.g. one from our stable repository).
I'm just an average Joe, I don't even know what cgroups are for.
(In reply to Robert Delahunt from comment #61) > I'm just an average Joe, I don't even know what cgroups are for. I would suspect some service has enabled the memory controller. Or maybe systemd on your system does that but I believe that the version we have in OS15.2 doesn't do that yet. Michal Koutny would know better and maybe give you better clues how to find out. For the general cgroups setup, please provide mount | grep cgroup Next steps will depend on the output.
(In reply to Michal Hocko from comment #62) > For the general cgroups setup, please provide > mount | grep cgroup Unless explicitly disabled (with kernel cmdline), the memory hierarchy (root only) would be always mounted. To get the information about fine-grained grouping, I suggest > find /sys/fs/cgroup/memory -type d Additionally, if there's a non-trivial structure, you can track the originating service by looking at Memory* directives (MemoryDenyWriteExecute= is irrelevant) > systemctl cat "*.service" | grep -E "# /|Memory"
mount | grep cgroup : mpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755) cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) systemctl cat "*.service" | grep -E "# /|Memory" # /usr/lib/systemd/system/systemd-update-utmp.service # /usr/lib/systemd/system/kbdsettings.service # /usr/lib/systemd/system/auditd.service ## /etc/systemd/system/auditd.service and add network-online.target # /usr/lib/systemd/system/apparmor.service # /usr/lib/systemd/system/getty@.service # /usr/lib/systemd/system/getty@tty1.service.d/noclear.conf # /usr/lib/systemd/system/systemd-fsck@.service # /usr/lib/systemd/system/lvm2-pvscan@.service # /usr/lib/systemd/system/user@.service # /usr/lib/systemd/system/kmod-static-nodes.service # /usr/lib/systemd/system/detect-part-label-duplicates.service # /usr/lib/systemd/system/systemd-user-sessions.service # /usr/lib/systemd/system/systemd-ask-password-plymouth.service # /usr/lib/systemd/system/systemd-journald.service MemoryDenyWriteExecute=yes # /usr/lib/systemd/system/systemd-fsck@.service # /usr/lib/systemd/system/systemd-udevd.service MemoryDenyWriteExecute=yes # /usr/lib/systemd/system/rtkit-daemon.service # /usr/lib/systemd/system/accounts-daemon.service # /usr/lib/systemd/system/upower.service MemoryDenyWriteExecute=true # /usr/lib/systemd/system/firewalld.service # /usr/lib/systemd/system/smartd.service # /usr/lib/systemd/system/systemd-backlight@.service # /usr/lib/systemd/system/lvm2-monitor.service # /usr/lib/systemd/system/bluetooth.service # /usr/lib/systemd/system/sshd.service # /usr/lib/systemd/system/dbus.service # /usr/lib/systemd/system/systemd-remount-fs.service # /usr/lib/systemd/system/cron.service # /usr/lib/systemd/system/avahi-daemon.service # /usr/lib/systemd/system/systemd-journal-flush.service # /usr/lib/systemd/system/user@.service # /usr/lib/systemd/system/colord.service # /usr/lib/systemd/system/systemd-tmpfiles-setup.service # /usr/lib/systemd/system/display-manager.service # /usr/lib/systemd/system/cups.service # /usr/lib/systemd/system/udisks2.service # /usr/lib/systemd/system/ModemManager.service # /usr/lib/systemd/system/wpa_supplicant.service # /usr/lib/systemd/system/postfix.service # /run/systemd/generator/systemd-cryptsetup@cr\x2dauto\x2d1.service # /usr/lib/systemd/system/mcelog.service # /usr/lib/systemd/system/systemd-logind.service MemoryDenyWriteExecute=yes # /usr/lib/systemd/system/rsyslog.service # /usr/lib/systemd/system/NetworkManager.service # /usr/lib/systemd/system/NetworkManager.service.d/NetworkManager-ovs.conf # /usr/lib/systemd/system/nscd.service # /usr/lib/systemd/system/../../dracut/modules.d/98dracut-systemd/dracut-shutdown.service # /usr/lib/systemd/system/irqbalance.service # /usr/lib/systemd/system/haveged.service # /usr/lib/systemd/system/systemd-backlight@.service # /usr/lib/systemd/system/fwupd.service # /usr/lib/systemd/system/polkit.service # /usr/lib/systemd/system/iscsi.service # /usr/lib/systemd/system/systemd-sysctl.service # /usr/lib/systemd/system/systemd-sysctl.service.d/50-kernel-uname_r.conf # /usr/lib/systemd/system/systemd-udev-trigger.service # /usr/lib/systemd/system/systemd-random-seed.service # /usr/lib/systemd/system/systemd-fsck-root.service # /usr/lib/systemd/system/klog.service # /lib/systemd/system/klog.service # /usr/lib/systemd/system/systemd-modules-load.service # /usr/lib/systemd/system/systemd-tmpfiles-setup-dev.service What are we investigating? By the way this is the stock kernel with cgroup_disable=memory boot parameter and no swap.
(In reply to Robert Delahunt from comment #64) [...] > By the way this is the stock kernel with cgroup_disable=memory boot > parameter and no swap. Sorry, I should have been more explicit. We are interested in who is using memory cgroup controller. But the cgroup_disable kernel command line makes it disabled. From the systemctl it seems no service is really trying to use it so I suspect it will be cgroup v1 created automatically and the hierarchy will mirror the systemd organization structure (slices, scopes etc.). Please boot again with the kernel command line parameter dropped.
With cgroup_memory=disable removed (not active) in boot parameters.... (By the way, I only started using that parameter during the process of this bug testing, so the initial comments I provided when I first began helping didn't have this...) This is stock kernel (i.e. opensuse-update) Linux desktop-01721d1.lan 5.3.18-lp152.44-default #1 SMP Wed Sep 30 18:51:43 UTC 2020 (914f31e) x86_64 x86_64 x86_64 GNU/Linux mount | grep cgroup > /tmp/cgroup.txt tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755) cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) systemctl cat "*.service" | grep -E "# /|Memory" # /run/systemd/generator/systemd-cryptsetup@cr\x2dauto\x2d1.service # /usr/lib/systemd/system/kbdsettings.service # /usr/lib/systemd/system/sshd.service # /usr/lib/systemd/system/user@.service # /usr/lib/systemd/system/cups.service # /usr/lib/systemd/system/../../dracut/modules.d/98dracut-systemd/dracut-shutdown.service # /usr/lib/systemd/system/systemd-journald.service MemoryDenyWriteExecute=yes # /usr/lib/systemd/system/NetworkManager.service # /usr/lib/systemd/system/NetworkManager.service.d/NetworkManager-ovs.conf # /usr/lib/systemd/system/auditd.service ## /etc/systemd/system/auditd.service and add network-online.target # /usr/lib/systemd/system/smartd.service # /usr/lib/systemd/system/lvm2-pvscan@.service # /usr/lib/systemd/system/ModemManager.service # /usr/lib/systemd/system/upower.service MemoryDenyWriteExecute=true # /usr/lib/systemd/system/systemd-random-seed.service # /usr/lib/systemd/system/user@.service # /usr/lib/systemd/system/postfix.service # /usr/lib/systemd/system/mcelog.service # /usr/lib/systemd/system/systemd-fsck@.service # /usr/lib/systemd/system/apparmor.service # /usr/lib/systemd/system/systemd-fsck-root.service # /usr/lib/systemd/system/systemd-modules-load.service # /usr/lib/systemd/system/cron.service # /usr/lib/systemd/system/rtkit-daemon.service # /usr/lib/systemd/system/systemd-fsck@.service # /usr/lib/systemd/system/lvm2-monitor.service # /usr/lib/systemd/system/irqbalance.service # /usr/lib/systemd/system/systemd-ask-password-plymouth.service # /usr/lib/systemd/system/systemd-tmpfiles-setup.service # /usr/lib/systemd/system/systemd-tmpfiles-setup-dev.service # /usr/lib/systemd/system/systemd-update-utmp.service # /usr/lib/systemd/system/avahi-daemon.service # /usr/lib/systemd/system/systemd-journal-flush.service # /usr/lib/systemd/system/dbus.service # /usr/lib/systemd/system/kmod-static-nodes.service # /usr/lib/systemd/system/firewalld.service # /usr/lib/systemd/system/systemd-udev-trigger.service # /usr/lib/systemd/system/systemd-logind.service MemoryDenyWriteExecute=yes # /usr/lib/systemd/system/accounts-daemon.service # /usr/lib/systemd/system/fwupd.service # /usr/lib/systemd/system/systemd-backlight@.service # /usr/lib/systemd/system/rsyslog.service # /usr/lib/systemd/system/haveged.service # /usr/lib/systemd/system/systemd-backlight@.service # /usr/lib/systemd/system/nscd.service # /usr/lib/systemd/system/polkit.service # /usr/lib/systemd/system/systemd-udevd.service MemoryDenyWriteExecute=yes # /usr/lib/systemd/system/detect-part-label-duplicates.service # /usr/lib/systemd/system/getty@.service # /usr/lib/systemd/system/getty@tty1.service.d/noclear.conf # /usr/lib/systemd/system/display-manager.service # /usr/lib/systemd/system/udisks2.service # /usr/lib/systemd/system/bluetooth.service # /usr/lib/systemd/system/iscsi.service # /usr/lib/systemd/system/systemd-user-sessions.service # /usr/lib/systemd/system/klog.service # /lib/systemd/system/klog.service # /usr/lib/systemd/system/wpa_supplicant.service # /usr/lib/systemd/system/systemd-sysctl.service # /usr/lib/systemd/system/systemd-sysctl.service.d/50-kernel-uname_r.conf # /usr/lib/systemd/system/colord.service # /usr/lib/systemd/system/systemd-remount-fs.service find /sys/fs/cgroup/memory -type d /sys/fs/cgroup/memory
(In reply to Robert Delahunt from comment #66) [...] > cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) [...] > find /sys/fs/cgroup/memory -type d > > /sys/fs/cgroup/memory This is more than surprising. Because there are no cgroups created and as such they cannot really influence the reclaim decisions. Could you try to reproduce your original problem and see whether something/somebody creates any directories (cgroups) in that hiearchy?
I set up one window doing dd if=/dev/zero of=/dev/sdc bs=4K Another monitoring the output of df -h | grep cgroup every second Another monitoring directories under /sys/fs/cgroup/memory Another monitoring free -m As expected, swap use spiked at roughly 200 MB (consider: I have 32 GB RAM) At no time did cgroup get used or any directories populate under /sys/fs/cgroup/memory Understand that I have a long history with OpenSUSE memory issues. In previous versions, I would put entries in /etc/fstab that forcefully remounted all tmpfs mountpoints with a size=512M option to reduce their usage. In the past, it seemed that helped my system use swap less, though I never collected any scientific data that would empirically prove it helped. It just seemed that stock OpenSUSE (previous versions) dug into swap more readily, so forcing tmpfs entries to use less space seemed to help. I have long been suspicious of tmpfs reclaim anyways. But regardless, it seems nothing is using cgroups. I have also speculated that OpenSUSE needs a laptop-specific kernel which would alter this behavior. I mean, does anyone honestly need groups on a laptop? Anyways, one minor note is the kernel you made, mhocko, results in me not having a sound card. But yeah, back to the original topic, cgroups isn't using anything, but the machine dug into swap predictably, just like before.
My experience with OpenSUSE is from v 42.3 through LEAP 15.2 (present).
(In reply to Robert Delahunt from comment #68) > I set up one window doing dd if=/dev/zero of=/dev/sdc bs=4K > Another monitoring the output of df -h | grep cgroup every second cgroup uses a virtual filesystem so df will not tell you much. > Another monitoring directories under /sys/fs/cgroup/memory I simply do not see how memcg controller enabled but not used can make any picture. [...] > But yeah, back to the original topic, cgroups isn't using anything, but the > machine dug into swap predictably, just like before. All that with the latest kernel I have provided, right? Is this really repeatable. Both with the cgroup controller disabled and enabled?
My latest test monitoring /sys/fs/cgroup/memory was with the OpenSUSE LEAP 15.2 stock (update) kernel (see uname -a in previous post). I can run it all again with your kernel, sure. But please reply real quick and tell me exactly what data you want, so that I can make sure I test things exactly as you want, with all the data you want. I don't have your kernel installed (I did a test where I reinstalled LEAP 15.2 without online updates and it seemed to fix most of the other weirdness I experienced in GTK/XFCE apps). So just please tell me everything you want to know. I'll install your latest kernel, reboot without the cgroups command line option, and then run all the tests you need, once I get back from the gym.
(In reply to Robert Delahunt from comment #71) > My latest test monitoring /sys/fs/cgroup/memory was with the OpenSUSE LEAP > 15.2 stock (update) kernel (see uname -a in previous post). OK, that explains it, I guess. Please stick with the test kernel so that we can actually draw any conclusion here. So far I believe that the two identified patches have made swapping much more probable under stream IO. Upstream kernel behaves differently because of later changes. The state we have in 15.2 kernels is half baked and therefore I would rather like to restore the previous behavior. For that I would like to see confirmed that a) test kernel (the latest one) doesn't swap under your streaming IO load without cgroups (cgroup_disable=memory) and that this is the case consistently in several runs b) the same tested _without_ cgroup_disable=memory parameter - aka cgroups enabled by default (check the cgroup hierarchy find /sys/fs/cgroup/memory -type d) in both cases collect /prov/vmstat as before
uname -a Linux desktop-01721d1.lan 5.3.18-lp152.2.gb85b477-default #1 SMP Fri Sep 25 14:55:58 UTC 2020 (b85b477) x86_64 x86_64 x86_64 GNU/Linux dd if=/dev/zero of=/dev/sdc bs=4K status=progress the find command found no directories in group other than memory swap use rose to 130 MB and then fluxuated between 80 and 100 MB. I am in Gnome Classic and literally the only things running are Dropbox and Chrome (and I'm only in one tab replying to this bug request). Your kernel is loaded. cgroup_disable=memory is NOT in the boot line (I removed it using advanced options prior to booting the kernel). How else may I help?
(In reply to Robert Delahunt from comment #73) [...] > How else may I help? Please read comment 72 again.
In about five minutes: This is with vmstats with your kernel with cgroups enabled: http://www.puresimplicity.net/~delahunt/vmstat/mhocko4/ This is with vmstats with your kernel with cgroups disabled: http://www.puresimplicity.net/~delahunt/vmstat/mhocko5/
Note that when I ran your kernel with cgroup memory disabled (mhocko5) and monitored the /sys/fs/cgroup/memory directory for more directories using find, it said /sys/fs/cgroup/memory didn't exist. This is different from the opensuse-update kernel (stock update) which still had the directory but it wasn't being used.
Both data sets are up.
I probably have the same issue. I already opened a new bug (see bug#1177541), before I get pointed to this one. So here is again my issue, and I hope that would be helpful. "My system freezes for few seconds when there is high disk usage, like copying large files, or when opening a demanding chrome web pages (due to swaping?). It happens on Ext4, Btrfs and XFS, so file system doesn't matter. It happens on both Gnome and Xfce, so that also doesn't matter. Windows 10, Fedora and Ubuntu works almost fine on the same device, it's a problem with Leap 15.2 only. So I upgraded my system from Leap 15.2 to TW, and everything works almost fine now. I booted my device to TW but with Leap kernel (5.3.18-lp152.44-default), and the freezes happen again. So it seems to me that's a kernel issue. My search lead me to multiple cases with the same issue (different distros). See: https://askubuntu.com/questions/1212736/system-freezes-on-disk-i-o And: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1861359 It seems like newer kernel have this issue fixed, maybe version 5.5.6 (as stated in the link), and it seems that Ubuntu backported successfully a fix to kernel 5.4. This issue is very annoying, I hope openSUSE can backport a fix from upstream. Reproducible: Always Steps to Reproduce: 1. Start a disk demanding process, like copying large files Actual Results: Freezes and a laggy mouse cursor Expected Results: Smooth system Maybe it could be more obvious in devices with low ram, but Ubuntu 20.04 and Windows 10 work perfectly on the same device."
Can confirm and agree with Abdulrhman Ied. Depending on the day, there would be a 3-5 second lag when my system began digging into swap. I figured it was just the swap factor and/or my system. However, I noticed on the MHocko kernels (see comment history) that it happened far less often.
(In reply to Robert Delahunt from comment #79) > Can confirm and agree with Abdulrhman Ied. Depending on the day, there > would be a 3-5 second lag when my system began digging into swap. I figured > it was just the swap factor and/or my system. > > However, I noticed on the MHocko kernels (see comment history) that it > happened far less often. I tried the mentioned kernel (5.3.18-lp152.2.gb85b477-preempt) without any further configuration (I just installed the kernel and reboot to it), and the system still freezes.
(In reply to Abdulrhman Ied from comment #80) > (In reply to Robert Delahunt from comment #79) > > Can confirm and agree with Abdulrhman Ied. Depending on the day, there > > would be a 3-5 second lag when my system began digging into swap. I figured > > it was just the swap factor and/or my system. > > > > However, I noticed on the MHocko kernels (see comment history) that it > > happened far less often. > > I tried the mentioned kernel (5.3.18-lp152.2.gb85b477-preempt) without any > further configuration (I just installed the kernel and reboot to it), and > the system still freezes. Please provide system specifications.
(In reply to Robert Delahunt from comment #81) > Please provide system specifications. OS: openSUSE Leap 15.2 x86_64 Host: HP 15 Notebook PC 099011000000000000 Kernel: 5.3.18-lp152.44-default CPU: Intel Celeron N2840 (2) @ 2.582GHz GPU: Intel Atom Processor Z36xxx/Z37xxx Se Memory: 1426MiB / 1870MiB I have installed the latest kernel from http://download.opensuse.org/repositories/Kernel:/stable/standard and things are much improved now.
*** Bug 1177541 has been marked as a duplicate of this bug. ***
Sorry for a late reply. I was busy with other issues (In reply to Robert Delahunt from comment #75) > In about five minutes: > > This is with vmstats with your kernel with cgroups enabled: > > http://www.puresimplicity.net/~delahunt/vmstat/mhocko4/ diff between the first and last snapshot 1602004484 1602004607[diff] pswpin 0 0 pswpout 0 48703 pgscan_kswapd 0 7177411 pgscan_direct 0 0 pgsteal_kswapd 0 7115294 pgsteal_direct 0 0 > This is with vmstats with your kernel with cgroups disabled: > > http://www.puresimplicity.net/~delahunt/vmstat/mhocko5/ 1602004764 1602004958[diff] pswpin 0 0 pswpout 0 75781 pgscan_kswapd 0 13728122 pgscan_direct 0 0 pgsteal_kswapd 0 13664819 pgsteal_direct 0 0 Both do swap out. The later covers a longer time period - 194s vs 123s and scans twice as many pages which results in twice as many pages reclaimed and 55% more swapout. From that we can conclude (from a high level) that the swapout reflects the overall reclaim and cgroups enabled/disabled doesn't play any major role here. Which is a good confirmation because it would be really curious to see a difference in the behavior just from having cgroups enabled without being used. So let's focus on the cgroups enabled case for now. Let's have a look at 1602004840 1602004841[diff] pswpin 0 0 pswpout 3103 6801 pgsteal_kswapd 206006 148891 pgscan_kswapd 251736 148891 nr_active_anon 339953 -1840 nr_active_file 147150 7 nr_inactive_anon 38139 9382 nr_inactive_file 7263922 -2703 workingset_activate 170 0 workingset_refault 170 57 workingset_restore 0 0 From this we can conclude that - some active anonymous pages have been rotated to the inactive list which grown much larger though - even when we consider the swapout. So there must be some process allocating a nontrivial amount of anonymous memory and there is more going on than just the IO test case - There is a ton of inactive page cache to reclaim from - refaults are quite marginal So this is in line with previous observations. I am inclined to drop the two patches mentioned earlier (comment 60) as they are known to contribute considerably. Unless Vlastimil or Mel speak up. At this moment I am not sure how much more time I can spend on this so I would recommend to use a more recent kernel. Btw. considering stalls. The data I have seen so far doesn't indicate any reclaim induced source of a potential stall. There is no swap in neither no direct reclaim. So existing reclaim decisions. Maybe in your regular workload there is a considerable swapin (pswpin) going on.
> At this moment I am not sure how much more time I can spend on this so I > would recommend to use a more recent kernel. Yeah, I am already using a more recent kernel and it's much better. Thanks for all your time, I hope that this is a temporary situation, and fixes will be backported soon, so we can go back to a higher standard stability.
Hello, After I had upgraded to openSUSE Leap 15.3, the issue got resolved. lsb_release -d Description: openSUSE Leap 15.3 uname -r 5.3.18-59.5-default Thank you very much for all of your efforts.