Bug 1206937 - Enable CONFIG_LRU_GEN_ENABLED by default on x86_64
Summary: Enable CONFIG_LRU_GEN_ENABLED by default on x86_64
Status: NEW
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: x86-64 openSUSE Tumbleweed
: P5 - None : Enhancement with 1 vote (vote)
Target Milestone: ---
Assignee: Michal Hocko
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-06 19:00 UTC by Forgotten User 0079121656
Modified: 2023-09-22 10:46 UTC (History)
8 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
dmueller: needinfo? (mgorman)


Attachments
A systemd service file to enable Multi-Gen LRU. (433 bytes, text/x-dbus-service)
2023-03-30 00:38 UTC, Archer Allstars
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Forgotten User 0079121656 2023-01-06 19:00:06 UTC
Enabling MGLRU could give some performance benefits. Right now it is compiled into the kernel but not enabled by default
Comment 2 Michal Hocko 2023-01-09 15:54:08 UTC
(In reply to John Doe from comment #0)
> Enabling MGLRU could give some performance benefits. Right now it is
> compiled into the kernel but not enabled by default

Do you have any specific usecase in mind or is this more of a "I want to play with this new reclaim implementation"?
Comment 5 Forgotten User 0079121656 2023-01-09 17:54:35 UTC
(In reply to Michal Hocko from comment #2)
> (In reply to John Doe from comment #0)
> > Enabling MGLRU could give some performance benefits. Right now it is
> > compiled into the kernel but not enabled by default
> 
> Do you have any specific usecase in mind or is this more of a "I want to
> play with this new reclaim implementation"?

No specific use case but others might find this useful due to the performance benefits. Arch enabled it already and I presume Fedora will have it enabled when 6.1 reaches Fedora 37
Comment 6 Michal Hocko 2023-01-09 19:11:28 UTC
(In reply to John Doe from comment #5)
> (In reply to Michal Hocko from comment #2)
> > (In reply to John Doe from comment #0)
> > > Enabling MGLRU could give some performance benefits. Right now it is
> > > compiled into the kernel but not enabled by default
> > 
> > Do you have any specific usecase in mind or is this more of a "I want to
> > play with this new reclaim implementation"?
> 
> No specific use case but others might find this useful due to the
> performance benefits. Arch enabled it already and I presume Fedora will have
> it enabled when 6.1 reaches Fedora 37

The MGLRU can be enabled by
echo y >/sys/kernel/mm/lru_gen/enabled
more on that in Documentation/admin-guide/mm/multigen_lru.rst in the kernel source tree. I would recommend enabling that and reporting back any noticeable improvements. I would much rather enable MGLRU based on actual numbers.
Comment 7 Mel Gorman 2023-01-10 09:43:21 UTC
(In reply to John Doe from comment #5)
> (In reply to Michal Hocko from comment #2)
> > (In reply to John Doe from comment #0)
> > > Enabling MGLRU could give some performance benefits. Right now it is
> > > compiled into the kernel but not enabled by default
> > 
> > Do you have any specific usecase in mind or is this more of a "I want to
> > play with this new reclaim implementation"?
> 
> No specific use case but others might find this useful due to the
> performance benefits. Arch enabled it already and I presume Fedora will have
> it enabled when 6.1 reaches Fedora 37

As far as I'm aware, the main benefits have been demonstrated on relatively small machines with desktop-class workloads -- primarily workloads interesting to a chromebook where interactive tasks perform better when the total working set for foreground and background tasks exceeds total memory. That's a less common scenario then what is seen on server class workloads. For example, one of our biggest customers by revenue has applications that carefully size their workload to avoid swapping and reclaim decisions as much as possible and MGLRU would have limited to no benefit. Hence the caution about enabling it by default because it could take years to rattle out the issues. However, the option to enable it and experiment it is available so if a few requests were made based on "Workload X benefits from MGLRU and we should not have to tune it every time" then there would be greater confidence in enabling it by default.
Comment 8 Archer Allstars 2023-03-30 00:38:15 UTC
Created attachment 865974 [details]
A systemd service file to enable Multi-Gen LRU.

I also vote for this feature, as the OOM issue has been with Linux for so long. I moved from Windows 11 to full Linux, openSUSE Tumbleweed, on my laptop (with limited RAM). OOM management on Windows is far better. On Linux, without zRAM and hopefully better OOM management, it's almost guaranteed that the system will freeze when my RAM runs out.

In the meantime, you can enable this nice feature easily on boot time with this attached service file. Put the file in /etc/systemd/system. Then, open YaST Services Manager and make mglru service runs on boot.
Comment 9 Michal Hocko 2023-03-30 06:50:42 UTC
(In reply to Archer Allstars from comment #8)
> Created attachment 865974 [details]
> A systemd service file to enable Multi-Gen LRU.
> 
> I also vote for this feature, as the OOM issue has been with Linux for so
> long. I moved from Windows 11 to full Linux, openSUSE Tumbleweed, on my
> laptop (with limited RAM). OOM management on Windows is far better. On
> Linux, without zRAM and hopefully better OOM management, it's almost
> guaranteed that the system will freeze when my RAM runs out.

I would be really interesting in comparison of the same workload both with the default reclaim and MGLRU where you see system freezes on OOM. Ideally collect /proc/vmstat data[1]. My assumption is that MGLRU would trigger the oom killer sooner while the traditional reclaim would keep refaulting and trashing over page cache.
 
> In the meantime, you can enable this nice feature easily on boot time with
> this attached service file. Put the file in /etc/systemd/system. Then, open
> YaST Services Manager and make mglru service runs on boot.

Thanks, that is certainly an option. I would recommend talking to systemd people to integrate something like this into systemd (with a high level configurion knob to easily enable or disable the feature).
Comment 10 Michal Hocko 2023-03-30 06:57:29 UTC
(In reply to Michal Hocko from comment #9)
[...]
> I would be really interesting in comparison of the same workload both with
> the default reclaim and MGLRU where you see system freezes on OOM. Ideally
> collect /proc/vmstat data[1].

Forgot about [1]
You can use https://build.opensuse.org/package/show/home:mhocko/mmdebug-tools
and collect_logs specifically to do that safely also under a heavy memory pressure. Run the tool as root or configure a sufficient mlock rlimit.
collect_logs -i /proc/vmstat -o $OUTPUT_FILE -T $TIME_PERIOD
TIME_PERIOD depends on the runtime of your experiment. The longer it will take the more data will be collected (and bigger the mlock rlimit would need to be).
Comment 11 Dirk Mueller 2023-05-09 06:49:13 UTC
Is there any update on this? as far as I can see Arch and Fedora have enabled it by default. did we finish benchmarking it? It appears stable enough for other distros..
Comment 12 Dirk Mueller 2023-05-15 08:03:24 UTC
Mel, any tests that would indicate either way? the upstream posted numbers looked compelling to me. 

I've tested it locally in my personal use, and I couldn't spot a difference. the only thing that I've noticed is that the total cpu time of kswap0 is a bit lower than before ( ~ 10% lower) over periods of use, but that could be fluctuation..
Comment 13 Mel Gorman 2023-05-25 14:18:20 UTC
(In reply to Dirk Mueller from comment #12)
> Mel, any tests that would indicate either way? the upstream posted numbers
> looked compelling to me. 
> 
> I've tested it locally in my personal use, and I couldn't spot a difference.
> the only thing that I've noticed is that the total cpu time of kswap0 is a
> bit lower than before ( ~ 10% lower) over periods of use, but that could be
> fluctuation..

There is no list of tests that I can provide that would give the definite answer one way or the other. Even if I had such a list for a single machine, it would not follow that the results would be the same for all machines. No matter what, this will not have universal benefit and instead be a mix of wins and losses. An example about why this true is that part of lrugen involves walking address spaces and page tables. On really small machines, that may work great at managing residency for small memory sizes but is the same true for multi-terabyte machines and all workloads? We don't know and can't know because for every example of where mglru helps, there will be a counter-example.

If the option can be enabled at runtime then there is freedom to evaluate it on a case-by-case basis. There is no available data on what the Arch experience has been or what Fedoras experience may be in the future. The original bug description states "could give some performance benefits" which is ambiguous as it implies it could perform better or spontaneously combust. The best supporting data on this bug was comment 8 stating that OOM handling may be smoother but is also did not state explicitly that enabling mglru solved the problem, was simply expected to help the problem or if changing min_ttl_ms was a crucial change. OOM handling is also a corner case when the system is under stress and something is about to be killed and not a statement on general performance in normal scenarios (e.g. launching a new large application when memory is low or application behavior when there is a large background cp in progress). That said, upstream commits also included testimonials stating that OOM handling was better.

Upstream the picture is different. The bulk of the data is based on Android or chromebooks but not necessarily applicable to a standard Linux distribution. The server workloads were compelling with the caveat that the deployment environment was one that is very careful about placing workloads with precise estimates on what the resource requirements are before starting. There were other positive results that are interesting. For example, fio improvements for random access distributions is an interesting result as "random" implies that clever page selection makes no different but does indicate that for fio, the CPU cycles spend on reclaim decisions was lower with mglru which is interesting.

Unfortunately, none of it changes the expectation that mglru will "win some, lose some".
Comment 14 Michal Hocko 2023-05-26 08:12:03 UTC
I fully agree with Mel (comment 13). I haven't heard of any big stories about MGLRU outside of specific desktop workloads (even Android was waiting for mglru to land upstream before they were allowed to start deploying it at scale so more data to come I guess). As Mel said the experience with server workloads is still lacking behind.

I can see only one reason to make the mglru enabled by default at this stage and that is to increase the testing coverage. This would certainly help to get more data. If that is really the case (I would find it brave and supportive to the future mglru development) then a userspace service/config package would be a better option than changing the kernel config imho as it would allow to fallback to the default reclaim method easier.