Bug 1219284

Summary: increased ram usage after upgrading from 6.6.x to 6.7.x
Product: [openSUSE] openSUSE Tumbleweed Reporter: Andrea Manzini <andrea.manzini>
Component: KernelAssignee: openSUSE Kernel Bugs <kernel-bugs>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: daniel.wagner, dimstar, martin.wilck, mhocko, tiwai, vbabka
Version: Current   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: measures with kernel 6.6.7
measures with kernel 6.7.1
screenshot of free command 1
screenshot of free command 2
screenshot of free command 2
/proc/slabinfo content on kernel 6.6
/proc/slabinfo content on kernel 6.7
free command output with kernel 6.7.6

Description Andrea Manzini 2024-01-29 11:27:00 UTC
Created attachment 872257 [details]
measures with kernel 6.6.7

upgrading just the kernel-default package, the system reports ~450MB more ram used with same settings, same desktop environment and configuration.
Comment 1 Andrea Manzini 2024-01-29 11:29:15 UTC
Created attachment 872258 [details]
measures with kernel 6.7.1
Comment 2 Andrea Manzini 2024-01-29 12:50:22 UTC
Created attachment 872262 [details]
screenshot of free command 1
Comment 3 Andrea Manzini 2024-01-29 12:50:45 UTC
Created attachment 872263 [details]
screenshot of free command 2
Comment 4 Andrea Manzini 2024-01-29 12:51:47 UTC
Created attachment 872264 [details]
screenshot of free command 2
Comment 5 Martin Wilck 2024-01-30 08:37:35 UTC
Any significant differences in /proc/slabinfo?
Comment 6 Andrea Manzini 2024-01-30 08:53:10 UTC
Created attachment 872295 [details]
/proc/slabinfo content on kernel 6.6
Comment 7 Andrea Manzini 2024-01-30 08:53:38 UTC
Created attachment 872296 [details]
/proc/slabinfo content on kernel 6.7
Comment 8 Andrea Manzini 2024-01-30 09:03:04 UTC
from a first glance I can't tell any significant difference, apart from user_namespace which is like doubled for 6.7 ; but some values like inode_cache or dentry are higher in 6.6 than 6.7 .

I attached the content of both measurement if you want to take a look.

$ grep user_namespace slabinfo_6.*

slabinfo_6.6.txt:user_namespace        13     13    624   13    2 : tunables    0    0    0 : slabdata      1      1      0

slabinfo_6.7.txt:user_namespace        25     25    632   25    4 : tunables    0    0    0 : slabdata      1      1      0
Comment 9 Daniel Wagner 2024-01-30 09:34:34 UTC
It seems, the Arch folks have done some debugging:

  https://bbs.archlinux.org/viewtopic.php?id=292086&p=3

and from a quick look this here might help:

  transparent_hugepage=madvise
Comment 10 Takashi Iwai 2024-01-30 10:04:49 UTC
If that's the case, let's invite MM people to the tea party.
Comment 11 Michal Hocko 2024-01-30 10:24:06 UTC
6.6
AnonPages:        492536 kB
AnonHugePages:    264192 kB

6.7
AnonPages:        770116 kB
AnonHugePages:    399360 kB

There is much more anonymous memory used in 6.7 kernel (~270MB). Daniel has already mentioned that this might be THP related but only part of it can be contributed to that (135MB). So some userspace had to consume more memory with the newer kernel which is rather unexpected. Are you sure you are comparing the exactly same userspace here?
Comment 12 Andrea Manzini 2024-01-30 10:28:20 UTC
all the measurements were taken from the same fresh installed tumbleweed virtual machine with XFCE desktop environment, so same userspace. Just did these action to reproduce:

- boot choosing the 6.6 kernel
- start the DE 
- open a terminal to run commands 
- reboot
- choose 6.7 kernel
- start the DE 
- open a terminal to run commands
Comment 13 Daniel Wagner 2024-01-30 13:46:35 UTC
apparently the fix is:

c4608d1bf7c653 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE")
Comment 14 Daniel Wagner 2024-01-30 13:51:43 UTC
Ah Thorsten Leemhuise corrected himself, it seems to be

  4ef9ad19e17676 ("mm: huge_memory: don't force huge page alignment on 32 bit")

but according some testing by the Arch folks it is not a complete fix.
Comment 15 Michal Hocko 2024-01-30 14:02:18 UTC
(In reply to Daniel Wagner from comment #14)
> Ah Thorsten Leemhuise corrected himself, it seems to be
> 
>   4ef9ad19e17676 ("mm: huge_memory: don't force huge page alignment on 32
> bit")
> 
> but according some testing by the Arch folks it is not a complete fix.

I can see how this can over consume on thread's stacks because those are usually 2MB in size by default (by glibs) and the alignment would make it more likely THP target. I do not see THP would consume all the additional memory, though. Runtime overhead is likely a more likely visible effect.

That being said. No objection to taking the patch from my end.
Comment 16 Michal Hocko 2024-01-30 14:05:40 UTC
(In reply to Daniel Wagner from comment #14)
> Ah Thorsten Leemhuise corrected himself, it seems to be
> 
>   4ef9ad19e17676 ("mm: huge_memory: don't force huge page alignment on 32
> bit")
> 
> but according some testing by the Arch folks it is not a complete fix.

Nope, this only affects 32b apps on 64b systems. And it mostly breaks ASLR (see bug 1218800. This should be already in stable kernel branch
Comment 17 Andrea Manzini 2024-02-28 08:17:25 UTC
an update, seems the problem has been found and fixed:

https://fosstodon.org/@kernellogger/111957388343037784
Comment 18 Michal Hocko 2024-02-28 11:57:56 UTC
(In reply to Andrea Manzini from comment #17)
> an update, seems the problem has been found and fixed:
> 
> https://fosstodon.org/@kernellogger/111957388343037784

c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE") should be part of our TW for about the week.

Is the problem still reproducible? I am mostly asking because I have some reservations this is the whole story as per comment 11
Comment 19 Andrea Manzini 2024-02-28 15:10:07 UTC
Created attachment 873086 [details]
free command output with kernel 6.7.6
Comment 20 Andrea Manzini 2024-02-28 15:12:23 UTC
did a quick test updating the same system to latest kernel 6.7.6 (screenshot attached); I can confirm the reported memory usage is lower.
Comment 21 Daniel Wagner 2024-04-03 14:30:09 UTC
Are we good to close this one?
Comment 22 Andrea Manzini 2024-04-11 15:07:41 UTC
closing this as fixed