Bug 1213150 - After update from 15.4 to 15.5, lots of vmalloc error reports in the system journal
Summary: After update from 15.4 to 15.5, lots of vmalloc error reports in the system j...
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Leap 15.5
Hardware: x86-64 openSUSE Leap 15.5
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: openSUSE Kernel Bugs
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-07 20:43 UTC by James Moe
Modified: 2023-07-17 10:00 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description James Moe 2023-07-07 20:43:23 UTC
I use fail2ban to identify and block villainous IPs in iptables. In LEAP 15.4 there was nary a memory problem; the OS has lots of free memory. In LEAP 15.5 the same configuration generates numerous memory allocation errors in the system journal (See below for a sample).

The error occurs for both of fail2ban's "ban" and "unban" operations. Which is quite odd for "unban." Why would releasing an IP need more memory? Why would ban'ing an IP need 2MB?

I have tried to duplicate the error by manually un-/ban'ing an IP; the operations performed without error.

----[ lsmem ]----
$  lsmem --output SIZE,STATE,REMOVABLE,BLOCK,NODE,ZONES
 SIZE  STATE REMOVABLE BLOCK NODE  ZONES
 128M online       yes     0    0   None
   3G online       yes  1-24    0  DMA32
 4.9G online       yes 32-70    0 Normal

Memory block size:       128M
Total online memory:       8G
Total offline memory:      0B


----[ typical memory allocation error ]----
2023-07-07T12:31:36-0700 sma-server3 kernel: iptables: vmalloc error: size 0, page order 9, failed to allocate pages, mode:0x400cc0(GFP_KERNEL_ACCOUNT), nodemask=(null),cpuset=/,mems_allowed=0
2023-07-07T12:31:36-0700 sma-server3 kernel: CPU: 3 PID: 25291 Comm: iptables Tainted: G                  N 5.14.21-150500.53-default #1 SLE15-SP5 3b90198179ad2dbddc570cfe6efd7895c9be3e4a
2023-07-07T12:31:36-0700 sma-server3 kernel: Hardware name: System manufacturer System Product Name/M3A78-EM, BIOS 1602    03/27/2009
2023-07-07T12:31:36-0700 sma-server3 kernel: Call Trace:
2023-07-07T12:31:36-0700 sma-server3 kernel:  <TASK>
2023-07-07T12:31:36-0700 sma-server3 kernel:  dump_stack_lvl+0x45/0x5b
2023-07-07T12:31:36-0700 sma-server3 kernel:  warn_alloc+0x116/0x180
2023-07-07T12:31:36-0700 sma-server3 kernel:  __vmalloc_node_range+0x390/0x4a0
2023-07-07T12:31:36-0700 sma-server3 kernel:  __vmalloc_node+0x57/0x70
2023-07-07T12:31:36-0700 sma-server3 kernel:  ? xt_alloc_table_info+0x26/0x70 [x_tables e20979056ab8b8537ed985ce4d87d9ec0f6393cb]
2023-07-07T12:31:36-0700 sma-server3 kernel:  xt_alloc_table_info+0x26/0x70 [x_tables e20979056ab8b8537ed985ce4d87d9ec0f6393cb]
2023-07-07T12:31:36-0700 sma-server3 kernel:  do_ipt_set_ctl+0x191/0x3bf [ip_tables fc299e32f3942b3711f7eeaf1a8aa3a911438972]
2023-07-07T12:31:36-0700 sma-server3 kernel:  nf_setsockopt+0x57/0x80
2023-07-07T12:31:36-0700 sma-server3 kernel:  ip_setsockopt+0x2cf/0x12f0
2023-07-07T12:31:36-0700 sma-server3 kernel:  __sys_setsockopt+0xf3/0x1e0
2023-07-07T12:31:36-0700 sma-server3 kernel:  __x64_sys_setsockopt+0x20/0x30
2023-07-07T12:31:36-0700 sma-server3 kernel:  do_syscall_64+0x5b/0x80
2023-07-07T12:31:36-0700 sma-server3 kernel:  ? do_user_addr_fault+0x1ff/0x730
2023-07-07T12:31:36-0700 sma-server3 kernel:  ? exc_page_fault+0x67/0x150
2023-07-07T12:31:36-0700 sma-server3 kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb
2023-07-07T12:31:36-0700 sma-server3 kernel: RIP: 0033:0x7f85c6c9607a
2023-07-07T12:31:36-0700 sma-server3 kernel: Code: ff ff ff c3 48 8b 15 15 fe 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b1 0f 1f 80 00 00 00 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e6 fd 0c 00 f7 d8 64 89 01 48
2023-07-07T12:31:36-0700 sma-server3 kernel: RSP: 002b:00007ffd7c7d1b28 EFLAGS: 00000206 ORIG_RAX: 0000000000000036
2023-07-07T12:31:36-0700 sma-server3 kernel: RAX: ffffffffffffffda RBX: 0000559b38c81e80 RCX: 00007f85c6c9607a
2023-07-07T12:31:36-0700 sma-server3 kernel: RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000005
2023-07-07T12:31:36-0700 sma-server3 kernel: RBP: 0000559b38c81e88 R08: 0000000000210248 R09: 00007f85c6761080
2023-07-07T12:31:36-0700 sma-server3 kernel: R10: 00007f85c6551010 R11: 0000000000000206 R12: 0000559b38c81e88
2023-07-07T12:31:36-0700 sma-server3 kernel: R13: 00000000002101e8 R14: 00007f85c6551070 R15: 00007f85c6551010
2023-07-07T12:31:36-0700 sma-server3 kernel:  </TASK>
2023-07-07T12:31:36-0700 sma-server3 kernel: Mem-Info:
2023-07-07T12:31:36-0700 sma-server3 kernel: active_anon:549150 inactive_anon:290814 isolated_anon:0
                                              active_file:717441 inactive_file:158430 isolated_file:0
                                              unevictable:20 dirty:72 writeback:0
                                              slab_reclaimable:107359 slab_unreclaimable:44906
                                              mapped:37366 shmem:260 pagetables:18335 bounce:0
                                              free:95414 free_pcp:0 free_cma:0
2023-07-07T12:31:36-0700 sma-server3 kernel: Node 0 active_anon:2196600kB inactive_anon:1163256kB active_file:2869764kB inactive_file:633720kB unevictable:80kB isolated(anon):0kB isolated(file):0kB mapped:149464kB dirty:288kB writeback:0kB shmem:1040kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2224128kB writeback_tmp:0kB kernel_stack:18176kB pagetables:73340kB all_unreclaimable? no
2023-07-07T12:31:36-0700 sma-server3 kernel: Node 0 DMA free:14304kB boost:0kB min:128kB low:160kB high:192kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
2023-07-07T12:31:36-0700 sma-server3 kernel: lowmem_reserve[]: 0 2969 7819 7819 7819
2023-07-07T12:31:36-0700 sma-server3 kernel: Node 0 DMA32 free:311028kB boost:30720kB min:56332kB low:62732kB high:69132kB reserved_highatomic:0KB active_anon:958788kB inactive_anon:564104kB active_file:732496kB inactive_file:159544kB unevictable:0kB writepending:248kB present:3259904kB managed:3093632kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
2023-07-07T12:31:36-0700 sma-server3 kernel: lowmem_reserve[]: 0 0 4850 4850 4850
2023-07-07T12:31:36-0700 sma-server3 kernel: Node 0 Normal free:56324kB boost:0kB min:41840kB low:52300kB high:62760kB reserved_highatomic:2048KB active_anon:1237812kB inactive_anon:599156kB active_file:2137236kB inactive_file:473832kB unevictable:80kB writepending:40kB present:5111808kB managed:4967188kB mlocked:80kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
2023-07-07T12:31:36-0700 sma-server3 kernel: lowmem_reserve[]: 0 0 0 0 0
2023-07-07T12:31:36-0700 sma-server3 kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 1*1024kB (U) 2*2048kB (UM) 2*4096kB (M) = 14304kB
2023-07-07T12:31:36-0700 sma-server3 kernel: Node 0 DMA32: 3322*4kB (UME) 3359*8kB (UME) 1805*16kB (UME) 2459*32kB (UME) 1355*64kB (UME) 468*128kB (UME) 67*256kB (UME) 1*512kB (U) 0*1024kB 0*2048kB 0*4096kB = 312016kB
2023-07-07T12:31:36-0700 sma-server3 kernel: Node 0 Normal: 1047*4kB (UMEH) 383*8kB (UMEH) 216*16kB (UMEH) 276*32kB (UMEH) 120*64kB (UME) 36*128kB (UE) 19*256kB (UE) 20*512kB (UE) 10*1024kB (UE) 0*2048kB 0*4096kB = 57172kB
2023-07-07T12:31:36-0700 sma-server3 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
2023-07-07T12:31:36-0700 sma-server3 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
2023-07-07T12:31:36-0700 sma-server3 kernel: 881462 total pagecache pages
2023-07-07T12:31:36-0700 sma-server3 kernel: 5289 pages in swap cache
2023-07-07T12:31:36-0700 sma-server3 kernel: Swap cache stats: add 370712, delete 365424, find 1622097/1631716
2023-07-07T12:31:36-0700 sma-server3 kernel: Free swap  = 7120892kB
2023-07-07T12:31:36-0700 sma-server3 kernel: Total swap = 8384508kB
2023-07-07T12:31:36-0700 sma-server3 kernel: 2096925 pages RAM
2023-07-07T12:31:36-0700 sma-server3 kernel: 0 pages HighMem/MovableOnly
2023-07-07T12:31:36-0700 sma-server3 kernel: 77880 pages reserved
2023-07-07T12:31:36-0700 sma-server3 kernel: 0 pages cma reserved
2023-07-07T12:31:36-0700 sma-server3 kernel: 0 pages hwpoisoned
Comment 1 Takashi Iwai 2023-07-10 07:32:32 UTC
Just to be sure: could you check with the recent upstream kernel, too?  The package is found in OBS Kernel:stable:Backport repo.
Comment 2 Takashi Iwai 2023-07-10 07:37:02 UTC
Also, 6.3.x kernel is found in OBS home:tiwai:kernel:6.3 (the repor "backport"), too.

I'm asking it because 6.3 seemed to have some issues indicating vmalloc problems, and I wonder whether it has any relevance.
Comment 3 James Moe 2023-07-10 19:36:18 UTC
> Just to be sure: could you check with the recent upstream kernel, too?

1. What version is in upstream?
2. The URL for OBS?
3. How do I install a different kernel?
4. Then remove it?
Comment 4 Takashi Iwai 2023-07-11 06:27:06 UTC
(In reply to James Moe from comment #3)
> > Just to be sure: could you check with the recent upstream kernel, too?
> 
> 1. What version is in upstream?

6.4.x in OBS Kernel:stable:Backport and 6.3 in OBS home:tiwai:kernel:6.3.

> 2. The URL for OBS?

In general, the repository of OBS project is found under the corresponding subdirectory in http://download.opensuse.org/repositories/

For OBS Kernel:stable:Backport, it's:
  http://download.opensuse.org/repositories/Kernel:/stable:/Backport/standard/

For OBS home:tiwai:kernel:6.3 (backport repo):
  http://download.opensuse.org/repositories/home:/tiwai:/kernel:/6.3/backport/

> 3. How do I install a different kernel?

Just download kernel-default.rpm, and install it via "zypper install".
You might need to pass --oldpackage option.
The package is an unofficial build, and zypper may ask the unknown cert, but you can simply proceed.  Also, you'd need to turn off Secure Boot for such a kernel.

The kernel package is treated as "multi-versions", so you can keep multiple kernel packages on the same system.

It'd be better to increase the number of installable kernels beforehand, too.
Edit /etc/zypp/zypp.conf, and give more entries to multiversions.kernel, e.g.
  multiversion.kernels = latest,latest-1,latest-2,latest-3,running

> 4. Then remove it?

Simply remove the package via "zypper remove".
Comment 5 James Moe 2023-07-12 05:48:21 UTC
LEAP 15.5 is currently using linux "5.14.21-150500.53-default x86_64." why would loading v6.4 be enlightening?

I am reluctant to mess with a production server.
Comment 6 Takashi Iwai 2023-07-12 06:23:29 UTC
For verifying whether this is a recent upstream behavior change or not.

But if you'd like to stick more with Leap 15.5, the first thing would be to check the latest kernel in OBS Kernel:SLE15-SP5 repo.  That contains already tons of fixes for the maintenance update.
Comment 7 James Moe 2023-07-16 04:25:27 UTC
The blitz of vmalloc errors has stopped as mysteriously as it began. The last error message was about 2PM 13-Jul-2023. Fail2ban continues as before. There was no change to the system during that time.

I suppose we can mark this as Closed?
Comment 8 Takashi Iwai 2023-07-17 10:00:58 UTC
Then let's close :)  Feel free to reopen if encountering the same problem again.