Bug 1178359 - Unexplainable high load average
Unexplainable high load average
Status: RESOLVED NORESPONSE
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel
Leap 15.2
Other openSUSE Leap 15.2
: P5 - None : Normal (vote)
: ---
Assigned To: openSUSE Kernel Bugs
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-11-02 13:49 UTC by itteam itteam
Modified: 2022-01-21 12:12 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
tiwai: needinfo? (itteam)


Attachments
fig1 (62.85 KB, image/png)
2020-11-02 13:49 UTC, itteam itteam
Details

Note You need to log in before you can comment on or make changes to this bug.
Description itteam itteam 2020-11-02 13:49:45 UTC
Created attachment 843219 [details]
fig1

Overview:
We've witnessed an unusually high system load average on several recent Leap 15.2 virtual machine builds which reside under a Nutanix AHV hypervisor.

Steps to reproduce:
It's difficult to reproduce as it seems to start after several days of uptime, if at all. But I have seen this happen on 4 systems so far.

Actual results:
On an affected VM, after some time (days or weeks), system load average starts jumping up in steps. For one system, the load average jumped from ~0.01 (normal), to ~0.5, then to ~1.0 and so on until it is now ~5.5, over the period of several days. The attached image (fig 1) shows this behaviour over time, it is a chart of the 5-minute average (note: unit on the chart is % where 100% = nproc) A reboot resets the problem although I am yet to see if it returns.

Some output from another VM, this one is essentially as close to a bare 15.2 installation as we have:

# cat /proc/loadavg
2.04 2.01 2.00 1/228 4319

As you can see from the below output, there are no processes listed as waiting or in uninterruptible sleep, and there is nothing waiting for IO, nor is there any swap activity.

# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0  19200 248944   1060 1337940    0    0     3    27   11   20  1  0 99  0  0
 0  0  19200 248912   1060 1337940    0    0     0     0  349  260  1  0 100  0  0
 0  0  19200 248912   1060 1337940    0    0     0     0  318  249  0  0 100  0  0
 0  0  19200 248912   1060 1337940    0    0     0     0  338  254  0  0 100  0  0
 0  0  19200 248976   1060 1337940    0    0     0     0  287  246  0  1 100  0  0
 0  0  19200 248944   1060 1337940    0    0     0     0  262  214  1  0 100  0  0
 0  0  19200 248912   1060 1337940    0    0     0     0  320  248  0  1 100  0  0
 0  0  19200 248944   1060 1337940    0    0     0     0  315  243  1  0 100  0  0
 0  0  19200 248912   1060 1337940    0    0     0     0  284  236  0  0 100  0  0
 0  0  19200 248912   1060 1337940    0    0     0     0  362  276  0  0 100  0  0
 0  0  19200 248944   1060 1337940    0    0     0     0  377  270  0  0 100  0  0
 0  0  19200 248912   1060 1337940    0    0     0     0  335  233  0  0 100  0  0
 0  0  19200 248912   1060 1337940    0    0     0     0  316  271  0  0 99  0  0
 0  0  19200 248944   1060 1337940    0    0     0     0  321  258  0  0 100  0  0
 0  0  19200 248912   1060 1337940    0    0     0     0  297  228  0  0 99  0  0
 0  0  19200 248944   1060 1337940    0    0     0     0  319  241  0  1 100  0  0
 0  0  19200 248912   1060 1337940    0    0     0     0  309  250  1  0 100  0  0
 0  0  19200 248912   1060 1337940    0    0     0     0  328  249  0  0 100  0  0
 0  0  19200 248944   1060 1337940    0    0     0     0  339  267  0  0 99  0  0
 0  0  19200 248912   1060 1337940    0    0     0     0  360  245  0  0 100  0  0
 0  0  19200 248944   1060 1337940    0    0     0     0  288  238  0  0 100  0  0
 0  0  19200 248944   1060 1337940    0    0     0     0  338  246  0  0 100  0  0
 0  0  19200 248944   1060 1337940    0    0     0     0  348  263  1  0 100  0  0
 0  0  19200 248660   1060 1338132    0    0     0  1288  326  288  0  0 100  0  0
 0  0  19200 248660   1060 1338132    0    0     0     0  341  258  0  1 100  0  0
 0  0  19200 248692   1060 1338132    0    0     0     0  369  267  0  0 100  0  0
 0  0  19200 248692   1060 1338132    0    0     0     0  330  265  0  0 100  0  0
 0  0  19200 248660   1060 1338132    0    0     0     0  337  263  0  0 99  0  0
 0  0  19200 248692   1060 1338132    0    0     0     0  331  258  0  0 100  0  0
 0  0  19200 248660   1060 1338132    0    0     0     0  309  223  0  0 100  0  0
 0  0  19200 248912   1060 1338132    0    0     0     0  306  262  0  0 100  0  0

Expected results:
System load average stays within expected levels which allows for effective monitoring.

Build:
Linux 5.3.18-lp152.26-default #1 SMP Mon Jun 29 14:58:38 UTC 2020 (2a0430f) x86_64 x86_64 x86_64 GNU/Linux
Nutanix AHV virtual machine, Intel(R) Xeon(R) Gold 6150 CPU, variable memory/core count for VMs.

Additional Builds and platforms:
We have not witnessed this on Leap 15.1
Comment 1 Takashi Iwai 2020-11-05 09:47:19 UTC
We've seen a similar bogus loadavg problem on the recent Tumbleweed and Leap 15.2 kernels, but it was about certain ARM64 bare metal.

Now this is on x86-64 VM, and with a much older kernel (released in July -- corresponding to SLE15-SP2 commit 72557bb644c5).  So I'm not quite sure whether it's the same problem.

Adding Mel and Rudi to Cc who have been already involved with the another bug.
Comment 2 Takashi Iwai 2020-12-01 16:10:40 UTC
Could you test with the latest openSUSE-15.2 KOTD?
  http://download.opensuse.org/repositories/Kernel:/openSUSE-15.2/standard/

This contains the recent fix for the loadavg bug.  Although this has hit on Arm and such platforms, it might be the case for a specific hypervisor, too.
Comment 3 Miroslav Beneš 2022-01-21 12:12:02 UTC
No response, closing.