Bug 1214307 - System hangs on stop job for irqbalance after Upgrade from 15.4 to 15.5
Summary: System hangs on stop job for irqbalance after Upgrade from 15.4 to 15.5
Status: RESOLVED INVALID
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Upgrade Problems (show other bugs)
Version: Leap 15.5
Hardware: x86-64 openSUSE Leap 15.5
: P5 - None : Minor (vote)
Target Milestone: ---
Assignee: Thomas Renninger
QA Contact: Jiri Srain
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-16 06:56 UTC by Rainer Kaluscha
Modified: 2023-12-21 07:36 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
aschnell: needinfo? (rainer.kaluscha)


Attachments
strace-log for irqbalance 1.9.2 after systemctl stop - takes 30 sec (12.54 KB, text/plain)
2023-12-14 10:48 UTC, Rainer Kaluscha
Details
strace-log for irqbalance 1.9.3 after systemctl stop - no delay (21.02 KB, text/plain)
2023-12-14 10:49 UTC, Rainer Kaluscha
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rainer Kaluscha 2023-08-16 06:56:05 UTC
I experienced the same problem on three machines (with completely different hardware) after upgrading from OSS 15.4 to 15.5:

On shutdown, the system hangs 90 seconds on a stop job for irqbalance. 

During normal operation, systemctl restart irqbalance also takes a very long time. BTW: I don't use thermald so irqbalance complains about not being able to bind to its socket.   

I tested with different kernels (including current distro kernel and 6.4.10) - no difference.

P.S. Downgrading irqbalance from current 1.9.2 to 1.9.1 or 1.8.1 (from OSS 15.4) resolved the issue for me.
Comment 1 Thomas Renninger 2023-08-22 05:18:48 UTC
Can you try a build from here please:
https://build.opensuse.org/package/show/home:trenn/irqbalance.SUSE_SLE-15-SP5_Update
Comment 2 Thomas Renninger 2023-08-22 05:19:56 UTC
FYI: I'll be on vacation from tomorrow on.
Comment 3 Rainer Kaluscha 2023-08-22 16:14:52 UTC
Have a nice vacation.

Unfortunately, irqbalance-1.9.2-150500.1.1.x86_64.rpm from your repo also hangs on systemctl stop irqbalance.service :-(

Going back to 1.91 ...

So long,
Rainer
Comment 4 Rainer Kaluscha 2023-10-12 17:16:05 UTC
And irqbalance-1.9.2-150500.1.2 hangs, too ...
Comment 5 Arvin Schnell 2023-12-13 08:15:31 UTC
If you want you can try the latest version from https://build.opensuse.org/package/show/home:aschnell:branches:Base:System/irqbalance.

And please if you give version numbers include the complete version, e.g. on 15.4
we have 1.8.0.18.git+2435e8d.

Can you try to attach strace to the irqbalance process and then use 'systemctl
stop'? E.g. 'strace -o irqbalance.log -tt -p <pid> /usr/sbin/irqbalance' where
you get the pid from 'systemctl status irqbalance.service'?
Comment 6 Rainer Kaluscha 2023-12-13 20:26:26 UTC
irqbalance-1.9.3.10.git+1a7d461-150500.260.1.x86_64.rpm works on my primary Linux box - no hang / delay when stopping the service. 

Great, thnx !

P.S. I will test also on my laptop in the next days ...
Comment 7 Arvin Schnell 2023-12-14 08:01:24 UTC
OK, that is good to hear. Unfortunately we still do not know what actually
caused the hang. I cannot say whether we can make an update for 15.5 based
on that.
Comment 8 Rainer Kaluscha 2023-12-14 08:17:28 UTC
Fortunately, the error is reproducible. If it helps, I can still do an strace on a buggy version, e.g. the current version from OSS 15.5 main repo (1.9.2-150500.1.3).
Comment 9 Rainer Kaluscha 2023-12-14 10:48:01 UTC
Created attachment 871348 [details]
strace-log for irqbalance 1.9.2  after systemctl stop - takes 30 sec
Comment 10 Rainer Kaluscha 2023-12-14 10:49:33 UTC
Created attachment 871349 [details]
strace-log for irqbalance 1.9.3 after systemctl stop - no delay

irqbalance-1.9.3.10.git+1a7d461-150500.260.1.x86_64.rpm
Comment 11 Rainer Kaluscha 2023-12-14 10:50:30 UTC
Comment on attachment 871348 [details]
strace-log for irqbalance 1.9.2  after systemctl stop - takes 30 sec

irqbalance-1.9.2-150500.1.3.x86_64.rpm
Comment 12 Arvin Schnell 2023-12-19 08:30:38 UTC
Thanks for the logs. AFAIS irqbalance is stuck in recvmsg() which is used
when communicating with irqbalance-ui. Are you using irqbalance-ui or any
other UI connected to irqbalance?
Comment 13 Rainer Kaluscha 2023-12-19 17:40:13 UTC
"systemctl stop irqbalance.service" hangs with irqbalance-1.9.2-150500.1.3 even if irqbalance-ui isn't installed. The service comes up without errors, it just warns about "thermal: received a netlink error (Interrupted system call)".

P.S. Did I mention that I disabled IP V6 on my box (using sysctl: net.ipv6.conf.all.disable_ipv6 = 1) ?
Comment 14 Arvin Schnell 2023-12-20 08:42:26 UTC
Looks to be related to https://github.com/Irqbalance/irqbalance/issues/259.
Unfortunately the fix is spread over several commits.

Still I wonder why I cannot reproduce it. Do you have the standard SUSE
kernel (with CONFIG_THERMAL_NETLINK=y)?
Comment 15 Rainer Kaluscha 2023-12-20 14:09:09 UTC
Bingo :-)

Enabling CONFIG_THERMAL_NETLINK=y resolved the issue ...

Startup message changed from "thermal: received a netlink error" to " thermal: received group id (3)" and service stops without delay when asked to do so.
Comment 16 Arvin Schnell 2023-12-21 07:36:17 UTC
So in the initial report you wrote that you tested with different
kernels including the "distro" kernel. But apparently that did not
mean the 15.5 kernel RPM which has CONFIG_THERMAL_NETLINK=y. Looks
invalid to me.