Bugzilla – Bug 1214307
System hangs on stop job for irqbalance after Upgrade from 15.4 to 15.5
Last modified: 2023-12-21 07:36:17 UTC
I experienced the same problem on three machines (with completely different hardware) after upgrading from OSS 15.4 to 15.5: On shutdown, the system hangs 90 seconds on a stop job for irqbalance. During normal operation, systemctl restart irqbalance also takes a very long time. BTW: I don't use thermald so irqbalance complains about not being able to bind to its socket. I tested with different kernels (including current distro kernel and 6.4.10) - no difference. P.S. Downgrading irqbalance from current 1.9.2 to 1.9.1 or 1.8.1 (from OSS 15.4) resolved the issue for me.
Can you try a build from here please: https://build.opensuse.org/package/show/home:trenn/irqbalance.SUSE_SLE-15-SP5_Update
FYI: I'll be on vacation from tomorrow on.
Have a nice vacation. Unfortunately, irqbalance-1.9.2-150500.1.1.x86_64.rpm from your repo also hangs on systemctl stop irqbalance.service :-( Going back to 1.91 ... So long, Rainer
And irqbalance-1.9.2-150500.1.2 hangs, too ...
If you want you can try the latest version from https://build.opensuse.org/package/show/home:aschnell:branches:Base:System/irqbalance. And please if you give version numbers include the complete version, e.g. on 15.4 we have 1.8.0.18.git+2435e8d. Can you try to attach strace to the irqbalance process and then use 'systemctl stop'? E.g. 'strace -o irqbalance.log -tt -p <pid> /usr/sbin/irqbalance' where you get the pid from 'systemctl status irqbalance.service'?
irqbalance-1.9.3.10.git+1a7d461-150500.260.1.x86_64.rpm works on my primary Linux box - no hang / delay when stopping the service. Great, thnx ! P.S. I will test also on my laptop in the next days ...
OK, that is good to hear. Unfortunately we still do not know what actually caused the hang. I cannot say whether we can make an update for 15.5 based on that.
Fortunately, the error is reproducible. If it helps, I can still do an strace on a buggy version, e.g. the current version from OSS 15.5 main repo (1.9.2-150500.1.3).
Created attachment 871348 [details] strace-log for irqbalance 1.9.2 after systemctl stop - takes 30 sec
Created attachment 871349 [details] strace-log for irqbalance 1.9.3 after systemctl stop - no delay irqbalance-1.9.3.10.git+1a7d461-150500.260.1.x86_64.rpm
Comment on attachment 871348 [details] strace-log for irqbalance 1.9.2 after systemctl stop - takes 30 sec irqbalance-1.9.2-150500.1.3.x86_64.rpm
Thanks for the logs. AFAIS irqbalance is stuck in recvmsg() which is used when communicating with irqbalance-ui. Are you using irqbalance-ui or any other UI connected to irqbalance?
"systemctl stop irqbalance.service" hangs with irqbalance-1.9.2-150500.1.3 even if irqbalance-ui isn't installed. The service comes up without errors, it just warns about "thermal: received a netlink error (Interrupted system call)". P.S. Did I mention that I disabled IP V6 on my box (using sysctl: net.ipv6.conf.all.disable_ipv6 = 1) ?
Looks to be related to https://github.com/Irqbalance/irqbalance/issues/259. Unfortunately the fix is spread over several commits. Still I wonder why I cannot reproduce it. Do you have the standard SUSE kernel (with CONFIG_THERMAL_NETLINK=y)?
Bingo :-) Enabling CONFIG_THERMAL_NETLINK=y resolved the issue ... Startup message changed from "thermal: received a netlink error" to " thermal: received group id (3)" and service stops without delay when asked to do so.
So in the initial report you wrote that you tested with different kernels including the "distro" kernel. But apparently that did not mean the 15.5 kernel RPM which has CONFIG_THERMAL_NETLINK=y. Looks invalid to me.