Bugzilla – Bug 115115
irqbalance causes system lockup
Last modified: 2005-09-12 11:01:42 UTC
Shortly after starting /usr/sbin/irqbalance system locks up. Hardware : Asus P4S800D-E Motherboard, Intel P4 CPU, hyper-threading enabled Package : irqbalance, version 0.09, release 42 Kernel : Linux 2.6.13-3-smp #1 SMP Mon Aug 29 19:48:23 UTC 2005 The system would lockup shortly after irqbalance was started during boot. Booting in safe mode disabled SMP and irqbalance which allowed the system to boot. ----- Disabled start of irqbalance on boot: "chkconfig --del irq_balancer" System was then able to boot normally with hyper-threading enabled System ran normally until irqbalance was started manually "rcirq_balancer start". Shortly afterward system would lock-up. This is repeatable.
What kind of lockup? When you run in console mode and do klogconsole -l8 -r0 first do you see something? Does the system still react to console switches then? Also please attach hwinfo output.
Created attachment 48861 [details] Output from hwinfo
Hmm, you seem to have an SIS chipset. Maybe those don't like setting the IRQ affinity and have some APIC bugs. Did earlier releases work? I guess on a HT system like this we can just disable it because irq balancing only really makes sense on a real multi socket SMP system.
> What kind of lockup? Does the system still react to console switches then? Within a few seconds of starting irqbalance: System fails to respond to keyboard inputs. Unable to switch between VTTYs. System stops displaying output to attached monitor. Further testing shows existing SSH connections continue to function and NEW connections may be established. Attempted to execute "top" from from an SSH connection and the session appeared to hang. Had to kill the "top" process from another session to get back to a command prompt. > When you run in console mode and do klogconsole -l8 -r0 first do you see something? No. Booted system with "console=ttyS0,115200n8" kernel arg. Then ran "klogconsole -l8 -r0" before starting irqbalance. No output to VTTY0, serial port, dmesg or /var/log/messages. > Did earlier releases work? Unknown. Did not try with earlier 10.0x releases
Ok. Wild theory: irqbalanced touches the keyboard interrupt and that chipset doesn't like that. What happens when you don't start irqbalanced but just do over a ssh connection as root cut -d: -f1 /proc/interrupts | while read i ; do echo $i 1 echo 1 > /proc/irq/$i/smp_affinity sleep 1 echo $i 2 echo 2 > /proc/irq/$i/smp_affinity sleep 1 done Does that lock up too? What is the last output you see? Also attach /proc/interrupts. >Unknown. Did not try with earlier 10.0x releases I meant releases before 10 like 9.3 or earlier
Re: Behavior on pre 10.0 releases Unknown. irqbalance was not installed with 9.3 on this hardware. As 9.3 is no longer installed on this hardware this would be difficult to test. Re: test script linux:~ # ./test.sh CPU0 CPU1 1 ./test.sh: line 4: /proc/irq/CPU0 CPU1/smp_affinity: No such file or directory CPU0 CPU1 2 ./test.sh: line 7: /proc/irq/CPU0 CPU1/smp_affinity: No such file or directory 0 1 0 2 <Hung. Needed to kill test.sh process from another ssh session> Ran a second time, output was: linux:~ # ./test.sh CPU0 CPU1 1 ./test.sh: line 4: /proc/irq/CPU0 CPU1/smp_affinity: No such file or directory <Hung. Needed to kill test.sh process from another ssh session>
Created attachment 48864 [details] contents of /proc/interrupts
It looks like it locks up when trying to change irq 0. Stefan, can you just make irqbalanced ignore irq 0?
Paul, does this work with RC1? In RC1, irqbalance is not installed on P4. (doesn't make sense with 1 HT CPU anyways)
The exact rule is: It is only installed on 64-bit x86-64 SMP systems. So, after an rpm -e irqbalance, an update should not install it again. Lowering priority.
> The exact rule is: It is only installed on 64-bit x86-64 SMP systems. In that case part of the fault is with the installer as irqbalance was installed as part of a "Standard system with KDE" install. As an error proofing measure would it be difficult to modify irqbalance to verify it is on appropriate hardware before continuing to run? I plan on doing a clean install of RC1 later today on the same hardware and will report back on what the installer does with irqbalance.
That it causes a lockup on your system is a hardware bug in your chipset. It is hard for irqbalanced to predict hardware bugs like this. irqbalanced itself is not to blame.
> irqbalanced itself is not to blame. I agree, irqbalance should not have been installed or run on this hardware to begin with so perhaps this is better described as an installer bug. However as you mentioned above, it doesn't make much sense to run it on a single CPU system. What I was suggesting is that perhaps irqbalanced could do a sanity check for this condition before it runs. I am NOT suggesting that there should be some sort of test/exception table for every chipset. In any event, I appreciate everyones time on this and will report back on how RC1 deals with the install issue.
There are plans to make irqbalanced hyperthreading/dual core aware, but not for 10.0
I have reinstalled the system with 10.0-RC1 using the same auotyast control file as before. This time irqbalance was NOT installed. The problem appears to have been resolved. Thanks to everyone for their time and effort on this issue.
Thanks for testing again, as this seems to be the only system with this problem, I'll close it now.