|
Bugzilla – Full Text Bug Listing |
| Summary: | irqbalance causes system lockup | ||
|---|---|---|---|
| Product: | [openSUSE] SUSE LINUX 10.0 | Reporter: | Paul Beltrani <echo> |
| Component: | Other | Assignee: | Stefan Fent <stefan.fent> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P5 - None | ||
| Version: | Beta 4 | ||
| Target Milestone: | --- | ||
| Hardware: | i686 | ||
| OS: | SUSE Other | ||
| Whiteboard: | |||
| Found By: | Other | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
Output from hwinfo
contents of /proc/interrupts |
||
|
Description
Paul Beltrani
2005-09-03 01:23:44 UTC
What kind of lockup? When you run in console mode and do klogconsole -l8 -r0 first do you see something? Does the system still react to console switches then? Also please attach hwinfo output. Created attachment 48861 [details]
Output from hwinfo
Hmm, you seem to have an SIS chipset. Maybe those don't like setting the IRQ affinity and have some APIC bugs. Did earlier releases work? I guess on a HT system like this we can just disable it because irq balancing only really makes sense on a real multi socket SMP system. > What kind of lockup? Does the system still react to console switches then? Within a few seconds of starting irqbalance: System fails to respond to keyboard inputs. Unable to switch between VTTYs. System stops displaying output to attached monitor. Further testing shows existing SSH connections continue to function and NEW connections may be established. Attempted to execute "top" from from an SSH connection and the session appeared to hang. Had to kill the "top" process from another session to get back to a command prompt. > When you run in console mode and do klogconsole -l8 -r0 first do you see something? No. Booted system with "console=ttyS0,115200n8" kernel arg. Then ran "klogconsole -l8 -r0" before starting irqbalance. No output to VTTY0, serial port, dmesg or /var/log/messages. > Did earlier releases work? Unknown. Did not try with earlier 10.0x releases Ok. Wild theory: irqbalanced touches the keyboard interrupt and that chipset
doesn't like that.
What happens when you don't start irqbalanced but just do over a ssh connection
as root
cut -d: -f1 /proc/interrupts | while read i ; do
echo $i 1
echo 1 > /proc/irq/$i/smp_affinity
sleep 1
echo $i 2
echo 2 > /proc/irq/$i/smp_affinity
sleep 1
done
Does that lock up too? What is the last output you see? Also
attach /proc/interrupts.
>Unknown. Did not try with earlier 10.0x releases
I meant releases before 10 like 9.3 or earlier
Re: Behavior on pre 10.0 releases Unknown. irqbalance was not installed with 9.3 on this hardware. As 9.3 is no longer installed on this hardware this would be difficult to test. Re: test script linux:~ # ./test.sh CPU0 CPU1 1 ./test.sh: line 4: /proc/irq/CPU0 CPU1/smp_affinity: No such file or directory CPU0 CPU1 2 ./test.sh: line 7: /proc/irq/CPU0 CPU1/smp_affinity: No such file or directory 0 1 0 2 <Hung. Needed to kill test.sh process from another ssh session> Ran a second time, output was: linux:~ # ./test.sh CPU0 CPU1 1 ./test.sh: line 4: /proc/irq/CPU0 CPU1/smp_affinity: No such file or directory <Hung. Needed to kill test.sh process from another ssh session> Created attachment 48864 [details]
contents of /proc/interrupts
It looks like it locks up when trying to change irq 0. Stefan, can you just make irqbalanced ignore irq 0? Paul, does this work with RC1? In RC1, irqbalance is not installed on P4. (doesn't make sense with 1 HT CPU anyways) The exact rule is: It is only installed on 64-bit x86-64 SMP systems. So, after an rpm -e irqbalance, an update should not install it again. Lowering priority. > The exact rule is: It is only installed on 64-bit x86-64 SMP systems.
In that case part of the fault is with the installer as irqbalance was installed
as part of a "Standard system with KDE" install.
As an error proofing measure would it be difficult to modify irqbalance to
verify it is on appropriate hardware before continuing to run?
I plan on doing a clean install of RC1 later today on the same hardware and will
report back on what the installer does with irqbalance.
That it causes a lockup on your system is a hardware bug in your chipset. It is hard for irqbalanced to predict hardware bugs like this. irqbalanced itself is not to blame. > irqbalanced itself is not to blame.
I agree, irqbalance should not have been installed or run on this hardware to
begin with so perhaps this is better described as an installer bug.
However as you mentioned above, it doesn't make much sense to run it on a single
CPU system. What I was suggesting is that perhaps irqbalanced could do a sanity
check for this condition before it runs. I am NOT suggesting that there should
be some sort of test/exception table for every chipset.
In any event, I appreciate everyones time on this and will report back on how
RC1 deals with the install issue.
There are plans to make irqbalanced hyperthreading/dual core aware, but not for 10.0 I have reinstalled the system with 10.0-RC1 using the same auotyast control file as before. This time irqbalance was NOT installed. The problem appears to have been resolved. Thanks to everyone for their time and effort on this issue. Thanks for testing again, as this seems to be the only system with this problem, I'll close it now. |