Bugzilla – Bug 155767
Thermal polling badness on Sun W2100z and Sun W1100z boxes
Last modified: 2006-03-13 14:48:41 UTC
This problem has been around in various forms since 9.3. It was especially bad in 10.0. I just installed 10.1beta6 and it is still there... Every few seconds the keyboard/mouse freeze. This makes it very difficult to type in your password, type any commands, etc... My typing is bad enough without the OS stealing a random character every few seconds. Andi had a kernel patch. I thought it went in? The workaround is I change /etc/sysconfig/powersave/thermal THERMAL_POLLING_FREQUENCY="1000000" Please check this out and consider a fix for 10.0 final. Sun sold many of these boxes. I have recieved many requests from people outside my organization for help on fixing this.
On Thursday 12 January 2006 11:20, Thomas Renninger wrote: > > Andi Kleen wrote: >> > > On Wednesday 11 January 2006 15:39, Eric Whiting wrote: >>> > >> Yes -- major problem. Should be fixed in new kernels (right Andi?) >> > > >> > > Somehow my patch to not turn off interrupts and use sleeping delays >> > > during polling doesn't seem to have made it anywhere. Thomas, did you >> > > follow it? burst mode didn't work on that box for some reason. > > > > This one seem to fix it for some machines, but not this one? > > The patch is in 10.0... Yes, but how about mainline? I guess i need to resubmit it there. >>> > >> /(115459 in bugzilla has details -- //the fix was supposed to be in >>> > >> 10.0RC3 but has not made it into the 10.0 kernels yet. )/ > > > > I thought it made it into the goldmaster? > > If not it must be in the last YOU update kernel, if this one has the same > > problem, the patch does not help for these machines. It should be in YOU at least, right. -Andi
Hmm, I guess i need to resurrect the patch to do the polling with interrupts on. Might have not made it into the 10.1 codebase. Or didn't you have a different fix for the bug at some point Thomas?
Another fix: There is a new bios from Sun that fixes the issue... 2.3 fixes it. http://www.sun.com/desktop/workstation/w1100z/downloads.jsp In the earlier thread regarding this issue I had updated my bios to whatever was current at the time. That link from sun mentions 'jittery' performance.
Ahhh, that makes sense. AFAIK Andi's patch was intergrated in some form. There is now ec_intr=0/1 ec_intr=1 should be default, ec_intr=0 will force polling. "Burst" and "Interrupt" was mixed up here and naming has been corrected, ec_burst has been marked for future usage and the parts making use of burst mode ripped out again IIRC. ->No action needed? Closing ... Thanks Eric for keeping an eye on this.
bugzilla has the original andi patch: https://bugzilla.novell.com/show_bug.cgi?id=115459#c14 Checking against 2.6.15.6 I don't see any of andi's diffs in the main kernel. There were some good fixes in there... ??? At the time of the original report toggling ec_intr did not help much. Fixes: 1. change thermal polling to 100000s (not a good fix) 2. update Sun bios (not always possible) 3. recondsider the kernel patch -- your decision on that.
We should probably reconsider the kernel patch to not turn off interrupts doing EC reading. Luming, can you perhaps review it? It's the patch in https://bugzilla.novell.com/show_bug.cgi?id=115459#c14
This is the similar patch: http://bugzilla.kernel.org/show_bug.cgi?id=5764#c1 I think it is ok. But, the problem is why ec_read/write/query get so many chances to run on these box? So, I think this patch is just a workaround, we still need to figure out if there are other unknown issues. acpidump will help me understand what's going on when evaluating _TZP. The weird thing is ec_intr mode don't disalbe interrupt, and it should help . But it doesn't. So, I think other unknow issue somehow has significant impact on the results.
I compared your(Andi's) patch and what currently is mainline: ec_intr=0/1 is the switch to turn off/on Andi's "not disabling irqs" patch. That means, we have default ec_intr=1 and Andi's patch should already be active. Only these changes are misssing (not sure what kind of impact that might have here): in_interrupt checks are missing to avoid waiting on mutex when in interrupt context: if (in_interrupt()) return_VALUE(-ENODEV); There are some very small changes (udelay/mdelay - Andi is using msleep instead of usleep) but I doubt they have any functional change ... I believe: a) this board was always broken with old BIOS b) If it ever worked: Disabling interrupts and the old ec_burst changes make it work? But the burst changes have been thrown out and marked ACPI_FUTURE_USAGE, no chance to get this in for recent kernels.