Bug 155767 - Thermal polling badness on Sun W2100z and Sun W1100z boxes
Summary: Thermal polling badness on Sun W2100z and Sun W1100z boxes
Status: RESOLVED FIXED
Alias: None
Product: SUSE Linux 10.1
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 6
Hardware: x86-64 Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-03-07 16:25 UTC by Eric Whiting
Modified: 2006-03-13 14:48 UTC (History)
3 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Whiting 2006-03-07 16:25:58 UTC
This problem has been around in various forms since 9.3. It was especially bad in 10.0. I just installed 10.1beta6 and it is still there...

Every few seconds the keyboard/mouse freeze. This makes it very difficult to type in your password, type any commands, etc... My typing is bad enough without the OS stealing a random character every few seconds.

Andi had a kernel patch. I thought it went in?

The workaround is I change /etc/sysconfig/powersave/thermal 

THERMAL_POLLING_FREQUENCY="1000000"

Please check this out and consider a fix for 10.0 final. 

Sun sold many of these boxes. I have recieved many requests from people outside my organization for help on fixing this.
Comment 1 Eric Whiting 2006-03-07 16:27:51 UTC
On Thursday 12 January 2006 11:20, Thomas Renninger wrote:
> > Andi Kleen wrote:
>> > > On Wednesday 11 January 2006 15:39, Eric Whiting wrote:
>>> > >> Yes -- major problem. Should be fixed in new kernels (right Andi?)
>> > >
>> > > Somehow my patch to not turn off interrupts and use sleeping delays
>> > > during polling doesn't seem to have made it anywhere. Thomas, did you
>> > > follow it? burst mode didn't work on that box for some reason.
> >
> > This one seem to fix it for some machines, but not this one?
> > The patch is in 10.0...

Yes, but how about mainline? I guess i need to resubmit it there.

>>> > >> /(115459 in bugzilla has details -- //the fix was supposed to be in
>>> > >> 10.0RC3 but has not made it into the 10.0 kernels yet. )/
> >
> > I thought it made it into the goldmaster?
> > If not it must be in the last YOU update kernel, if this one has the same
> > problem, the patch does not help for these machines.

It should be in YOU at least, right.

-Andi
Comment 2 Andreas Kleen 2006-03-07 20:01:35 UTC
Hmm, I guess i need to resurrect the patch to do the polling with interrupts
on. Might have not made it into the 10.1 codebase.

Or didn't you have a different fix for the bug at some point Thomas?


Comment 3 Eric Whiting 2006-03-08 00:36:24 UTC
Another fix: There is a new bios from Sun that fixes the issue... 2.3 fixes it. 

http://www.sun.com/desktop/workstation/w1100z/downloads.jsp

In the earlier thread regarding this issue I had updated my bios to whatever was current at the time. That link from sun mentions 'jittery' performance. 


Comment 4 Thomas Renninger 2006-03-08 05:45:43 UTC
Ahhh, that makes sense.
AFAIK Andi's patch was intergrated in some form. There is now ec_intr=0/1
ec_intr=1 should be default, ec_intr=0 will force polling.
"Burst" and "Interrupt" was mixed up here and naming has been corrected, ec_burst has been marked for future usage and the parts making use of burst mode ripped out again IIRC.

->No action needed? Closing ...
Thanks Eric for keeping an eye on this.
Comment 5 Eric Whiting 2006-03-08 18:05:12 UTC
bugzilla has the original andi patch:
https://bugzilla.novell.com/show_bug.cgi?id=115459#c14

Checking against 2.6.15.6 I don't see any of andi's diffs in the main kernel. There were some good fixes in there... ???

At the time of the original report toggling ec_intr did not help much. 

Fixes:
1. change thermal polling to 100000s (not a good fix)
2. update Sun bios (not always possible)
3. recondsider the kernel patch -- your decision on that. 

Comment 6 Andreas Kleen 2006-03-08 18:14:06 UTC
We should probably reconsider the kernel patch to not turn off
interrupts doing EC reading.

Luming, can you perhaps review it? It's the patch in
https://bugzilla.novell.com/show_bug.cgi?id=115459#c14
Comment 7 Luming Yu 2006-03-09 06:05:21 UTC
This is the similar patch: http://bugzilla.kernel.org/show_bug.cgi?id=5764#c1
I think it is ok. But, the problem is why ec_read/write/query get so many
chances to run on these box? 

So, I think this patch is just a workaround, we still need to figure out if there are other unknown issues. acpidump will help me understand what's going
on when evaluating _TZP.

The weird thing is ec_intr mode don't disalbe interrupt, and it should help
. But it doesn't. So, I think other unknow issue somehow has significant impact on the results.
Comment 8 Thomas Renninger 2006-03-13 14:48:41 UTC
I compared your(Andi's) patch and what currently is mainline:
ec_intr=0/1 is the switch to turn off/on Andi's "not disabling irqs" patch.
That means, we have default ec_intr=1 and Andi's patch should already be active. Only these changes are misssing (not sure what kind of impact that might have here):
in_interrupt checks are missing to avoid waiting on mutex when in interrupt context:
	if (in_interrupt())
		return_VALUE(-ENODEV);

There are some very small changes (udelay/mdelay - Andi is using msleep instead of usleep) but I doubt they have any functional change ...

I believe:
 a) this board was always broken with old BIOS
 b) If it ever worked: Disabling interrupts and the old ec_burst changes make 
    it work? But the burst changes have been thrown out and marked 
    ACPI_FUTURE_USAGE, no chance to get this in for recent kernels.