Bug 138011 - thermal management prefers to throttle even if ALLOW_THROTTLING="no"
Summary: thermal management prefers to throttle even if ALLOW_THROTTLING="no"
Status: RESOLVED FIXED
Alias: None
Product: SUSE Linux 10.1
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Alpha 3plus
Hardware: Other Other
: P5 - None : Critical (vote)
Target Milestone: ---
Assignee: Thomas Renninger
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-12 10:04 UTC by Dirk Mueller
Modified: 2005-12-21 12:40 UTC (History)
0 users

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
the patch (374 bytes, patch)
2005-12-20 23:20 UTC, Dirk Mueller
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Mueller 2005-12-12 10:04:15 UTC
Hi, 

for some reason powersave prefers acpi-cpufreq now, but even the old speedstep-centrino exposes the same behaviour: the processor is throttled after a while down to 87% without first going through speed-stepping (actually, speed stepping never occurs it seems). even worse, throttling is disabled in the performance scheme of powersave. 

the resulting system is very slow (and doesn't really manage to reduce the heat, since processor clock isn't stepped down at all).
Comment 3 Thomas Renninger 2005-12-13 15:36:45 UTC
Arggh, this probably was the SL10.0-RCx fix thermal passive cooling patch that also went mainline.
IIRC, I also tested the throttling case, but it seems one if/else path still is wrong ...
Dirk, you also know the code very well, could you have a look at it, I don't know what to do first at the moment and this one is really critical.
Ohh, you are the reporter..., so it worked for 10.0, but it does not for 10.1? That's strange...

> even worse, throttling is disabled in the performance scheme of powersave.
This is done by kernel, no influence possible, still default behaviour should be if the passive trip point is exceeded: first cpufreq, then throttling
and if the passive trip point is sub-ceeded: first go up with throttling again, then cpufreq.
Comment 4 Dirk Mueller 2005-12-19 10:25:10 UTC
hmm, ok, I've looked into it a bit. The good news is: there is no bug. The bad news is: That behaviour is suboptimal. 

The thermal code tries to throttle first and then later tries to reduce cpu frequency it seems (BTW, I was unable to find out if your ondemand-kernel-handler-always-caches-cpufreq-policy patch is still in our suse tree (couldn't figure out how to access kerncvs) or not). 

Anyway, I think this behaviour is suboptimal, because throttling only leads to linear heat reduction, while speed stepping leads to quadratic heat reduction. If I run compilation jobs, it before reduced the cpu frequency to 1.2GHz after a while and ran stable (without overheating) with that. Now it stays at 1.7GHz and throttles down by 93%, which leads to a system that is as slow as a 486 without turbo button, but still scratches the overheating limit all the time). 

I couldn't find documentation though.. is it "faster" for the cpu to reduce throttling or stepping up the speed? which method has more overhead? It seems to me it should prefer whatever can react quicker to changes. 

Comment 5 Dirk Mueller 2005-12-19 10:29:41 UTC
ok, there could be one bug: even though it throttles down maximum, it never actually speed steps down (even though it should, because it is above the configured trip point). 

will have a look at this maybe tonight. 
Comment 6 Thomas Renninger 2005-12-20 09:09:19 UTC
Frequency scaling must be the preferred and the first method to cool down.
Tell me if you found anything, I have a machine with throttling and cpufreq ATM and could also have a look.
I could reproduce this..., the behaviour changed to older kernels and there must definitley be a bug somewhere.
Comment 7 Dirk Mueller 2005-12-20 16:36:05 UTC
I haven't looked more closely yet, but indeed, rereading the source confirms that it should first speed-step and later throttle. adding some more printk's..
 

Comment 8 Thomas Renninger 2005-12-20 18:53:33 UTC
the:
if (!cpu_has_cpufreq)

checks in acpi_thermal_cpufreq_increase/decrease (processor_thermal.c)

must be
if (cpu_has_cpufreq)

This was the change in mainline (reworked by someone) that I think causes the grief (have a closer look to changing "!"):
 static int cpu_has_cpufreq(unsigned int cpu)
{
struct cpufreq_policy policy;
- if (!acpi_thermal_cpufreq_is_init)
- return -ENODEV;
- if (!cpufreq_get_policy(&policy, cpu))
+ if (!acpi_thermal_cpufreq_is_init || cpufreq_get_policy(&policy, cpu))
return -ENODEV;
return 0;

Am I right, or have i overseen something?
I am in a hurry currently, I come up with a patch tomorrow.
Comment 9 Thomas Renninger 2005-12-20 18:54:21 UTC
BTW: I am in a real hurry... not tested yet, only assumptions..., bye.
Comment 10 Dirk Mueller 2005-12-20 23:19:45 UTC
indeed, that seems to work. I decided to fix the return values instead, which seems to make the stuff more logical to me (all callers of the static expect the inverse meaning)

Comment 11 Dirk Mueller 2005-12-20 23:20:14 UTC
Created attachment 61508 [details]
the patch
Comment 12 Thomas Renninger 2005-12-21 10:09:31 UTC
Thanks.
Patch retestet, added to stable and applied for mainline.
Dirk: I forget to mention you, wasn't meant to, I just forgot it, sorry.
Comment 13 Dirk Mueller 2005-12-21 12:34:44 UTC
No problem, I already posted the stuff to acpi-devel@ last night:

http://sourceforge.net/mailarchive/forum.php?thread_id=9289494&forum_id=6102

Though, no reaction so far.