Bugzilla – Bug 138011
thermal management prefers to throttle even if ALLOW_THROTTLING="no"
Last modified: 2005-12-21 12:40:08 UTC
Hi, for some reason powersave prefers acpi-cpufreq now, but even the old speedstep-centrino exposes the same behaviour: the processor is throttled after a while down to 87% without first going through speed-stepping (actually, speed stepping never occurs it seems). even worse, throttling is disabled in the performance scheme of powersave. the resulting system is very slow (and doesn't really manage to reduce the heat, since processor clock isn't stepped down at all).
Arggh, this probably was the SL10.0-RCx fix thermal passive cooling patch that also went mainline. IIRC, I also tested the throttling case, but it seems one if/else path still is wrong ... Dirk, you also know the code very well, could you have a look at it, I don't know what to do first at the moment and this one is really critical. Ohh, you are the reporter..., so it worked for 10.0, but it does not for 10.1? That's strange... > even worse, throttling is disabled in the performance scheme of powersave. This is done by kernel, no influence possible, still default behaviour should be if the passive trip point is exceeded: first cpufreq, then throttling and if the passive trip point is sub-ceeded: first go up with throttling again, then cpufreq.
hmm, ok, I've looked into it a bit. The good news is: there is no bug. The bad news is: That behaviour is suboptimal. The thermal code tries to throttle first and then later tries to reduce cpu frequency it seems (BTW, I was unable to find out if your ondemand-kernel-handler-always-caches-cpufreq-policy patch is still in our suse tree (couldn't figure out how to access kerncvs) or not). Anyway, I think this behaviour is suboptimal, because throttling only leads to linear heat reduction, while speed stepping leads to quadratic heat reduction. If I run compilation jobs, it before reduced the cpu frequency to 1.2GHz after a while and ran stable (without overheating) with that. Now it stays at 1.7GHz and throttles down by 93%, which leads to a system that is as slow as a 486 without turbo button, but still scratches the overheating limit all the time). I couldn't find documentation though.. is it "faster" for the cpu to reduce throttling or stepping up the speed? which method has more overhead? It seems to me it should prefer whatever can react quicker to changes.
ok, there could be one bug: even though it throttles down maximum, it never actually speed steps down (even though it should, because it is above the configured trip point). will have a look at this maybe tonight.
Frequency scaling must be the preferred and the first method to cool down. Tell me if you found anything, I have a machine with throttling and cpufreq ATM and could also have a look. I could reproduce this..., the behaviour changed to older kernels and there must definitley be a bug somewhere.
I haven't looked more closely yet, but indeed, rereading the source confirms that it should first speed-step and later throttle. adding some more printk's..
the: if (!cpu_has_cpufreq) checks in acpi_thermal_cpufreq_increase/decrease (processor_thermal.c) must be if (cpu_has_cpufreq) This was the change in mainline (reworked by someone) that I think causes the grief (have a closer look to changing "!"): static int cpu_has_cpufreq(unsigned int cpu) { struct cpufreq_policy policy; - if (!acpi_thermal_cpufreq_is_init) - return -ENODEV; - if (!cpufreq_get_policy(&policy, cpu)) + if (!acpi_thermal_cpufreq_is_init || cpufreq_get_policy(&policy, cpu)) return -ENODEV; return 0; Am I right, or have i overseen something? I am in a hurry currently, I come up with a patch tomorrow.
BTW: I am in a real hurry... not tested yet, only assumptions..., bye.
indeed, that seems to work. I decided to fix the return values instead, which seems to make the stuff more logical to me (all callers of the static expect the inverse meaning)
Created attachment 61508 [details] the patch
Thanks. Patch retestet, added to stable and applied for mainline. Dirk: I forget to mention you, wasn't meant to, I just forgot it, sorry.
No problem, I already posted the stuff to acpi-devel@ last night: http://sourceforge.net/mailarchive/forum.php?thread_id=9289494&forum_id=6102 Though, no reaction so far.