Bugzilla – Bug 105981
false acpi_thermal_critical alert
Last modified: 2005-08-21 12:16:11 UTC
My ThinkPad T42p received a acpi_thermal_critical and thus rebooted. The syslog says: Aug 20 01:07:42 sighup syslog-ng[3891]: Changing permissions on special file /dev/xconsole Aug 20 01:07:42 sighup syslog-ng[3891]: Changing permissions on special file /dev/tty10 Aug 20 01:07:42 sighup kernel: acpi_thermal-0463 [861] acpi_thermal_critical : Critical trip point Aug 20 01:07:42 sighup kernel: Critical temperature reached (95 C), shutting down. Aug 20 01:07:42 sighup init: Switching to runlevel: 0 Actually I am pretty sure that this was a false alert because: 1. This machine never before went even near this limit even on very heavy load. 2. At that moment I was playing Lincity-NG and nothing else was running that could have lead to heady load. 3. The machine did not really feel hot. 4. After immediately rebooting the machine, /proc/acpi/thermal_zone/THM0/temperature said that the machine is at 50 C. It seems impossible that the machine could have cooled down from 95 C to 50 C within the timeframe of one boot cycle. Unfortunately this sort of problem is almost impossible to reproduce thus I don't know whether you can do anything about that.
Did some furhter investigation under heavy load. When monitoring /proc/acpi/thermal_zone/THM0/temperature on older SUSE releases (e.g. 9.3) the temperature went slowly up under heavy load and went slowly down when the heavy load was no longer present. Under 10.0 now the value seems still somewhat reasonable when there is no heavy load but as soon as I put the machine under heavy load the value starts jumping up and down with a speed that is almost impossible from a physical point of view. By jumping up and down it seems that at some random time it accidently hits the critical limit and thus shuts the system down.
sounds like a acpi kernel problem
It seems as if current kernels or Xorg configuration let a lot Thinkpads overheat quite quickly. Even I still have no idea what could be the cause of this and whether this is really kernel ACPI related, I like to find out. Here you find a short discussion of other Thinkpad user suffering the same problem: http://mailman.linux-thinkpad.org/pipermail/linux-thinkpad/2005-August/thread.html If the graphic card/driver fits can you try this (forwarded message) (In Xorg.conf): ATI FireGL mobility T2, Option "DynamicClocks" "on". This way the temperature of the gfx card goes down from ~ 98°C to 56°C, when idling over night it goes down to 48°C and the fan stops completely. Even this one and #98178 are duplicates I like to let this one open as general Thinkpad overheating bug and the other one as "ondemand passive thermal policy broken" bug
Ok, will try this. But note: When the system is idle, I am at about 45°C even without that option although the fan is still running.
Not sure whether this is fixable and time, but if this is really because of a regression in current linux kernels (proabably ACPI), I consider this as sever as Thinkpads are the laptops that are known to work really well with Linux. I have no idea but could imagine that something with fan control does not work as expected. Even if fan is on, Dirk reports that it could run much faster (e.g. at boot time - BIOS controlled). Robert: Can you send me the output of acpidmp, please. Does it make a difference when ibm_acpi module is not loaded? (Try something to produce load to get full performance: e.g. cat /dev/zero >/dev/null) You can then watch your temperature increase nicley with e.g.: watch -n1 cat /proc/acpi/thermal_zone/*/temperature or with ibm_acpi module: watch -n1 cat /proc/acpi/*ibm*/thermal
This is not the first report of this kind, adding behlert, increasing severity.
Have you already updateded the BIOS and embedded controller (http://www-307.ibm.com/pc/support/site.wss/document.do?sitestyle=ibm&lndocid=MIGR-50277) firmware? If not, could you please send me the output of acpidmp before and after updating to latest BIOS/embedded controller firmware. Does updating solve this issue?
this should be an exact duplicate of my bugreport, Robert also uses a Thinkpad p-Series laptop. Robert, try modprobe ibm_acpi experimental=1 then you can more closely watch the behaviour via : watch "cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq /proc/acpi/ibm/fan /proc/acpi/ibm/thermal" for more details, see bugreport 98178 whats your trip_point polling frequency? does it also go away if you lower it to 1 s or something like that?
there's a new bios released just yesterday,but it doesn't seem to fix anything relevant: http://www-307.ibm.com/pc/support/site.wss/document.do?sitestyle=ibm&lndocid=MIGR-50275
I wonder whether the new embedded controller firmware causes this. The description is talking about getting rid of nasty fan noise. Do you both have new embedded controller firmware? Can someone confirm that this only happened with the new firmware?
Created attachment 46766 [details] Output of acpidmp Ok, first the output of acpidmp. Have not yet loaded ibm_acpi. Will do so now and evaluate...
The system has the latest BIOS and the latest embedded controller firmware. I don't think that the embedded controller firmware is responsible because I never had problems with that in 9.3 or before.
ok, 9.3 and older didn't use the kernel ondemand frequency scaler but an userspace implementation that was far more conservative and therefore didn't trigger the overheating. I already downgraded to kernel from 9.3 and it doesn't make a difference, kernel ondemand is broken there as well. anyway, can we agree that this is a duplicate of 98178 ?
Yes, most likely it is a duplicate. I will do more investigations as suggested above later. At the very moment I have to do some other stuff.
*** This bug has been marked as a duplicate of 98178 ***
Ok, I tried now Option "DynamicClocks" "on". Where there is no significant change when the system is idle it seems that applications with high graphics activity no longer trigger the alert.