Bugzilla – Bug 1203748
Laptop very slow in battery mode (Lenovo Thinkpad T470) - needs BIOS update to version >= 1.42
Last modified: 2023-02-03 10:56:30 UTC
For some reason my laptop gets very slow in battery only mode. I notice this when watching videos. It makes it rather unusable when not connecting it to power. It looks like it goes into some deep powersave mode. I would like to avoid this even if this would half the running time in battery mode. Which information do you need?
For example when watching video https://www.youtube.com/watch?v=hZ2joF8_1QY in 1440p60 when switching from power (running absolutely fluently) to battery mode this is switching to 60 sec/frame (!) or even worse!
The question is which configuration really matters and what changed the setup. I suppose you're using XFCE? Do you run thermald? Or it's only with upowerd? As a test, you can try to change the cpufreq governer, manually writing to /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor file, e.g. # echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor and so on for each CPU. The default value for the recent Intel CPUs is powersave, IIRC. About the cpufreq, Arch has a nice web page: https://wiki.archlinux.org/title/CPU_frequency_scaling
Yes, I'm using XFCE. thermald is installed but apparently not running (doesn't occur in 'ps' output) upowerd is indeed running /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor is set to 'powersave', not matter if the laptop runs powered or on battery. >About the cpufreq, Arch has a nice web page: > https://wiki.archlinux.org/title/CPU_frequency_scaling Thanks!
I can set /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor manually to 'performance' and this stays when switching between powered and battery. Which is good. So I can test whether this helps.
In reply to Stefan Dirsch from comment #1) > For example when watching video > > https://www.youtube.com/watch?v=hZ2joF8_1QY > > in 1440p60 when switching from power (running absolutely fluently) to > battery mode this is switching to 60 sec/frame (!) or even worse! This is weird. Sometimes things are working fine when running on battery. If not CPUs are running at exactly 400 MHz on battery. Otherwise they run at about 3000MHz, no matter if powered or on battery.
Try to disable intel_idle by passing intel_idle.max_cstate=0 boot option. This should avoid the deep state. If it works, something doesn't work as expected in the intel_idle. You can try to compare the behavior of SLE15-SP4 kernel and the recent upstream kernel from OBS Kernel:stable:Backport.
Tried this. Unfortunately with this setting I still see this going down to 400 MHz rendering the machine unusable. Sometime when reconnecting to power, waiting a bit, when again disconnecting things are back to normal.
Found this in journalctl, but I'm not sure if this happened together with switching CPUs to 400 MHz Oct 04 15:13:00 linux-3j75.suse kernel: ACPI: EC: interrupt blocked Oct 04 15:13:00 linux-3j75.suse kernel: ACPI: Preparing to enter system sleep state S3 Oct 04 15:13:00 linux-3j75.suse kernel: ACPI: EC: event blocked Oct 04 15:13:00 linux-3j75.suse kernel: ACPI: EC: EC stopped Oct 04 15:13:00 linux-3j75.suse kernel: PM: Saving platform NVS memory Oct 04 15:13:00 linux-3j75.suse kernel: Disabling non-boot CPUs ... Oct 04 15:13:00 linux-3j75.suse kernel: IRQ 132: no longer affine to CPU1 Oct 04 15:13:00 linux-3j75.suse kernel: smpboot: CPU 1 is now offline Oct 04 15:13:00 linux-3j75.suse kernel: smpboot: CPU 2 is now offline Oct 04 15:13:00 linux-3j75.suse kernel: IRQ 16: no longer affine to CPU3 Oct 04 15:13:00 linux-3j75.suse kernel: smpboot: CPU 3 is now offline Oct 04 15:13:00 linux-3j75.suse kernel: ACPI: Low-level resume complete Oct 04 15:13:00 linux-3j75.suse kernel: ACPI: EC: EC started Oct 04 15:13:00 linux-3j75.suse kernel: PM: Restoring platform NVS memory Oct 04 15:13:00 linux-3j75.suse kernel: Enabling non-boot CPUs ... Oct 04 15:13:00 linux-3j75.suse kernel: x86: Booting SMP configuration: Oct 04 15:13:00 linux-3j75.suse kernel: smpboot: Booting Node 0 Processor 1 APIC 0x2 Oct 04 15:13:00 linux-3j75.suse kernel: ACPI: \_PR_.CPU1: Found 3 idle states Oct 04 15:13:00 linux-3j75.suse kernel: CPU1 is up Oct 04 15:13:00 linux-3j75.suse kernel: smpboot: Booting Node 0 Processor 2 APIC 0x1 Oct 04 15:13:00 linux-3j75.suse kernel: ACPI: \_PR_.CPU2: Found 3 idle states Oct 04 15:13:00 linux-3j75.suse kernel: CPU2 is up Oct 04 15:13:00 linux-3j75.suse kernel: smpboot: Booting Node 0 Processor 3 APIC 0x3 Oct 04 15:13:00 linux-3j75.suse kernel: ACPI: \_PR_.CPU3: Found 3 idle states Oct 04 15:13:00 linux-3j75.suse kernel: CPU3 is up Oct 04 15:13:00 linux-3j75.suse kernel: ACPI: Waking up from system sleep state S3 Oct 04 15:13:00 linux-3j75.suse kernel: ACPI: EC: interrupt unblocked Oct 04 15:13:00 linux-3j75.suse kernel: ACPI: EC: event unblocked
(In reply to Stefan Dirsch from comment #8) > Found this in journalctl, but I'm not sure if this happened together with > switching CPUs to 400 MHz Those look like normal messages about suspend/resume; the CPUs are powered down and up.
BTW, with this setting intel_idle.max_cstate=0 I no longer can switch to 'performance'. ~ sudo echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor bash: /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor: Permission denied
Might be relatetd https://bugzilla.opensuse.org/show_bug.cgi?id=1203991
(In reply to Frank Krüger from comment #11) > Might be relatetd https://bugzilla.opensuse.org/show_bug.cgi?id=1203991 I'm afraid it's not. thermald is not running on my system. "--disable-active-power " is not set in config. Also thermald on Leap 15.3 is much older than on TW (1.6 vs 2.5). BTW I opened the bug for the wrong Leap version. It's 15.3, not 15.4. Sorry for that!
(In reply to Stefan Dirsch from comment #12) > BTW I opened the bug for the wrong > Leap version. It's 15.3, not 15.4. Sorry for that! Could you try to boot with Leap 15.4 kernel on top of the same system? That'll be interesting.
Seems on kernel of Leap 15.4 (5.14.21-150400.24.21-default) it happens less often. Much better now it seems, but this needs more testing. But still sometimes things are getting slow and CPU speed stays at 400 MHz. This is with "intel_idle.max_cstate=0". No thermald running apparently, but I found in 'ps' (didn't see this before because I grepped for thermald): root 801 0.0 0.0 0 0 ? I< 10:33 0:00 [acpi_thermal_pm]
/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor is set to 'powersave' in power and battery mode.
I also tried with kernel 6.0 meanwhile. About the same results. Still rather often the machine switches in this 400 MHz slow mode. :-( This is sill with "intel_idle.max_cstate=0".
Interesting. Giovanni, any clue about the behavior? Also, Stefan, please give the hardware details of your machine (e.g. hwinfo).
Created attachment 862098 [details] hwinfo.out
Giovanni, any chance to look into this? What else do you need?
Hello Stefan, I'd like two pieces of information: (1) Boot with dynamic debugging enabled on cpufreq drivers (intel_pstate in fact) adding this to the kernel command line: dyndbg="file drivers/cpufreq/* +pf" Then try to trigger both the good behavior (fast laptop, AC power) and bad behavior (slow laptop, on battery) and attach the kernel logs here. (2) install the tool "turbostat" from the package "cpupower" if you don't have it already, and run the following command twice, i.e. --> when the laptop is on AC power (not battery) and is running well --> when the laptop is on battery and is running slow command: turbostat --interval 1 --num_iterations 10 2>&1 | tee turbostat.$(date +%F.%H-%M-%S).txt Then attach the resulting turbostat.XXX.txt files here. This will collect 10 samples, one sample per second, of frequency and power activity. Actually more importantly it will collect all power management settings from various MSR in the header of the output. I want to examine and compare the MSR values during the good and bad behavior. This is the reasoning: What I think is happening is that when you disconnect the AC power, then the firmware sends an ACPI message to the driver that limits the speed of the processor; this could happen via the ACPI _PPC object (Performance Present Capabilites). That's essentially a table of values that the driver uses to update min and max clock frequencies. It wouldn't make much sense for the firmware to limit the driver at 400 MHz when on battery, but lets see what the kernel log says with dynamic debugging enabled, and the power management config shown by turbostat. Your i7-7600U CPU is a Kaby Lake from 2017, which has autonomous frequency scaling (the feature is called HWP, or Intel Speed Shift). The governor and driver don't do much, other than setting some value at boot in an MSR and letting it do its thing. That's why I think it's reacting to an ACPI event, and the data contained in the ACPI object isn't very good.
Thanks for looking into this. I definitely wrote dyndbg="file drivers/cpufreq/* +pf" in grub during boot. Unfortunately this results in # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-6.0.0-lp153.3.g1195759-default root=/dev/mapper/system-root resume=/dev/system/swap splash=silent quiet showopts intel_idle.max_cstate=0 "dyndbg=file drivers/cpufreq/* +pf" So the following results might be completely useless. :-( Nevertheless I will try my best.
(In reply to Stefan Dirsch from comment #22) > Thanks for looking into this. I definitely wrote > > dyndbg="file drivers/cpufreq/* +pf" > > in grub during boot. Unfortunately this results in > > # cat /proc/cmdline > BOOT_IMAGE=/vmlinuz-6.0.0-lp153.3.g1195759-default > root=/dev/mapper/system-root resume=/dev/system/swap splash=silent quiet > showopts intel_idle.max_cstate=0 "dyndbg=file drivers/cpufreq/* +pf" > > So the following results might be completely useless. :-( This should be fine, it's just how the kernel handles. Check the dmesg output -- now you must have more lines about cpufreq there.
Created attachment 863065 [details] dmesg-good-ac.log dmesg - that's how I booted with AC power connected - good
Created attachment 863066 [details] dmesg-bad-battery.log dmesg - After switching to battery - in a bad/slow state (unfortunately no relevant difference in dmesg to good state)
Created attachment 863067 [details] turbostat-good-ac.2022-11-23.15-52-25.txt turbostat in good state
Created attachment 863068 [details] turbostat-bad-battery.2022-11-23.15-51-44.txt turbostat in bad state (slow battery)
(In reply to Stefan Dirsch from comment #25) > Created attachment 863066 [details] > dmesg-bad-battery.log > > dmesg - After switching to battery - in a bad/slow state (unfortunately no > relevant difference in dmesg to good state) Maybe I was wrong. There is at least this -[ 2200.120064] perf: interrupt took too long (2540 > 2500), lowering kernel.perf_event_max_sample_rate to 78500 -[ 2307.480111] perf: interrupt took too long (3242 > 3175), lowering kernel.perf_event_max_sample_rate to 61500 Maybe that's related.
Let me know when I did something wrong/stupid. Or if you need any more infos.
Hi Stefan, question: did this laptop ever work well (wrt this problem)? Initially I was thinking the problem could be a kernel update but I'm trying to validate that assumption. If it used to work in the past, do you recall any hardware or software update that happened before it broke? From your dmesg and turbostat attachments the problem looks both serious and difficult to solve. I've found this thread from 2017 in the Lenovo forum that seems to describe exactly your symptoms: "T470 Heavy CPU Throttling on Battery" https://forums.lenovo.com/t5/ThinkPad-T400-T500-and-newer-T/T470-Heavy-CPU-Throttling-on-Battery/td-p/3783920 For posterity I'll copy-paste both the question and the alleged solution from that thread (update the BIOS to at least version 1.42, currently the latest is 1.72) ;; Hi, ;; ;; My CPU has been throttling down to 400 MHz when on battery. This does not ;; necessarily happen on low battery - it sometimes happens when I have over ;; 50% battery life left (albeit the removable battery is low), and sometimes ;; when i have less than 25% battery life left. ;; ;; My temperatures appear to be good (according to Core Temp), usually around ;; 40-50C. ;; ;; Has anyone encountered this problem before? All of my drivers are up to ;; date according to Lenovo companion, and I have placed power settings in ;; windows to High Performance, and changed the Intel Speedstep settings in ;; the BIOS to max performance as well. ;; Dear all, ;; ;; ThinkPad T470 development team identified the root cause and released the ;; fix as the latest BIOS Update package at the Lenovo web site; ;; ;; https://pcsupport.lenovo.com/us/en/products/Laptops-and-netbooks/ThinkPad-T-Series-laptops/ThinkPad-T470/downloads/DS120429 ;; ;; UEFI: 1.42 / ECP: 1.27 ;; ;; (New) Support interface of TPM firmware update. ;; (New) Updated the Diagnostics module to version 03.11.001. ;; (New) Updated the CPU microcode. ;; (Fix) Fixed an issue where CPU clock stays low with FRU P/N 01AV452 battery. The forum mentions the external battery, part number 01AV452, which I can see in your hwinfo output: P: /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:18/PNP0C09:00/PNP0C0A:01/power_supply/BAT1 L: 0 E: DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:18/PNP0C09:00/PNP0C0A:01/power_supply/BAT1 E: SUBSYSTEM=power_supply E: POWER_SUPPLY_NAME=BAT1 E: POWER_SUPPLY_TYPE=Battery E: POWER_SUPPLY_STATUS=Full E: POWER_SUPPLY_PRESENT=1 E: POWER_SUPPLY_TECHNOLOGY=Li-poly E: POWER_SUPPLY_CYCLE_COUNT=866 E: POWER_SUPPLY_VOLTAGE_MIN_DESIGN=11460000 E: POWER_SUPPLY_VOLTAGE_NOW=12715000 E: POWER_SUPPLY_POWER_NOW=0 E: POWER_SUPPLY_ENERGY_FULL_DESIGN=24000000 E: POWER_SUPPLY_ENERGY_FULL=19870000 E: POWER_SUPPLY_ENERGY_NOW=19870000 E: POWER_SUPPLY_CAPACITY=100 E: POWER_SUPPLY_CAPACITY_LEVEL=Full E: POWER_SUPPLY_MODEL_NAME=01AV452 ^^^^^^^ E: POWER_SUPPLY_MANUFACTURER=SMP E: POWER_SUPPLY_SERIAL_NUMBER= 1935 I archived the T470 parts diagram here in the corporate confluence, you should recognize the optional external battery: (requires confluence login) https://confluence.suse.com/download/attachments/515801221/t470-system-service-parts.png?api=v2 So, in case you have a firmware version older than 1.42, I think updating it is worth a shot. When I do that on my laptop I use "fwupdmgr" command (from within Linux!) which makes it all very smooth, see https://wiki.archlinux.org/title/fwupd , it generally it's a matter of fwupdmgr get-devices fwupdmgr refresh fwupdmgr get-updates fwupdmgr update Regarding your turbostat and dmesg attachments: the first this I did is comparing the header of the turbostat output (all the power management settings). They're identical in both "good" and "bad". Only the temperature reading changes (MSR_IA32_PACKAGE_THERM_STATUS), and that doesn't mean much. The settings I looked at are: MSR_PM_ENABLE: 0x00000001 (HWP) which means the CPU is doing autonomous frequency scaling (no OS involved) and MSR_HWP_REQUEST: 0x80002704 (min 4 max 39 des 0 epp 0x80 window 0x0 pkg 0x0) which means the configuration allows the CPU the full range of speeds, from "min 4" (400 MHz) to "max 39" (3900 MHz). What I was suspecting was a situation like "min 4" and "max 4" in the "bad" situation, but it isn't like that. In both scenarios the full range is allowed. "epp 0x80" is also important, as that's a number between 0x0 (maximum performance setting) to 0xff (maximum efficiency). Again, you have 0x80 in both cases as expected, which is mid-range, nothing to see there. Yet turbostat shows your clock squarely at 400 MHz when on battery which is a serious problem. As per dmesg, as you observed, we have some cpufreq messages at booth (showing the dyndbg parameter works) but no logs during normal operations. If the intel_pstate driver was to impose some limits on the frequency it would write something in the logs. One minor remark, unrelated to your problem: I see you have intel_idle.max_cstate=0 in your command line. Are you aware that this parameter prevents the intel_idle driver from loading, falling back to the acpi_idle driver? Generally people think "=0" means "don't go idle", but unfortunately the parameter works differently. intel_idle.max_cstate= [KNL,HW,ACPI,X86] 0 disables intel_idle and fall back on acpi_idle. 1 to 9 specify maximum depth of C-state. In case the intent is to not go idle, the correct parameter is "idle=poll". Let me know if you have a BIOS version <1.42 on the T470 and what happens if you update it.
dmidecode -t bios # dmidecode 3.2 Getting SMBIOS data from sysfs. SMBIOS 3.0.0 present. Handle 0x000B, DMI type 0, 24 bytes BIOS Information Vendor: LENOVO Version: N1QET55W (1.30 ) Release Date: 05/23/2017 Address: 0xE0000 Runtime Size: 128 kB ROM Size: 16 MB Characteristics: PCI is supported PNP is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported EDD is supported 3.5"/720 kB floppy services are supported (int 13h) Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) CGA/mono video services are supported (int 10h) ACPI is supported USB legacy is supported BIOS boot specification is supported Targeted content distribution is supported UEFI is supported BIOS Revision: 1.30 Firmware Revision: 1.13 Handle 0x0022, DMI type 13, 22 bytes BIOS Language Information Language Description Format: Abbreviated Installable Languages: 1 en-US Currently Installed Language: en-US Wrong DMI structures length: 3005 bytes announced, structures occupy 3020 bytes.
I think I never updated the BIOS. # fwupdmgr get-updates WARNING: Firmware can not be updated in legacy BIOS mode See https://github.com/fwupd/fwupd/wiki/PluginFlag:legacy-bios for more information. So I'm still running either in legacy BIOS or CSM UEFI BIOS in "CSM compatibility mode". And switching to UEFI would probably mean I would need to reinstall the whole system. So I guess I need to update the BIOS by other means. Hope I can still find an update somewhere. Maybe this is then doable by a USB stick with FreeDOS on it. But first I need to make a backup of my data.
How long I see this issue. Definitely not from the beginning. Maybe since about a year or so. I think it's new since Leap 15.2/15.3 (currently running 15.3). But I can't say for sure ...
I removed intel_idle.max_cstate=0 option. It was Takashi's first idea to try to address the issue (comment#6). I forgot to remove it again. Done now.
Backup is done. Unfortunately I needed to notice that BIOS updates via fwupdmgr/fwupdate only work in UEFI mode. But I installed in CSM mode. :-( I tried to switch in BIOS to UEFI mode, but this that my installed system can't be booted at all - as I expected. :-( fwupdate does not work in CSM mode (as fwupdmgr, see comment#32) linux-3j75:/tmp/tmp # ls N1QET96W.cab N1QHT54W.cab linux-3j75:/tmp/tmp # fwupdate -l failed: no volumes of type c12a7328-f81f-11d2-ba4b-00a0c93ec93b: no volumes of type ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 https://bugzilla.redhat.com/show_bug.cgi?id=1904779 I'm not aware of any other means to update the BIOS. If I ever will reinstall my system, I will switch to UEFI mode first to update the BIOS. For the time being I need to live with that issue. Thanks a lot for investigation and all the hints. Learned a lot! I will reassign the bug to myself, so you no longer need to think about it. Feel free to "unsubscribe" from it.
Takashi had a good proposal. Creating a Live Image for a USB stick with fwupdmgr/fwupdate included, so I can switch to UEFI mode in firmware and boot with this Live image in UEFI mode and run fwupdmgr/fwupdate.
(In reply to Stefan Dirsch from comment #36) > Takashi had a good proposal. Creating a Live Image for a USB stick with > fwupdmgr/fwupdate included, so I can switch to UEFI mode in firmware and > boot with this Live image in UEFI mode and run fwupdmgr/fwupdate. So this became my hackweek project. ;-) https://hackweek.opensuse.org/22/projects/thinkpad-t470-bios-update-in-live-system
(In reply to Stefan Dirsch from comment #37) > (In reply to Stefan Dirsch from comment #36) > > Takashi had a good proposal. Creating a Live Image for a USB stick with > > fwupdmgr/fwupdate included, so I can switch to UEFI mode in firmware and > > boot with this Live image in UEFI mode and run fwupdmgr/fwupdate. > > So this became my hackweek project. ;-) > > https://hackweek.opensuse.org/22/projects/thinkpad-t470-bios-update-in-live- > system Successfully done. :-) Closing ...
(In reply to Stefan Dirsch from comment #38) > > Successfully done. :-) Closing ... Nicely done! To recap: your system is legacy BIOS, the vendor offers BIOS updates only via UEFI capsules and their only documented update method is with the Linux tool fwupdate(1). fwupdate doesn't work if the system uses legacy mode. So what you did is use a (uefi) live image from a USB disk with fwupdate, run the update, and then continue use legacy mode on an update firmware. Am I getting this right? This kind of shows that there is only one firmware; I somehow always assumed there are two, one legacy and one uefi. But apparently you're always running the uefi BIOS, only it has a thin compatibility layer on top so you can keep using the old interface. Does that make sense?
(In reply to Giovanni Gherdovich from comment #39) > To recap: your system is legacy BIOS, Yes, kind of, i.e. openSUSE Leap 15.3 was still using legacy BIOS for installation, in more detail. It runs on CSM mode, i.e. the legacy BIOS compatibily layer on top of UEFI. At least that's how I understood it. > the vendor offers BIOS updates only > via UEFI capsules and their only documented update method is with the Linux > tool fwupdate(1). fwupdate doesn't work if the system uses legacy mode. So > what you did is use a (uefi) live image from a USB disk with fwupdate, run > the update, and then continue use legacy mode on an update firmware. Am I > getting this right? Yes, in legacy BIOS compatibility mode CSM. See above. > This kind of shows that there is only one firmware; I somehow always assumed > there are two, one legacy and one uefi. But apparently you're always running > the uefi BIOS, only it has a thin compatibility layer on top so you can keep > using the old interface. Does that make sense? Yes, definitely. Only one (UEFI) BIOS. Well, in more detail. It was first installed an update for ECP (Embedded Controller Program) . On top of that the update for the UEFI BIOS (first step to an intermediate version). Then another update for UEFI BIOS (second and last step to the final version). For all these a boot entry was generated and the update didn't happen before selecting this boot entry. Then also the firmware for the NVME has been updated. This happened immediately.