Bugzilla – Bug 103028
rcpowersave start gives strange messages
Last modified: 2005-11-21 15:12:28 UTC
rcpowersave start shows the following in /var/log/messages: cpu_init done, current fid 0xe, vid 0x8 limiting to cpu 4 failed limiting to cpu 5 failed limiting to cpu 6 failed limiting to cpu 7 failed limiting to cpu 8 failed limiting to cpu 9 failed limiting to cpu 10 failed limiting to cpu 11 failed limiting to cpu 12 failed limiting to cpu 13 failed limiting to cpu 14 failed limiting to cpu 15 failed limiting to cpu 16 failed limiting to cpu 17 failed limiting to cpu 18 failed limiting to cpu 19 failed limiting to cpu 20 failed limiting to cpu 21 failed limiting to cpu 22 failed limiting to cpu 23 failed limiting to cpu 24 failed limiting to cpu 25 failed limiting to cpu 26 failed limiting to cpu 27 failed limiting to cpu 28 failed limiting to cpu 29 failed limiting to cpu 30 failed limiting to cpu 31 failed limiting to cpu 32 failed limiting to cpu 33 failed limiting to cpu 34 failed limiting to cpu 35 failed limiting to cpu 36 failed limiting to cpu 37 failed limiting to cpu 38 failed limiting to cpu 39 failed limiting to cpu 40 failed limiting to cpu 41 failed limiting to cpu 42 failed limiting to cpu 43 failed limiting to cpu 44 failed limiting to cpu 45 failed limiting to cpu 46 failed limiting to cpu 47 failed limiting to cpu 48 failed limiting to cpu 49 failed limiting to cpu 50 failed limiting to cpu 51 failed limiting to cpu 52 failed limiting to cpu 53 failed limiting to cpu 54 failed limiting to cpu 55 failed limiting to cpu 56 failed limiting to cpu 57 failed limiting to cpu 58 failed limiting to cpu 59 failed limiting to cpu 60 failed limiting to cpu 61 failed limiting to cpu 62 failed limiting to cpu 63 failed limiting to cpu 64 failed limiting to cpu 65 failed limiting to cpu 66 failed limiting to cpu 67 failed limiting to cpu 68 failed limiting to cpu 69 failed limiting to cpu 70 failed limiting to cpu 71 failed limiting to cpu 72 failed limiting to cpu 73 failed limiting to cpu 74 failed limiting to cpu 75 failed limiting to cpu 76 failed limiting to cpu 77 failed limiting to cpu 78 failed limiting to cpu 79 failed limiting to cpu 80 failed limiting to cpu 81 failed limiting to cpu 82 failed limiting to cpu 83 failed limiting to cpu 84 failed limiting to cpu 85 failed limiting to cpu 86 failed limiting to cpu 87 failed limiting to cpu 88 failed limiting to cpu 89 failed limiting to cpu 90 failed limiting to cpu 91 failed limiting to cpu 92 failed limiting to cpu 93 failed limiting to cpu 94 failed limiting to cpu 95 failed limiting to cpu 96 failed limiting to cpu 97 failed limiting to cpu 98 failed limiting to cpu 99 failed limiting to cpu 100 failed limiting to cpu 101 failed limiting to cpu 102 failed limiting to cpu 103 failed limiting to cpu 104 failed limiting to cpu 105 failed limiting to cpu 106 failed limiting to cpu 107 failed limiting to cpu 108 failed limiting to cpu 109 failed limiting to cpu 110 failed limiting to cpu 111 failed limiting to cpu 112 failed limiting to cpu 113 failed limiting to cpu 114 failed limiting to cpu 115 failed limiting to cpu 116 failed limiting to cpu 117 failed limiting to cpu 118 failed limiting to cpu 119 failed limiting to cpu 120 failed limiting to cpu 121 failed limiting to cpu 122 failed limiting to cpu 123 failed limiting to cpu 124 failed limiting to cpu 125 failed limiting to cpu 126 failed limiting to cpu 127 failed
looks like a kernel bug. which processor? which chipset? which cpufreq driver? (install cpufrequtils and run cpufreq-info)
2 dual core opteron processors tyan S2885 cpufreq-info reports: cpufrequtils 0.3: cpufreq-info (C) Dominik Brodowski 2004 Report errors and bugs to linux@brodo.de, please. analyzing CPU 0: driver: powernow-k8 CPUs which need to switch frequency at the same time: 0 1 hardware limits: 1.80 GHz - 2.20 GHz available frequency steps: 2.20 GHz, 2.00 GHz, 1.80 GHz available cpufreq governors: ondemand, userspace, powersave, performance current policy: frequency should be within 1.80 GHz and 2.20 GHz. The governor "ondemand" may decide which speed to use within this range. current CPU frequency is 1.80 GHz (asserted by call to hardware). analyzing CPU 1: driver: powernow-k8 CPUs which need to switch frequency at the same time: 0 1 hardware limits: 1.80 GHz - 2.20 GHz available frequency steps: 2.20 GHz, 2.00 GHz, 1.80 GHz available cpufreq governors: ondemand, userspace, powersave, performance current policy: frequency should be within 1.80 GHz and 2.20 GHz. The governor "ondemand" may decide which speed to use within this range. current CPU frequency is 1.80 GHz (asserted by call to hardware). analyzing CPU 2: driver: powernow-k8 CPUs which need to switch frequency at the same time: 2 3 hardware limits: 1.80 GHz - 2.20 GHz available frequency steps: 2.20 GHz, 2.00 GHz, 1.80 GHz available cpufreq governors: ondemand, userspace, powersave, performance current policy: frequency should be within 1.80 GHz and 2.20 GHz. The governor "ondemand" may decide which speed to use within this range. current CPU frequency is 1.80 GHz (asserted by call to hardware). analyzing CPU 3: driver: powernow-k8 CPUs which need to switch frequency at the same time: 2 3 hardware limits: 1.80 GHz - 2.20 GHz available frequency steps: 2.20 GHz, 2.00 GHz, 1.80 GHz available cpufreq governors: ondemand, userspace, powersave, performance current policy: frequency should be within 1.80 GHz and 2.20 GHz. The governor "ondemand" may decide which speed to use within this range. current CPU frequency is 1.80 GHz (asserted by call to hardware).
This is a kernel bug in powernow-k8 (maybe above?). Seems as if the code iterates over all theoretically allowed CPUs, instead of found CPUs. Mark, do you know about this one?
Btw. $ ls /sys/devices/system/cpu/ cpu0 cpu104 cpu110 cpu117 cpu123 cpu15 cpu21 cpu28 cpu34 cpu40 cpu47 cpu53 cpu6 cpu66 cpu72 cpu79 cpu85 cpu91 cpu98 cpu1 cpu105 cpu111 cpu118 cpu124 cpu16 cpu22 cpu29 cpu35 cpu41 cpu48 cpu54 cpu60 cpu67 cpu73 cpu8 cpu86 cpu92 cpu99 cpu10 cpu106 cpu112 cpu119 cpu125 cpu17 cpu23 cpu3 cpu36 cpu42 cpu49 cpu55 cpu61 cpu68 cpu74 cpu80 cpu87 cpu93 cpu100 cpu107 cpu113 cpu12 cpu126 cpu18 cpu24 cpu30 cpu37 cpu43 cpu5 cpu56 cpu62 cpu69 cpu75 cpu81 cpu88 cpu94 cpu101 cpu108 cpu114 cpu120 cpu127 cpu19 cpu25 cpu31 cpu38 cpu44 cpu50 cpu57 cpu63 cpu7 cpu76 cpu82 cpu89 cpu95 cpu102 cpu109 cpu115 cpu121 cpu13 cpu2 cpu26 cpu32 cpu39 cpu45 cpu51 cpu58 cpu64 cpu70 cpu77 cpu83 cpu9 cpu96 cpu103 cpu11 cpu116 cpu122 cpu14 cpu20 cpu27 cpu33 cpu4 cpu46 cpu52 cpu59 cpu65 cpu71 cpu78 cpu84 cpu90 cpu97 Why do I have 128 cpus there?
I expect that either powernow-k8 driver uses current->cpus_allowed in a wrong way, or this is broken somewhere higher. Could someone of the kernel hackers have a look at that, I have not much time at the moment -> need to setup a lot new hardware. You can use *willimas* to test Serial console is on the machine: *sconsole1* -> type in: cscreen the remote powersave switch is rpower3 -> telnet. As this seems to have to do with a scheduler variable -> assigning to Nick. Ask me if you have problem with the machine.
I noticed it myself on willimas before, but never acted on it ;-) NR_CPUS is 128 CPUs, so it looks it is trying all possible CPUs. It could be this loop: for (i=0; i<NR_CPUS; i++) { if (!cpu_online(i)) continue; if (check_supported_cpu(i)) supported_cpus++; } however this would imply that cpu_online() is broken, which would be worrying. There were recently some changes in this area for CPU hotplug support.
That loop has been in the driver for two years without changes and without error messages from CPU 127. I suspect cpu_online() needs to be investigated.
It's related to cpu hotplug (128 cpus) and sysfs. I just disabled the printk for now. That's a kludge, in mainline we can fix it better.